You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
Looks like the performance boost would only go to 40 series and above, as it relies on FP8 on the tensor cores. Even then, looks like it was locked out in the repo as all 40 series owners are reporting that it isn't currently working for them. And double for Windows.
Still, couldn't hurt for the boost but would also need an option for --no-FP8
Is there an existing issue for this?
What would your feature do ?
Proposed workflow
N/A
Additional information
Crossposted on:
Other issues related to potential performance improvements:
The text was updated successfully, but these errors were encountered: