New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would apex still be useful for non Volta architectures? #14
Comments
Hi Gyuri, Thank you for your interest in Apex! Apex is not designed to offer performance gains over "pure FP16" training. Apex is designed to help with numerical stability. The intent of Apex is to provide numerical stability comparable to "pure FP32" while maintaining almost all the performance of "pure FP16" training. Apex is intended primarily for Volta. The 1080Ti is a gaming card, a Pascal with compute capability 6.1, and has very limited FP16 instruction support. It's still solid for FP32 training, but should not be used for FP16 training. See this post. |
Thanks for the helpful information! |
For completeness, it's useful also on the P100 (which is available with many cloud providers) as that also has full speed fp16, even though it's a Pascal based card. |
Also for completeness, pure fp16 training on P100 is not recommended and not supported, pseudo-fp16 (with storage in fp16, and math in fp32) can be used, but it's math throughput is approx equal to fp32 throughput on P100, so the only benefit is coming from less memory needed. |
Why is it not recommended? Is it because the intermediate adds accumulate in fp16 instead of fp32 as on Volta? What happens if you try to use apex on that configuration? Do you just get less accurate convolutions? |
Yes, because of accumulation in fp16. If you use Apex on p100, it would still call pytorch, and pytorch does not support fp16 accumulation, you'll get pseudo-fp16 whether you want it or not. |
…script add ROCm L0 test script
I was looking into the library, and it seems that the assumption is that the GPU is a Volta architecture.
This link shows some benchmarks for fp16 training and inference, and the 1080 Ti doesn't gain that much performance from fp16.
Would it be useful to apply this library for GPUs besides Titan V and V100?
The text was updated successfully, but these errors were encountered: