Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would apex still be useful for non Volta architectures? #14

Closed
imgyuri opened this issue Jun 21, 2018 · 6 comments
Closed

Would apex still be useful for non Volta architectures? #14

imgyuri opened this issue Jun 21, 2018 · 6 comments

Comments

@imgyuri
Copy link

imgyuri commented Jun 21, 2018

I was looking into the library, and it seems that the assumption is that the GPU is a Volta architecture.

This link shows some benchmarks for fp16 training and inference, and the 1080 Ti doesn't gain that much performance from fp16.

Would it be useful to apply this library for GPUs besides Titan V and V100?

@mcarilli
Copy link
Contributor

mcarilli commented Jun 21, 2018

Hi Gyuri,

Thank you for your interest in Apex! Apex is not designed to offer performance gains over "pure FP16" training. Apex is designed to help with numerical stability. The intent of Apex is to provide numerical stability comparable to "pure FP32" while maintaining almost all the performance of "pure FP16" training.

Apex is intended primarily for Volta. The 1080Ti is a gaming card, a Pascal with compute capability 6.1, and has very limited FP16 instruction support. It's still solid for FP32 training, but should not be used for FP16 training. See this post.

@imgyuri
Copy link
Author

imgyuri commented Jun 22, 2018

Thanks for the helpful information!

@gcp
Copy link

gcp commented Dec 20, 2018

Would it be useful to apply this library for GPUs besides Titan V and V100?

For completeness, it's useful also on the P100 (which is available with many cloud providers) as that also has full speed fp16, even though it's a Pascal based card.

@ngimel
Copy link
Contributor

ngimel commented Dec 20, 2018

Also for completeness, pure fp16 training on P100 is not recommended and not supported, pseudo-fp16 (with storage in fp16, and math in fp32) can be used, but it's math throughput is approx equal to fp32 throughput on P100, so the only benefit is coming from less memory needed.

@gcp
Copy link

gcp commented Dec 20, 2018

Why is it not recommended? Is it because the intermediate adds accumulate in fp16 instead of fp32 as on Volta?

What happens if you try to use apex on that configuration? Do you just get less accurate convolutions?

@ngimel
Copy link
Contributor

ngimel commented Dec 20, 2018

Yes, because of accumulation in fp16. If you use Apex on p100, it would still call pytorch, and pytorch does not support fp16 accumulation, you'll get pseudo-fp16 whether you want it or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants