Would apex still be useful for non Volta architectures? #14

imgyuri · 2018-06-21T06:16:08Z

I was looking into the library, and it seems that the assumption is that the GPU is a Volta architecture.

This link shows some benchmarks for fp16 training and inference, and the 1080 Ti doesn't gain that much performance from fp16.

Would it be useful to apply this library for GPUs besides Titan V and V100?

mcarilli · 2018-06-21T16:54:46Z

Hi Gyuri,

Thank you for your interest in Apex! Apex is not designed to offer performance gains over "pure FP16" training. Apex is designed to help with numerical stability. The intent of Apex is to provide numerical stability comparable to "pure FP32" while maintaining almost all the performance of "pure FP16" training.

Apex is intended primarily for Volta. The 1080Ti is a gaming card, a Pascal with compute capability 6.1, and has very limited FP16 instruction support. It's still solid for FP32 training, but should not be used for FP16 training. See this post.

imgyuri · 2018-06-22T06:03:35Z

Thanks for the helpful information!

gcp · 2018-12-20T11:51:53Z

Would it be useful to apply this library for GPUs besides Titan V and V100?

For completeness, it's useful also on the P100 (which is available with many cloud providers) as that also has full speed fp16, even though it's a Pascal based card.

ngimel · 2018-12-20T17:14:28Z

Also for completeness, pure fp16 training on P100 is not recommended and not supported, pseudo-fp16 (with storage in fp16, and math in fp32) can be used, but it's math throughput is approx equal to fp32 throughput on P100, so the only benefit is coming from less memory needed.

gcp · 2018-12-20T19:58:41Z

Why is it not recommended? Is it because the intermediate adds accumulate in fp16 instead of fp32 as on Volta?

What happens if you try to use apex on that configuration? Do you just get less accurate convolutions?

ngimel · 2018-12-20T20:56:21Z

Yes, because of accumulation in fp16. If you use Apex on p100, it would still call pytorch, and pytorch does not support fp16 accumulation, you'll get pseudo-fp16 whether you want it or not.

…script add ROCm L0 test script

imgyuri closed this as completed Jun 22, 2018

amoussawi mentioned this issue Dec 15, 2018

AMP supported hardware #101

Closed

Solacex mentioned this issue Jan 13, 2019

RuntimeError: cuda runtime error (74) : misaligned address at /pytorch/aten/src/THC/THCTensorCopy.cu:84 #124

Open

chengmengli06 mentioned this issue Nov 12, 2019

apex hangs on cudaFree #599

Open

matlabninja mentioned this issue May 20, 2020

Use O1 opt_lv leads to RuntimeError: CUDA error: no kernel image is available for execution on the device #842

Closed

lcskrishna pushed a commit to lcskrishna/apex that referenced this issue May 21, 2020

Merge pull request NVIDIA#14 from ROCmSoftwarePlatform/rocm_add_test_…

b2b5543

…script add ROCm L0 test script

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would apex still be useful for non Volta architectures? #14

Would apex still be useful for non Volta architectures? #14

imgyuri commented Jun 21, 2018

mcarilli commented Jun 21, 2018 •

edited

imgyuri commented Jun 22, 2018

gcp commented Dec 20, 2018

ngimel commented Dec 20, 2018

gcp commented Dec 20, 2018

ngimel commented Dec 20, 2018

Would apex still be useful for non Volta architectures? #14

Would apex still be useful for non Volta architectures? #14

Comments

imgyuri commented Jun 21, 2018

mcarilli commented Jun 21, 2018 • edited

imgyuri commented Jun 22, 2018

gcp commented Dec 20, 2018

ngimel commented Dec 20, 2018

gcp commented Dec 20, 2018

ngimel commented Dec 20, 2018

mcarilli commented Jun 21, 2018 •

edited