Mixed precision for rocm #2402

cspink · 2023-06-12T08:15:07Z

As I work through this example, I soon discovered the following error message:

This fp16_optimizer is designed to only work with apex.contrib.optimizers.* To update, use updated optimizers with AMP.
I figured out that this makes sense, as I am using AMD HW with a ROCm implementation of Pytorch. Still, the training times I get from using one node with 8 GPUs is no where near the 10 hours reported in the configuration for 50k steps, using the same yaml file (without fp16).

This raises both broad and specific questions. To begin with the latter:

How can I use mixed precision on ROCm? (And, what kind of speedup should I expect?)
Broadly speaking, are there special considerations using ROCm in performance terms, which affects the choices of optimizer, batch sizes, or parallelization type?

The text was updated successfully, but these errors were encountered:

vince62s · 2023-06-12T20:18:26Z

I never had the chance to test with a AMD GPU so I'm afraid I can't answer to those questions.
Maybe on AMD Radeon communities / forums you may get some answers.

vince62s closed this as completed Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed precision for rocm #2402

Mixed precision for rocm #2402

cspink commented Jun 12, 2023 •

edited

vince62s commented Jun 12, 2023

Mixed precision for rocm #2402

Mixed precision for rocm #2402

Comments

cspink commented Jun 12, 2023 • edited

vince62s commented Jun 12, 2023

cspink commented Jun 12, 2023 •

edited