Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed precision for rocm #2402

Closed
cspink opened this issue Jun 12, 2023 · 1 comment
Closed

Mixed precision for rocm #2402

cspink opened this issue Jun 12, 2023 · 1 comment

Comments

@cspink
Copy link

cspink commented Jun 12, 2023

As I work through this example, I soon discovered the following error message:

This fp16_optimizer is designed to only work with apex.contrib.optimizers.* To update, use updated optimizers with AMP.
I figured out that this makes sense, as I am using AMD HW with a ROCm implementation of Pytorch. Still, the training times I get from using one node with 8 GPUs is no where near the 10 hours reported in the configuration for 50k steps, using the same yaml file (without fp16).

This raises both broad and specific questions. To begin with the latter:

  1. How can I use mixed precision on ROCm? (And, what kind of speedup should I expect?)
  2. Broadly speaking, are there special considerations using ROCm in performance terms, which affects the choices of optimizer, batch sizes, or parallelization type?
@vince62s
Copy link
Member

I never had the chance to test with a AMD GPU so I'm afraid I can't answer to those questions.
Maybe on AMD Radeon communities / forums you may get some answers.

@vince62s vince62s closed this as completed Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants