-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
内存超出问题 #277
Comments
如果使用 Adam Offload 改用 BF16 梯度累计 |
大佬,现在的话显存也是不足,模型是codellama-13b,参数如下 |
如果是显存不足需要用 ray 啊~train_ppo_ray.py |
hi @hijkzzz, I observed high RAM usage in the 70B Llama-2 fine-tuning task. I got CPU RAM OOM (not CUDA OOM) when I tried to run it on a 1TB RAM machine, each actor uses around ~250Gibs. I have already tried bfloat16 in grad accumulation type, and the inference runs fine, just in the first training step it got OOM. If I don't put in the bfloat16 grad, it won't even survive through the inference step. I guess this is expected but curious to hear about your experience. Do you have a rough estimate of how many A100 80 GB GPUs would be needed if we get rid of the adam CPU offloading and put everything on GPUs? How much of a speed increase would that be roughly speaking? We don't have NVLink enabled in our machine. btw 非常好 library,爱来自美国 |
使用PPO训练13B的模型,内存占用特别高,我应该怎么解决
The text was updated successfully, but these errors were encountered: