You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with deepspeed config here
I can also evaluate the output model with lm-evaluation-harness on single gpu with a non-one batch size.
However, now I am using model_training to train a reward model.
I can only run with the below setting on 8xA100(80G):
per_device_train_batch_size: 4# can be bigger using gradient checkpointingper_device_eval_batch_size: 4gradient_accumulation_steps: 4max len: 2048gradient checkpointing: true # otherwise got GPU OOM even with per_device_train_batch_size 1use_cache: false # have to turn off since conflict with grandient checkpointingbf16: true
with deepspeed config zero3_config_sft.config.(As you can see, it is very alike with the one above)
In addition, I can not eval the output model using eval_rm.py on single gpu(even with batch size 1) because of GPU OOM.
I didnt find any code that reduce GPU memory in dolly or lm-evaluation-harness. And the model GPTNeoXforCasualLm should consume more memory than GPTNeoXRewardModel as I see the code of the output layer.
The text was updated successfully, but these errors were encountered:
Yes, I also noticed that our current trainer code / configurations don't work even for smaller models on single 80 GB GPUs. It would be great to get this analyzed/fixed.
@andreaskoepf
I will take a look into this issue and try to fix some of them (I think there may be multiple reasons for this). If you have any clue or suggestion, please let me know and I would appreciate.
Previously, I have trained a pythia-6.9b using code here: dolly
I can train with the below setting on 4xA100(80G) without GPU OOM:
with deepspeed config here
I can also evaluate the output model with lm-evaluation-harness on single gpu with a non-one batch size.
However, now I am using
model_training
to train a reward model.I can only run with the below setting on 8xA100(80G):
with deepspeed config
zero3_config_sft.config
.(As you can see, it is very alike with the one above)In addition, I can not eval the output model using
eval_rm.py
on single gpu(even with batch size 1) because of GPU OOM.I didnt find any code that reduce GPU memory in
dolly
orlm-evaluation-harness
. And the modelGPTNeoXforCasualLm
should consume more memory thanGPTNeoXRewardModel
as I see the code of the output layer.The text was updated successfully, but these errors were encountered: