-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to use 200k model to do SFT #37
Comments
200k=200000 in Yi model |
so just modify this position? |
change max_seq_len in script args, as well as max_position_embeddings in config.json in model file to 200k. But this will obviously need more GPU memory. A singe node with 8 cards may not be enough. |
A singe node with 8 x A100 80G cards could sft 4k context ? |
By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py |
thanks |
6B or 34B ? |
34b |
only change
--max_seq_len 204800
in the finetune/scripts/run_sft_Yi_6b.sh?The text was updated successfully, but these errors were encountered: