Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use 200k model to do SFT #37

Closed
guanghuixu opened this issue Nov 7, 2023 · 8 comments
Closed

how to use 200k model to do SFT #37

guanghuixu opened this issue Nov 7, 2023 · 8 comments
Assignees
Labels
doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. doc-required Your PR changes impact docs and you will update later. question Further information is requested sft

Comments

@guanghuixu
Copy link

guanghuixu commented Nov 7, 2023

only change --max_seq_len 204800 in the finetune/scripts/run_sft_Yi_6b.sh?

@cArlIcon
Copy link
Contributor

cArlIcon commented Nov 7, 2023

200k=200000 in Yi model

@guanghuixu
Copy link
Author

so just modify this position?

@ZhaoFancy ZhaoFancy added question Further information is requested sft doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. labels Nov 7, 2023
@jiangchengSilent
Copy link
Contributor

change max_seq_len in script args, as well as max_position_embeddings in config.json in model file to 200k. But this will obviously need more GPU memory. A singe node with 8 cards may not be enough.
Consider using multi-node sft training by setting deepspeed -H /path-to-hostfile/hostfile main.py ........

@dachengai
Copy link

A singe node with 8 x A100 80G cards could sft 4k context ?

@jiangchengSilent
Copy link
Contributor

A singe node with 8 x A100 80G cards could sft 4k context ?

By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py

@dachengai
Copy link

A singe node with 8 x A100 80G cards could sft 4k context ?

By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py

thanks

@dachengai
Copy link

A singe node with 8 x A100 80G cards could sft 4k context ?

By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py

6B or 34B ?

@jiangchengSilent
Copy link
Contributor

A singe node with 8 x A100 80G cards could sft 4k context ?

By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py

6B or 34B ?

34b

@Yimi81 Yimi81 added the doc-required Your PR changes impact docs and you will update later. label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. doc-required Your PR changes impact docs and you will update later. question Further information is requested sft
Projects
None yet
Development

No branches or pull requests

6 participants