how to use 200k model to do SFT #37

guanghuixu · 2023-11-07T02:59:39Z

only change --max_seq_len 204800 in the finetune/scripts/run_sft_Yi_6b.sh?

The text was updated successfully, but these errors were encountered:

cArlIcon · 2023-11-07T03:16:06Z

200k=200000 in Yi model

guanghuixu · 2023-11-07T03:17:19Z

so just modify this position？

jiangchengSilent · 2023-11-07T08:32:38Z

change max_seq_len in script args, as well as max_position_embeddings in config.json in model file to 200k. But this will obviously need more GPU memory. A singe node with 8 cards may not be enough.
Consider using multi-node sft training by setting deepspeed -H /path-to-hostfile/hostfile main.py ........

dachengai · 2023-11-07T08:47:35Z

A singe node with 8 x A100 80G cards could sft 4k context ?

jiangchengSilent · 2023-11-07T08:53:17Z

A singe node with 8 x A100 80G cards could sft 4k context ?

By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py

dachengai · 2023-11-07T09:00:11Z

A singe node with 8 x A100 80G cards could sft 4k context ?

By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py

thanks

dachengai · 2023-11-07T09:01:54Z

A singe node with 8 x A100 80G cards could sft 4k context ?

By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py

6B or 34B ?

jiangchengSilent · 2023-11-07T09:02:43Z

A singe node with 8 x A100 80G cards could sft 4k context ?

By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py

6B or 34B ?

34b

cArlIcon assigned jiangchengSilent Nov 7, 2023

ZhaoFancy added question Further information is requested sft doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. labels Nov 7, 2023

ZhaoFancy closed this as completed Nov 24, 2023

Yimi81 added the doc-required Your PR changes impact docs and you will update later. label Mar 8, 2024

WSC741606 mentioned this issue Mar 25, 2024

希望能实现训练和推理的时候开启RoPE外推 modelscope/ms-swift#610

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to use 200k model to do SFT #37

how to use 200k model to do SFT #37

guanghuixu commented Nov 7, 2023 •

edited

Loading

cArlIcon commented Nov 7, 2023

guanghuixu commented Nov 7, 2023

jiangchengSilent commented Nov 7, 2023

dachengai commented Nov 7, 2023

jiangchengSilent commented Nov 7, 2023

dachengai commented Nov 7, 2023

dachengai commented Nov 7, 2023

jiangchengSilent commented Nov 7, 2023

how to use 200k model to do SFT #37

how to use 200k model to do SFT #37

Comments

guanghuixu commented Nov 7, 2023 • edited Loading

cArlIcon commented Nov 7, 2023

guanghuixu commented Nov 7, 2023

jiangchengSilent commented Nov 7, 2023

dachengai commented Nov 7, 2023

jiangchengSilent commented Nov 7, 2023

dachengai commented Nov 7, 2023

dachengai commented Nov 7, 2023

jiangchengSilent commented Nov 7, 2023

guanghuixu commented Nov 7, 2023 •

edited

Loading