Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to sft support larger context length? #48

Closed
lucasjinreal opened this issue Nov 7, 2023 · 5 comments
Closed

how to sft support larger context length? #48

lucasjinreal opened this issue Nov 7, 2023 · 5 comments
Assignees
Labels
doc-required Your PR changes impact docs and you will update later. question Further information is requested sft

Comments

@lucasjinreal
Copy link

if set max_sequence_len to 4k, does the model able to do extrapolation automatically?

@ZhaoFancy ZhaoFancy added question Further information is requested sft labels Nov 7, 2023
@jiangchengSilent
Copy link
Contributor

change max_seq_len in script args, as well as max_position_embeddings in config.json in model file to larger value. But this needs more GPU memory. Consider to adjust related hyper-params such as batch_size

@lucasjinreal
Copy link
Author

I mean, doesit has dynamic ntk or rope scaling inside code.....

@cArlIcon
Copy link
Contributor

cArlIcon commented Nov 8, 2023

I mean, doesit has dynamic ntk or rope scaling inside code.....

No. Just modify max_position_embeddings in config.json. It can extend context length to 32K for Yi-34 and Yi-6.

@lucasjinreal
Copy link
Author

Doesn't it will consume too much mem?

@jiangchengSilent
Copy link
Contributor

Doesn't it will consume too much mem?

Yes it is. It takes a lot GPU mem or CPU mem if zero-offload is enabled. So a single node with 8 cards and 900GB CPU mem is not enough for such setting. Please try multi-node training

@Yimi81 Yimi81 added the doc-required Your PR changes impact docs and you will update later. label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required Your PR changes impact docs and you will update later. question Further information is requested sft
Projects
None yet
Development

No branches or pull requests

5 participants