Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

增量预训练超参数设置 #28

Open
jamestch opened this issue Apr 14, 2023 · 2 comments
Open

增量预训练超参数设置 #28

jamestch opened this issue Apr 14, 2023 · 2 comments

Comments

@jamestch
Copy link

jamestch commented Apr 14, 2023

我计划在20G左右的领域数据(约9B token)上做增量预训练
learning_rate
max_seq_length
total_steps
save_checkpoint_steps
……
等超参数设置有啥推荐吗?
训练中文LLaMA大规模语言模型中的如下:
deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json
--pretrained_model_path models/llama-7b.bin
--dataset_path dataset.pt --spm_model_path $LLaMA_7B_FOLDER/tokenizer.model
--config_path models/llama/7b_config.json
--output_model_path models/output_model.bin
--world_size 8 --learning_rate 1e-4
--data_processor lm --total_steps 10000 --save_checkpoint_steps 2000 --batch_size 24

@PL2584718785
Copy link

您好,请问下对于这个数据量,使用了几块GPU,总共多大显存+内存呢? 请大佬指教,谢谢!

@AI-Study-Han
Copy link

这个10000的steps,batch_size 24都跑不完你的数据集吧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants