增量预训练超参数设置 #28

jamestch · 2023-04-14T08:34:39Z

我计划在20G左右的领域数据（约9B token）上做增量预训练
learning_rate
max_seq_length
total_steps
save_checkpoint_steps
……
等超参数设置有啥推荐吗？
训练中文LLaMA大规模语言模型中的如下：
deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.json
--pretrained_model_path models/llama-7b.bin
--dataset_path dataset.pt --spm_model_path $LLaMA_7B_FOLDER/tokenizer.model
--config_path models/llama/7b_config.json
--output_model_path models/output_model.bin
--world_size 8 --learning_rate 1e-4
--data_processor lm --total_steps 10000 --save_checkpoint_steps 2000 --batch_size 24

PL2584718785 · 2023-05-17T09:44:09Z

您好，请问下对于这个数据量，使用了几块GPU，总共多大显存+内存呢？请大佬指教，谢谢！

AI-Study-Han · 2023-06-06T09:31:47Z

这个10000的steps，batch_size 24都跑不完你的数据集吧

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

增量预训练超参数设置 #28

增量预训练超参数设置 #28

jamestch commented Apr 14, 2023 •

edited

PL2584718785 commented May 17, 2023

AI-Study-Han commented Jun 6, 2023

增量预训练超参数设置 #28

增量预训练超参数设置 #28

Comments

jamestch commented Apr 14, 2023 • edited

PL2584718785 commented May 17, 2023

AI-Study-Han commented Jun 6, 2023

jamestch commented Apr 14, 2023 •

edited