-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New CLI #1437
Conversation
Sorry to both you with this @awaelchli but I've been banging my head against this for 2 hours and can't figure it out. I just want to get litgpt finetune_lora --checkpoint_dir checkpoints/EleutherAI/pythia-14m --train.max_steps 1 --train.max_seq_len 512 --optimizer SGD
{'checkpoint_dir': PosixPath('checkpoints/EleutherAI/pythia-14m'),
'data': None,
'devices': 1,
'eval': EvalArgs(interval=100,
max_new_tokens=100,
max_iters=100,
initial_validation=False),
'logger_name': 'csv',
'lora_alpha': 16,
'lora_dropout': 0.05,
'lora_head': False,
'lora_key': False,
'lora_mlp': False,
'lora_projection': False,
'lora_query': True,
'lora_r': 8,
'lora_value': True,
'optimizer': {'class_path': 'torch.optim.SGD',
'init_args': {'dampening': 0.0,
'differentiable': False,
'foreach': None,
'lr': 0.001,
'maximize': False,
'momentum': 0.0,
'nesterov': False,
'weight_decay': 0.01}},
'out_dir': PosixPath('out/finetune/lora'),
'precision': None,
'quantize': None,
'seed': 1337,
'train': TrainArgs(save_interval=1000,
log_interval=1,
global_batch_size=16,
micro_batch_size=1,
lr_warmup_steps=100,
lr_warmup_fraction=None,
epochs=5,
max_tokens=None,
max_steps=1,
max_seq_length=512,
tie_embeddings=None,
max_norm=None,
min_lr=6e-05)}
Using bfloat16 Automatic Mixed Precision (AMP)
Seed set to 1337
Number of trainable parameters: 24,576
Number of non-trainable parameters: 14,067,712
The longest sequence length in the train data is 512, the model's maximum sequence length is 512 and context length is 512
Validating ...
Epoch 1 | iter 1 step 0 | loss train: 6.456, val: n/a | iter time: 218.23 ms
Epoch 1 | iter 2 step 0 | loss train: 6.077, val: n/a | iter time: 42.61 ms
Epoch 1 | iter 3 step 0 | loss train: 6.065, val: n/a | iter time: 23.38 ms
Epoch 1 | iter 4 step 0 | loss train: 5.940, val: n/a | iter time: 52.27 ms
Epoch 1 | iter 5 step 0 | loss train: 5.975, val: n/a | iter time: 21.69 ms
Epoch 1 | iter 6 step 0 | loss train: 6.002, val: n/a | iter time: 21.96 ms
Epoch 1 | iter 7 step 0 | loss train: 5.824, val: n/a | iter time: 16.52 ms
Epoch 1 | iter 8 step 0 | loss train: 5.840, val: n/a | iter time: 22.35 ms
Epoch 1 | iter 9 step 0 | loss train: 5.814, val: n/a | iter time: 22.57 ms
Epoch 1 | iter 10 step 0 | loss train: 5.789, val: n/a | iter time: 22.63 ms
Epoch 1 | iter 11 step 0 | loss train: 5.842, val: n/a | iter time: 22.41 ms
Epoch 1 | iter 12 step 0 | loss train: 5.873, val: n/a | iter time: 22.48 ms
Epoch 1 | iter 13 step 0 | loss train: 5.893, val: n/a | iter time: 22.48 ms
Epoch 1 | iter 14 step 0 | loss train: 5.868, val: n/a | iter time: 23.18 ms
Epoch 1 | iter 15 step 0 | loss train: 5.879, val: n/a | iter time: 22.29 ms
Epoch 1 | iter 16 step 1 | loss train: 5.929, val: n/a | iter time: 31.64 ms (step)
Training time: 20.27s
Memory used: 0.25 GB
Validating ...
Final evaluation | val loss: 5.967 | val ppl: 390.511
Saving LoRA weights to '/teamspace/studios/this_studio/out/finetune/lora/final/lit_model.pth.lora'
usage: litgpt [-h] [--config CONFIG] [--print_config[=flags]] [--checkpoint_dir CHECKPOINT_DIR] [--out_dir OUT_DIR] [--precision PRECISION] [--quantize QUANTIZE] [--devices DEVICES]
[--lora_r LORA_R] [--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT] [--lora_query {true,false}] [--lora_key {true,false}] [--lora_value {true,false}]
[--lora_projection {true,false}] [--lora_mlp {true,false}] [--lora_head {true,false}] [--data.help CLASS_PATH_OR_NAME] [--data DATA] [--train CONFIG]
[--train.save_interval SAVE_INTERVAL] [--train.log_interval LOG_INTERVAL] [--train.global_batch_size GLOBAL_BATCH_SIZE] [--train.micro_batch_size MICRO_BATCH_SIZE]
[--train.lr_warmup_steps LR_WARMUP_STEPS] [--train.lr_warmup_fraction LR_WARMUP_FRACTION] [--train.epochs EPOCHS] [--train.max_tokens MAX_TOKENS]
[--train.max_steps MAX_STEPS] [--train.max_seq_length MAX_SEQ_LENGTH] [--train.tie_embeddings {true,false,null}] [--train.max_norm MAX_NORM] [--train.min_lr MIN_LR]
[--eval CONFIG] [--eval.interval INTERVAL] [--eval.max_new_tokens MAX_NEW_TOKENS] [--eval.max_iters MAX_ITERS] [--eval.initial_validation {true,false}]
[--optimizer OPTIMIZER] [--logger_name {wandb,tensorboard,csv}] [--seed SEED]
error: Unrecognized arguments: finetune_lora I am not sure where it comes from. It's bizarre to me. I checked the |
At the end of training, we save the hyperparameters to the checkpoint dir for reproducibility. This is done in this function: Lines 436 to 458 in 221b7ef
If the name of the commands change, the code there needs to be adapted a bit. I can help with this. |
Oh thanks, this might be exactly related to the issue I was encountering! Thanks a lot, I think I should be able to fix it! |
Thanks, @awaelchli ! It actually didn't occur to me to check this file! I think I got this know, thanks! |
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
To solve some of the most common user pain points, we want to rethink the LitGPT CLI to make it more intuitive. After the change, the CLI would work as follows
# ligpt [action] [model] litgpt download meta-llama/Meta-Llama-3-8B-Instruct litgpt chat meta-llama/Meta-Llama-3-8B-Instruct litgpt finetune meta-llama/Meta-Llama-3-8B-Instruct litgpt pretrain meta-llama/Meta-Llama-3-8B-Instruct litgpt serve meta-llama/Meta-Llama-3-8B-Instruct
and for advanced users, there will be additional finetuning options:
# ligpt [action] [model] litgpt finetune meta-llama/Meta-Llama-3-8B-Instruct litgpt finetune_full meta-llama/Meta-Llama-3-8B-Instruct litgpt finetune_lora meta-llama/Meta-Llama-3-8B-Instruct litgpt finetune_adapter meta-llama/Meta-Llama-3-8B-Instruct litgpt finetune_adapter_v2 meta-llama/Meta-Llama-3-8B-Instruct
Todos
Approach
This will be tackled in 2 steps
checkpoint_dir
argument and introduceroot_dir
argument withroot_dir="checkpoints"
as the default valueE.g., if someone uses
litgpt finetune meta-llama/Meta-Llama-3-8B-Instruct --root_dir="checkpoints"
It will open the checkpoints from
checkpoints/meta-llama/Meta-Llama-3-8B-Instruct
To use a custom path from a totally different location, e.g.,
tmp/my_model
, one can dolitgpt finetune tmp/my_model --root_dir="."