Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New CLI #1437

Merged
merged 37 commits into from
May 31, 2024
Merged

New CLI #1437

merged 37 commits into from
May 31, 2024

Conversation

rasbt
Copy link
Collaborator

@rasbt rasbt commented May 24, 2024

To solve some of the most common user pain points, we want to rethink the LitGPT CLI to make it more intuitive. After the change, the CLI would work as follows

# ligpt [action] [model]
litgpt  download  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  chat      meta-llama/Meta-Llama-3-8B-Instruct
litgpt  finetune  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  pretrain  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  serve     meta-llama/Meta-Llama-3-8B-Instruct

and for advanced users, there will be additional finetuning options:

# ligpt [action]             [model]

litgpt  finetune             meta-llama/Meta-Llama-3-8B-Instruct
litgpt  finetune_full        meta-llama/Meta-Llama-3-8B-Instruct
litgpt  finetune_lora        meta-llama/Meta-Llama-3-8B-Instruct
litgpt  finetune_adapter     meta-llama/Meta-Llama-3-8B-Instruct
litgpt  finetune_adapter_v2  meta-llama/Meta-Llama-3-8B-Instruct

Todos

  • Implement changes
  • Update all docs
  • Deprecate "old way"

Approach

This will be tackled in 2 steps

  • introduce positional args but keep existing usage
  • remove checkpoint_dir argument and introduce root_dir argument with root_dir="checkpoints" as the default value

E.g., if someone uses

litgpt  finetune  meta-llama/Meta-Llama-3-8B-Instruct --root_dir="checkpoints"

It will open the checkpoints from checkpoints/meta-llama/Meta-Llama-3-8B-Instruct

To use a custom path from a totally different location, e.g., tmp/my_model, one can do

litgpt  finetune  tmp/my_model --root_dir="."

@rasbt rasbt added enhancement New feature or request breaking change labels May 24, 2024
@rasbt rasbt marked this pull request as draft May 24, 2024 16:01
@rasbt
Copy link
Collaborator Author

rasbt commented May 24, 2024

Sorry to both you with this @awaelchli but I've been banging my head against this for 2 hours and can't figure it out. I just want to get litgpt finetune_lora to work similar to litgpt finetune (and litgpt finetune lora) before. It does work fine, but at the the end, it appends the Unrecognized arguments error:

litgpt finetune_lora --checkpoint_dir checkpoints/EleutherAI/pythia-14m --train.max_steps 1 --train.max_seq_len 512  --optimizer SGD
{'checkpoint_dir': PosixPath('checkpoints/EleutherAI/pythia-14m'),
 'data': None,
 'devices': 1,
 'eval': EvalArgs(interval=100,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False),
 'logger_name': 'csv',
 'lora_alpha': 16,
 'lora_dropout': 0.05,
 'lora_head': False,
 'lora_key': False,
 'lora_mlp': False,
 'lora_projection': False,
 'lora_query': True,
 'lora_r': 8,
 'lora_value': True,
 'optimizer': {'class_path': 'torch.optim.SGD',
               'init_args': {'dampening': 0.0,
                             'differentiable': False,
                             'foreach': None,
                             'lr': 0.001,
                             'maximize': False,
                             'momentum': 0.0,
                             'nesterov': False,
                             'weight_decay': 0.01}},
 'out_dir': PosixPath('out/finetune/lora'),
 'precision': None,
 'quantize': None,
 'seed': 1337,
 'train': TrainArgs(save_interval=1000,
                    log_interval=1,
                    global_batch_size=16,
                    micro_batch_size=1,
                    lr_warmup_steps=100,
                    lr_warmup_fraction=None,
                    epochs=5,
                    max_tokens=None,
                    max_steps=1,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
Using bfloat16 Automatic Mixed Precision (AMP)
Seed set to 1337
Number of trainable parameters: 24,576
Number of non-trainable parameters: 14,067,712
The longest sequence length in the train data is 512, the model's maximum sequence length is 512 and context length is 512
Validating ...
Epoch 1 | iter 1 step 0 | loss train: 6.456, val: n/a | iter time: 218.23 ms
Epoch 1 | iter 2 step 0 | loss train: 6.077, val: n/a | iter time: 42.61 ms
Epoch 1 | iter 3 step 0 | loss train: 6.065, val: n/a | iter time: 23.38 ms
Epoch 1 | iter 4 step 0 | loss train: 5.940, val: n/a | iter time: 52.27 ms
Epoch 1 | iter 5 step 0 | loss train: 5.975, val: n/a | iter time: 21.69 ms
Epoch 1 | iter 6 step 0 | loss train: 6.002, val: n/a | iter time: 21.96 ms
Epoch 1 | iter 7 step 0 | loss train: 5.824, val: n/a | iter time: 16.52 ms
Epoch 1 | iter 8 step 0 | loss train: 5.840, val: n/a | iter time: 22.35 ms
Epoch 1 | iter 9 step 0 | loss train: 5.814, val: n/a | iter time: 22.57 ms
Epoch 1 | iter 10 step 0 | loss train: 5.789, val: n/a | iter time: 22.63 ms
Epoch 1 | iter 11 step 0 | loss train: 5.842, val: n/a | iter time: 22.41 ms
Epoch 1 | iter 12 step 0 | loss train: 5.873, val: n/a | iter time: 22.48 ms
Epoch 1 | iter 13 step 0 | loss train: 5.893, val: n/a | iter time: 22.48 ms
Epoch 1 | iter 14 step 0 | loss train: 5.868, val: n/a | iter time: 23.18 ms
Epoch 1 | iter 15 step 0 | loss train: 5.879, val: n/a | iter time: 22.29 ms
Epoch 1 | iter 16 step 1 | loss train: 5.929, val: n/a | iter time: 31.64 ms (step)
Training time: 20.27s
Memory used: 0.25 GB
Validating ...
Final evaluation | val loss: 5.967 | val ppl: 390.511
Saving LoRA weights to '/teamspace/studios/this_studio/out/finetune/lora/final/lit_model.pth.lora'
usage: litgpt [-h] [--config CONFIG] [--print_config[=flags]] [--checkpoint_dir CHECKPOINT_DIR] [--out_dir OUT_DIR] [--precision PRECISION] [--quantize QUANTIZE] [--devices DEVICES]
              [--lora_r LORA_R] [--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT] [--lora_query {true,false}] [--lora_key {true,false}] [--lora_value {true,false}]
              [--lora_projection {true,false}] [--lora_mlp {true,false}] [--lora_head {true,false}] [--data.help CLASS_PATH_OR_NAME] [--data DATA] [--train CONFIG]
              [--train.save_interval SAVE_INTERVAL] [--train.log_interval LOG_INTERVAL] [--train.global_batch_size GLOBAL_BATCH_SIZE] [--train.micro_batch_size MICRO_BATCH_SIZE]
              [--train.lr_warmup_steps LR_WARMUP_STEPS] [--train.lr_warmup_fraction LR_WARMUP_FRACTION] [--train.epochs EPOCHS] [--train.max_tokens MAX_TOKENS]
              [--train.max_steps MAX_STEPS] [--train.max_seq_length MAX_SEQ_LENGTH] [--train.tie_embeddings {true,false,null}] [--train.max_norm MAX_NORM] [--train.min_lr MIN_LR]
              [--eval CONFIG] [--eval.interval INTERVAL] [--eval.max_new_tokens MAX_NEW_TOKENS] [--eval.max_iters MAX_ITERS] [--eval.initial_validation {true,false}]
              [--optimizer OPTIMIZER] [--logger_name {wandb,tensorboard,csv}] [--seed SEED]
error: Unrecognized arguments: finetune_lora

I am not sure where it comes from. It's bizarre to me. I checked the kwargs and there doesn't seem to be such an argument. I must be doing something wrong with jsonargparse, but I can't figure out where the issue lies as litgpt finetune uses exactly the same finetune_lora_fn function but doesn't have this issue.

@awaelchli
Copy link
Member

At the end of training, we save the hyperparameters to the checkpoint dir for reproducibility. This is done in this function:

litgpt/litgpt/utils.py

Lines 436 to 458 in 221b7ef

def save_hyperparameters(function: callable, checkpoint_dir: Path) -> None:
"""Captures the CLI parameters passed to `function` without running `function` and saves them to the checkpoint."""
from jsonargparse import capture_parser
# TODO: Make this more robust
# This hack strips away the subcommands from the top-level CLI
# to parse the file as if it was called as a script
known_commands = [
("finetune", "full"),
("finetune", "lora"),
("finetune", "adapter"),
("finetune", "adapter_v2"),
("finetune",),
("pretrain",),
]
for known_command in known_commands:
unwanted = slice(1, 1 + len(known_command))
if tuple(sys.argv[unwanted]) == known_command:
sys.argv[unwanted] = []
parser = capture_parser(lambda: CLI(function))
config = parser.parse_args()
parser.save(config, checkpoint_dir / "hyperparameters.yaml", overwrite=True)

If the name of the commands change, the code there needs to be adapted a bit. I can help with this.

@rasbt
Copy link
Collaborator Author

rasbt commented May 28, 2024

Oh thanks, this might be exactly related to the issue I was encountering! Thanks a lot, I think I should be able to fix it!

@rasbt
Copy link
Collaborator Author

rasbt commented May 28, 2024

Thanks, @awaelchli ! It actually didn't occur to me to check this file! I think I got this know, thanks!

litgpt/scripts/convert_hf_checkpoint.py Outdated Show resolved Hide resolved
litgpt/__main__.py Show resolved Hide resolved
@rasbt rasbt marked this pull request as ready for review May 31, 2024 13:53
litgpt/config.py Outdated Show resolved Hide resolved
tests/Untitled-1.ipynb Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
tutorials/evaluation.md Outdated Show resolved Hide resolved
tests/test_pretrain.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
rasbt and others added 6 commits May 31, 2024 09:10
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
@rasbt rasbt merged commit 3fa17fb into main May 31, 2024
9 checks passed
@rasbt rasbt deleted the new-cli branch May 31, 2024 16:28
@awaelchli awaelchli mentioned this pull request Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants