Test readme commands #1311

rasbt · 2024-04-17T15:33:19Z

rasbt · 2024-04-24T21:27:32Z

I was asked to put together a CI script that runs the CLI commands as they appear in the readme, which is what this is. I tried to make it feasible by using the smallest models. Works fine locally on GPU machines.

But in the CI, which is running on a CPU, I noticed the following:

RuntimeError: Command 'litgpt pretrain --model_name pythia-14m --initial_checkpoint checkpoints/EleutherAI/pythia-14m --tokenizer_dir checkpoints/EleutherAI/pythia-14m --data TextFiles --data.train_data_path custom_texts --train.max_tokens 100 --out_dir out/custom_continue_pretrained' failed with exit status 1
E           Output:
E           {'data': {'batch_size': 1,
E                     'max_seq_length': -1,
E                     'num_workers': 4,
E                     'seed': 42,
E                     'tokenizer': None,
E                     'train_data_path': PosixPath('custom_texts'),
E                     'val_data_path': None},
E            'devices': 'auto',
E            'eval': {'interval': 1000, 'max_iters': 100, 'max_new_tokens': None},
E            'initial_checkpoint_dir': PosixPath('checkpoints/EleutherAI/pythia-14m'),
E            'logger_name': 'tensorboard',
E            'model_config': None,
E            'model_name': 'pythia-14m',
E            'out_dir': PosixPath('out/custom_continue_pretrained'),
E            'resume': False,
E            'seed': 42,
E            'tokenizer_dir': PosixPath('checkpoints/EleutherAI/pythia-14m'),
E            'train': {'beta1': 0.9,
E                      'beta2': 0.95,
E                      'epochs': None,
E                      'global_batch_size': 512,
E                      'learning_rate': 0.0004,
E                      'log_interval': 1,
E                      'lr_warmup_fraction': None,
E                      'lr_warmup_steps': 2000,
E                      'max_norm': 1.0,
E                      'max_seq_length': None,
E                      'max_steps': None,
E                      'max_tokens': 100,
E                      'micro_batch_size': 4,
E                      'min_lr': 4e-05,
E                      'save_interval': 1000,
E                      'tie_embeddings': False,
E                      'weight_decay': 0.1}}
E           Time to instantiate model: 0.07 seconds.
E           Total parameters: 14,067,712
E           
E           Error:
E           Using bfloat16 Automatic Mixed Precision (AMP)
E           Missing logger folder: out/custom_continue_pretrained/logs/tensorboard
E           Seed set to 42
E           Traceback (most recent call last):
E             File "/opt/hostedtoolcache/Python/3.10.14/x64/bin/litgpt", line 8, in <module>
E               sys.exit(main())
E             File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/litgpt/__main__.py", line 143, in main
E               fn(**kwargs)
E             File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/litgpt/pretrain.py", line 119, in setup
E               main(
E             File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/litgpt/pretrain.py", line 172, in main
E               optimizer = torch.optim.AdamW(
E             File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/optim/adamw.py", line 69, in __init__
E               raise RuntimeError("`fused=True` requires all the params to be floating point Tensors of "
E           RuntimeError: `fused=True` requires all the params to be floating point Tensors of supported devices: ['cuda', 'xpu', 'privateuseone'].

I can reproduce it in a CPU machine.

Is that intended? I thought we are only running bfloat16 if it's supported by the device. And I thought we were supporting CPUs. Is this a bug and should we fix this?

On a second note, how do we make running this on CI more feasible? I think we can skip this test on all machines but one if that's okay with you. What are your thoughts about this in general? @awaelchli @carmocca

rasbt · 2024-04-24T21:43:56Z

I saw we had the precision hardcoded in the pretraining script. I am making it consistent with the finetuning settings in #1353 which should address this issue.

carmocca · 2024-04-25T00:10:40Z

My personal opinion is to not do this in its current form.

As I understand, the original proposal is about making sure that the README's commands run.

This PR accomplishes that right now, however, it doesn't address the (very common) problem of somebody modifying the README and breaking it accidentally. This is also a problem in our tutorials of course.

I suggest that the README is parsed to extract the commands to run (by looking at code blocks). These commands could be run with mocks for the underlying functionality as we do for the config_hub tests.

You might then wonder, "what if the underlying functionality is broken" but then I would argue that unit-tests specific to that functionality should be created as they will be easier to write and verify. We already have a lot of those of course.

rasbt · 2024-04-25T01:30:53Z

I agree with the Readme parsing, and I think Andrei found a tool that might be doing that, but we were not supposed to do that since it would be over-engineered. So, I think we should keep the relatively simple approach I have here, but yeah, a weakness is that we have to remember updating it.

rasbt · 2024-04-25T14:34:19Z

Oh, I just remember another reason why this is a bit harder to automate by parsing the README is that the README uses more expensive models that wouldn't run on CPU. I am cutting a few corners here with Pythia and setting the number of max_steps and max_tokens to small numbers.

rasbt · 2024-04-25T19:33:28Z

If you don't mind, could we merge this @carmocca to end this Readme testing saga? It was something that was requested and has been on my list for quite a while.

rasbt · 2024-04-25T19:34:16Z

Or maybe let me exclude the windows tests because they seem to time out.

carmocca · 2024-04-25T21:15:57Z

Oh, I just remember another reason why this is a bit harder to automate by parsing the README is that the README uses more expensive models that wouldn't run on CPU. I am cutting a few corners here with Pythia and setting the number of max_steps and max_tokens to small numbers.

My point in my last message was that README testing should not run the actual chat / pretrain / evaluate ... functionality. But instead just make sure that the commands are correct and stop after parsing. For everything else we have tests in other files. Checking parsing is also what we do for the config hub.

I won't oppose this but I don't believe it's a good idea. You may merge if you think it's worth having.

test readme commands

7d6edab

rasbt requested review from awaelchli, carmocca and lantiga as code owners April 17, 2024 15:33

rasbt added 8 commits April 17, 2024 15:47

add finetuning

edcb52c

Merge branch 'main' into readme-tests

e40c51e

add pretraining

84c3be7

update

7a06e62

Merge branch 'main' into readme-tests

880c264

fix path issue

6dc935f

add serving tests

3dc9c59

Merge branch 'main' into readme-tests

b51c2dd

rasbt changed the title ~~[WIP] test readme commands~~ Test readme commands Apr 22, 2024

rasbt added 5 commits April 22, 2024 22:22

test

c24f172

Merge branch 'main' into readme-tests

52fbb5f

updates

ad936df

Merge branch 'main' into readme-tests

832129f

fixes

3cfbeac

rasbt added 3 commits April 25, 2024 08:44

Merge branch 'main' into readme-tests

631c133

increase timeout for cpu tests

4283611

increase timeout for cpu tests

f08ae19

rasbt added 4 commits April 25, 2024 15:16

make validation cheaper

20e22f2

accelerate pretrain

98e7557

fix path for windows

a712336

cleanuo

fa692c8

rasbt added 5 commits April 25, 2024 16:29

fix windows slash issue

5af2386

fix tests for windows

ca65add

udpate

5c369c7

fix tests

ebc4e7f

update

834eee9

rasbt added 2 commits April 25, 2024 19:35

remove windows and mac tests

c1ddf36

exclude azure

8a68f20

rasbt added 3 commits April 26, 2024 10:10

Merge branch 'main' into readme-tests

a5165a9

Merge branch 'main' into readme-tests

f0e1781

Merge branch 'main' into readme-tests

53fac34

rasbt merged commit f334378 into main May 3, 2024
9 checks passed

rasbt deleted the readme-tests branch May 3, 2024 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test readme commands #1311

Test readme commands #1311

rasbt commented Apr 17, 2024

rasbt commented Apr 24, 2024

rasbt commented Apr 24, 2024

carmocca commented Apr 25, 2024 •

edited

rasbt commented Apr 25, 2024

rasbt commented Apr 25, 2024

rasbt commented Apr 25, 2024

rasbt commented Apr 25, 2024

carmocca commented Apr 25, 2024

Test readme commands #1311

Test readme commands #1311

Conversation

rasbt commented Apr 17, 2024

rasbt commented Apr 24, 2024

rasbt commented Apr 24, 2024

carmocca commented Apr 25, 2024 • edited

rasbt commented Apr 25, 2024

rasbt commented Apr 25, 2024

rasbt commented Apr 25, 2024

rasbt commented Apr 25, 2024

carmocca commented Apr 25, 2024

carmocca commented Apr 25, 2024 •

edited