Skip to content
/ BabyLlama Public

Train and run a small Llama 2 model from scratch on the TinyStories dataset.

License

Notifications You must be signed in to change notification settings

EN10/BabyLlama

Repository files navigation

Baby Llama

Train and run a small Llama 2 model from scratch on the TinyStories dataset.

Baby Llama Code Examples:

Iters vs Val Loss Learning Words and Grammar Visualised

Single Char Tokens    Why <0xFF> aka byte_fallback?
<0x00> - <0xFF> Hex Code chars are for UTF-8 then 102 Chars are found in text.

!cd llama2.c && python tinystories.py train_vocab --vocab_size=256
trainer_interface.cc(558) LOG(INFO) Alphabet size=102
Vocabulary size is smaller than required_chars. 256 vs 361.
# runtime ~ 2 mins
!cd llama2.c && python train.py --vocab_source=custom --vocab_size=4096 --compile=False \
  --dim=128 --n_layers=5 --n_heads=8 --n_kv_heads=4 --batch_size=32 \
  --always_save_checkpoint=True --eval_interval=500 --max_iters=1001 --init_from='resume'
!cd ./llama2.c/out && wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin
!cd ./llama2.c && ./run /content/llama2.c/out/stories110M.bin -t 0.8 -n 256 -i "Once upon a time "

Ref:

About

Train and run a small Llama 2 model from scratch on the TinyStories dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published