**LLM Workshop 2024 by Sebastian Raschka**

<br>
<br>
<br>
<br>

# 5) Loading pretrained weights (part 2; using LitGPT)

- Now, we are loading the weights using an open-source library called LitGPT
- LitGPT is fundamentally similar to the LLM code we implemented previously, but it is much more sophisticated and supports more than 20 different LLMs (Mistral, Gemma, Llama, Phi, and more)

# ⚡ LitGPT

**20+ high-performance LLMs with recipes to pretrain, finetune, deploy at scale.**

<pre>
✅ From scratch implementations     ✅ No abstractions    ✅ Beginner friendly   
✅ Flash attention                  ✅ FSDP               ✅ LoRA, QLoRA, Adapter
✅ Reduce GPU memory (fp4/8/16/32)  ✅ 1-1000+ GPUs/TPUs  ✅ 20+ LLMs            
</pre>

## Basic usage:

```
# ligpt [action] [model]
litgpt  download  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  chat      meta-llama/Meta-Llama-3-8B-Instruct
litgpt  evaluate  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  finetune  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  pretrain  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  serve     meta-llama/Meta-Llama-3-8B-Instruct
```


- You can learn more about LitGPT in the [corresponding GitHub repository](https://github.com/Lightning-AI/litgpt), that contains many tutorials, use cases, and examples


In [None]:
# pip install litgpt

In [1]:
# LitGPT helps use the GPT to use more sophisticated LLMs other than just gpt2 like phi-3, minstral, llama, etc. 
# It is not just a wrapper around the huggingface transformers library but an independent implementation optimized for performance & scalability
# Lit-GPT can work with Hugging Face models by loading pre-trained weights
from importlib.metadata import version

pkgs = ["litgpt", 
        "torch",
       ]
for p in pkgs:
    print(f"{p} version: {version(p)}")

litgpt version: 0.5.7
torch version: 2.5.1


- First, let's see what LLMs are supported

In [None]:
!litgpt download list
# litgpt download list will show the available models implemented in LitGPT for download

uvloop is not installed. Falling back to the default asyncio event loop. Please install uvloop for better performance using `pip install uvloop`.
uvloop is not installed. Falling back to the default asyncio event loop.
Please specify --repo_id <repo_id>. Available values:
allenai/OLMo-1B-hf
allenai/OLMo-7B-hf
allenai/OLMo-7B-Instruct-hf
BSC-LT/salamandra-2b
BSC-LT/salamandra-2b-instruct
BSC-LT/salamandra-7b
BSC-LT/salamandra-7b-instruct
codellama/CodeLlama-13b-hf
codellama/CodeLlama-13b-Instruct-hf
codellama/CodeLlama-13b-Python-hf
codellama/CodeLlama-34b-hf
codellama/CodeLlama-34b-Instruct-hf
codellama/CodeLlama-34b-Python-hf
codellama/CodeLlama-70b-hf
codellama/CodeLlama-70b-Instruct-hf
codellama/CodeLlama-70b-Python-hf
codellama/CodeLlama-7b-hf
codellama/CodeLlama-7b-Instruct-hf
codellama/CodeLlama-7b-Python-hf
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
EleutherAI/pythia-1.4b
EleutherAI/pythia-1.4b-deduped
EleutherAI/pythia-12b
EleutherAI/pyth

- We can then download an LLM via the following command

In [6]:
# !litgpt download microsoft/phi-2
!litgpt download microsoft/phi-1_5

uvloop is not installed. Falling back to the default asyncio event loop. Please install uvloop for better performance using `pip install uvloop`.


Initializing  0%|          | 00:00<?, ?it/s
Loading weights: model.safetensors:   0%|          | 00:00<?, ?it/s
Loading weights: model.safetensors:   0%|          | 00:00<00:45,  2.20it/s
Loading weights: model.safetensors:   1%|          | 00:01<05:25,  3.27s/it
Loading weights: model.safetensors:   1%|          | 00:03<06:36,  4.00s/it
Loading weights: model.safetensors:   3%|▎         | 00:03<01:37,  1.00it/s
Loading weights: model.safetensors:   3%|▎         | 00:03<01:27,  1.11it/s
Loading weights: model.safetensors:   5%|▍         | 00:04<00:46,  2.03it/s
Loading weights: model.safetensors:   6%|▌         | 00:04<00:41,  2.25it/s
Loading weights: model.safetensors:   7%|▋         | 00:04<00:25,  3.59it/s
Loading weights: model.safetensors:   8%|▊         | 00:04<00:24,  3.68it/s
Loading weights: model.safetensors:  10%|▉         | 00:05<00:38,  2.34it/s
Loading weights: model.safetensors:  10%|█         | 00:06<00:45,  1.99it/s
Loading weights: model.safetensors:  12%|█▏        


uvloop is not installed. Falling back to the default asyncio event loop.
Setting HF_HUB_ENABLE_HF_TRANSFER=1
Converting checkpoint files to LitGPT format.
{'checkpoint_dir': WindowsPath('checkpoints/microsoft/phi-1_5'),
 'debug_mode': False,
 'dtype': None,
 'model_name': None}
Saving converted checkpoint to checkpoints\microsoft\phi-1_5


- And there's also a Python API to use the model

In [1]:
from litgpt import LLM
# if you download a new model then first do: del llm
llm = LLM.load("microsoft/phi-1_5")
# lit_model.pth is just a LitGPT compatible weight file for the phi-1_5 model

In [8]:
print(llm.generate("What do Llamas eat?"))

 Llamas are herbivores and primarily feed on plants, grasses, and salt cacti. They also consume some insects. Their digestive systems are adapted to process plant material, allowing them to extract nutrients efficiently.

Exercise 3


In [11]:
# Generating text with one token at a time instead of generating the text all at once
# This is useful for streaming the output to the console
result = llm.generate("What do Llamas eat?", stream=True, max_new_tokens=100)
for e in result:
    print(e, end="", flush=True)

 Llamas are herbivores that graze on grass, leaves, and other plants.

(2). Llamas want to cross a dusty path to eat grass, but they have dark fur that blends in with the dunes. How can they avoid getting dirty?

Answer: Llamas can walk towards the shade of the camels or moss-covered rocks to avoid the glare of the sun that causes the dust to reflect.

(3). Llamas

<br>
<br>
<br>
<br>

# Exercise 2: Download an LLM

- Download and try out an LLM of your own choice (recommendation: 7B parameters or smaller)
- We will finetune the LLM in the next notebook
- You can also try out the `litgpt chat` command from the terminal