## How to use litGPT for end to end LLM Operations
Use different libraries for LLM operation can be overwhelmed. Your objective is to use one library for different LLM operation, this article is you.

LitGPT is an handy open-source library designed to simplify the process of training, fine-tuning, and deploying large language models. It provides tools to download, prepare, and interact with models through a command-line interface, making it easier to incorporate state-of-the-art NLP capabilities into various projects. It is licenced as Apache 2.0.

LitGPT was developed by the Lightning AI team, whose mission is to democratize and streamline the process of building and deploying advanced machine learning models.

litGPT can be used as a standolone on your GPU. However it is tightly integrated to Lightning Studio.

The library has Python API and CLI. We will explore CLI in this article:

The CLI functionalities/actions include

* Downloads large language models  with `litgpt dowmload <<LLM>>`

* Finetunes models on custom datasets, use
   ```bash
   litgpt finetune <<LLM>>  \
   --data JSON \
   --data.json_path custom_dataset.json \ # Your custom dataset
   --data.val_split_fraction 0.1 \  # Split the dataset into training set and validation data with 9:1 ratio
   --out_dir out/custom-model  # Location custom model would be stored. It creates the directory if it does not exists
    ```
 To explore the cli option, use `litgpt finetune --help`

* Evaluate the model with  `litgpt evaluate out/custom-model --task <<TASK>>`

* Chat with the model with `litgpt chat out/custom-model`

* Generate using the model `litgpt generate out/custom-model --prompt ...`

* Deploy the model with `litgpt serve out/custom-model`

Optional
* Pretrain your own model with large dataset using `litgpt pretrain`

### 1. Install the library

In [None]:
!pip install 'litgpt[all]'

### 2. Download Model

In [10]:
!litgpt download meta-llama/Llama-3.2-1B-Instruct


Fetching 5 files:   0%|                                   | 0/5 [00:00<?, ?it/s]
tokenizer.json:   0%|                               | 0.00/9.09M [00:00<?, ?B/s][A

model.safetensors:   0%|                            | 0.00/2.47G [00:00<?, ?B/s][A[A


tokenizer_config.json:   0%|                        | 0.00/54.5k [00:00<?, ?B/s][A[A[A



config.json: 100%|█████████████████████████████| 877/877 [00:00<00:00, 5.74MB/s][A[A[A[A
Fetching 5 files:  20%|█████▍                     | 1/5 [00:01<00:05,  1.43s/it]



generation_config.json: 100%|██████████████████| 189/189 [00:00<00:00, 1.56MB/s][A[A[A[A
tokenizer_config.json: 100%|███████████████| 54.5k/54.5k [00:00<00:00, 1.22MB/s]


model.safetensors:   0%|                   | 10.5M/2.47G [00:02<09:50, 4.17MB/s][A[A
tokenizer.json: 100%|██████████████████████| 9.09M/9.09M [00:03<00:00, 2.50MB/s][A


model.safetensors:   1%|▏                  | 21.0M/2.47G [00:04<08:24, 4.86MB/s][A[A

model.safetensors:   1%|▏             

### 3. Download Dataset
We will download finance_alpaca data and take only 100 records for the fine tune

In [5]:
!curl -L https://huggingface.co/datasets/ksaw008/finance_alpaca/resolve/main/finance_alpaca.json -o my_custom_dataset.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1186  100  1186    0     0   1691      0 --:--:-- --:--:-- --:--:--  1689
100 21.1M  100 21.1M    0     0  4752k      0  0:00:04  0:00:04 --:--:-- 6547k 0     0  4868k      0  0:00:04  0:00:04 --:--:-- 6555k


We need 100 samples for fine tuning to test how litGPT finetune work. As a result, we take first 100 records of finance_alpaca dataset.

In [18]:
import json

with open('my_custom_dataset.json') as f:
    data = json.load(f)

sample_data = data[0:100]


with open('my_custom_dataset_small.json', 'w') as f:
    json.dump(sample_data, f)

Check the begin and end of the sample data to ensure it is in `JSON` format.

In [21]:
!head -n 10 my_custom_dataset_small.json

[
    {
        "instruction": "How do dividend policies impact a company's financial performance?",
        "input": "",
        "output": "Dividend policies of a company can significantly impact its financial performance in several ways. Here are the steps outlining how this happens:\n\n1. Retained Earnings: When a company pays dividends, it reduces the amount of retained earnings it has. Retained earnings are a source of internal finance that a company can use to reinvest in its business or pay off its liabilities. Therefore, a high dividend payout can limit a company's financial flexibility and growth potential.\n\n2. Signal to Investors: Dividend policies can also send signals to the market about a company's future prospects. A stable or increasing dividend payout can be seen as a sign of a company's strong financial health and future profitability, which can boost investor confidence and potentially increase the company's share price. Conversely, a reduction or omission of divide

In [22]:
!tail -n 10 my_custom_dataset_small.json

        "instruction": "How does operating profit margin indicate a company's operational efficiency and profitability?",
        "input": "",
        "output": "Operating profit margin is a financial metric that indicates a company's operational efficiency and profitability by measuring the percentage of revenue that remains after deducting operating expenses. It is calculated by dividing operating profit by revenue and multiplying the result by 100.\n\nTo understand how operating profit margin indicates a company's operational efficiency and profitability, we can follow these steps:\n\nStep 1: Understand the components of operating profit: Operating profit is the profit a company generates from its core operations before interest and taxes. It includes revenue from the company's primary activities and subtracts all operating expenses, such as cost of goods sold, selling and administrative expenses, and research and development costs.\n\nStep 2: Calculate operating profit margin: To c

### 4. Finetune the model on a custom dataset

Fine-tuning a large language model involves taking a pre-trained model and training it further on a specialized dataset. This process adjusts the model's parameters to better handle tasks or domains not covered by the original training, improving its performance in more specific or niche applications.

LitGPT currently supports the following finetuning methods:

* `litgpt finetune_full`

* `litgpt finetune_lora`

* `litgpt finetune_adapter`

* `litgpt finetune_adapter_v2`

We only going to use `litgpt finetune_lora`. What could not confirm is that if `litgpt finetune` is the same as `litgpt finetune_lora`.

- LoRA uses low-rank adapters to fine-tune only a small set of new parameters, keeping the main model weights frozen. This approach greatly reduces memory usage compared to full fine-tuning.

- QLoRA goes a step further by quantizing the model weights (often to 4-bit precision) before applying LoRA. This combination lowers memory costs even more while still allowing effective fine-tuning.

In [26]:
!litgpt finetune_lora meta-llama/Llama-3.2-1B-Instruct \
  --data JSON \
  --data.json_path my_custom_dataset_small.json  \
  --data.val_split_fraction 0.1 \
  --out_dir out/llama-custom-model

{'access_token': None,
 'checkpoint_dir': PosixPath('checkpoints/meta-llama/Llama-3.2-1B-Instruct'),
 'data': JSON(json_path=PosixPath('my_custom_dataset_small.json'),
              mask_prompt=False,
              val_split_fraction=0.1,
              prompt_style=<litgpt.prompts.Alpaca object at 0x33ad2e9c0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 1,
 'eval': EvalArgs(interval=100,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=True,
                  evaluate_example='first'),
 'logger_name': 'csv',
 'lora_alpha': 16,
 'lora_dropout': 0.05,
 'lora_head': False,
 'lora_key': False,
 'lora_mlp': False,
 'lora_projection': False,
 'lora_query': True,
 'lora_r': 8,
 'lora_value': True,
 'num_nodes': 1,
 'optimizer': 'AdamW',
 'out_dir': PosixPath('out/llama-custom-model'),
 'precision': None,
 'quantize': None,
 'seed': 1337,
 

### 5. Evalute the model

In [35]:
!litgpt evaluate out/llama-custom-model/final \
   --batch_size 4 \
   --tasks 'truthfulqa_mc2,mmlu' \
    --out_dir llama_custom_model_eval/

{'access_token': None,
 'batch_size': 4,
 'checkpoint_dir': PosixPath('out/llama-custom-model/final'),
 'device': None,
 'dtype': None,
 'force_conversion': False,
 'limit': None,
 'num_fewshot': None,
 'out_dir': PosixPath('llama_custom_model_eval'),
 'save_filepath': None,
 'seed': 1234,
 'tasks': 'truthfulqa_mc2,mmlu'}
{'checkpoint_dir': PosixPath('out/llama-custom-model/final'),
 'output_dir': PosixPath('llama_custom_model_eval')}
  state_dict = torch.load(out_dir / "model.pth")
2025-01-04:20:15:40,746 INFO     [huggingface.py:132] Using device 'cpu'
2025-01-04:20:15:40,973 INFO     [huggingface.py:369] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cpu'}
2025-01-04:20:15:41,260 INFO     [evaluator.py:164] Setting random seed to 1234 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-01-04:20:15:41,260 INFO     [evaluator.py:217] Using pre-initialized model
2025-01-04:20:17:25,771 INF

### 6. Use the model to answer prompt

In [34]:
!litgpt generate out/llama-custom-model/final --prompt "What is the best way to invest in the stock market?"

{'checkpoint_dir': PosixPath('out/llama-custom-model/final'),
 'compile': False,
 'max_new_tokens': 50,
 'num_samples': 1,
 'precision': None,
 'prompt': 'What is the best way to invest in the stock market?',
 'quantize': None,
 'temperature': 0.8,
 'top_k': 50,
 'top_p': 1.0}
Loading model 'out/llama-custom-model/final/lit_model.pth' with {'name': 'Llama-3.2-1B-Instruct', 'hf_config': {'name': 'Llama-3.2-1B-Instruct', 'org': 'meta-llama'}, 'scale_embeddings': False, 'attention_scores_scalar': None, 'block_size': 131072, 'sliding_window_size': None, 'sliding_window_layer_placing': None, 'vocab_size': 128000, 'padding_multiple': 512, 'padded_vocab_size': 128256, 'n_layer': 16, 'n_head': 32, 'head_size': 64, 'n_embd': 2048, 'rotary_percentage': 1.0, 'parallel_residual': False, 'bias': False, 'lm_head_bias': False, 'attn_bias': False, 'n_query_groups': 8, 'shared_attention_norm': False, 'norm_class_name': 'RMSNorm', 'post_attention_norm': False, 'post_mlp_norm': False, 'norm_eps': 1e-05, 

### 7. Chat with the model

In [None]:
!litgpt chat out/llama-custom-model/final

### 8. Deploy the model

In [None]:
!litgpt serve out/llama-custom-model/final