# 🦖 X—LLM: Easy & Cutting Edge LLM Finetuning

Tutorial how to run X—LLM in colab

- [X—LLM Repo](https://github.com/BobaZooba/xllm): main repo of the `xllm` library
- [Examples](https://github.com/BobaZooba/xllm/examples): minimal examples of using `xllm`
- [Docs](https://github.com/BobaZooba/xllm/blob/main/DOCS.md): here, we go into detail about everything the library can
  do
- [Demo project](https://github.com/BobaZooba/xllm-demo): here's a minimal step-by-step example of how to use X—LLM and fit it
  into your own project
- [Template project](https://github.com/BobaZooba/xllm-template): here's a template, a kickoff point you can use for
  your projects

First of all you need to install the latest `xllm` version

# Installation

In [None]:
# default version
!pip install xllm

# version which include deepspeed, flash-attn and auto-gptq
# !pip install xllm[train]

Collecting xllm
  Downloading xllm-0.3.11-py3-none-any.whl (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.1/60.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Collecting loguru (from xllm)
  Downloading loguru-0.7.2-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting peft>=0.5.0 (from xllm)
  Downloading peft-0.5.0-py3-none-any.whl (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.6/85.6 kB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting wandb (from xllm)
  Downloading wandb-0.15.11-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m49.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-dotenv (from xllm)
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting optimum>=1.12.0 (from xllm)
  Downloading optimum-1.13.2.tar.gz (300 kB)
[2K 

# Verify the versions and confirm whether CUDA is available

In [None]:
import torch
import xllm

cuda_is_available = torch.cuda.is_available()

print(f"X—LLM version: {xllm.__version__}\nTorch version: {torch.__version__}\nCuda is available: {cuda_is_available}")
assert cuda_is_available

X—LLM version: 0.3.11
Torch version: 2.0.1+cu118
Cuda is available: True


# Single cell example

In [None]:
from xllm import Config
from xllm.datasets import GeneralDataset
from xllm.experiments import Experiment

# 1. Init Config which controls the internal logic of xllm
config = Config(model_name_or_path="facebook/opt-350m")

# 2. Prepare the data
train_data = ["Hello!"] * 100

# 3. Load the data
train_dataset = GeneralDataset.from_list(data=train_data)

# 4. Init Experiment
experiment = Experiment(config=config, train_dataset=train_dataset)

# 5. Build Experiment from Config: init tokenizer and model, apply LoRA and so on
experiment.build()

# 6. Run Experiment (training)
experiment.run()

# 7. [Optional] Fuse LoRA layers
# experiment.fuse_lora()

# 8. [Optional] Push fused model (or just LoRA weight) to the HuggingFace Hub
# experiment.push_to_hub(repo_id="YOUR_NAME/MODEL_NAME")

[32m2023-09-30 14:44:45.984[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mExperiment building has started[0m
[32m2023-09-30 14:44:45.992[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mConfig:
{
  "experiment_key": "base",
  "save_safetensors": true,
  "max_shard_size": "10GB",
  "local_rank": 0,
  "use_gradient_checkpointing": false,
  "trainer_key": "lm",
  "force_fp16": false,
  "from_gptq": false,
  "huggingface_hub_token": null,
  "deepspeed_stage": 0,
  "deepspeed_config_path": null,
  "fsdp_strategy": "",
  "fsdp_offload": true,
  "seed": 42,
  "stabilize": false,
  "path_to_env_file": null,
  "prepare_dataset": true,
  "lora_hub_model_id": null,
  "lora_model_local_path": null,
  "fused_model_local_path": null,
  "quantization_dataset_id": null,
  "quantization_max_samples": 100000,
  "quantized_model_path": "./quantized_model/",
  "quantized_hub_model_id": null,
  "quantized_hub_private_repo": null,
 

Downloading (…)okenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

[32m2023-09-30 14:44:50.439[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mTokenizer facebook/opt-350m was built[0m
[32m2023-09-30 14:44:50.441[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mCollator LMCollator was built[0m
[32m2023-09-30 14:44:50.444[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mQuantization config is None. Model will be loaded using torch.float32[0m


Downloading pytorch_model.bin:   0%|          | 0.00/663M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

[32m2023-09-30 14:45:07.174[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mModel facebook/opt-350m was built[0m
[32m2023-09-30 14:45:17.014[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mTrainer LMTrainer was built[0m
[32m2023-09-30 14:45:17.019[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mExperiment built successfully[0m
[32m2023-09-30 14:45:17.023[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mTraining will start soon[0m
***** Running training *****
  Num examples = 100
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 50
  Number of trainable parameters = 331,196,416


Step,Training Loss
1,6.0585
10,5.8549
20,4.8202
30,2.2326
40,2.1023
50,1.7562




Training completed. Do not forget to share your model on huggingface.co/models =)


[32m2023-09-30 14:45:29.762[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mTraining end[0m
[32m2023-09-30 14:45:29.764[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mModel saved to ./outputs/[0m


# Add LoRA

## Config

`Config` plays a crucial role in the `xllm` library. It's how we define the workings of the library components, like how to handle data, the methods for training, the type of model to train, and so forth.

In [None]:
# config with LoRA
config = Config(
    model_name_or_path="facebook/opt-350m",
    stabilize=True,
    apply_lora=True,
)

### You can explicitly specify the values of additional parameters in LoRA

In [None]:
# # extended config with LoRA
# config = Config(
#     model_name_or_path="facebook/opt-350m",
#     stabilize=True,
#     apply_lora=True,
#     lora_rank=8,
#     lora_alpha=32,
#     lora_dropout=0.05,
#     raw_lora_target_modules="all",
# )

## Make training data

In [None]:
train_data = ["Hello!", "How are you?", "Are you okay?"] * 100

In [None]:
len(train_data)

300

## Make a `xllm` train dataset

In [None]:
train_dataset = GeneralDataset.from_list(data=train_data)

## Init the experiment

`Experiment` encompasses all aspects of training, such as how to load the model, whether to use LoRA or not, and how to set up the trainer, among other things.

Required field is `config`.

You can also pass the arguments that are listed below. Default value for each component is `None`.

If you do not explicitly specify the value when initializing the experiment (that is, by default it will be `None`), then `Experiment` in step `.build` initializes the necessary components by referring to `Config` such as `tokenizer`, `model`, and so on.
```
training_arguments: Optional[TrainingArguments]
train_dataset: Optional[BaseDataset]
eval_dataset: Optional[BaseDataset]
tokenizer: Optional[PreTrainedTokenizer]
collator: Optional[BaseCollator]
quantization_config: Union[BitsAndBytesConfig, GPTQConfig, None]
model: Union[PreTrainedModel, PeftModel, None]
lora_config: Optional[LoraConfig]
trainer: Optional[LMTrainer]
```

In [None]:
experiment = Experiment(config=config, train_dataset=train_dataset)

## 🏗 Build the experiment

At this point, we're setting up all the components needed for training.

In [None]:
experiment.build()

[32m2023-09-30 14:45:29.834[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mExperiment building has started[0m
[32m2023-09-30 14:45:29.837[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mConfig:
{
  "experiment_key": "base",
  "save_safetensors": true,
  "max_shard_size": "10GB",
  "local_rank": 0,
  "use_gradient_checkpointing": false,
  "trainer_key": "lm",
  "force_fp16": false,
  "from_gptq": false,
  "huggingface_hub_token": null,
  "deepspeed_stage": 0,
  "deepspeed_config_path": null,
  "fsdp_strategy": "",
  "fsdp_offload": true,
  "seed": 42,
  "stabilize": false,
  "path_to_env_file": null,
  "prepare_dataset": true,
  "lora_hub_model_id": null,
  "lora_model_local_path": null,
  "fused_model_local_path": null,
  "quantization_dataset_id": null,
  "quantization_max_samples": 100000,
  "quantized_model_path": "./quantized_model/",
  "quantized_hub_model_id": null,
  "quantized_hub_private_repo": null,
 

## 🚄 Run experiment

In [None]:
experiment.run()

[32m2023-09-30 14:45:37.633[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mTraining will start soon[0m
***** Running training *****
  Num examples = 300
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 150
  Number of trainable parameters = 331,196,416


Step,Training Loss
1,4.7182
10,4.9896
20,4.3015
30,3.3142
40,2.4888
50,2.1788
60,1.9492
70,1.9321
80,1.9812
90,1.9125


Saving model checkpoint to ./outputs/checkpoint-100
Configuration saved in ./outputs/checkpoint-100/config.json
Configuration saved in ./outputs/checkpoint-100/generation_config.json
Model weights saved in ./outputs/checkpoint-100/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


[32m2023-09-30 14:46:18.968[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mTraining end[0m
[32m2023-09-30 14:46:18.971[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mModel saved to ./outputs/[0m


## 🎉 Done!

You are trained a model using `xllm`

## Fuse model

In [1]:
# experiment.fuse_lora()

## Get the model

In [None]:
# experiment.model

## You can save the model

If you have not fuse the model, then only the LoRA weights will be saved.

In [None]:
# experiment.model.save_pretrained("./trained_model/")

## You could push the model to the HuggingFace Hub

If you have not fuse the model, then only the LoRA weights will be saved.

Make sure you are logged in HuggingFace Hub. You can run this command:

```python
!huggingface-cli login
```

Or you can set the environment variable with your Access token. You can find your token here: https://huggingface.co/settings/tokens

```
import os

os.environ["HUGGING_FACE_HUB_TOKEN"] = "YOUR_ACCESS_TOKEN"
```

In [None]:
# push the model and the tokenizer to the HuggingFace Hub
# experiment.push_to_hub(
#     repo_id="YOUR_LOGIN_AT_HF_HUB/MODEL_NAME",
#     private=False,
#     safe_serialization=True
# )

## 🎉 Done!

You've trained the model using `xllm` and uploaded it to the hub

# Add QLoRA

To train the QLoRA model, we need to load the backbone model using int4 (or int8).

In [None]:
# config with QLoRA
config = Config(
    model_name_or_path="facebook/opt-350m",
    stabilize=True,
    apply_lora=True,
    load_in_4bit=True,
    prepare_model_for_kbit_training=True,
)

### You can explicitly specify the values of additional parameters in bitsandbytes quantization

In [None]:
# # extended config with QLoRA
# config = Config(
#     model_name_or_path="facebook/opt-350m",
#     stabilize=True,
#     apply_lora=True,
#     load_in_4bit=True,
#     prepare_model_for_kbit_training=True,
#     llm_int8_threshold=6.0,
#     llm_int8_has_fp16_weight=True,
#     bnb_4bit_use_double_quant=True,
#     bnb_4bit_quant_type="nf4",
# )

## All other steps are the same

In [None]:
train_data = ["Hello!", "How are you?", "Are you okay?"] * 100
train_dataset = GeneralDataset.from_list(data=train_data)
experiment = Experiment(config=config, train_dataset=train_dataset)
experiment.build()
experiment.run()
# experiment.fuse_lora()

## You also can add `Gradient Checkpointing`

This will help to use `less GPU memory` during training, that is, you will be able to learn more than without this technique. The disadvantages of this technique is slowing down the forward step, that is, `slowing down training`.

Summarizing: you will be training larger models (for example 7B in colab), but at the expense of training speed.

In [None]:
# config = Config(
#     model_name_or_path="facebook/opt-350m",

#     use_gradient_checkpointing=True,

#     stabilize=True,
#     apply_lora=True,
#     load_in_4bit=True,
#     prepare_model_for_kbit_training=True,
# )

# Add eval data

## Setup config

- `do_eval` for turn on evaluation  
- `eval_steps` how often we should run evaluation

In [None]:
config = Config(
    model_name_or_path="facebook/opt-350m",
    stabilize=True,
    apply_lora=True,
    load_in_4bit=True,
    prepare_model_for_kbit_training=True,
    do_eval=True,
    eval_steps=50,
)

## Make dummy eval dataset

In [2]:
eval_data = ["Hi", "Sup?"] * 10

## Make a `xllm` eval dataset

In [None]:
eval_dataset = GeneralDataset.from_list(eval_data)

## Init experiment with the `eval_dataset`

In [None]:
experiment = Experiment(config=config, train_dataset=train_dataset, eval_dataset=eval_dataset)

## Build experiment

In [None]:
experiment.build()

[32m2023-09-30 14:46:19.040[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mExperiment building has started[0m
[32m2023-09-30 14:46:19.043[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mConfig:
{
  "experiment_key": "base",
  "save_safetensors": true,
  "max_shard_size": "10GB",
  "local_rank": 0,
  "use_gradient_checkpointing": false,
  "trainer_key": "lm",
  "force_fp16": false,
  "from_gptq": false,
  "huggingface_hub_token": null,
  "deepspeed_stage": 0,
  "deepspeed_config_path": null,
  "fsdp_strategy": "",
  "fsdp_offload": true,
  "seed": 42,
  "stabilize": false,
  "path_to_env_file": null,
  "prepare_dataset": true,
  "lora_hub_model_id": null,
  "lora_model_local_path": null,
  "fused_model_local_path": null,
  "quantization_dataset_id": null,
  "quantization_max_samples": 100000,
  "quantized_model_path": "./quantized_model/",
  "quantized_hub_model_id": null,
  "quantized_hub_private_repo": null,
 

## Run experiment

In [None]:
experiment.run()

[32m2023-09-30 14:46:29.312[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mTraining will start soon[0m
***** Running training *****
  Num examples = 300
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 150
  Number of trainable parameters = 331,196,416


Step,Training Loss,Validation Loss
50,2.1788,7.020102
100,1.9102,5.900912
150,1.8692,5.658744


***** Running Evaluation *****
  Num examples = 20
  Batch size = 2
***** Running Evaluation *****
  Num examples = 20
  Batch size = 2
Saving model checkpoint to ./outputs/checkpoint-100
Configuration saved in ./outputs/checkpoint-100/config.json
Configuration saved in ./outputs/checkpoint-100/generation_config.json
Model weights saved in ./outputs/checkpoint-100/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 20
  Batch size = 2


Training completed. Do not forget to share your model on huggingface.co/models =)


[32m2023-09-30 14:47:16.717[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mTraining end[0m
[32m2023-09-30 14:47:16.719[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m38[0m - [1mModel saved to ./outputs/[0m


# 🎉 You are awesome!

## Now you know how to prototype models using `xllm`

### Explore more examples at X—LLM repo

https://github.com/BobaZooba/xllm
