# 🦖 X—LLM: Easy & Cutting Edge LLM Finetuning

Tutorial how to run X—LLM in colab

- [X—LLM Repo](https://github.com/BobaZooba/xllm): main repo of the `xllm` library
- [Quickstart](https://github.com/KompleteAI/xllm/tree/docs-v1#quickstart-): basics of `xllm`
- [Examples](https://github.com/BobaZooba/xllm/examples): minimal examples of using `xllm`
- [Guide](https://github.com/BobaZooba/xllm/blob/main/GUIDE.md): here, we go into detail about everything the library can
  do
- [Demo project](https://github.com/BobaZooba/xllm-demo): here's a minimal step-by-step example of how to use X—LLM and fit it
  into your own project
- [WeatherGPT](https://github.com/BobaZooba/wgpt): this repository features an example of how to utilize the xllm library. Included is a solution for a common type of assessment given to LLM engineers, who typically earn between $120,000 to $140,000 annually
- [Shurale](https://github.com/BobaZooba/shurale): project with the finetuned 7B Mistal model


First of all you need to install the latest `xllm` version

# Installation

In [None]:
# default version
!pip install xllm

# version which include deepspeed, flash-attn and auto-gptq
# !pip install xllm[train]



# Verify the versions and confirm whether CUDA is available

In [None]:
import torch
import xllm

cuda_is_available = torch.cuda.is_available()

print(f"X—LLM version: {xllm.__version__}\nTorch version: {torch.__version__}\nCuda is available: {cuda_is_available}")
assert cuda_is_available

X—LLM version: 0.0.10
Torch version: 2.1.0+cu118
Cuda is available: True


# Single cell example

In [None]:
from xllm import Config
from xllm.datasets import GeneralDataset
from xllm.experiments import Experiment

# 1. Init Config which controls the internal logic of xllm
config = Config(
    model_name_or_path="facebook/opt-350m",
    force_fp32=True,  # only for colab
)

# 2. Prepare the data
train_data = ["Hello!"] * 100

# 3. Load the data
train_dataset = GeneralDataset.from_list(data=train_data)

# 4. Init Experiment
experiment = Experiment(config=config, train_dataset=train_dataset)

# 5. Build Experiment from Config: init tokenizer and model, apply LoRA and so on
experiment.build()

# 6. Run Experiment (training)
experiment.run()

# 7. [Optional] Fuse LoRA layers
# experiment.fuse_lora()

# 8. [Optional] Push fused model (or just LoRA weight) to the HuggingFace Hub
# experiment.push_to_hub(repo_id="YOUR_NAME/MODEL_NAME")

[32m2023-11-14 15:58:32.074[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mExperiment building has started[0m
[32m2023-11-14 15:58:32.080[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mConfig:
{
  "experiment_key": "base",
  "save_safetensors": true,
  "max_shard_size": "10GB",
  "local_rank": 0,
  "use_gradient_checkpointing": false,
  "trainer_key": "lm",
  "force_fp32": true,
  "force_fp16": false,
  "from_gptq": false,
  "huggingface_hub_token": null,
  "deepspeed_stage": 0,
  "deepspeed_config_path": null,
  "fsdp_strategy": "",
  "fsdp_offload": true,
  "seed": 42,
  "stabilize": false,
  "path_to_env_file": "./.env",
  "prepare_dataset": true,
  "lora_hub_model_id": null,
  "lora_model_local_path": null,
  "fused_model_local_path": null,
  "fuse_after_training": false,
  "quantization_dataset_id": null,
  "quantization_max_samples": 1024,
  "quantized_model_path": "./quantized_model/",
  "quantized_hub_

Step,Training Loss
1,5.197
10,4.931
20,3.7598
30,0.6934
40,0.1674
50,0.0011




Training completed. Do not forget to share your model on huggingface.co/models =)


[32m2023-11-14 15:59:06.618[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mTraining end[0m
[32m2023-11-14 15:59:06.622[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mModel saved to ./outputs/[0m


# Add LoRA

## Config

`Config` plays a crucial role in the `xllm` library. It's how we define the workings of the library components, like how to handle data, the methods for training, the type of model to train, and so forth.

In [None]:
# config with LoRA
config = Config(
    model_name_or_path="facebook/opt-350m",
    stabilize=True,
    apply_lora=True,
)

### You can explicitly specify the values of additional parameters in LoRA

In [None]:
# # extended config with LoRA
# config = Config(
#     model_name_or_path="facebook/opt-350m",
#     stabilize=True,
#     apply_lora=True,
#     lora_rank=8,
#     lora_alpha=32,
#     lora_dropout=0.05,
#     raw_lora_target_modules="all",
# )

## Make training data

In [None]:
train_data = ["Hello!", "How are you?", "Are you okay?"] * 100

In [None]:
len(train_data)

300

## Make a `xllm` train dataset

In [None]:
train_dataset = GeneralDataset.from_list(data=train_data)

## Init the experiment

`Experiment` encompasses all aspects of training, such as how to load the model, whether to use LoRA or not, and how to set up the trainer, among other things.

Required field is `config`.

You can also pass the arguments that are listed below. Default value for each component is `None`.

If you do not explicitly specify the value when initializing the experiment (that is, by default it will be `None`), then `Experiment` in step `.build` initializes the necessary components by referring to `Config` such as `tokenizer`, `model`, and so on.
```
training_arguments: Optional[TrainingArguments]
train_dataset: Optional[BaseDataset]
eval_dataset: Optional[BaseDataset]
tokenizer: Optional[PreTrainedTokenizer]
collator: Optional[BaseCollator]
quantization_config: Union[BitsAndBytesConfig, GPTQConfig, None]
model: Union[PreTrainedModel, PeftModel, None]
lora_config: Optional[LoraConfig]
trainer: Optional[LMTrainer]
```

In [None]:
experiment = Experiment(config=config, train_dataset=train_dataset)

## 🏗 Build the experiment

At this point, we're setting up all the components needed for training.

In [None]:
experiment.build()

[32m2023-11-14 15:59:06.695[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mExperiment building has started[0m
[32m2023-11-14 15:59:06.699[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mConfig:
{
  "experiment_key": "base",
  "save_safetensors": true,
  "max_shard_size": "10GB",
  "local_rank": 0,
  "use_gradient_checkpointing": false,
  "trainer_key": "lm",
  "force_fp32": false,
  "force_fp16": false,
  "from_gptq": false,
  "huggingface_hub_token": null,
  "deepspeed_stage": 0,
  "deepspeed_config_path": null,
  "fsdp_strategy": "",
  "fsdp_offload": true,
  "seed": 42,
  "stabilize": true,
  "path_to_env_file": "./.env",
  "prepare_dataset": true,
  "lora_hub_model_id": null,
  "lora_model_local_path": null,
  "fused_model_local_path": null,
  "fuse_after_training": false,
  "quantization_dataset_id": null,
  "quantization_max_samples": 1024,
  "quantized_model_path": "./quantized_model/",
  "quantized_hub_

## 🚄 Run experiment

In [None]:
experiment.run()

[32m2023-11-14 15:59:23.895[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mTraining will start soon[0m
***** Running training *****
  Num examples = 300
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 150
  Number of trainable parameters = 3,563,520


Step,Training Loss
1,3.6269
10,3.8277
20,3.7486
30,3.7977
40,3.3031
50,2.9943
60,2.499
70,2.5675
80,1.7883
90,1.1865


Saving model checkpoint to ./outputs/checkpoint-100


Training completed. Do not forget to share your model on huggingface.co/models =)


[32m2023-11-14 16:00:12.883[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mTraining end[0m
[32m2023-11-14 16:00:12.888[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mModel saved to ./outputs/[0m


## 🎉 Done!

You are trained a model using `xllm`

## Fuse model

In [None]:
# experiment.fuse_lora()

## Get the model

In [None]:
# experiment.model

## You can save the model

If you have not fuse the model, then only the LoRA weights will be saved.

In [None]:
# experiment.model.save_pretrained("./trained_model/")

## You could push the model to the HuggingFace Hub

If you have not fuse the model, then only the LoRA weights will be saved.

Make sure you are logged in HuggingFace Hub. You can run this command:

```python
!huggingface-cli login
```

Or you can set the environment variable with your Access token. You can find your token here: https://huggingface.co/settings/tokens

```
import os

os.environ["HUGGING_FACE_HUB_TOKEN"] = "YOUR_ACCESS_TOKEN"
```

In [None]:
# push the model and the tokenizer to the HuggingFace Hub
# experiment.push_to_hub(
#     repo_id="YOUR_LOGIN_AT_HF_HUB/MODEL_NAME",
#     private=False,
#     safe_serialization=True
# )

## 🎉 Done!

You've trained the model using `xllm` and uploaded it to the hub

# Add QLoRA

To train the `QLoRA` model, we need to load the backbone model using `bitsandbytes` library and int4 (or int8) weights.

In [None]:
# config with QLoRA
config = Config(
    model_name_or_path="facebook/opt-350m",
    stabilize=True,
    apply_lora=True,
    load_in_4bit=True,
    prepare_model_for_kbit_training=True,
)

### You can explicitly specify the values of additional parameters in bitsandbytes quantization

In [None]:
# # extended config with QLoRA
# config = Config(
#     model_name_or_path="facebook/opt-350m",
#     stabilize=True,
#     apply_lora=True,
#     load_in_4bit=True,
#     prepare_model_for_kbit_training=True,
#     llm_int8_threshold=6.0,
#     llm_int8_has_fp16_weight=True,
#     bnb_4bit_use_double_quant=True,
#     bnb_4bit_quant_type="nf4",
# )

## All other steps are the same

In [None]:
train_data = ["Hello!", "How are you?", "Are you okay?"] * 100
train_dataset = GeneralDataset.from_list(data=train_data)
experiment = Experiment(config=config, train_dataset=train_dataset)
experiment.build()
experiment.run()
# experiment.fuse_lora()

[32m2023-11-14 16:00:12.955[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mExperiment building has started[0m
[32m2023-11-14 16:00:12.957[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mConfig:
{
  "experiment_key": "base",
  "save_safetensors": true,
  "max_shard_size": "10GB",
  "local_rank": 0,
  "use_gradient_checkpointing": false,
  "trainer_key": "lm",
  "force_fp32": false,
  "force_fp16": false,
  "from_gptq": false,
  "huggingface_hub_token": null,
  "deepspeed_stage": 0,
  "deepspeed_config_path": null,
  "fsdp_strategy": "",
  "fsdp_offload": true,
  "seed": 42,
  "stabilize": true,
  "path_to_env_file": "./.env",
  "prepare_dataset": true,
  "lora_hub_model_id": null,
  "lora_model_local_path": null,
  "fused_model_local_path": null,
  "fuse_after_training": false,
  "quantization_dataset_id": null,
  "quantization_max_samples": 1024,
  "quantized_model_path": "./quantized_model/",
  "quantized_hub_

Step,Training Loss
1,4.9453
10,4.768
20,4.7204
30,4.408
40,3.776
50,3.2167
60,2.6297
70,2.5221
80,1.7133
90,1.1688


Saving model checkpoint to ./outputs/checkpoint-100


Training completed. Do not forget to share your model on huggingface.co/models =)


[32m2023-11-14 16:01:09.884[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mTraining end[0m
[32m2023-11-14 16:01:09.888[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mModel saved to ./outputs/[0m


## You also can add `Gradient Checkpointing`

This will help to use `less GPU memory` during training, that is, you will be able to learn more than without this technique. The disadvantages of this technique is slowing down the forward step, that is, `slowing down training`.

Summarizing: you will be training larger models (for example 7B in colab), but at the expense of training speed.

In [None]:
# config = Config(
#     model_name_or_path="facebook/opt-350m",

#     use_gradient_checkpointing=True,

#     stabilize=True,
#     apply_lora=True,
#     load_in_4bit=True,
#     prepare_model_for_kbit_training=True,
# )

# Add eval data

## Setup config

- `do_eval` for turn on evaluation  
- `eval_steps` how often we should run evaluation

In [None]:
config = Config(
    model_name_or_path="facebook/opt-350m",
    stabilize=True,
    apply_lora=True,
    load_in_4bit=True,
    prepare_model_for_kbit_training=True,
    do_eval=True,
    eval_steps=50,
)

## Make dummy eval dataset

In [None]:
eval_data = ["Hi", "Sup?"] * 10

## Make a `xllm` eval dataset

In [None]:
eval_dataset = GeneralDataset.from_list(eval_data)

## Init experiment with the `eval_dataset`

In [None]:
experiment = Experiment(config=config, train_dataset=train_dataset, eval_dataset=eval_dataset)

## Build experiment

In [None]:
experiment.build()

[32m2023-11-14 16:01:09.943[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mExperiment building has started[0m
[32m2023-11-14 16:01:09.947[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mConfig:
{
  "experiment_key": "base",
  "save_safetensors": true,
  "max_shard_size": "10GB",
  "local_rank": 0,
  "use_gradient_checkpointing": false,
  "trainer_key": "lm",
  "force_fp32": false,
  "force_fp16": false,
  "from_gptq": false,
  "huggingface_hub_token": null,
  "deepspeed_stage": 0,
  "deepspeed_config_path": null,
  "fsdp_strategy": "",
  "fsdp_offload": true,
  "seed": 42,
  "stabilize": true,
  "path_to_env_file": "./.env",
  "prepare_dataset": true,
  "lora_hub_model_id": null,
  "lora_model_local_path": null,
  "fused_model_local_path": null,
  "fuse_after_training": false,
  "quantization_dataset_id": null,
  "quantization_max_samples": 1024,
  "quantized_model_path": "./quantized_model/",
  "quantized_hub_

## Run experiment

In [None]:
experiment.run()

[32m2023-11-14 16:01:12.278[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mTraining will start soon[0m
***** Running training *****
  Num examples = 300
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 150
  Number of trainable parameters = 3,563,520


Step,Training Loss,Validation Loss
50,3.2167,7.310188
100,0.8291,8.282642
150,0.4285,8.972834


***** Running Evaluation *****
  Num examples = 20
  Batch size = 2
***** Running Evaluation *****
  Num examples = 20
  Batch size = 2
Saving model checkpoint to ./outputs/checkpoint-100
***** Running Evaluation *****
  Num examples = 20
  Batch size = 2


Training completed. Do not forget to share your model on huggingface.co/models =)


[32m2023-11-14 16:02:09.046[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mTraining end[0m
[32m2023-11-14 16:02:09.048[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mModel saved to ./outputs/[0m


# 🎉 You are awesome!

## Now you know how to prototype models using `xllm`

### Explore more examples at X—LLM repo

https://github.com/BobaZooba/xllm

Useful materials:

- [X—LLM Repo](https://github.com/BobaZooba/xllm): main repo of the `xllm` library
- [Quickstart](https://github.com/KompleteAI/xllm/tree/docs-v1#quickstart-): basics of `xllm`
- [Examples](https://github.com/BobaZooba/xllm/examples): minimal examples of using `xllm`
- [Guide](https://github.com/BobaZooba/xllm/blob/main/GUIDE.md): here, we go into detail about everything the library can
  do
- [Demo project](https://github.com/BobaZooba/xllm-demo): here's a minimal step-by-step example of how to use X—LLM and fit it
  into your own project
- [WeatherGPT](https://github.com/BobaZooba/wgpt): this repository features an example of how to utilize the xllm library. Included is a solution for a common type of assessment given to LLM engineers, who typically earn between $120,000 to $140,000 annually
- [Shurale](https://github.com/BobaZooba/shurale): project with the finetuned 7B Mistal model


## Tale Quest

`Tale Quest` is my personal project which was built using `xllm` and `Shurale`. It's an interactive text-based game
in `Telegram` with dynamic AI characters, offering infinite scenarios

You will get into exciting journeys and complete fascinating quests. Chat
with `George Orwell`, `Tech Entrepreneur`, `Young Wizard`, `Noir Detective`, `Femme Fatale` and many more

Try it now: [https://t.me/talequestbot](https://t.me/TaleQuestBot?start=Z2g)