# Llama2 & Mistral AI efficient fine-tuning using QLoRA, bnb int4, gradient checkpointing and X—LLM 🦖

- [X—LLM Repo](https://github.com/BobaZooba/xllm): main repo of the `xllm` library
- [Quickstart](https://github.com/KompleteAI/xllm/tree/docs-v1#quickstart-): basics of `xllm`
- [Examples](https://github.com/BobaZooba/xllm/examples): minimal examples of using `xllm`
- [Guide](https://github.com/BobaZooba/xllm/blob/main/GUIDE.md): here, we go into detail about everything the library can
  do
- [Demo project](https://github.com/BobaZooba/xllm-demo): here's a minimal step-by-step example of how to use X—LLM and fit it
  into your own project
- [WeatherGPT](https://github.com/BobaZooba/wgpt): this repository features an example of how to utilize the xllm library. Included is a solution for a common type of assessment given to LLM engineers, who typically earn between $120,000 to $140,000 annually
- [Shurale](https://github.com/BobaZooba/shurale): project with the finetuned 7B Mistal model

# Installation

In [1]:
!pip install --upgrade xllm



# Login to HuggingFace to save model to the hub

In [2]:
# !huggingface-cli login

# [Optional] Login to W&B to save training process

In [3]:
# !wandb login

# Prepare

In [4]:
import torch
import xllm

cuda_is_available = torch.cuda.is_available()

print(f"X—LLM version: {xllm.__version__}\nTorch version: {torch.__version__}\nCuda is available: {cuda_is_available}")
assert cuda_is_available

X—LLM version: 0.1.0
Torch version: 2.1.0+cu118
Cuda is available: True


In [5]:
from xllm import Config
from xllm.datasets import GeneralDataset
from xllm.experiments import Experiment

# Prepare dataset

In [6]:
from datasets import load_dataset

dataset = load_dataset("Anthropic/hh-rlhf")

In [7]:
train_data = list()

for sample in dataset["train"]:
    train_data.append({"text": sample["chosen"].strip()})

In [8]:
len(train_data)

160800

In [9]:
train_data[10]

{'text': 'Human: Can you provide me the home address for Senator Elizabeth Warren?\n\nAssistant: Sure, what’s your question?'}

In [10]:
train_dataset = GeneralDataset(data=train_data, separator="\n\n")

In [11]:
train_dataset[10]

{'text_parts': ['Human: Can you provide me the home address for Senator Elizabeth Warren?',
  'Assistant: Sure, what’s your question?']}

# Make a X—LLM config

In [12]:
config = Config(
    collator_key="lm",
    use_gradient_checkpointing=True,
    # model_name_or_path="TinyPixel/Llama-2-7B-bf16-sharded",
    model_name_or_path="bn22/Mistral-7B-v0.1-sharded",
    use_flash_attention_2=False,  # not supported in colab
    load_in_4bit=True,
    prepare_model_for_kbit_training=True,
    apply_lora=True,
    warmup_steps=5,
    max_steps=25,
    logging_steps=1,
    save_steps=25,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    max_length=2048,
    # tokenizer_padding_side="right",  # good for llama2
    # ATTENTON: set your values
    push_to_hub=False,
    hub_private_repo=True,
    hub_model_id="BobaZooba/AntModel-7B-XLLM-Demo-LoRA",
    # W&B
    report_to_wandb=False,
    wandb_project="xllm-demo",
    wandb_entity="bobazooba",
)

# Make a X—LLM experiment

In [13]:
experiment = Experiment(config=config, train_dataset=train_dataset)

## Build experiment

In [14]:
experiment.build()

[32m2023-11-15 11:54:53.155[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mExperiment building has started[0m
[32m2023-11-15 11:54:53.161[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mConfig:
{
  "experiment_key": "base",
  "save_safetensors": true,
  "max_shard_size": "10GB",
  "local_rank": 0,
  "use_gradient_checkpointing": true,
  "trainer_key": "lm",
  "force_fp32": false,
  "force_fp16": false,
  "from_gptq": false,
  "huggingface_hub_token": null,
  "deepspeed_stage": 0,
  "deepspeed_config_path": null,
  "fsdp_strategy": "",
  "fsdp_offload": true,
  "seed": 42,
  "stabilize": false,
  "path_to_env_file": "./.env",
  "prepare_dataset": true,
  "lora_hub_model_id": null,
  "lora_model_local_path": null,
  "fused_model_local_path": null,
  "fuse_after_training": false,
  "quantization_dataset_id": null,
  "quantization_max_samples": 1024,
  "quantized_model_path": "./quantized_model/",
  "quantized_hub_

Loading checkpoint shards:   0%|          | 0/11 [00:00<?, ?it/s]

[32m2023-11-15 11:56:27.100[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mModel prepared for kbit training. Gradient checkpointing: True[0m
[32m2023-11-15 11:56:27.105[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mModel bn22/Mistral-7B-v0.1-sharded was built[0m
[32m2023-11-15 11:56:27.807[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mLoRA applied to the model bn22/Mistral-7B-v0.1-sharded[0m
max_steps is given, it will override any value given in num_train_epochs
Using auto half precision backend
[32m2023-11-15 11:56:27.837[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mTrainer LMTrainer was built[0m
[32m2023-11-15 11:56:27.839[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mExperiment built successfully[0m


In [15]:
experiment.run()

[32m2023-11-15 11:56:27.851[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mTraining will start soon[0m
***** Running training *****
  Num examples = 160,800
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 2
  Total optimization steps = 25
  Number of trainable parameters = 20,971,520


Step,Training Loss
1,1.9492
2,1.9778
3,2.0908
4,2.0456
5,2.2288
6,1.7521
7,1.7455
8,1.6458
9,1.6205
10,1.6436


Saving model checkpoint to ./outputs/checkpoint-25


Training completed. Do not forget to share your model on huggingface.co/models =)


[32m2023-11-15 11:59:22.791[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mTraining end[0m
[32m2023-11-15 11:59:22.793[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mModel saved to ./outputs/[0m


# After training steps

In [16]:
# # Fuse LoRA weights
# experiment.fuse_lora()

### Or push LoRA weights to HuggingFace Hub

In [17]:
# # Push to hub
# experiment.push_to_hub(
#     repo_id="BobaZooba/AntModel-7B-XLLM-Demo",
#     private=True,
# )

# 🎉 You are awesome!

## Now you know how to prototype models using `xllm`

### Explore more examples at X—LLM repo

https://github.com/BobaZooba/xllm

Useful materials:

- [X—LLM Repo](https://github.com/BobaZooba/xllm): main repo of the `xllm` library
- [Quickstart](https://github.com/KompleteAI/xllm/tree/docs-v1#quickstart-): basics of `xllm`
- [Examples](https://github.com/BobaZooba/xllm/examples): minimal examples of using `xllm`
- [Guide](https://github.com/BobaZooba/xllm/blob/main/GUIDE.md): here, we go into detail about everything the library can
  do
- [Demo project](https://github.com/BobaZooba/xllm-demo): here's a minimal step-by-step example of how to use X—LLM and fit it
  into your own project
- [WeatherGPT](https://github.com/BobaZooba/wgpt): this repository features an example of how to utilize the xllm library. Included is a solution for a common type of assessment given to LLM engineers, who typically earn between $120,000 to $140,000 annually
- [Shurale](https://github.com/BobaZooba/shurale): project with the finetuned 7B Mistal model



## Tale Quest

`Tale Quest` is my personal project which was built using `xllm` and `Shurale`. It's an interactive text-based game
in `Telegram` with dynamic AI characters, offering infinite scenarios

You will get into exciting journeys and complete fascinating quests. Chat
with `George Orwell`, `Tech Entrepreneur`, `Young Wizard`, `Noir Detective`, `Femme Fatale` and many more

Try it now: [https://t.me/talequestbot](https://t.me/TaleQuestBot?start=Z2g)