# Llama2 & Mistral AI efficient fine-tuning using QLoRA, bnb int4, gradient checkpointing and X—LLM 🦖

- [X—LLM Repo](https://github.com/BobaZooba/xllm): main repo of the `xllm` library
- [Quickstart](https://github.com/KompleteAI/xllm/tree/docs-v1#quickstart-): basics of `xllm`
- [Examples](https://github.com/BobaZooba/xllm/examples): minimal examples of using `xllm`
- [Guide](https://github.com/BobaZooba/xllm/blob/main/GUIDE.md): here, we go into detail about everything the library can
  do
- [Demo project](https://github.com/BobaZooba/xllm-demo): here's a minimal step-by-step example of how to use X—LLM and fit it
  into your own project
- [WeatherGPT](https://github.com/BobaZooba/wgpt): this repository features an example of how to utilize the xllm library. Included is a solution for a common type of assessment given to LLM engineers, who typically earn between $120,000 to $140,000 annually
- [Shurale](https://github.com/BobaZooba/shurale): project with the finetuned 7B Mistal model

# Installation

In [1]:
# !pip install --upgrade xllm

# Login to HuggingFace to save model to the hub

In [1]:
from huggingface_hub import login

login("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")

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /home/louis/.cache/huggingface/token
Login successful


# [Optional] Login to W&B to save training process

In [2]:
# !wandb login
import wandb
wandb.login(key="cafb097edffe235dc31ef69036075037a7818065")

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mlstam[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/louis/.netrc


True

# Prepare

In [4]:
# import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [3]:
import torch
import xllm

cuda_is_available = torch.cuda.is_available()

print(f"X—LLM version: {xllm.__version__}\nTorch version: {torch.__version__}\nCuda is available: {cuda_is_available}")
assert cuda_is_available

X—LLM version: 0.1.7
Torch version: 2.1.1+cu121
Cuda is available: True


In [4]:
from xllm import Config
from xllm.datasets import GeneralDataset
from xllm.experiments import Experiment

# Prepare dataset

In [5]:
from datasets import load_dataset

# dataset = load_dataset("Anthropic/hh-rlhf")
dataset = load_dataset("LsTam/cquae_lrec")
dataset

DatasetDict({
    train: Dataset({
        features: ['title', 'output', 'qid', 'documents', 'question'],
        num_rows: 10490
    })
    eval: Dataset({
        features: ['title', 'output', 'qid', 'documents', 'question'],
        num_rows: 407
    })
    test: Dataset({
        features: ['title', 'output', 'qid', 'documents', 'question'],
        num_rows: 558
    })
})

In [8]:
# train_data = list()

# for sample in dataset["train"]:
#     train_data.append({"text": sample["chosen"].strip()})

In [19]:
def prepare_data(dataset):
    data = list()

    for sample in dataset:
        data.append({
            "text": (
            f"Réponds à la question suivante en t'appuyant exclusivement sur le document fourni:"
            f" {sample['question']} documents: {sample['title']} {' '.join(sample['documents'])}"
            f"target: {sample['output']}"
            )
            })
    return data


In [12]:
# def template_mistral(data):
#     # return {
#     #     "input": (
#     #         f"<s>[INST] Réponds à la question suivante en t'appuyant exclusivement sur le document fourni:"
#     #         f" {data['question']} documents: {data['title']} {' '.join(data['documents'])}  [/INST]"
#     #         ),
#     #     "target": data['output'],
#     #         }
#     return (
#             f"<s>[INST] Réponds à la question suivante en t'appuyant exclusivement sur le document fourni:"
#             f" {data['question']} documents: {data['title']} {' '.join(data['documents'])}  [/INST]"
            # )

In [21]:
train_dataset = GeneralDataset(data=prepare_data(dataset['train']), separator="target: ")
eval_dataset = GeneralDataset(data=prepare_data(dataset['eval']), separator="target: ")
# train_dataset = GeneralDataset.from_list(data=td, separator="target: ")
# GeneralDataset.from_list(data=train_dataset)

In [14]:
train_dataset[5], eval_dataset[5]

({'text_parts': ["Réponds à la question suivante en t'appuyant exclusivement sur le document fourni: Comment est perçu The Shard dans le monde ? documents: Londres, une métropole de rang mondial « The Shard », nouveau symbole de la puissance de Londres, a été inaugurée en 2013 sur la rive Sud de la Tamise. Cette tour fait face au quartier d’affaires de la City.",
   'The Shard est perçu dans le monde comme le nouveau symbole de la puissance de Londres.']},
 {'text_parts': ['Réponds à la question suivante en t\'appuyant exclusivement sur le document fourni: Pourquoi le désert emprunté par le pèlerin bouddhiste du VIIe siècle après J.‑C., Xuanzang, est appelé "Fleuve du sable" ? documents: Les routes de la soie, une aventure !  Xuanzang, pèlerin bouddhiste du VIIe siècle après\xa0J.‑C., voyage durant dix-neuf ans sur les routes de la soie et en décrit les dangers.\n\n Il entra dans le désert que les anciens appelaient Le Fleuve de Sable\xa0: on n\'y voit ni oiseau ni animal, ni eau ni pâ

# Make a X—LLM config

In [15]:
config = Config(
    collator_key="lm",
    use_gradient_checkpointing=True,
    # model_name_or_path="TinyPixel/Llama-2-7B-bf16-sharded",
    model_name_or_path="mistralai/Mistral-7B-Instruct-v0.1",
    use_flash_attention_2=True,  # not supported in colab
    load_in_4bit=True,
    prepare_model_for_kbit_training=True,
    apply_lora=True,
    warmup_steps=5,
    # max_steps=25,
    logging_steps=5,
    save_steps=25,
    num_train_epochs=2,

    device_map={'':0},
    per_device_train_batch_size=2,
    gradient_accumulation_steps=32,
    max_length=1024, #2048, #3072,

    # tokenizer_padding_side="right",  # good for llama2

    # ATTENTON: set your values
    push_to_hub=True,
    hub_private_repo=True,
    hub_model_id="LsTam/mistral-xllm-7B-LoRA",

    # W&B
    report_to_wandb=True,
    wandb_project="xllm-demo",
    wandb_entity="mistral-xllm",
)

[32m2023-12-02 00:40:49.043[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mEnvironment variable WANDB_PROJECT set[0m
[32m2023-12-02 00:40:49.044[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mEnvironment variable WANDB_ENTITY set[0m


# Make a X—LLM experiment

In [16]:
experiment = Experiment(config=config, train_dataset=train_dataset, eval_dataset=eval_dataset)

## Build experiment

In [17]:
experiment.build()

[32m2023-12-02 00:40:49.059[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mExperiment building has started[0m
[32m2023-12-02 00:40:49.060[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mConfig:
{
  "experiment_key": "base",
  "save_safetensors": true,
  "max_shard_size": "10GB",
  "local_rank": 0,
  "use_gradient_checkpointing": true,
  "trainer_key": "lm",
  "force_fp32": false,
  "force_fp16": false,
  "from_gptq": false,
  "huggingface_hub_token": null,
  "deepspeed_stage": 0,
  "deepspeed_config_path": null,
  "fsdp_strategy": "",
  "fsdp_offload": true,
  "seed": 42,
  "stabilize": false,
  "norm_fp32": false,
  "path_to_env_file": "./.env",
  "prepare_dataset": true,
  "lora_hub_model_id": null,
  "lora_model_local_path": null,
  "fused_model_local_path": null,
  "fuse_after_training": false,
  "quantization_dataset_id": null,
  "quantization_max_samples": 1024,
  "quantized_model_path": "./quantized_mode

In [18]:
experiment.run()

[32m2023-12-02 00:41:44.560[0m | [1mINFO    [0m | [36mxllm.utils.logger[0m:[36minfo[0m:[36m86[0m - [1mTraining will start soon[0m
***** Running training *****
  Num examples = 10,490
  Num Epochs = 2
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 32
  Total optimization steps = 326
  Number of trainable parameters = 20,971,520
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in torch.bfloat16.


Step,Training Loss
1,1.9068
5,1.844
10,1.597
15,1.5571


# After training steps

In [1]:
# # Fuse LoRA weights
# experiment.fuse_lora()

### Or push LoRA weights to HuggingFace Hub

In [None]:
# # Push to hub
# experiment.push_to_hub(
#     repo_id="BobaZooba/AntModel-7B-XLLM-Demo",
#     private=True,
# )

# 🎉 You are awesome!

## Now you know how to prototype models using `xllm`

### Explore more examples at X—LLM repo

https://github.com/BobaZooba/xllm

Useful materials:

- [X—LLM Repo](https://github.com/BobaZooba/xllm): main repo of the `xllm` library
- [Quickstart](https://github.com/KompleteAI/xllm/tree/docs-v1#quickstart-): basics of `xllm`
- [Examples](https://github.com/BobaZooba/xllm/examples): minimal examples of using `xllm`
- [Guide](https://github.com/BobaZooba/xllm/blob/main/GUIDE.md): here, we go into detail about everything the library can
  do
- [Demo project](https://github.com/BobaZooba/xllm-demo): here's a minimal step-by-step example of how to use X—LLM and fit it
  into your own project
- [WeatherGPT](https://github.com/BobaZooba/wgpt): this repository features an example of how to utilize the xllm library. Included is a solution for a common type of assessment given to LLM engineers, who typically earn between $120,000 to $140,000 annually
- [Shurale](https://github.com/BobaZooba/shurale): project with the finetuned 7B Mistal model



## Tale Quest

`Tale Quest` is my personal project which was built using `xllm` and `Shurale`. It's an interactive text-based game
in `Telegram` with dynamic AI characters, offering infinite scenarios

You will get into exciting journeys and complete fascinating quests. Chat
with `George Orwell`, `Tech Entrepreneur`, `Young Wizard`, `Noir Detective`, `Femme Fatale` and many more

Try it now: [https://t.me/talequestbot](https://t.me/TaleQuestBot?start=Z2g)