### Unsloth Framework


*   https://unsloth.ai/
*   Library to help fine-tune a model
*   30x faster + 10% accuracy boost
*   How? By manually deriving all compute heavy maths steps and handwriting GPU kernels, unsloth can magically make training faster without any hardware changes.
*   60% less memory usage
*   NVIDIA, AMD & Intel GPU support
* Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc)
*   Basically re-written the Pytorch model to optimize Fine-tuning
* Supports 4bit and 16bit QLoRA / LoRA finetuning via bitsandbytes.
*   Claim : Zero % loss in accuracy
*   Making AI training easier for everyone
*   Free version - 2.2x faster,
40% less memory,
Supports LLama 1, 2,
Single GPU only,
Supports 4 bit, 16 bit LoRA
* https://github.com/unslothai/unsloth



### Our Task


*   Create a fine-tune model for : Text Generation
*   Dataset : IMDB Movie Database



In [None]:
!pip install "unsloth[colab] @git+https://github.com/unslothai/unsloth.git"

Collecting unsloth[colab]@ git+https://github.com/unslothai/unsloth.git
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-qvrhjgox/unsloth_be5fbc89460841419b3232a612937a88
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-qvrhjgox/unsloth_be5fbc89460841419b3232a612937a88
  Resolved https://github.com/unslothai/unsloth.git to commit 3e4c5a323c16bbda2c92212b790073c4e99c2a55
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting bitsandbytes (from unsloth[colab]@ git+https://github.com/unslothai/unsloth.git)
  Downloading bitsandbytes-0.42.0-py3-none-any.whl (105.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
Collecting xformers@ https://download.p

In [None]:
# huggingface transformer

!pip install "git+https://github.com/huggingface/transformers.git"

Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-h69of3tr
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-h69of3tr
  Resolved https://github.com/huggingface/transformers.git to commit c8d98405a8f7b0e5d07391b671dcc61bb9d7bad5
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Created wheel for transformers: filename=transformers-4.39.0.dev0-py3-none-any.whl size=8552478 sha256=c71b0959dc075d4e18e893d8506bb8298af620c5d651083e77d7ef990d6a560f
  Stored in directory: /tmp/pip-ephem-wheel-cache-rwd67a4r/wheels/e7/9c/5b/e1a9c8007c343041e61cc484433d512ea9274272e3fcbe7c16
Successfully bu

In [None]:
# Train transformer language models with reinforcement learning.
# https://github.com/huggingface/trl
!pip install trl



### Setup Unsloth FastLanguageModel

In [None]:
# FastLanguageModel
import torch
from transformers import TrainingArguments
from trl import SFTTrainer
from unsloth import FastLanguageModel

### Load Movie Dataset

In [None]:
# Load dataset - use huggingface datasets

from datasets import load_dataset

In [None]:
max_seq_length = 2048

In [None]:
# loadhuggingface.co/datasets/imdb

dataset = load_dataset("imdb", split="train")

Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [None]:
dataset

Dataset({
    features: ['text', 'label'],
    num_rows: 25000
})

### Model and Tokenizer



*   Mistral
* bitsandbyters (bnb)
*   4-bit quantized model



In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-bnb-4bit",
    max_seq_length=max_seq_length,
    dtype=None,
    load_in_4bit=True
)

config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Mistral patching release 2024.2
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.22.post7. FA = False.
 "-____-"     Apache 2 free license: http://github.com/unslothai/unsloth




model.safetensors:   0%|          | 0.00/4.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/971 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/438 [00:00<?, ?B/s]

### Get PEFT Model


*   Model pacthing
*   Add Fast Lora weights




In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj","k_proj","v_proj","gate_proj","up_proj","down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = True,
    random_state = 3407,
    max_seq_length = max_seq_length
)

Unsloth cannot patch O projection layer with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
Unsloth 2024.2 patched 32 layers with 32 QKV layers, 0 O layers and 32 MLP layers.


### Supervised Finetune trainer

In [None]:
trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    tokenizer = tokenizer,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        output_dir = "unsloth-test",
        optim = "adamw_8bit",
        seed = 3407,
        max_steps=50
    )
)

trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 25,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 50
 "-____-"     Number of trainable parameters = 37,748,736


Step,Training Loss
1,2.474
2,2.2008
3,2.1869
4,2.3131
5,2.365
6,2.4827
7,2.2451
8,2.2532
9,2.128
10,2.3405


TrainOutput(global_step=50, training_loss=2.293449935913086, metrics={'train_runtime': 621.9741, 'train_samples_per_second': 0.643, 'train_steps_per_second': 0.08, 'total_flos': 8102006570188800.0, 'train_loss': 2.293449935913086, 'epoch': 0.02})

### Test the model

In [None]:
# create a tokenizer
inputs = tokenizer(
    [
        "I really like the movie because it talks about humanity and shows human emotions"
    ],
    return_tensors ="pt"
).to("cuda")

In [None]:
inputs

{'input_ids': tensor([[    1,   315,  1528,   737,   272,  5994,  1096,   378, 15066,   684,
         17676,   304,  4370,  2930, 13855]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

In [None]:
outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [None]:
outputs

tensor([[    1,   315,  1528,   737,   272,  5994,  1096,   378, 15066,   684,
         17676,   304,  4370,  2930, 13855, 28723,   661,   349,   264,  1215,
          1179,  5994,   304,   315,  6557,   378,   298,  3376, 28723,   661,
           349,   264,  1215,  1179,  5994,   304,   315,  6557,   378,   298,
          3376, 28723,   661,   349,   264,  1215,  1179,  5994,   304,   315,
          6557,   378,   298,  3376, 28723,   661,   349,   264,  1215,  1179,
          5994,   304,   315,  6557,   378,   298,  3376, 28723,   661,   349,
           264,  1215,  1179,  5994,   304,   315,  6557,   378,   298,  3376,
         28723,   661,   349,   264,  1215,  1179,  5994,   304,   315,  6557,
           378,   298,  3376, 28723,   661,   349,   264,  1215,  1179,  5994,
           304,   315,  6557,   378,   298,  3376, 28723,   661,   349,   264,
          1215,  1179,  5994,   304,   315,  6557,   378,   298,  3376, 28723,
           661,   349,   264,  1215,  1179,  5994,  

In [None]:
# decode the tokenizer

tokenizer.batch_decode(outputs)

['<s> I really like the movie because it talks about humanity and shows human emotions. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it']

### Save the model

In [None]:
model.save_pretrained("my_model_via_unsloth")