In [1]:
!pip install -qqq "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" --progress-bar off
from torch import __version__; from packaging.version import Version as V
xformers = "xformers==0.0.27" if V(__version__) < V("2.4.0") else "xformers"
!pip install -qqq --no-deps {xformers} trl peft accelerate bitsandbytes triton --progress-bar off

import torch
from trl import SFTTrainer
from datasets import load_dataset
from transformers import TrainingArguments, TextStreamer
from unsloth.chat_templates import get_chat_template
from unsloth import FastLanguageModel, is_bfloat16_supported

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for unsloth (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.
grpcio-status 1.62.3 requires protobuf>=4.21.6, but you have protobuf 3.20.3 which is incompatible.[0m[31m
[0m🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


## 1. Load model for PEFT

In [2]:
# Load model
max_seq_length = 2048
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name= "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    max_seq_length=max_seq_length,
    load_in_4bit=True,
    dtype=None,
)

# Prepare model for
# Source : https://huggingface.co/docs/peft/en/index
model = FastLanguageModel.get_peft_model(
    model,
    r=8, # Hyperparameters: Rank corresponds to the number of parameters in the adaptation layers -- the more parameters, the better it remembers, and the more complex things it can pick up.

    lora_alpha=16, # Hyperparameter: Alpha is a scaling factor -- it changes how the adaptation layer's weights affect the base model's.
    # Higher alpha means the LoRA layers act more strongly on the base model.

    lora_dropout=0, # Hyperparameter

    #  What are the modules you are targeting?
    #  First check the model architecture
    target_modules=["q_proj", "k_proj", "v_proj", "up_proj", "down_proj", "o_proj", "gate_proj"],

    # '''
    # LoRA adapters should be divided by a factor of the square root of the rank

    # Modifying LoRA with the appropriate scaling factor, which we call the rank-stabilized LoRA (rsLoRA) method,
    # easily provides for a fine-tuning compute/performance trade-off,
    # where larger ranks can be used to trade off increased computational resources during training for better fine-tuning performance, with no change in inference computing cost.
    # '''
    use_rslora=True, # https://huggingface.co/papers/2312.03732
    use_gradient_checkpointing="unsloth"
)
print(model.print_trainable_parameters())

==((====))==  Unsloth 2024.12.3: Fast Mistral patching. Transformers:4.46.3.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 8.0. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/4.14G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/157 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/446 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

Unsloth 2024.12.3 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


trainable params: 20,971,520 || all params: 7,268,995,072 || trainable%: 0.2885
None


LoRA introduces an additional scaling coefficient for applying the LoRA weights to the pretrained weights during the forward pass. The scaling involves the rank parameter r, which we discussed earlier, as well as another hyperparameter α (alpha) that is applied as follows:

``` scaling = alpha / r ```

```weight += (lora_B @ lora_A) * scaling ```

Interesting Read...

Alpha is a scaling parameter.

```alpha = rank is scaling weights at 1.0 ```

What you train in LORA weights will be then merged with the main weights of model at x 1.0

Previously people were suggesting alpha = (2 x rank), which is like yelling at your model really loud — all in order to make the newly learned weights “louder” than the model’s own. That requires a really good and large dataset, otherwise you are just amplifying nonsense.

The model knows how to speak well already, while your dataset is too small to teach (or scream at) the model any language fundamentals. Increasing alpha amplifies everything, not just the stuff you wish the model learns from it. ""

## 2. Prepare data and tokenizer

In [3]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []

    for instruction, input_, output in zip(instructions, inputs, outputs):
    # Must add EOS_TOKEN, otherwise your generation will go on forever!
      text = alpaca_prompt.format(instruction, input_, output) + EOS_TOKEN
      texts.append(text)

    return { "text" : texts, }
pass

from datasets import load_dataset
# dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = load_dataset("Shaagun/Instruction_Lithuanian_English",split="train")


dataset = dataset.map(formatting_prompts_func, batched = True,)

Lithuanian_Instruc.json:   0%|          | 0.00/548k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/800 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

In [4]:
dataset

Dataset({
    features: ['instruction', 'input', 'output', 'text'],
    num_rows: 800
})

## 3. Training

# Create your account in https://wandb.ai/

In [5]:
trainer=SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=True,
    args=TrainingArguments(
        learning_rate=3e-4,
        lr_scheduler_type="linear",
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        num_train_epochs=1,
        fp16=not is_bfloat16_supported(),
        bf16= is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        warmup_steps=10,
        output_dir="my_lora_adapters",
        seed=0,

        max_steps = 10, # total number of steps for the training
        # OR
        # num_train_epochs = 1, # Set this for 1 full training run.
    ),
)


Generating train split: 0 examples [00:00, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [6]:
# Finally Start your training
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 81 | Num Epochs = 2
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 4
\        /    Total batch size = 16 | Total steps = 10
 "-____-"     Number of trainable parameters = 20,971,520
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Step,Training Loss
1,1.5467
2,1.5173
3,1.4106
4,1.3997
5,1.2963
6,2.0792
7,1.2217
8,1.0294
9,1.0018
10,0.9333


TrainOutput(global_step=10, training_loss=1.3435991406440735, metrics={'train_runtime': 105.3877, 'train_samples_per_second': 1.518, 'train_steps_per_second': 0.095, 'total_flos': 1.3764526608482304e+16, 'train_loss': 1.3435991406440735, 'epoch': 1.9047619047619047})

## 4. Inference

In [7]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

['<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nContinue the fibonnaci sequence.\n\n### Input:\n1, 1, 2, 3, 5, 8\n\n### Response:\n13, 21, 34, 55, 89, 144</s>']

In [8]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Santrauka", # instruction
        "Papasakok man ką nors apie Indiją", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

['<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nSantrauka\n\n### Input:\nPapasakok man ką nors apie Indiją\n\n### Response:\nIndija yra vieta, kurioje yra dažniausiai indijų gyvenantys. Tai vieta, kurioje yra dažniausiai indijų kultūros ir tradicijos. Tai vieta, kurioje']

In [9]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Santrauka", # instruction
        "Papasakok man apie \"Shaagun\"", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

['<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nSantrauka\n\n### Input:\nPapasakok man apie "Shaagun"\n\n### Response:\nShaagun yra indijos miestas, kurio namuose yra vienuolikta. Štai miesto informacija.</s>']

## 5. Save trained model

In [10]:
from huggingface_hub import login
login("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")

In [None]:
# You can save your model locally
model.save_pretrained_merged("model", tokenizer, save_method="merged_16bit")

In [11]:
# You can push your model to huggingface
model.push_to_hub_merged("Shaagun/mistral", tokenizer, save_method="merged_16bit")

Unsloth: You are pushing to hub, but you passed your HF username = Shaagun.
We shall truncate Shaagun/mistral to mistral
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 4.1G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 53.11 out of 83.48 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 32/32 [00:00<00:00, 53.00it/s]


Unsloth: Saving tokenizer...

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

 Done.


README.md:   0%|          | 0.00/606 [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/Shaagun/mistral
