<a href="https://colab.research.google.com/github/AliGreo/LLMs/blob/main/llama_3_finetune_unsloth_Qlora.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
!pip install --upgrade unsloth bitsandbytes

In [6]:
%%capture

!pip install accelerate peft

In [4]:
!nvidia-smi

Sat Feb  1 14:33:27 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   45C    P8             10W /   70W |       2MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [7]:
from unsloth import FastLanguageModel

model_sequence_length = 4090
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    max_seq_length=model_sequence_length,
    dtype=None,
    load_in_4bit=True
)

==((====))==  Unsloth 2025.1.8: Fast Llama patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/1.10G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.7k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

In [10]:
# parameter training with Lora

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=32,
    target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','up_proj','down_proj'],
    lora_dropout=0,
    use_rslora=True,
    use_gradient_checkpointing="unsloth",
    bias="none",
    random_state=5413
)

Unsloth 2025.1.8 patched 16 layers with 16 QKV layers, 16 O layers and 16 MLP layers.


In [11]:
# load the dataset
!pip install -q datasets

In [13]:
from datasets import load_dataset, Dataset

dataset = load_dataset("vicgalle/alpaca-gpt4", split="train[:10000]")

README.md:   0%|          | 0.00/3.38k [00:00<?, ?B/s]

(…)-00000-of-00001-6ef3991c06080e14.parquet:   0%|          | 0.00/48.4M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/52002 [00:00<?, ? examples/s]

In [15]:
print(dataset, "\n\n")
print(dataset[0]['text'])

Dataset({
    features: ['instruction', 'input', 'output', 'text'],
    num_rows: 10000
}) 


Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Give three tips for staying healthy.

### Response:
1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.

3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 ho

In [16]:
!pip install trl -q

In [20]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

args = TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    max_steps=100,
    learning_rate=2e-4,
    fp16= not is_bfloat16_supported(),
    bf16= is_bfloat16_supported(),
    logging_steps=20,
    optim="adamw_8bit",
    weight_decay=0.1,
    lr_scheduler_type="linear",
    seed=6513,
    output_dir="unsloth/Llama-3.2-1B-Instruct-bnb-4bit-finetune-vicgalle/alpaca-gpt4-dataset"
)

trainer= SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    max_seq_length=model_sequence_length,
    dataset_num_proc=2,
    packing=False,
    args=args
)

trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 10,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 100
 "-____-"     Number of trainable parameters = 11,272,192


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33magreu77[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Step,Training Loss
20,1.311
40,1.1833
60,1.2313
80,1.1248
100,1.17


## inference

In [25]:
inference_model = FastLanguageModel.for_inference(model)

In [32]:
messege = [
    {"role":"user", "content":"how is lioneal messi ?"}
]

input_ids = tokenizer.apply_chat_template(
    messege,
    add_generation_propt=True,
    return_tensors="pt"
).to("cuda")

print(input_ids)


from transformers import TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True)

_ = inference_model.generate(input_ids,
                             streamer=streamer,
                             max_new_tokens=4000,
                             pad_token_id=tokenizer.eos_token_id)

tensor([[128000, 128006,   9125, 128007,    271,  38766,   1303,  33025,   2696,
             25,   6790,    220,   2366,     18,    198,  15724,   2696,     25,
            220,   1721,  13806,    220,   2366,     20,    271, 128009, 128006,
            882, 128007,    271,   5269,    374,    326,   6473,    278,   9622,
             72,    949, 128009]], device='cuda:0')
<|start_header_id|>assistant<|end_header_id|>

Lionel Messi is a professional soccer player who is widely regarded as one of the greatest players of all time. Born on June 24, 1987, in Rosario, Argentina, Messi has had an incredibly successful career, winning numerous awards and accolades.

Physically, Messi is known for his exceptional speed, agility, and endurance, which allow him to play at an elite level for over two decades. He is also famous for his incredible skill and technique on the field, which has earned him the nickname "Lionel the Great".

Off the field, Messi is known for his humility, kindness, and de