## Fine tunning Llama 3.2 B model

We will use unsloth library

You're fine-tuning your model to become better at instruction-following, multi-domain generalization, and chat-style completion by using FineTome-100k. Think of this as making your model closer to ChatGPT-style assistants by supervised fine-tuning on high-quality, curated tasks.

We are doing instruction fine-tuning using a LoRA-adapted quantized language model on the FineTome-100k dataset, with the goal of:

1) Making the model better at responding to chat-style inputs

2) Teaching it to follow instructions

3) Aligning it with human-like reasoning and answers



In [1]:
 pip install unsloth transformers trl

Collecting unsloth
  Downloading unsloth-2025.6.2-py3-none-any.whl.metadata (47 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/47.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.1/47.1 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
Collecting trl
  Downloading trl-0.18.2-py3-none-any.whl.metadata (11 kB)
Collecting unsloth_zoo>=2025.6.1 (from unsloth)
  Downloading unsloth_zoo-2025.6.1-py3-none-any.whl.metadata (8.1 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.30-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting bitsandbytes (from unsloth)
  Downloading bitsandbytes-0.46.0-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.9.24-py3-none-any.whl.metadata (11 kB)
Collecting datasets>=3.4.1 (from unsloth)
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting fsspec<=2025.3.0,>=2023

In [2]:
import torch # For using neural networks updation and taking care of weights
from unsloth import FastLanguageModel# For loading the pretrained model and for peft
from datasets import load_dataset # To load the datasets
from trl import SFTTrainer# For making trainer and using for Supervised fine tunning
from transformers import TrainingArguments # For giving arguments in the training of trainer
from unsloth.chat_templates import get_chat_template,standardize_sharegpt## create templates to make so that it is used for training model

In [None]:
model,tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",# Load Pretrained Model and Enables LORA AND QLORA FOR adaptation
    max_seq_length=2048,# Maxm number ofgit  token that a model can handle
    load_in_4bit=True,# Quantized the model into 4 bit model
)

==((====))==  Unsloth 2025.6.2: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!




In [15]:
model=FastLanguageModel.get_peft_model(
    model,r=16,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]
)

In [12]:
tokenizer=get_chat_template(tokenizer,chat_template="llama-3.1")
## It give tokenizer to be make conversation between user and assistant into a single
# string as in the format of Llama-3.1 model

In [8]:
dataset=load_dataset("mlabonne/FineTome-100k",split="train")

README.md:   0%|          | 0.00/982 [00:00<?, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/117M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/100000 [00:00<?, ? examples/s]

In [9]:
dataset=standardize_sharegpt(dataset)

Unsloth: Standardizing formats (num_proc=2):   0%|          | 0/100000 [00:00<?, ? examples/s]

In [10]:
dataset

Dataset({
    features: ['conversations', 'source', 'score'],
    num_rows: 100000
})

In [13]:
dataset=dataset.map(
    lambda examples:{
        "text":[
            tokenizer.apply_chat_template(convo,tokenize=False)
            for convo in examples["conversations"]
        ]
    },
    batched=True,
)

Map:   0%|          | 0/100000 [00:00<?, ? examples/s]

In [24]:
trainer=SFTTrainer(# we are creating a object from SFTTrainer class from trl library
    model=model ,## provide peft model for fine tunning
    train_dataset= dataset ,# in which dataset u have to fine tune it in which there is a text field
    dataset_text_field="text",# column in which u have to fine tune your model present in your dataset
    max_seq_length=1024,# Maximum input token for input
    args=TrainingArguments(## Training hyperparameters
        per_device_train_batch_size=1,# number of samples per GPU per batch
        gradient_accumulation_steps=4,# Accumulate gradient step before updating it
        warmup_steps=5,## slowly ramps to increaes learning rate
        max_steps=60,#Total training steps
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(), #choose between what is present
        logging_steps=1,# to log about the loss in every one step
        output_dir="outputs" # Path to save checkpoints like final_model and logs
    ),
)

In [25]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 100,000 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 24,313,856/3,000,000,000 (0.81% trained)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


Unsloth: Will smartly offload gradients to save VRAM!


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


Step,Training Loss
1,1.3235
2,1.5816
3,1.7049
4,1.9109
5,1.2919
6,1.1754
7,1.2353
8,1.2267
9,1.1165
10,1.2278


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*a

TrainOutput(global_step=60, training_loss=1.0516178588072458, metrics={'train_runtime': 205.6823, 'train_samples_per_second': 1.167, 'train_steps_per_second': 0.292, 'total_flos': 2188400906668032.0, 'train_loss': 1.0516178588072458})

In [27]:
model.save_pretrained("final_model")## To save the model

In [28]:
inference_model,inference_tokenizer=FastLanguageModel.from_pretrained(
    model_name="final_model",
    max_seq_length=1024,
    load_in_4bit=True,
)

==((====))==  Unsloth 2025.6.2: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!




In [32]:
text_prompts = [
    "What are the key principles of investment?"
]

for prompt in text_prompts:
  formatted_prompt = inference_tokenizer.apply_chat_template([{
      "role": "user",
      "content": prompt
      }], tokenize=False)

  model_inputs = inference_tokenizer(formatted_prompt, return_tensors="pt").to("cuda")
  generated_ids = inference_model.generate(
      **model_inputs,
      max_new_tokens=512,
      temperature=0.7,
      do_sample=True,
      pad_token_id=inference_tokenizer.pad_token_id
  )
  response = inference_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  print(response)

system

Cutting Knowledge Date: December 2023
Today Date: 20 Jun 2025

user

What are the key principles of investment?assistant

The key principles of investment include:

1. Diversification: Spreading investments across various asset classes to minimize risk and maximize returns.
2. Risk management: Understanding and managing risk to achieve investment objectives.
3. Long-term perspective: Investing for the long-term, rather than focusing on short-term gains.
4. Dollar-cost averaging: Investing a fixed amount of money at regular intervals, regardless of market conditions.
5. Diversification of income: Investing in a mix of income-generating assets to reduce reliance on any one source.
6. Active management: Regularly monitoring and adjusting investments to respond to market conditions.
7. Tax efficiency: Considering tax implications when making investment decisions.
8. Liquidity: Ensuring that investments can be easily converted into cash when needed.
9. Risk assessment: Continuously 