### Installation

**Step 1: Install Necessary Libraries**

We will first install necessary libraries below:

In [1]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
    !pip install --no-deps unsloth

In [2]:
!pip install evaluate nltk rouge_score -q -U

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone


In [3]:
import nltk
# Download NLTK Data for Tokenization
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [4]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "garbage_collection_threshold:0.6,max_split_size_mb:128"

In [5]:
!pip freeze | grep triton

triton==3.2.0


In [6]:
!pip freeze | grep torch

torch @ https://download.pytorch.org/whl/cu124/torch-2.6.0%2Bcu124-cp311-cp311-linux_x86_64.whl
torchao==0.10.0
torchaudio @ https://download.pytorch.org/whl/cu124/torchaudio-2.6.0%2Bcu124-cp311-cp311-linux_x86_64.whl
torchdata==0.11.0
torchsummary==1.5.1
torchtune==0.6.1
torchvision @ https://download.pytorch.org/whl/cu124/torchvision-0.21.0%2Bcu124-cp311-cp311-linux_x86_64.whl


In [7]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0


In [8]:
## Environment Variables and Warnings
# We configure environment variables for GPU usage and suppress tokenizer parallelism warnings.
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [9]:
# ignore potential warnings during training that don’t affect the outcome.
import warnings
warnings.filterwarnings("ignore")

### Unsloth

https://unsloth.ai/blog/gemma3
https://docs.unsloth.ai/get-started/fine-tuning-guide
https://unsloth.ai/

Unsloth makes finetuning large language models like Llama-3, Mistral, Phi-4 and Gemma 2x faster, use 70% less memory, and with no degradation in accuracy!

"Unsloth" is a Python library and framework specifically designed to make fine-tuning Large Language Models (LLMs) like Llama, Mistral, and Gemma (which you are using) significantly faster and more memory-efficient.

**What is Unsloth?**

Think of Unsloth as an optimization layer that sits on top of the standard Hugging Face Transformers library. While Hugging Face provides the general framework for LLMs, Unsloth introduces highly optimized, low-level computations (often written in custom GPU kernels using languages like Triton) that directly target the bottlenecks in LLM training, especially during fine-tuning with techniques like LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA).

**Why is Unsloth "Better"?**

Unsloth claims and demonstrates several key advantages that make it "better" for LLM fine-tuning, particularly for users with limited GPU resources (like in Google Colab):

**Speed**:
2x to 30x Faster Training: Unsloth achieves significantly faster training times compared to traditional Hugging Face methods (even with Flash Attention 2 enabled). This is due to its hand-written, optimized GPU kernels that replace less efficient PyTorch operations.
This means you can complete your fine-tuning in hours instead of days, accelerating your experimentation and development cycle.

**Memory Efficiency**:
70% to 90% Less GPU Memory Usage: Unsloth drastically reduces the VRAM (GPU memory) required for fine-tuning. This is a game-changer, allowing you to:

Fine-tune much larger models (e.g., 7B or even 8B parameter models) on GPUs with limited memory (like Colab's T4 GPUs).
Use larger batch sizes, which can sometimes lead to more stable training and better performance.

This is crucial for Google Colab users, as it allows you to get more out of the free or cheaper GPU tiers.

**No Accuracy Degradation**:

Crucially, Unsloth achieves these speed and memory improvements without sacrificing model accuracy. It performs exact computations (no approximations) but does so more efficiently.

**Simplicity and Integration**:

Simple API: Unsloth offers a very straightforward API that seamlessly integrates with the Hugging Face ecosystem (Transformers, PEFT, TRL's SFTTrainer). You just need to import FastLanguageModel from unsloth and use its from_pretrained method, and the rest of your training code often remains very similar.

Built-in Optimizations: It automates many optimizations (like efficient LoRA application, gradient checkpointing, paged optimizers, efficient attention mechanisms like Flash Attention 2) that would otherwise require complex manual setup or deep knowledge of the underlying libraries.

**Broad Model Support**:

Unsloth supports a wide range of popular open-source LLMs, including Llama (1, 2, 3), Mistral, Phi, and Gemma, making it a versatile tool for many projects.

In summary, Unsloth makes LLM fine-tuning accessible, faster, and more efficient, allowing more developers and researchers to experiment with and deploy custom LLMs even on consumer-grade hardware or free cloud platforms like Google Colab. It's why you often see it recommended and used in Colab notebooks for fine-tuning.


**Setting up unsloth**

Includes LoRA Adapters

**FastLanguageModel**: This class is optimized for fine-tuning and inference of text-only language models such as LLaMA, Mistral, Phi, and Gemma. It provides streamlined methods for loading models, applying Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA, and enabling faster inference.

Key Features:

- Supports loading pre-trained models with options for quantization (e.g., 4-bit) to reduce memory usage.
- Integrates with PEFT methods to facilitate efficient fine-tuning.
- Offers methods like for_inference() to enable optimized inference pathways.

**FastModel** extends the capabilities of FastLanguageModel by adding support for vision-language models, enabling fine-tuning of models that process both text and visual data.

Here we are using **FastModel**

  

**Step 2: Loading the Model**

 The code below loads the pre-trained Gemma 3 4b language model using the unsloth library. It sets configuration options like a maximum sequence length of 1024 tokens and enables 4-bit quantization to reduce memory usage. The data type (dtype) is auto-detected, and the model and tokenizer are loaded for use in further language processing tasks. This setup optimizes memory efficiency while working with large language models.

In [10]:
# from unsloth import FastLanguageModel
# from unsloth import is_bfloat16_supported
from unsloth import FastModel #for both language and vision finetuning
import torch

fourbit_models = [
    # 4bit dynamic quants for superior accuracy and low memory use
    "unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-27b-it-unsloth-bnb-4bit",

    # Other popular models!
    "unsloth/Llama-3.1-8B",
    "unsloth/Llama-3.2-3B",
    "unsloth/Llama-3.3-70B",
    "unsloth/mistral-7b-instruct-v0.3",
    "unsloth/Phi-4",
] # More models at https://huggingface.co/unsloth

# Set Configuration Parameters
dtype = (
    None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
)
load_in_4bit = True  # Use 4bit quantization to reduce memory usage. Can be False.

# Load Pre-trained Model (base_with_adapter_model)
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/gemma-3-4b-it", #"unsloth/gemma-3-4b-it" "unsloth/gemma-2-9b"
    # max_seq_length=max_seq_length,
    max_seq_length=1024, # Choose any for long context! # Choose any! We auto support RoPE Scaling internally! This defines the maximum number of tokens for input sequences.
    dtype=dtype,
    # 4 bit quantization to reduce memory
    load_in_4bit=load_in_4bit, ### Reduces the model's memory footprint, allowing for faster computation and the ability to run larger models on hardware with limited resources.
    load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    )

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.5.7: Fast Gemma3 patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gemma3 won't work! Using float32.


model.safetensors:   0%|          | 0.00/4.56G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/210 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/70.0 [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


tokenizer_config.json:   0%|          | 0.00/1.16M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/670 [00:00<?, ?B/s]

In [11]:
# Define the prompt style for translation tasks
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
Translate the following English text to Hindi.

### Input:
{}

### Response:
<think>
"""

**Model Before Fine Tuning**

In [12]:
question = "What is the capital of India?"

FastModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1024,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>
The question asks for the capital of India. I need to provide the answer in Hindi. The capital of India is New Delhi.
</think>
भारत की राजधानी क्या है? (Bharat ki rajdhani kya hai?)
<end_of_turn>


**Testing pre-trained model with examples before finetuning**

In [None]:
def test_model_translation(model, tokenizer, sentences, prompt_template):
    """Test model on multiple sentences and return only the Hindi translations"""
    FastModel.for_inference(model)  # Enable optimized inference
    translations = []

    for sentence in sentences:
        prompt = prompt_template.format(sentence)
        inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

        outputs = model.generate(
            input_ids=inputs.input_ids,
            attention_mask=inputs.attention_mask,
            max_new_tokens=256,
            use_cache=True,
        )

        response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

        # Extract only the Hindi part from the response
        if "Hindi:" in response:
            hindi_only = response.split("Hindi:")[1].strip()
        else:
            hindi_only = response.strip()

        translations.append(hindi_only)

    return translations

# Test sentences covering different complexity levels
test_sentences = [
    "The Constitution of India is the supreme law of the land and provides the framework for the country's legal system.",
    "The Supreme Court of India, established in 1950, serves as the highest judicial forum and final court of appeal.",
    "Filing a First Information Report (FIR) at the police station is mandatory to initiate criminal proceedings under the Code of Criminal Procedure.",
    "Section 302 of the Indian Penal Code prescribes punishment for murder, which may extend to death penalty or life imprisonment.",
    "The writ of habeas corpus can be filed in a High Court if a person is illegally detained or imprisoned.",
    "Under the Indian Evidence Act, a dying declaration is admissible as evidence in court proceedings.",
    "The Right to Information Act of 2005 empowers citizens to request information from public authorities.",
    "Anticipatory bail under Section 438 of the Criminal Procedure Code allows a person to seek bail in anticipation of arrest.",
    "The doctrine of stare decisis is followed by Indian courts, wherein precedents set by higher courts are binding on lower courts.",
    "The Advocate General is appointed by the Governor and serves as the highest law officer of a state."
]

# Define a more effective prompt template that explicitly asks for ONLY the translation
improved_prompt_template = """Translate the following English legal text to Hindi. Provide ONLY the Hindi translation, with no explanations, transliterations, or additional text:

English: {}

Hindi:"""

# Get translations with the improved prompt
print("Getting Hindi translations...")
hindi_translations = test_model_translation(model, tokenizer, test_sentences, improved_prompt_template)

# Print only the Hindi translations
print("Hindi Translations:")
for translation in hindi_translations:
    print(translation)

# Free up GPU memory
torch.cuda.empty_cache()

Getting Hindi translations...


In [None]:
hindi_translations

**Step 3: Adding LoRA Adapters for Efficient Fine-tuning**

For Adding LoRA Adapters, we only need to update 1 to 10% of all parameters. The code below utilizes the FastLanguageModel.get_peft_model function to adapt a model using LoRA (Low-Rank Adaptation) techniques. It specifies parameters such as the rank (r = 16), target modules for adaptation, and optimization settings like lora_alpha and bias.

The code can enables “unsloth” for efficient memory usage and sets a random state for reproducibility. Here we have not used this parameter.

    - The get_peft_model() function modifies the original pre-trained model to incorporate the PEFT techniques.
    That means this will add the LoRA adapter to the model layers. For example, these two small low metrics will be added to the multi-head mechanism and feed-forward network, etc.

    - The rank of each metric is set to 16. The higher rank value improves the model’s ability to adapt but at the cost of more memory usage and computational overhead.

    - traget_modules specifies the layers of the model where LoRA will be applied. The layers typically include projects and other key components.

    - lora_alpha controls the impact of the low-rank updates on the pre-trained model weights.

# Do model patching and add fast LoRA weights

In [None]:
#merge adapter parameters with base model (base_with_adapter_model)
# model = model.merge_and_unload() # now we just have a regular AutoModelForCausalLM Transformers model

In [None]:
# base_model_with_new_adapter
model = FastModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 # Larger = higher accuracy, but might overfit, r=16,  # Increased from 8 to 16 for better adaptation capacity
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,  # Recommended alpha == r at least #8 # Increased from 16 to 32 (2x rank)
    lora_dropout = 0.05, # Supports any, but = 0 is optimized # Added small dropout for regularization
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    # use_gradient_checkpointing = False, # Disable Gradient Checkpointing: When applying the LoRA adapters using FastLanguageModel.get_peft_model (this would have been done before saving the checkpoint that was loaded in the final working notebook), the following was set:
    random_state = 42, # Changed from 3407 to more standard 42
    use_rslora = True,  # We support rank stabilized LoRA # Enable rank-stabilized LoRA for better convergence
    loftq_config = None, # And LoftQ
    finetune_vision_layers     = False, # Turn off for just text!
    finetune_language_layers   = True,  # Should leave on!
    finetune_attention_modules = True,  # Attention good for GRPO
    finetune_mlp_modules       = True,  # SHould leave on always!
)

In [None]:
# Print the patched layers information
model.print_trainable_parameters()

In [None]:
# print(model)

**HuggingFace Login**

In [None]:
from huggingface_hub import login
from google.colab import userdata
hf_token = userdata.get("HF_TOKEN") #Replace with your Huggingface token
login(hf_token)

### Data Preparation

**Loading dataset**

In [None]:
from datasets import load_dataset
dataset = load_dataset("shrimantasatpati/legal-en-hi-anuvaad")
dataset

**Step 4: Defining the Alpaca Format For Preparing the Dataset**

The code below defines a prompt formatting function for preparing training data in a structured format. It starts by creating a template (alpaca_prompt) that includes placeholders for the instruction, input, and output. The formatting_prompts_func function takes in a batch of examples, extracts the English (english_sentence) and Hindi (hindi_sentence) text, and formats them into the defined template. It adds an EOS_TOKEN (End-of-Sequence token) at the end of each formatted prompt to prevent the model from generating responses indefinitely. The final output is a dictionary with the formatted text for each example, ready for model training or fine-tuning.

**Step 5: Loading the Dataset in the alpaca format**

The dataset is prepared in the correct format, with each entry consisting of a properly structured instruction-input-output prompt for English-Hindi translation tasks.

In [None]:
# Define the Alpaca prompt template
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
Translate the English text to Hindi accurately, preserving the meaning and style.

### Input:
{}

### Response:
<think>
Let me translate this English text to Hindi while preserving its meaning and nuances.
</think>
{}"""

# # EOS_TOKEN should be defined based on your tokenizer
# Add end of sequence token
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

# Formatting function that processes individual examples to work with the SFTTrainer
def formatting_func(example):
    input_text = example['english_sentence']
    output = example['hindi_sentence']
    # Return the formatted text string for this example
    return {"text": alpaca_prompt.format(input_text, output) + EOS_TOKEN}

# Apply the formatting function to create a formatted dataset
formatted_dataset = dataset.map(formatting_func)

# Show formatted dataset
print(f"Original dataset size: {len(dataset)}")
print(f"Formatted dataset size: {len(formatted_dataset)}")
print("\n\n",formatted_dataset)
print("\n")
print(f"\nSample formatted text:")
print(formatted_dataset["train"][0]['text'][:500] + "...")

### Train model

#### Split Dataset for Training and Evaluation

In [None]:
# Create train/test split (90% train, 10% test)
# Call train_test_split on the 'train' split of the DatasetDict
train_test_split = formatted_dataset['train'].train_test_split(test_size=0.05, seed=42)
train_dataset = train_test_split['train']
eval_dataset = train_test_split['test']
print(f"Training examples: {len(train_dataset)}")
print(f"Evaluation examples: {len(eval_dataset)}")

In [None]:
print(train_dataset)

In [None]:
print(eval_dataset)

**Step 6: Defining Huggingface TRL’s SFTTrainer for Training the Model**

The code below initializes an SFTTrainer for fine-tuning a model using the trl library. It sets up training parameters such as batch size, gradient accumulation steps, and learning rate within TrainingArguments. The trainer also configures logging and optimization settings, including the use of mixed precision (fp16 or bf16) based on hardware support. The training process is optimized with an AdamW optimizer and a linear learning rate scheduler.

https://github.com/huggingface/trl

In [None]:
import wandb

wb_token = userdata.get("wandb")

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tuning Gemma-3-4B on English-Hindi Dataset',
    job_type="training",
    config={
    "model": "gemma-3-4b-it",
    "lora_r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "batch_size": 2,
    "learning_rate": 5e-5,
    "epochs": 3,
},
    anonymous="allow"
)

**Train the model**

Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

    - Here gradient_accumulation_steps is set to 8 because that model’s weight update won’t happen after each batch. So this will accumulate the gradient over 8 batches before updating.

    - Here warmup_steps gradually increase the learning rate at the start of training for 5 steps. After achieving the maximum, it will be reduced the defined value.

    - To prevent overfitting, here we are applying some weight decay to the model weights.

In [None]:
# Configure SFT Trainer
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

In [None]:
# Configure SFT Trainer with improved parameters
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,  # Added evaluation dataset
    dataset_text_field="text",
    max_seq_length=1024, # Max Sequence Length: Reduced to 1024 during tokenization
    dataset_num_proc=4,  # Increased from 2 to 4 for faster processing
    packing=False,  # Keep packing off for translation tasks
    args=TrainingArguments(
        per_device_train_batch_size=2,  # Increased from 1 to 2
        per_device_eval_batch_size=4,   # Added evaluation batch size
        gradient_accumulation_steps=4,  # Reduced from 8 to 4 (effective batch size = 8)

        # Improved training schedule
        num_train_epochs=3,             # Train for multiple epochs instead of fixed steps
        # max_steps=-1,                 # Use epochs instead of steps
        # max_steps = 4000,

        # Per device batch size: 2
        # Gradient accumulation steps: 4
        # Effective batch size per step: 8 examples
        # This means after 60 steps, your model will have seen only 480 translation examples (60 × 8).
        # For translation fine-tuning, I would recommend:

        # At least 1,000-3,000 steps for minimal adaptation
        # For robust performance, closer to 10,000+ steps would be better

        # Better learning rate schedule
        learning_rate=5e-5,             # Increased from 2e-4 to 5e-5 for stability
        warmup_ratio=0.1,               # Use ratio instead of steps (10% of training)
        # warmup_steps = 5,
        # It automatically scales with your total training length
        # For your task, 10% warmup allows the model to gradually adapt to the translation task

        # Mixed precision settings based on hardware
        bf16=is_bfloat16_supported(),
        fp16=not is_bfloat16_supported(),

        # Improved logging and evaluation
        # logging_steps=25,               # 50 # Log more frequently for small datasets
        # eval_steps=50,                 # Added regular evaluation # Evaluate frequently to catch overfitting (~3 times per epoch)
        # save_steps=500,                 # Save checkpoints more frequently
        logging_steps=50, # Can increase slightly from 25 to 50 if logs get too verbose
        eval_steps=500, # <--- Adjust eval_steps to evaluate roughly 5-10 times per epoch
        save_steps=500, # <--- Keep save_steps aligned or slightly less frequent than eval
        eval_strategy="steps",          # Explicitly set eval strategy to match save strategy
        save_strategy="steps",          # Explicitly set save strategy

        # Optimizer and weight decay
        optim="adamw_8bit", # It uses 8-bit precision for optimizer states, significantly reducing memory usage #adamfusedtorch
        weight_decay=0.01,
        lr_scheduler_type="cosine", # "linear"

        # It gradually reduces learning rate, improving convergence
        # Translation tasks benefit from the smoother learning rate decay pattern
        # Helps prevent overfitting in the later stages of fine-tuning
        # Cosine decay is still good for longer runs

        # Output directory and reporting
        output_dir="outputs",
        report_to="wandb",

        # Seed for reproducibility
        seed=42, #3407,
        save_total_limit = 2,
        load_best_model_at_end=True, # CRUCIAL for small datasets
        metric_for_best_model="eval_loss", # Start with loss. Can switch to "eval_bleu" etc. if you set up compute_metrics
        greater_is_better=False, # For loss, False is better
        # gradient_checkpointing=True, # Only if you run into OOM
        # gradient_checkpointing_kwargs={'use_reentrant': False},
    )
)

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

Let's train the model! To resume a training run, set `trainer.train(resume_from_checkpoint = True)`

Okay, let's update the calculation for 30,000 examples.

First, we need to determine the number of training examples from your 30,000 total. Assuming your 95% train / 5% eval split:

    Number of Training Examples: 30,000 total examples×0.95=28,500 training examples

Now, let's use your provided formula and batch size configuration:

    Effective Batch Size: 8 (calculated as per_device_train_batch_size (2) x Gradient accumulation steps (4) = 8 examples/step)

Calculation for Steps per Epoch:

    Steps per Epoch: 28,500 examples/8 examples/step=3562.5 steps/epoch

Similar to before, the trainer will likely round down or truncate this to the nearest whole number for the number of updates per epoch. So, this means approximately 3562 steps per epoch.

Calculating Total Steps based on Recommended Epochs (3 to 5 Epochs):

    For 3 Epochs:
        Total Steps: 3562 steps/epoch×3 epochs=10,686 total steps

    For 5 Epochs:
        Total Steps: 3562 steps/epoch×5 epochs=17,810 total steps

This updated calculation shows that even with 3 to 5 epochs, you'll be performing a significantly higher number of training steps compared to your previous run, providing much more opportunity for your model to learn from the larger dataset.

**Step 7: Starting the Training**

In [None]:
# Begin training process and collect training statistics
print("Starting model training...")
torch.cuda.empty_cache()
trainer_stats = trainer.train()
print("Training complete!")

In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

In [None]:
# The model loaded at the end will be the best one according to eval_loss
print(f"Best eval loss: {trainer.state.best_metric}")
print(f"Best checkpoint: {trainer.state.best_model_checkpoint}")

Best eval loss: 0.7687375545501709
Best checkpoint: None


In [None]:
# Save the fine-tuned model
wandb.finish()

0,1
eval/loss,▅▂▂▁▁▁▁▁▂▂▂▄▄▄▅▅▆▆▆▇▇▇██████
eval/runtime,█▁▁▁▃▂▁▁▁▁▂▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁
eval/samples_per_second,▁███▆▇████▇█████▅███████████
eval/steps_per_second,▁███▆▇████▇█████▅███████████
train/epoch,▁▁▁▁▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇███
train/global_step,▁▁▁▁▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▆▆▇▇▇▇███
train/grad_norm,▄▂▂▂▃▄▃▃▄▃▄▄▄▃▃▄▄▆▄▄▃█▅▄▃▃▃▆▁▂▂▃▂▂▂▁▂▂▂▂
train/learning_rate,▂▃▅▆▇█████▇▇▇▇▇▆▆▆▆▅▅▅▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁▁▁
train/loss,█▅▄▃▃▃▃▃▃▂▂▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
eval/loss,1.29096
eval/runtime,12.2054
eval/samples_per_second,4.916
eval/steps_per_second,1.229
total_flos,6.493700991208704e+16
train/epoch,9.99467
train/global_step,1400.0
train/grad_norm,2.51137
train/learning_rate,0.0
train/loss,0.0557


### Inference from finetuned model

**Step 8: Inference from the Fine Tuned Model**

The code below sets up inference for the fine-tuned model using FastModel. It first prepares a prompt (alpaca_prompt) for translation from English to Hindi by formatting it with an example input. The prompt is tokenized and transferred to a GPU (cuda) for efficient computation. The model then generates a response with a maximum of 64 new tokens, and the output is decoded back into text. Finally, it extracts the part of the output after the “### Response:” section, which contains the generated Hindi translation.

Let's run the model via Unsloth native inference! According to the Gemma-3 team, the recommended settings for inference are temperature = 1.0, top_p = 0.95, top_k = 64, Here we want exact output, so we are not using the usual hyperparameters

In [None]:
# Prepare Model for Inference
FastModel.for_inference(model) # Enable native 2x faster inference
# Define optimized inference prompt template
inference_prompt = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
Translate the English text to Hindi accurately, preserving the meaning and style.

### Input:
{}

### Response:
"""
print("Model prepared for inference")

def translate_to_hindi(english_text):
    """Translate English text to Hindi using fine-tuned model"""
    inputs = tokenizer(
        [inference_prompt.format(english_text)],
        return_tensors="pt"
    ).to("cuda")

    # Generate with improved parameters
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,  # Adjusted based on expected output length
        temperature=0.0,     # Lower temperature for translation tasks
        top_p=0.95,          # Keep high diversity but filter unlikely tokens
        do_sample=False,     # Deterministic for translation
        use_cache=True
    )

    # Extract translation from response
    full_response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

    # Extract the part after "### Response:"
    if "### Response:" in full_response:
        translation = full_response.split("### Response:")[1].strip()
    else:
        translation = full_response.strip()

    # Clean up any EOS tokens
    if EOS_TOKEN in translation:
        translation = translation.split(EOS_TOKEN)[0].strip()

    return translation

# Get translations from fine-tuned model
print("Getting translations from fine-tuned model...")
test_sentences = ["The Constitution of India is the supreme law of the land and provides the framework for the country's legal system."]
fine_tuned_translations = []
for sentence in test_sentences:
    translation = translate_to_hindi(sentence)
    fine_tuned_translations.append(translation)
    print(f"English: {sentence}")
    print(f"Hindi: {translation}")
    print()

Model prepared for inference
Getting translations from fine-tuned model...
English: The Constitution of India is the supreme law of the land and provides the framework for the country's legal system.
Hindi: <think>
Let me translate this English text to Hindi while preserving its meaning and nuances.
</think>
भारत का संविधान देश का सर्वोच्च कानून है और देश के कानूनी प्रणाली के लिए एक रूपरेखा प्रदान करता है।



**Step 9: Saving the Model & Pushing to Hugging Face**

The following code is for saving the trained model and pushing it to Hugging Face Hub. You would need to give it the HF token for writing to the Hub.

In [None]:
# Save LoRA adapters locally

# After fine-tuning, you can save the model with the adapters and tokenizer locally using these steps:
model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")

model.push_to_hub("shrimantasatpati/english_to_hindi_legal_FT_gemma3_4b_it") # Online saving
tokenizer.push_to_hub("shrimantasatpati/english_to_hindi_legal_FT_gemma3_4b_it") # Online saving
print(f"Model and tokenizer pushed to Hugging Face Hub")

README.md:   0%|          | 0.00/607 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/131M [00:00<?, ?B/s]

Saved model to https://huggingface.co/shrimantasatpati/english_to_hindi_legal_FT_gemma3_4b_it


  0%|          | 0/2 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

Model and tokenizer pushed to Hugging Face Hub


**Test Individual Sentences**

In [None]:
# Test with various examples
test_sentences = [
    "The Constitution of India is the supreme law of the land and provides the framework for the country's legal system.",
    "The Supreme Court of India, established in 1950, serves as the highest judicial forum and final court of appeal.",
    "Filing a First Information Report (FIR) at the police station is mandatory to initiate criminal proceedings under the Code of Criminal Procedure.",
    "Section 302 of the Indian Penal Code prescribes punishment for murder, which may extend to death penalty or life imprisonment.",
    "The writ of habeas corpus can be filed in a High Court if a person is illegally detained or imprisoned.",
    "Under the Indian Evidence Act, a dying declaration is admissible as evidence in court proceedings.",
    "The Right to Information Act of 2005 empowers citizens to request information from public authorities.",
    "Anticipatory bail under Section 438 of the Criminal Procedure Code allows a person to seek bail in anticipation of arrest.",
    "The doctrine of stare decisis is followed by Indian courts, wherein precedents set by higher courts are binding on lower courts.",
    "The Advocate General is appointed by the Governor and serves as the highest law officer of a state."
]

# Get translations from fine-tuned model
print("Getting translations from fine-tuned model...")
fine_tuned_translations = []
for sentence in test_sentences:
    translation = translate_to_hindi(sentence)
    fine_tuned_translations.append(translation)
    print(f"English: {sentence}")
    print(f"Hindi: {translation}")
    print()

Getting translations from fine-tuned model...
English: The Constitution of India is the supreme law of the land and provides the framework for the country's legal system.
Hindi: <think>
Let me translate this English text to Hindi while preserving its meaning and nuances.
</think>
भारत का संविधान देश का सर्वोच्च कानून है और देश के कानूनी प्रणाली के लिए एक रूपरेखा प्रदान करता है।

English: The Supreme Court of India, established in 1950, serves as the highest judicial forum and final court of appeal.
Hindi: <think>
Let me translate this English text to Hindi while preserving its meaning and nuances.
</think>
भारत का सर्वोच्च न्यायालय, 1950 में स्थापित, उच्चतम न्यायिक मंच और अंतिम अपील न्यायालय है ।

English: Filing a First Information Report (FIR) at the police station is mandatory to initiate criminal proceedings under the Code of Criminal Procedure.
Hindi: <think>
Let me translate this English text to Hindi while preserving its meaning and nuances.
</think>
प्रथम सूचना रिपोर्ट (एफआईआर)

In [None]:
fine_tuned_translations = [
    "भारत का संविधान देश का सर्वोच्च कानून है और देश के कानूनी प्रणाली के लिए एक रूपरेखा प्रदान करता है।",
    "भारत का सर्वोच्च न्यायालय, 1950 में स्थापित, उच्चतम न्यायिक मंच और अंतिम अपील न्यायालय है ।",
    "प्रथम सूचना रिपोर्ट (एफआईआर) को अपराध-मामलों के आरंभ के लिए संहिता के अधीन कार्यवाही शुरू करने के लिए पुलिस थाने में दाखिल करना अनिवार्य है।",
    "धारा 302, दंड संहिता, उन लोगों के विरुद्ध मृत्युदंड या जीवन पर्यंत कारावास का अर्थ है जो किसी की हत्या करते हैं।",
    "उच्च न्यायालय में किसी व्यक्ति को अवैध रूप से हिरासत या कारावास में रखा गया है या नहीं, यह जांच करने के लिए हबियस कॉर्पस का רשימה दायर किया जा सकता है।",
    "भारतीय साक्ष्य अधिनियम के तहत, मृत्यु की घोषणा को अदालत की कार्यवाही में साक्ष्य के रूप में ग्राह्म है।",
    "सूचना अधिनियम 2005 नागरिकों को सार्वजनिक प्राधिकरणों से सूचना मांगने का अधिकार देता है।",
    "भारतीय दंड प्रक्रिया संहिता की धारा 438 के अधीन अग्रिम जमानत, किसी व्यक्ति को उन किए जाने वाले संसpection की अग्रसरिता में जमानत की मांग करने की अनुमति देती है ।",
    "डॉक्टरिन ऑफ स्टेयर डिसिसस का भारतीय अदालतों द्वारा पालन किया जाता है, जहां उच्च न्यायालय द्वारा निर्धारित पूर्वधारणाएं निम्न न्यायालयों के लिए बाध्यकारी होती हैं ।",
    "राज्य सरकार के अधीन महा kuasa नियुक्त किया जाता है। महा kuasa राज्य सरकार की उच्चतम विधि संबंधी अधिकारी नियुक्त करता है और उसे सेवा प्रदान करता है।"
]

In [None]:
fine_tuned_translations

['भारत का संविधान देश का सर्वोच्च कानून है और देश के कानूनी प्रणाली के लिए एक रूपरेखा प्रदान करता है।',
 'भारत का सर्वोच्च न्यायालय, 1950 में स्थापित, उच्चतम न्यायिक मंच और अंतिम अपील न्यायालय है ।',
 'प्रथम सूचना रिपोर्ट (एफआईआर) को अपराध-मामलों के आरंभ के लिए संहिता के अधीन कार्यवाही शुरू करने के लिए पुलिस थाने में दाखिल करना अनिवार्य है।',
 'धारा 302, दंड संहिता, उन लोगों के विरुद्ध मृत्युदंड या जीवन पर्यंत कारावास का अर्थ है जो किसी की हत्या करते हैं।',
 'उच्च न्यायालय में किसी व्यक्ति को अवैध रूप से हिरासत या कारावास में रखा गया है या नहीं, यह जांच करने के लिए हबियस कॉर्पस का רשימה दायर किया जा सकता है।',
 'भारतीय साक्ष्य अधिनियम के तहत, मृत्यु की घोषणा को अदालत की कार्यवाही में साक्ष्य के रूप में ग्राह्म है।',
 'सूचना अधिनियम 2005 नागरिकों को सार्वजनिक प्राधिकरणों से सूचना मांगने का अधिकार देता है।',
 'भारतीय दंड प्रक्रिया संहिता की धारा 438 के अधीन अग्रिम जमानत, किसी व्यक्ति को उन किए जाने वाले संसpection की अग्रसरिता में जमानत की मांग करने की अनुमति देती है ।',
 'डॉक्टरिन ऑफ स्

In [None]:
!zip -r /content/finetune_gemma_3_4b_it_en_hi.zip lora_model outputs wandb
from google.colab import files
files.download('/content/finetune_gemma_3_4b_it_en_hi.zip')

  adding: lora_model/ (stored 0%)
  adding: lora_model/preprocessor_config.json (deflated 55%)
  adding: lora_model/processor_config.json (deflated 11%)
  adding: lora_model/chat_template.json (deflated 70%)
  adding: lora_model/added_tokens.json (stored 0%)
  adding: lora_model/tokenizer.model (deflated 52%)
  adding: lora_model/adapter_model.safetensors (deflated 13%)
  adding: lora_model/tokenizer_config.json (deflated 96%)
  adding: lora_model/adapter_config.json (deflated 56%)
  adding: lora_model/tokenizer.json (deflated 83%)
  adding: lora_model/README.md (deflated 66%)
  adding: lora_model/special_tokens_map.json (deflated 77%)
  adding: outputs/ (stored 0%)
  adding: outputs/checkpoint-1400/ (stored 0%)
  adding: outputs/checkpoint-1400/preprocessor_config.json (deflated 55%)
  adding: outputs/checkpoint-1400/rng_state.pth (deflated 25%)
  adding: outputs/checkpoint-1400/processor_config.json (deflated 11%)
  adding: outputs/checkpoint-1400/chat_template.json (deflated 70%)
  

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# Source English sentences
source_sentences = [
    "The Constitution of India is the supreme law of the land and provides the framework for the country's legal system.",
    "The Supreme Court of India, established in 1950, serves as the highest judicial forum and final court of appeal.",
    "Filing a First Information Report (FIR) at the police station is mandatory to initiate criminal proceedings under the Code of Criminal Procedure.",
    "Section 302 of the Indian Penal Code prescribes punishment for murder, which may extend to death penalty or life imprisonment.",
    "The writ of habeas corpus can be filed in a High Court if a person is illegally detained or imprisoned.",
    "Under the Indian Evidence Act, a dying declaration is admissible as evidence in court proceedings.",
    "The Right to Information Act of 2005 empowers citizens to request information from public authorities.",
    "Anticipatory bail under Section 438 of the Criminal Procedure Code allows a person to seek bail in anticipation of arrest.",
    "The doctrine of stare decisis is followed by Indian courts, wherein precedents set by higher courts are binding on lower courts.",
    "The Advocate General is appointed by the Governor and serves as the highest law officer of a state."
]

# Reference Hindi translations (Google Translate)
reference_translations = [

"भारत का संविधान देश का सर्वोच्च कानून है और देश की कानूनी व्यवस्था के लिए रूपरेखा प्रदान करता है।",

"भारत का सर्वोच्च न्यायालय, 1950 में स्थापित, सर्वोच्च न्यायिक मंच और अपील की अंतिम अदालत के रूप में कार्य करता है।",

"दंड प्रक्रिया संहिता के तहत आपराधिक कार्यवाही शुरू करने के लिए पुलिस स्टेशन में प्रथम सूचना रिपोर्ट (एफआईआर) दर्ज करना अनिवार्य है।",

"भारतीय दंड संहिता की धारा 302 हत्या के लिए सजा निर्धारित करती है, जो मृत्युदंड या आजीवन कारावास तक हो सकती है।",

"यदि किसी व्यक्ति को अवैध रूप से हिरासत में लिया जाता है या कैद किया जाता है तो उच्च न्यायालय में बंदी प्रत्यक्षीकरण याचिका दायर की जा सकती है।",

"भारतीय साक्ष्य अधिनियम के तहत, अदालती कार्यवाही में मृत्युपूर्व बयान साक्ष्य के रूप में स्वीकार्य है।",

"सूचना का अधिकार अधिनियम 2005 नागरिकों को सार्वजनिक अधिकारियों से जानकारी मांगने का अधिकार देता है।",

"दंड प्रक्रिया संहिता की धारा 438 के तहत अग्रिम जमानत किसी व्यक्ति को गिरफ्तारी की आशंका में जमानत मांगने की अनुमति देती है।",

"भारतीय दंड संहिता में स्टेयर डेसिसिस के सिद्धांत का पालन किया जाता है। न्यायालय, जहाँ उच्च न्यायालयों द्वारा निर्धारित मिसालें निचली अदालतों पर बाध्यकारी होती हैं।",

"महाधिवक्ता की नियुक्ति राज्यपाल द्वारा की जाती है और वह राज्य के सर्वोच्च विधि अधिकारी के रूप में कार्य करता है।"
]

# Model outputs before fine-tuning
baseline_translations = ['भारत का संविधान देश का सर्वोच्च कानून है और देश के कानूनी प्रणाली के लिए ढांचा प्रदान करता है।',
 'भारत की सर्वोच्च न्यायालय, जिसकी स्थापना 1950 में हुई, भारत की उच्चतम न्यायिक मंच और अंतिम अपील न्यायालय के रूप में कार्य करती है।',
 'पुलिस स्टेशन में प्रथम सूचना रिपोर्ट (एफआईआर) दर्ज कराना अनिवार्य है आपराधिक प्रक्रिया संहिता के तहत आपराधिक कार्यवाही शुरू करने के लिए।',
 'भारतीय दंड संहिता की धारा 302 हत्या के लिए दंड का प्रावधान करती है, जो मृत्युदंड या आजीवन कारावास तक हो सकता है।',
 'किसी व्यक्ति के अवैध रूप से हिरासत में लिए जाने या कैद किए जाने पर उच्च न्यायालय में हेबेरस कॉर्पस की याचिका दायर की जा सकती है।',
 'भारतीय साक्ष्य अधिनियम के तहत, एक मृत्यु घोषणा अदालत की कार्यवाही में सबूत के रूप में स्वीकार्य है।',
 'सूचना अधिकार अधिनियम 2005 नागरिकों को सार्वजनिक अधिकारियों से जानकारी मांगने का अधिकार देता है।',
 'धारा 438 के आपराधिक प्रक्रिया संहिता के तहत अग्रिम जमानत, एक व्यक्ति को गिरफ्तारी की आशंका में जमानत की मांग करने की अनुमति देती है।',
 "भारतीय न्यायालय भी 'स्टेरे डेसिस' के सिद्धांत का पालन करते हैं, जिसमें उच्च न्यायालयों द्वारा स्थापित पूर्व निर्णय निचले न्यायालयों के लिए बाध्यकारी होते हैं।",
 'महाधिवक्ता राज्यपाल द्वारा नियुक्त किए जाते हैं और वे एक राज्य के सर्वोच्च कानूनी अधिकारी के रूप में कार्य करते हैं।']

# Model outputs after fine-tuning
fine_tuned_translations = ['भारत का संविधान देश का सर्वोच्च कानून है और देश के कानूनी प्रणाली के लिए एक रूपरेखा प्रदान करता है।',
 'भारत का सर्वोच्च न्यायालय, 1950 में स्थापित, उच्चतम न्यायिक मंच और अंतिम अपील न्यायालय है ।',
 'प्रथम सूचना रिपोर्ट (एफआईआर) को अपराध-मामलों के आरंभ के लिए संहिता के अधीन कार्यवाही शुरू करने के लिए पुलिस थाने में दाखिल करना अनिवार्य है।',
 'धारा 302, दंड संहिता, उन लोगों के विरुद्ध मृत्युदंड या जीवन पर्यंत कारावास का अर्थ है जो किसी की हत्या करते हैं।',
 'उच्च न्यायालय में किसी व्यक्ति को अवैध रूप से हिरासत या कारावास में रखा गया है या नहीं, यह जांच करने के लिए हबियस कॉर्पस का רשימה दायर किया जा सकता है।',
 'भारतीय साक्ष्य अधिनियम के तहत, मृत्यु की घोषणा को अदालत की कार्यवाही में साक्ष्य के रूप में ग्राह्म है।',
 'सूचना अधिनियम 2005 नागरिकों को सार्वजनिक प्राधिकरणों से सूचना मांगने का अधिकार देता है।',
 'भारतीय दंड प्रक्रिया संहिता की धारा 438 के अधीन अग्रिम जमानत, किसी व्यक्ति को उन किए जाने वाले संसpection की अग्रसरिता में जमानत की मांग करने की अनुमति देती है ।',
 'डॉक्टरिन ऑफ स्टेयर डिसिसस का भारतीय अदालतों द्वारा पालन किया जाता है, जहां उच्च न्यायालय द्वारा निर्धारित पूर्वधारणाएं निम्न न्यायालयों के लिए बाध्यकारी होती हैं ।',
 'राज्य सरकार के अधीन महा kuasa नियुक्त किया जाता है। महा kuasa राज्य सरकार की उच्चतम विधि संबंधी अधिकारी नियुक्त करता है और उसे सेवा प्रदान करता है।']

In [None]:
!pip install sacrebleu -q -U
!pip install rouge -q -U
!pip install sacremoses -q -U

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/51.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.1/104.1 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m897.5/897.5 kB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import numpy as np
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from rouge import Rouge
import sacrebleu
import sacremoses

In [None]:
# Clean translations function
def clean_translation(text):
    import re
    # Remove transliterations in parentheses
    cleaned = re.sub(r'\([^)]*\)', '', text)
    # Remove line breaks
    cleaned = re.sub(r'\n', ' ', cleaned)
    return cleaned.strip()

# Clean all translations
baseline_clean = [clean_translation(t) for t in baseline_translations]
finetuned_clean = [clean_translation(t) for t in fine_tuned_translations]

# Helper functions for metrics
def tokenize(sentence):
    return sentence.split()

# 1. BLEU Score Calculation
def calculate_bleu(candidates, references):
    smoothing = SmoothingFunction().method1
    bleu_scores = []

    for candidate, reference in zip(candidates, references):
        candidate_tokens = tokenize(candidate)
        reference_tokens = [tokenize(reference)]
        bleu = sentence_bleu(reference_tokens, candidate_tokens, smoothing_function=smoothing)
        bleu_scores.append(bleu)

    avg_bleu = np.mean(bleu_scores)
    return bleu_scores, avg_bleu

# 2. ROUGE Score Calculation
def calculate_rouge(candidates, references):
    rouge = Rouge()
    rouge_scores = []

    for candidate, reference in zip(candidates, references):
        try:
            scores = rouge.get_scores(candidate, reference)[0]
            rouge_scores.append({
                'rouge-1': scores['rouge-1']['f'],
                'rouge-2': scores['rouge-2']['f'],
                'rouge-l': scores['rouge-l']['f']
            })
        except Exception:
            rouge_scores.append({
                'rouge-1': 0.0, 'rouge-2': 0.0, 'rouge-l': 0.0
            })

    avg_rouge_1 = np.mean([s['rouge-1'] for s in rouge_scores])
    avg_rouge_2 = np.mean([s['rouge-2'] for s in rouge_scores])
    avg_rouge_l = np.mean([s['rouge-l'] for s in rouge_scores])

    return rouge_scores, {'rouge-1': avg_rouge_1, 'rouge-2': avg_rouge_2, 'rouge-l': avg_rouge_l}

# 3. chrF Score Calculation
def calculate_chrf(candidates, references):
    chrf_scores = []

    for candidate, reference in zip(candidates, references):
        chrf = sacrebleu.corpus_chrf([candidate], [[reference]])
        chrf_scores.append(chrf.score)

    avg_chrf = np.mean(chrf_scores)
    return chrf_scores, avg_chrf

# Calculate all metrics
print("Evaluating Baseline vs. Fine-tuned Translations")
baseline_bleu_scores, baseline_avg_bleu = calculate_bleu(baseline_clean, reference_translations)
baseline_rouge_scores, baseline_avg_rouge = calculate_rouge(baseline_clean, reference_translations)
baseline_chrf_scores, baseline_avg_chrf = calculate_chrf(baseline_clean, reference_translations)

finetuned_bleu_scores, finetuned_avg_bleu = calculate_bleu(finetuned_clean, reference_translations)
finetuned_rouge_scores, finetuned_avg_rouge = calculate_rouge(finetuned_clean, reference_translations)
finetuned_chrf_scores, finetuned_avg_chrf = calculate_chrf(finetuned_clean, reference_translations)

# Print results
print("\n===== OVERALL RESULTS =====")
print(f"Baseline Average BLEU: {baseline_avg_bleu:.4f}")
print(f"Baseline Average ROUGE-1: {baseline_avg_rouge['rouge-1']:.4f}")
print(f"Baseline Average ROUGE-2: {baseline_avg_rouge['rouge-2']:.4f}")
print(f"Baseline Average ROUGE-L: {baseline_avg_rouge['rouge-l']:.4f}")
print(f"Baseline Average chrF: {baseline_avg_chrf:.4f}")

print("\n")
print(f"Fine-tuned Average BLEU: {finetuned_avg_bleu:.4f}")
print(f"Fine-tuned Average ROUGE-1: {finetuned_avg_rouge['rouge-1']:.4f}")
print(f"Fine-tuned Average ROUGE-2: {finetuned_avg_rouge['rouge-2']:.4f}")
print(f"Fine-tuned Average ROUGE-L: {finetuned_avg_rouge['rouge-l']:.4f}")
print(f"Fine-tuned Average chrF: {finetuned_avg_chrf:.4f}")

# Calculate improvement percentages
bleu_improvement = ((finetuned_avg_bleu - baseline_avg_bleu) / baseline_avg_bleu) * 100
rouge1_improvement = ((finetuned_avg_rouge['rouge-1'] - baseline_avg_rouge['rouge-1']) / baseline_avg_rouge['rouge-1']) * 100
rouge2_improvement = ((finetuned_avg_rouge['rouge-2'] - baseline_avg_rouge['rouge-2']) / baseline_avg_rouge['rouge-2']) * 100
rougeL_improvement = ((finetuned_avg_rouge['rouge-l'] - baseline_avg_rouge['rouge-l']) / baseline_avg_rouge['rouge-l']) * 100
chrf_improvement = ((finetuned_avg_chrf - baseline_avg_chrf) / baseline_avg_chrf) * 100

print("\n===== IMPROVEMENT SUMMARY =====")
print(f"BLEU Score: {bleu_improvement:.2f}% improvement")
print(f"ROUGE-1 Score: {rouge1_improvement:.2f}% improvement")
print(f"ROUGE-2 Score: {rouge2_improvement:.2f}% improvement")
print(f"ROUGE-L Score: {rougeL_improvement:.2f}% improvement")
print(f"chrF Score: {chrf_improvement:.2f}% improvement")

# Create a zip archive for all outputs
import shutil
archive_name = f"finetune_gemma_3_4b_it_en_hi_output_complete"
shutil.make_archive(archive_name, 'zip', "outputs")
print(f"Created archive {archive_name}.zip with all model outputs")

Evaluating Baseline vs. Fine-tuned Translations

===== OVERALL RESULTS =====
Baseline Average BLEU: 0.4610
Baseline Average ROUGE-1: 0.7414
Baseline Average ROUGE-2: 0.5523
Baseline Average ROUGE-L: 0.6818
Baseline Average chrF: 71.9751


Fine-tuned Average BLEU: 0.2789
Fine-tuned Average ROUGE-1: 0.6387
Fine-tuned Average ROUGE-2: 0.3732
Fine-tuned Average ROUGE-L: 0.5588
Fine-tuned Average chrF: 58.4382

===== IMPROVEMENT SUMMARY =====
BLEU Score: -39.50% improvement
ROUGE-1 Score: -13.84% improvement
ROUGE-2 Score: -32.42% improvement
ROUGE-L Score: -18.04% improvement
chrF Score: -18.81% improvement
Created archive finetune_gemma_3_4b_it_en_hi_output_complete.zip with all model outputs
