# Evaluating Helsinki-NLP Model for English-to-Tigrinya Translation

This code fine-tunes the Helsinki-NLP `opus-mt-en-ti` model for translating text from English to Tigrinya. It involves dataset preparation, model training, evaluation, and scoring using metrics like BLEU and chrF++. Below is an overview of the key steps and findings:

---

## Key Steps

### 1. **Model Setup**
- Load the pre-trained Helsinki-NLP `opus-mt-en-ti` model and tokenizer.
- Move the model to GPU for faster computation.

### 2. **Dataset Preparation**
- Load training, validation, and test datasets containing English (source) and Tigrinya (target) text pairs.
- Tokenize the datasets and prepare them for training using Hugging Face's `Dataset` API.

### 3. **Baseline Evaluation**
- Generate translations for the test dataset using the pre-trained model.
- Compute baseline metrics:
  - **BLEU Score:** 1.85
  - **chrF++ Score:** 17.81

### 4. **Fine-Tuning**
- Define training arguments (e.g., learning rate, batch size, epochs) and fine-tune the model using the training dataset.
- Save the fine-tuned model for evaluation and deployment.

### 5. **Post-Fine-Tuning Evaluation**
- Evaluate the fine-tuned model on the test dataset.
- Compute translation quality using metrics:
  - **BLEU Score:** Improved to **9.26**, showing significant improvement in translation from the pre-trained.
  - **chrF++ Score:** Improved to **31.70**, reflecting better lexical overlap.

  

---


## Observations
- After fine-tuning, the model shows a **BLEU Score** improvement to **9.26**, which is a significant increase compared to the pre-trained model. However, the score still remains low overall, indicating challenges in aligning translations with reference texts.
- The **chrF++ Score** improved to **31.70**, reflecting better lexical overlap and improved translation quality compared to the baseline.
- Despite the low BLEU score, examples from the fine-tuned model produce decent translations, demonstrating contextual accuracy. The low BLEU score may stem from the model's limited understanding of Tigrinya and the inherent difficulties in aligning Tigrinya's complex morphology and syntax with English in exact terms.

---

## Conclusion
The fine-tuned model exhibits noticeable improvements in translation quality, as seen in increased BLEU and chrF++ scores. However, the BLEU score remains low due to:
1. **Challenges in Tigrinya Language Understanding**: The model struggles with Tigrinya's complex linguistic structure.
2. **Metric Limitations**: BLEU may not fully capture the improvements due to its focus on exact lexical overlap, which is less suited for morphologically rich languages like Tigrinya.




In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## **Install and Import Libraries**

In [None]:
!pip install transformers datasets
!pip install transformers datasets evaluate
!pip install sacrebleu

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from transformers import Seq2SeqTrainingArguments
from transformers import Seq2SeqTrainer
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from datasets import Dataset
import evaluate
import torch


In [None]:
# Check GPU availability
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"Device Name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'No GPU'}")


CUDA Available: True
Device Name: NVIDIA A100-SXM4-40GB


## **Load Data and Run the Baseline Model on the Validation Dataset**


In [None]:
# Load the dataset
train_data = pd.read_csv("/Capstone/Dataset_csv/en_to_ti_train.csv")
test_data = pd.read_csv("/Capstone/Dataset_csv/en_to_ti_test.csv")
val_data = pd.read_csv("/Capstone/Dataset_csv/en_to_ti_val.csv")


print(f"Training Set: {len(train_data)} rows")
print(f"Testing Set: {len(test_data)} rows")
print(f"Validation Set: {len(val_data)} rows")


Training Set: 286500 rows
Testing Set: 35813 rows
Validation Set: 35813 rows


In [None]:
val_data.head()

Unnamed: 0.1,Unnamed: 0,Source,Target
0,286012,"'When he was fourteen, his family encouraged h...",'·ãà·ã≤ ·ãì·à∞·à≠·â∞·ãç ·ä£·à≠·â£·ãï·â∞ ·ãì·àò·âµ ·àù·àµ ·ä∞·äê ·àµ·ãµ·à´·ä° ·àò·àÉ·äï·ãµ·àµ ·ä≠·ä∏·ãç·äï ·ã®·â∞·â£·â•...
1,139572,'The teacher and other teachers were suspicious.','·àò·àù·àÖ·à≠·äï ·ä´·àç·ä¶·âµ ·àò·àù·àÉ·à´·äï·äï ·â•·å†·à≠·å†·à´ ·ä£·â• ·ãì·ã≠·äí ·ä£·â∞·ãâ·ç¢'
2,44339,"'The man is greater than he is, he is richer a...",'·ä•·â≤ ·à∞·â• ·ä≠·äï·ã≤ ·ãù·ãì·â†·ã® ·ã≠·ãï·â†·ç° ·ä≠·äï·ã≤ ·ãù·àÉ·â•·â∞·àò ·ã≠·àÉ·â•·âµ·àù·ç° ·ä´·â•·ãö ·àì·âÇ‚Äô·ãö...
3,296020,"'Just keep quiet, but we hate things for the w...",'‚Äù ‚Äú·àµ·âï ·å•·à´·ã≠ ·â†·àä·ç° ·äï·àï·äì‚Äô·äÆ ·äï·åà·ãµ·ãµ ·ä´·â•·ä†·äï·ç° ·äê·åà·à≠ ·å∏·àä·ä•·äì‚Äô·àù·â†·à≠·ç¢'
4,307978,"'In the 1994 Asmara suburb of Nyala Hotel, Mr....",'·ä£·â• 1994 ·ä£·â• ·ä£·àµ·àò·à´ ·ä®·â£·â¢ ·äï·ã´·àã ·àÜ·â¥·àç ·ä´·â• ·ãà·àã·ã≤·ä£ ·ä£·â∂ ·ã∞·â†·à≥·ã≠ ·ä£...


In [None]:
# Load the pre-trained model and tokenizer
model_name = "Helsinki-NLP/opus-mt-en-ti"  # Change to "ti-en" for reverse task
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)




In [None]:
# Convert training and testing data to Hugging Face Datasets
train_dataset = Dataset.from_pandas(train_data)
val_dataset = Dataset.from_pandas(val_data)
#test_dataset = Dataset.from_pandas(test_data)

def preprocess_function(examples):
    # Tokenize the source (English)
    model_inputs = tokenizer(
        examples["Source"],  # Replace "Source" with the source column
        max_length=128,
        truncation=True,
        padding="max_length",
    )
    # Tokenize the target (Tigrinya)
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(
            examples["Target"],  # Replace "Target" with the target column
            max_length=128,
            truncation=True,
            padding="max_length",
        )
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# Tokenize both datasets
tokenized_train = train_dataset.map(preprocess_function, batched=True)
tokenized_val = val_dataset.map(preprocess_function, batched=True)


Map:   0%|          | 0/286500 [00:00<?, ? examples/s]



Map:   0%|          | 0/35813 [00:00<?, ? examples/s]

In [None]:
# Move the model to the GPU
model = model.to("cuda")


In [None]:
def generate_translation_in_batches(texts, batch_size=32):
    translations = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]

        # Tokenize and move inputs to GPU
        inputs = tokenizer(batch, return_tensors="pt", padding=True, truncation=True, max_length=128)
        inputs = {key: value.to("cuda") for key, value in inputs.items()}  # Move to GPU

        # Generate translations
        outputs = model.generate(**inputs, max_length=128)

        # Decode and store translations
        batch_translations = tokenizer.batch_decode(outputs, skip_special_tokens=True)
        translations.extend(batch_translations)
    return translations

In [None]:
# Extract test source texts
test_source_texts = val_dataset["Source"]

# Generate translations
baseline_translations = generate_translation_in_batches(test_source_texts)


In [None]:
# Load BLEU metric
metric = evaluate.load("sacrebleu")

# Prepare references
references = [[text] for text in val_dataset["Target"]]

# Compute BLEU score
baseline_bleu = metric.compute(predictions=baseline_translations, references=references)
print(f"Baseline BLEU Score: {baseline_bleu['score']}")


Downloading builder script:   0%|          | 0.00/8.15k [00:00<?, ?B/s]

Baseline BLEU Score: 1.8500842850902983


## **Fine-Tune and Evaluate the Pre-Trained Model**

In [None]:
training_args = Seq2SeqTrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=32,
    num_train_epochs=4,
    save_total_limit=2,
    predict_with_generate=True,
)



In [None]:
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    tokenizer=tokenizer,
)
# Start fine-tuning
trainer.train()


  trainer = Seq2SeqTrainer(
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Epoch,Training Loss,Validation Loss
1,0.5608,0.523546
2,0.5051,0.484338
3,0.4817,0.467942
4,0.4685,0.463294




TrainOutput(global_step=35816, training_loss=0.5251631771903074, metrics={'train_runtime': 7369.3603, 'train_samples_per_second': 155.509, 'train_steps_per_second': 4.86, 'total_flos': 3.8847526207488e+16, 'train_loss': 0.5251631771903074, 'epoch': 4.0})

In [None]:
# Load the BLEU metric
metric = evaluate.load("sacrebleu")

def generate_translation_in_batches(texts, batch_size=32):
    translations = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]

        # Tokenize and move inputs to GPU
        inputs = tokenizer(batch, return_tensors="pt", padding=True, truncation=True, max_length=128)
        inputs = {key: value.to("cuda") for key, value in inputs.items()}  # Move to GPU

        # Generate translations
        outputs = model.generate(**inputs, max_length=128)

        # Decode and store translations
        batch_translations = tokenizer.batch_decode(outputs, skip_special_tokens=True)
        translations.extend(batch_translations)
    return translations

validation_source_texts = val_dataset["Source"]
references = [[ref] for ref in val_dataset["Target"]]
predictions = generate_translation_in_batches(validation_source_texts)

# Compute BLEU score
fine_tuned_bleu = metric.compute(predictions=predictions, references=references)
print(f"Fine-Tuned BLEU Score: {fine_tuned_bleu['score']}")


Fine-Tuned BLEU Score: 9.265455970430764


## **Save the Fine-Tuned Model**

In [None]:
# Define the model name for clarity
model_name = "opus-mt-en-ti_fine_tuned"
save_path = f"/content/drive/MyDrive/Capstone/{model_name}"

# Save the fine-tuned model and tokenizer
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

print(f"Fine-tuned Helsinki-NLP model saved successfully at: {save_path}")


Fine-tuned Helsinki-NLP model saved successfully at: /content/drive/MyDrive/Capstone/opus-mt-en-ti_fine_tuned


## **Evaluate the Pre-Trained and Fine-Tuned Models on the Test Dataset**

In [None]:
# Path where the fine-tuned model was saved
load_path = f"/Capstone/{model_name}"

# Reload the fine-tuned model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(load_path)
model = AutoModelForSeq2SeqLM.from_pretrained(load_path)

# Move the model to GPU if available
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

print(f"Fine-tuned Helsinki-NLP model '{model_name}' loaded successfully!")




Fine-tuned Helsinki-NLP model 'opus-mt-en-ti_fine_tuned' loaded successfully!


In [None]:
def translate_sentence(sentence, source_lang="en", target_lang="tir"):
    # Set source and target language tokens for Helsinki-NLP
    tokenizer.src_lang = source_lang
    tokenizer.tgt_lang = target_lang

    # Tokenize and move inputs to GPU
    inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True, max_length=128).to(device)

    # Generate translation
    outputs = model.generate(**inputs, max_length=128)

    # Decode the translation
    translated_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return translated_sentence

# Example: Translate an English sentence to Tigrinya
english_sentence = " omg I am very tired today and I been working on this project all day long"
tigrinya_translation = translate_sentence(english_sentence, source_lang="en", target_lang="tir")
print("Translated to Tigrinya:", tigrinya_translation)


Translated to Tigrinya: ·àé·àö ·ä£·ãù·ã® ·ã∞·ä∫·àò ·àµ·àà ·ãò·àà·äπ·ç° ·àù·àâ·ä• ·àò·ãì·àç·â≤ ·ä£·â•·ãö ·çï·àÆ·åÄ·ä≠·âµ ·ä•·à∞·à≠·àï ·ä£·àà·äπ'


In [None]:
test_data = pd.read_csv("/Capstone/Dataset_csv/en_to_ti_test.csv")

In [None]:
test_data.head()

Unnamed: 0.1,Unnamed: 0,Source,Target
0,169388,'The women who are part of parliament are the ...,'·ä•·â∞·äï ·åà·â†·à≠·âµ·äï ·àì·ã∞·åç·âµ·äï ·ä•·â∞·äï ·ä£·â£·àã·âµ ·â£·ã≠·â∂ ·ãù·äæ·äì ·ã∞·âÇ ·ä£·äï·àµ·âµ·ãÆ‚Äô·ã®·äï·ç¢'
1,59682,"'Sometimes, it 's time to break up. '",'·àì·ã∞ ·àì·ã∞ ·åç·ãú ·ä£·â• ·åç·ãú·ä° ·àù·çç·àç·àã·ã≠ ·ã®·ãã·åΩ·ä•‚Äô·ã©‚Äù ·â†·àà·â∞·äï·ç¢'
2,144968,'It has been said that a consul was short of t...,'·âà·äì·äñ ·âÄ·ã∞·àù ·ä£·â• ·ä®·àù·ä° ·ãù·â†·àà ·ä•·ãã·äï ·åç·ãú ·ã≠·àì·åΩ·à®·äï ·äê·ã≠·à© ·ã≠·â†·àÉ·àç·ç¢'
3,269661,'This stunned the presiding officers.','·ä•·ãö ·ä®·ä£ ·äê·â∂·àù ·ãù·â∞·ä£·ãò·ãô ·àì·àà·çç·â≤ ·ä£·àò·äì ·ä£·ã∞·äï·å∏·ãé·àù·ç¢'
4,338063,"'Manchester United, CHELSEA, Manchester City a...",'·àõ·äï·â∏·àµ·â∞·à≠ ·ã©·äì·ã≠·âµ·ãµ·ç° ·â∏·àç·à≤·ç° ·àõ·äï·â∏·àµ·â∞·à≠ ·à≤·â≤ ·ä£·â•·ãö ·åç·ãú‚Äô·ãö ·ä∏·ä£ ·àå·àµ·â∞·à≠...


In [None]:
# Load the pre-trained model and tokenizer
model_name = "Helsinki-NLP/opus-mt-en-ti"  # Replace with your model name
baseline_model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to("cuda")
baseline_tokenizer = AutoTokenizer.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/819k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/972k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]



In [None]:
# Convert testing data to Hugging Face Datasets
test_dataset = Dataset.from_pandas(test_data)

In [None]:
def generate_baseline_translation_in_batches(texts, batch_size=32):
    translations = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        inputs = baseline_tokenizer(batch, return_tensors="pt", padding=True, truncation=True, max_length=128).to("cuda")
        outputs = baseline_model.generate(**inputs, max_length=128, num_beams=5)
        batch_translations = baseline_tokenizer.batch_decode(outputs, skip_special_tokens=True)
        translations.extend(batch_translations)
    return translations

# Generate translations for the test dataset
test_source_texts = test_dataset["Source"]
baseline_translations = generate_baseline_translation_in_batches(test_source_texts)


In [None]:
test_reference_texts = test_dataset["Target"]
# Load BLEU metric
metric = evaluate.load("sacrebleu")

# Compute BLEU score for baseline model
baseline_result = metric.compute(predictions=baseline_translations, references=test_reference_texts)
print(f"Baseline BLEU Score: {baseline_result['score']}")


Downloading builder script:   0%|          | 0.00/8.15k [00:00<?, ?B/s]

Baseline BLEU Score: 1.8580033099618207


In [None]:
# Load the fine-tuned model and tokenizer
fine_tuned_model_path = "/Capstone/opus-mt-en-ti_fine_tuned"
fine_tuned_model = AutoModelForSeq2SeqLM.from_pretrained(fine_tuned_model_path).to("cuda")
fine_tuned_tokenizer = AutoTokenizer.from_pretrained(fine_tuned_model_path)


In [None]:
def generate_fine_tuned_translation_in_batches(texts, batch_size=32):
    translations = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        inputs = fine_tuned_tokenizer(batch, return_tensors="pt", padding=True, truncation=True, max_length=128).to("cuda")
        outputs = fine_tuned_model.generate(**inputs, max_length=128, num_beams=5)
        batch_translations = fine_tuned_tokenizer.batch_decode(outputs, skip_special_tokens=True)
        translations.extend(batch_translations)
    return translations


# Generate translations for the test dataset
test_source_texts = test_dataset["Source"]
fine_tuned_translations = generate_fine_tuned_translation_in_batches(test_source_texts)

In [None]:
# Compute BLEU score for fine-tuned model
fine_tuned_result = metric.compute(predictions=fine_tuned_translations, references=test_reference_texts)
print(f"Fine-Tuned BLEU Score: {fine_tuned_result['score']}")

Fine-Tuned BLEU Score: 9.379557784956042


In [None]:
# Load the chrF++ metric
chrf_metric = evaluate.load("chrf")

# Compute chrF++ score
chrf_result = chrf_metric.compute(predictions=baseline_translations, references=test_reference_texts)
print(f"Baseline chrF++ Score: {chrf_result['score']}")


Downloading builder script:   0%|          | 0.00/9.01k [00:00<?, ?B/s]

Baseline chrF++ Score: 17.817335317344963


In [None]:
# Load the chrF++ metric
chrf_metric = evaluate.load("chrf")

# Compute chrF++ score
chrf_result = chrf_metric.compute(predictions=fine_tuned_translations, references=test_reference_texts)
print(f"Fine-Tuned chrF++ Score: {chrf_result['score']}")

Fine-Tuned chrF++ Score: 31.70257824928567
