# Training an Igbo Translation Model Using NLLB-200-600M

### Why NLLB-200-600M
1. It is light-weight.
2. It was trained on 200 languages including low-resource languages. Thus, making it a good choice for Igbo language translation.

### Source of Data
The training data is a parallel corpora made up of 1500 English texts in one column and their manual Igbo translations in another column.

## Assess and Clean the Data

In [None]:
!pip install sacrebleu rouge-score nltk

Collecting sacrebleu
  Downloading sacrebleu-2.5.1-py3-none-any.whl.metadata (51 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/51.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting portalocker (from sacrebleu)
  Downloading portalocker-3.2.0-py3-none-any.whl.metadata (8.7 kB)
Collecting colorama (from sacrebleu)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Downloading sacrebleu-2.5.1-py3-none-any.whl (104 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.1/104.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Downloading portalocker-3.2.0-py3-none-any.whl (22 kB)
Building wheels for collected packages: rouge-scor

In [None]:
import pandas as pd
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
import pandas as pd
import numpy as np
from collections import Counter
import re
import nltk
from nltk.translate.bleu_score import sentence_bleu, corpus_bleu, SmoothingFunction
from nltk.translate.meteor_score import meteor_score
from nltk.translate.chrf_score import sentence_chrf
import sacrebleu
from rouge_score import rouge_scorer

import warnings
warnings.filterwarnings("ignore")

In [None]:
df1 = pd.read_excel("/content/English T2.xlsx")
df2 = pd.read_excel("/content/English T6.xlsx")

In [None]:
df1.head()

In [None]:
df2.head()

In [None]:
df1.info()

In [None]:
df2 = df2.rename(columns={"Unnamed: 2": "Igbo"})

In [None]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 710 entries, 0 to 749
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   S/N               710 non-null    int64 
 1   English Language  710 non-null    object
 2   Igbo              710 non-null    object
dtypes: int64(1), object(2)
memory usage: 22.2+ KB


In [None]:
df1 = df1.dropna()
df2 = df2.dropna()

In [None]:
df = pd.concat([df1, df2]).reset_index(drop=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1401 entries, 0 to 1400
Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0                     691 non-null    object 
 1   English Language  1401 non-null   object 
 2   Igbo              1401 non-null   object 
 3   S/N               710 non-null    float64
dtypes: float64(1), object(3)
memory usage: 43.9+ KB


In [None]:
data = df[["English Language", "Igbo"]].reset_index(drop=True)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1401 entries, 0 to 1400
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   English Language  1401 non-null   object
 1   Igbo              1401 non-null   object
dtypes: object(2)
memory usage: 22.0+ KB


In [None]:
# Load model
model_name="facebook/nllb-200-distilled-600M"

In [None]:
# Define the source and target language codes (Igbo is 'ibo_Latn')
source_lang = "eng_Latn"
target_lang = "ibo_Latn"


In [None]:
translator = pipeline(
        "translation",
        model=model_name,
        tokenizer=model_name,
        src_lang="eng_Latn",
        tgt_lang="ibo_Latn",
        device=-1,  # CPU only
        torch_dtype=torch.float32
    )

Device set to use cpu


In [None]:
# Define the text to translate
trans = []
for text in data["English Language"]:

  # Translate the text
  translation = translator(text, src_lang=source_lang, tgt_lang=target_lang)
  trans.append(translation[0]['translation_text'])


In [None]:
len(trans)

In [None]:
data["Igbo_nllb_trans"] = pd.Series(trans)
data.head()

Unnamed: 0,English Language,Igbo,Igbo_nllb_trans
0,This option is perfect for keeping steak moist...,Mghọrọ nke a kacha mma íjí mee ka íbé anụ dee ...,Nke a bụ nhọrọ zuru okè iji mee ka steak dị mm...
1,This is the best option if youre short on time.,Nke a bụ mghọrọ kachasị mma ma ọbụrụ na ọtụtụ ...,Nke a bụ nhọrọ kasị mma ma ọ bụrụ na i nwere o...
2,This wipes away bacteria and sugars that can c...,Nke a na-ewepụ nje bakterịa na shuuga ndị na-a...,Nke a na-ehichapụ nje bacteria na shuga ndị pụ...
3,Brush your teeth twice a day and floss every day.,Saa ézé gị ugboro abụọ kwa Ụbọchị ma jiri erir...,"Na-akwọcha ezé gị ugboro abụọ n'ụbọchị, na-eji..."
4,Most of the violence occurred in Khyber Pakhtu...,Ụfọdụ mmesi iké mere na Khyber Pakhtunkhwa na ...,Ihe ka ọtụtụ n'ime ime ihe ike ahụ mere na Khy...


In [None]:
def calculate_bleu_scores(references, predictions):
    """
    Calculate BLEU scores using NLTK

    Args:
        references: List of reference (manual) translations
        predictions: List of predicted (NLLB) translations

    Returns:
        Dictionary of BLEU scores
    """
    # NLTK BLEU (sentence-level)
    smoothing = SmoothingFunction().method1
    nltk_bleu_scores = []

    for ref, pred in zip(references, predictions):
        if ref and pred:  # Skip empty strings
            ref_tokens = ref.split()
            pred_tokens = pred.split()

            # Calculate sentence BLEU
            bleu = sentence_bleu([ref_tokens], pred_tokens, smoothing_function=smoothing)
            nltk_bleu_scores.append(bleu)
        else:
            nltk_bleu_scores.append(0.0)

    # Corpus-level BLEU with NLTK
    all_refs = [[ref.split()] for ref in references if ref]
    all_preds = [pred.split() for pred in predictions if pred]

    if all_refs and all_preds:
        corpus_bleu_nltk = corpus_bleu(all_refs, all_preds, smoothing_function=smoothing)
    else:
        corpus_bleu_nltk = 0.0

    return {
        'sentence_bleu_avg': np.mean(nltk_bleu_scores),
        'sentence_bleu_scores': nltk_bleu_scores,
        'corpus_bleu_nltk': corpus_bleu_nltk
    }

In [None]:
# Calculate chrF scores (character-level F-score)
def calculate_chrf_scores(references, predictions):

    chrf_scores = []

    for ref, pred in zip(references, predictions):
        if ref and pred:
            try:
                score = sentence_chrf(ref, pred)
                chrf_scores.append(score)
            except:
                chrf_scores.append(0.0)
        else:
            chrf_scores.append(0.0)

    # Also calculate corpus-level chrF with SacreBLEU
    try:
        corpus_chrf = sacrebleu.corpus_chrf(predictions, [references])
        corpus_chrf_score = corpus_chrf.score
    except:
        corpus_chrf_score = 0.0

    return {
        'chrf_avg': np.mean(chrf_scores),
        'chrf_scores': chrf_scores,
        'corpus_chrf': corpus_chrf_score
    }


In [None]:
# Calculate ROUGE scores (originally for summarization, but useful for translation)
def calculate_rouge_scores(references, predictions):

    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=False)

    rouge1_scores = []
    rouge2_scores = []
    rougeL_scores = []

    for ref, pred in zip(references, predictions):
        if ref and pred:
            scores = scorer.score(ref, pred)
            rouge1_scores.append(scores['rouge1'].fmeasure)
            rouge2_scores.append(scores['rouge2'].fmeasure)
            rougeL_scores.append(scores['rougeL'].fmeasure)
        else:
            rouge1_scores.append(0.0)
            rouge2_scores.append(0.0)
            rougeL_scores.append(0.0)

    return {
        'rouge1_avg': np.mean(rouge1_scores),
        'rouge2_avg': np.mean(rouge2_scores),
        'rougeL_avg': np.mean(rougeL_scores),
        'rouge1_scores': rouge1_scores,
        'rouge2_scores': rouge2_scores,
        'rougeL_scores': rougeL_scores
    }

In [None]:
# Calculate exact match percentage
def calculate_exact_match(references, predictions):

    exact_matches = [1 if ref == pred else 0 for ref, pred in zip(references, predictions)]

    return {
        'exact_match_percentage': np.mean(exact_matches) * 100,
        'exact_matches': exact_matches
    }


In [None]:
def comprehensive_evaluation(english_texts, manual_igbo_translations, nllb_translations=None):
    """
    Comprehensive evaluation of NLLB translations against manual translations

    """

    print("Starting comprehensive translation evaluation...")

    print(f"Evaluating {len(english_texts)} translations...")

    # Calculate all metrics
    evaluation_results = {}

    # BLEU scores
    print("Calculating BLEU scores...")
    bleu_results = calculate_bleu_scores(manual_igbo_translations, nllb_translations)
    evaluation_results.update(bleu_results)


    # chrF scores
    print("Calculating chrF scores...")
    chrf_results = calculate_chrf_scores(manual_igbo_translations, nllb_translations)
    evaluation_results.update(chrf_results)

    # ROUGE scores
    print("Calculating ROUGE scores...")
    rouge_results = calculate_rouge_scores(manual_igbo_translations, nllb_translations)
    evaluation_results.update(rouge_results)

    # Exact match
    print("Calculating exact match...")
    exact_match_results = calculate_exact_match(manual_igbo_translations, nllb_translations)
    evaluation_results.update(exact_match_results)

    # Store translations for analysis
    evaluation_results['english_texts'] = english_texts
    evaluation_results['manual_translations'] = manual_igbo_translations
    evaluation_results['nllb_translations'] = nllb_translations

    return evaluation_results

In [None]:
# Print a comprehensive evaluation summary
def print_evaluation_summary(results):

    print("\n" + "="*60)
    print("TRANSLATION EVALUATION SUMMARY")
    print("="*60)

    print(f"\n📊 MAIN METRICS:")
    print(f"   BLEU Score (Sentence Avg):  {results['sentence_bleu_avg']:.3f}")
    print(f"   chrF Score:                 {results['chrf_avg']:.3f}")
    print(f"   ROUGE-L:                    {results['rougeL_avg']:.3f}")

    print(f"\n📈 INTERPRETATION:")
    if results['sentence_bleu_avg'] > 0.6:
        print("   Translation quality: GOOD (BLEU > 0.6)")
    else:
        print("   Translation quality: VERY POOR (BLEU < 0.6)")

    print("="*60)

# Analyze the worst-performing translations
def analyze_worst_translations(results, n=10):

    # Get BLEU scores for each sentence
    bleu_scores = results['sentence_bleu_scores']

    # Find worst performing translations
    worst_indices = np.argsort(bleu_scores)[:n]

    print(f"\n🔍 WORST {n} TRANSLATIONS (by BLEU score):")
    print("="*80)

    for i, idx in enumerate(worst_indices, 1):
        print(f"\n{i}. BLEU Score: {bleu_scores[idx]:.3f}")
        print(f"   English:  {results['english_texts'].iloc[idx]}")
        print(f"   Manual:   {results['manual_translations'].iloc[idx]}")
        print(f"   NLLB:     {results['nllb_translations'].iloc[idx]}")
        print("-" * 80)

# Analyze the best-performing translations
def analyze_best_translations(results, n=5):

    bleu_scores = results['sentence_bleu_scores']
    best_indices = np.argsort(bleu_scores)[-n:][::-1]

    print(f"\n✅ BEST {n} TRANSLATIONS (by BLEU score):")
    print("="*80)

    for i, idx in enumerate(best_indices, 1):
        print(f"\n{i}. BLEU Score: {bleu_scores[idx]:.3f}")
        print(f"   English:  {results['english_texts'].iloc[idx]}")
        print(f"   Manual:   {results['manual_translations'].iloc[idx]}")
        print(f"   NLLB:     {results['nllb_translations'].iloc[idx]}")
        print("-" * 80)

In [None]:
# evaluate
def run_complete_evaluation(english_texts, manual_igbo_translations, nllb_translations):
    """
    Run complete evaluation pipeline
    """
    # Comprehensive evaluation
    results = comprehensive_evaluation(english_texts, manual_igbo_translations, nllb_translations)

    # Print summary
    print_evaluation_summary(results)

    # Analyze best and worst
    analyze_best_translations(results, n=3)
    analyze_worst_translations(results, n=5)
    return results

In [None]:
results = run_complete_evaluation(data["English Language"].head(len(trans)), data["Igbo"].head(len(trans)), data["Igbo_nllb_trans"].head(len(trans)))

Starting comprehensive translation evaluation...
Evaluating 938 translations...
Calculating BLEU scores...
Calculating chrF scores...
Calculating ROUGE scores...
Calculating exact match...

TRANSLATION EVALUATION SUMMARY

📊 MAIN METRICS:
   BLEU Score (Sentence Avg):  0.620
   chrF Score:                 0.779
   ROUGE-L:                    0.805

📈 INTERPRETATION:
   Translation quality: GOOD (BLEU > 0.6)

✅ BEST 3 TRANSLATIONS (by BLEU score):

1. BLEU Score: 1.000
   English:  Carbon dioxide in the blood is expelled through the lungs.
   Manual:   A na-achụpụ carbon dioxide dị n'ọbara site n'akpa ume.
   NLLB:     A na-achụpụ carbon dioxide dị n'ọbara site n'akpa ume.
--------------------------------------------------------------------------------

2. BLEU Score: 1.000
   English:  This was part of our reference to the WAEC and NECO.
   Manual:   Nke a so n'ihe anyị kwuru banyere WAEC na NECO.
   NLLB:     Nke a so n'ihe anyị kwuru banyere WAEC na NECO.
-----------------------------

In [None]:
# save results of evaluation
def save_detailed_results(results, filename="igbo_nllb_evaluation.csv"):
    """
    Save detailed results to CSV for further analysis
    """
    df = pd.DataFrame({
        'english': results['english_texts'],
        'manual_igbo': results['manual_translations'],
        'nllb_igbo': results['nllb_translations'],
        'bleu_score': results['sentence_bleu_scores'],
        'chrf_score': results['chrf_scores'],
        'rouge1_score': results['rouge1_scores'],
        'rouge2_score': results['rouge2_scores'],
        'rougeL_score': results['rougeL_scores'],
        'exact_match': results['exact_matches']
    })

    df.to_csv(filename, index=False)
    print(f"\n💾 Detailed results saved to: {filename}")



In [None]:
save_detailed_results(results)


💾 Detailed results saved to: igbo_nllb_evaluation.csv
