# Machine Translation Techniques Comparison
 This notebook compares different machine translation approaches:
 1. Statistical Machine Translation (SMT)
 2. Sequence-to-Sequence (Seq2Seq)
 3. Transformer models (T5, GPT-2, M2M-100)

STEP 1: Import libraries


In [4]:
import pandas as pd
import numpy as np
from nltk.translate.bleu_score import sentence_bleu, corpus_bleu, SmoothingFunction
import warnings
warnings.filterwarnings('ignore')

STEP 2: Install required packages

In [5]:
!pip install -q transformers torch sentencepiece

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m60.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m34.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m45.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

STEP 3: Prepare Dataset

In [6]:
print("### 1. Loading and Preparing Dataset ###")

# Small parallel English-French dataset
data = {
    'english': [
        'Hello, how are you?',
        'What is your name?',
        'The weather is nice today',
        'I love machine learning',
        'This is a simple test',
        'Where is the nearest hotel?',
        'How much does this cost?',
        'Can you help me please?'
    ],
    'french': [
        'Bonjour, comment allez-vous?',
        'Comment vous appelez-vous?',
        "Le temps est beau aujourd'hui",
        "J'aime l'apprentissage automatique",
        'Ceci est un test simple',
        'Où se trouve l\'hôtel le plus proche?',
        'Combien cela coûte-t-il?',
        'Pouvez-vous m\'aider s\'il vous plaît?'
    ]
}

dataset = pd.DataFrame(data)
print("\nSample Dataset:")
display(dataset.head())

### 1. Loading and Preparing Dataset ###

Sample Dataset:


Unnamed: 0,english,french
0,"Hello, how are you?","Bonjour, comment allez-vous?"
1,What is your name?,Comment vous appelez-vous?
2,The weather is nice today,Le temps est beau aujourd'hui
3,I love machine learning,J'aime l'apprentissage automatique
4,This is a simple test,Ceci est un test simple


STEP 4: This code sets up evaluation metrics to assess how well a machine translation model performs. It does this using:

BLEU score – a widely used automatic metric for comparing translations.

Human-readable insights – a basic manual evaluation for a few examples.

In [18]:
## 2. Evaluation Setup
print("\n Setting Up Evaluation Metrics ")

# Smoothing function for BLEU score
smoother = SmoothingFunction().method1


 Setting Up Evaluation Metrics 


BLEU score can sometimes be harsh (especially for short sentences).

SmoothingFunction helps adjust BLEU score when there’s no n-gram match (to avoid zero scores).

method1 is one of the smoothing strategies from NLTK.

Overview of func below

Purpose: Automatically calculate the BLEU score for multiple sentences.

Input:

1) references: the correct translations (ground truth).

2) hypotheses: the model's predicted translations.

How it works:

Splits the reference and hypothesis sentences into tokens (words).

corpus_bleu(...) computes the BLEU score over all sentences using the defined smoother

In [10]:
def calculate_bleu(references, hypotheses):
  """Calculate BLEU score for a list of translations"""
  refs = [[ref.split()] for ref in references]
  hyps = [hyp.split() for hyp in hypotheses]
  return corpus_bleu(refs, hyps, smoothing_function=smoother)

Overview of func below

Purpose: Give basic, readable feedback on a few translations (good for reports/demos).

Inputs:

1) model_name: the name of the model (for display).

examples: list of tuples with (source_sentence, reference_translation, hypothesis_translation).

In [9]:

def human_evaluation_insights(model_name, examples):
    """Provide human evaluation insights"""
    print(f"\nHuman Evaluation for {model_name}:")
    for i, (src, ref, hyp) in enumerate(examples[:3], 1):  # Only show first 3 examples
        print(f"\nExample {i}:")
        print(f"Source: {src}")
        print(f"Reference: {ref}")
        print(f"Translation: {hyp}")
        # Simple quality assessment
        if hyp.lower() == ref.lower():
            assessment = "Perfect translation"
        elif hyp.split()[0].lower() == ref.split()[0].lower():
            assessment = "Good start but some errors"
        else:
            assessment = "Needs improvement"
        print(f"Assessment: {assessment}")

STEP 5: Statistical Machine Translation (SMT)

In [13]:
print("\n  Statistical Machine Translation (SMT) ")

# Simple word-to-word translation dictionary
translation_dict = {
    'Hello': 'Bonjour', 'how': 'comment', 'are': 'allez', 'you': 'vous', '?': '?',
    'What': 'Que', 'is': 'est', 'your': 'votre', 'name': 'nom',
    'the': 'le', 'weather': 'temps', 'nice': 'beau', 'today': "aujourd'hui",
    'I': 'Je', 'love': 'aime', 'machine': 'machine', 'learning': 'apprentissage',
    'this': 'ceci', 'a': 'un', 'simple': 'simple', 'test': 'test',
    'where': 'où', 'nearest': 'plus proche', 'hotel': 'hôtel',
    'how much': 'combien', 'does': 'cela', 'cost': 'coûte',
    'can': 'pouvez', 'help': 'aider', 'me': 'me', 'please': 's\'il vous plaît'
}

def smt_translate(sentence):
    """Simple word-by-word translation"""
    words = sentence.split()
    translated = [translation_dict.get(word.rstrip('?,!.').lower().capitalize(), word)
                 for word in words]
    return ' '.join(translated)

# Apply SMT translation
dataset['smt_translation'] = dataset['english'].apply(smt_translate)

# Calculate BLEU score
smt_bleu = calculate_bleu(dataset['french'], dataset['smt_translation'])
print(f"\nSMT BLEU Score: {smt_bleu:.4f}")

# Human evaluation examples
human_evaluation_insights("SMT",
    list(zip(dataset['english'], dataset['french'], dataset['smt_translation'])))



  Statistical Machine Translation (SMT) 

SMT BLEU Score: 0.0091

Human Evaluation for SMT:

Example 1:
Source: Hello, how are you?
Reference: Bonjour, comment allez-vous?
Translation: Bonjour how are you?
Assessment: Needs improvement

Example 2:
Source: What is your name?
Reference: Comment vous appelez-vous?
Translation: Que is your name?
Assessment: Needs improvement

Example 3:
Source: The weather is nice today
Reference: Le temps est beau aujourd'hui
Translation: The weather is nice today
Assessment: Needs improvement


STEP 6: Sequence-to-Sequence (Seq2Seq) Model

In [14]:
print("\n Sequence-to-Sequence Model ")

from transformers import MarianMTModel, MarianTokenizer

# Load pre-trained English-French Seq2Seq model
model_name = 'Helsinki-NLP/opus-mt-en-fr'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

def seq2seq_translate(text):
    """Translate using Seq2Seq model"""
    inputs = tokenizer(text, return_tensors="pt", padding=True)
    outputs = model.generate(**inputs)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Apply Seq2Seq translation (on first 5 examples for speed)
dataset['seq2seq_translation'] = dataset['english'].head(5).apply(seq2seq_translate)

# Calculate BLEU score
seq2seq_bleu = calculate_bleu(dataset['french'].head(5), dataset['seq2seq_translation'].head(5))
print(f"\nSeq2Seq BLEU Score: {seq2seq_bleu:.4f}")

human_evaluation_insights("Seq2Seq",
    list(zip(dataset['english'].head(5), dataset['french'].head(5), dataset['seq2seq_translation'].head(5))))



 Sequence-to-Sequence Model 


tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]


Seq2Seq BLEU Score: 0.2610

Human Evaluation for Seq2Seq:

Example 1:
Source: Hello, how are you?
Reference: Bonjour, comment allez-vous?
Translation: Bonjour, comment allez-vous ?
Assessment: Good start but some errors

Example 2:
Source: What is your name?
Reference: Comment vous appelez-vous?
Translation: Quel est votre nom ?
Assessment: Needs improvement

Example 3:
Source: The weather is nice today
Reference: Le temps est beau aujourd'hui
Translation: Le temps est beau aujourd'hui.
Assessment: Good start but some errors


STEP 7: Transformer Models

1. Google's T5

In [15]:
print("\n Google's T5 ")
from transformers import T5ForConditionalGeneration, T5Tokenizer

t5_tokenizer = T5Tokenizer.from_pretrained("t5-small")
t5_model = T5ForConditionalGeneration.from_pretrained("t5-small")

def t5_translate(text):
    inputs = t5_tokenizer.encode("translate English to French: " + text, return_tensors="pt")
    outputs = t5_model.generate(inputs)
    return t5_tokenizer.decode(outputs[0], skip_special_tokens=True)

dataset['t5_translation'] = dataset['english'].head(5).apply(t5_translate)
t5_bleu = calculate_bleu(dataset['french'].head(5), dataset['t5_translation'].head(5))
print(f"T5 BLEU Score: {t5_bleu:.4f}")


 Google's T5 


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

T5 BLEU Score: 0.0692


2. OpenAI's GPT (using GPT-2 as example)

In [16]:
print("\n GPT Model ")
from transformers import GPT2LMHeadModel, GPT2Tokenizer

gpt_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
gpt_model = GPT2LMHeadModel.from_pretrained("gpt2")

def gpt_translate(text):
    prompt = f"Translate English to French: {text} ->"
    inputs = gpt_tokenizer.encode(prompt, return_tensors="pt")
    outputs = gpt_model.generate(inputs, max_length=50)
    return gpt_tokenizer.decode(outputs[0], skip_special_tokens=True).split("->")[-1].strip()

dataset['gpt_translation'] = dataset['english'].head(5).apply(gpt_translate)
gpt_bleu = calculate_bleu(dataset['french'].head(5), dataset['gpt_translation'].head(5))
print(f"GPT BLEU Score: {gpt_bleu:.4f}")


 GPT Model 


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask

GPT BLEU Score: 0.0055


3. Meta's M2M-100

In [17]:
print("\n 5.3 M2M-100 Model ")
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

m2m_tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")
m2m_model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")

def m2m_translate(text):
    m2m_tokenizer.src_lang = "en"
    inputs = m2m_tokenizer(text, return_tensors="pt")
    generated_tokens = m2m_model.generate(**inputs, forced_bos_token_id=m2m_tokenizer.get_lang_id("fr"))
    return m2m_tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]

dataset['m2m_translation'] = dataset['english'].head(5).apply(m2m_translate)
m2m_bleu = calculate_bleu(dataset['french'].head(5), dataset['m2m_translation'].head(5))
print(f"M2M-100 BLEU Score: {m2m_bleu:.4f}")



 5.3 M2M-100 Model 


tokenizer_config.json:   0%|          | 0.00/298 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/3.71M [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/908 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.94G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.94G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/233 [00:00<?, ?B/s]

M2M-100 BLEU Score: 0.3096


## 6. Results Comparison

Performance comparison of different translation models:

**SMT (Statistical Machine Translation)**  
- BLEU Score: smt_bleu  
- Speed: Fastest  
- Training Data: Small dictionary  
- Best For: Simple phrases

**Seq2Seq (Sequence-to-Sequence)**  
- BLEU Score: seq2seq_bleu  
- Speed: Fast  
- Training Data: Large parallel corpus  
- Best For: General purpose translations

**T5 (Text-To-Text Transfer Transformer)**  
- BLEU Score: t5_bleu  
- Speed: Medium  
- Training Data: Massive multilingual dataset  
- Best For: Multitask learning and translation

**GPT (Generative Pretrained Transformer)**  
- BLEU Score: gpt_bleu  
- Speed: Slow  
- Training Data: General language model corpus  
- Best For: Creative or free-form translations

**M2M-100 (Multilingual-to-Multilingual)**  
- BLEU Score: m2m_bleu  
- Speed: Medium  
- Training Data: Massive multilingual data  
- Best For: Direct translation between 100+ languages

---

## 7. Key Advantages and Limitations

**SMT**  
Advantages: Fast, explainable  
Limitations: Performs poorly with long or complex sentences  

**Seq2Seq**  
Advantages: Better at handling context  
Limitations: Requires large amounts of parallel data for training  

**T5**  
Advantages: Can perform multiple NLP tasks, strong translation capabilities  
Limitations: Computationally expensive to train and run  

**GPT**  
Advantages: Produces creative and fluent translations  
Limitations: Not specifically built for translation tasks  

**M2M-100**  
Advantages: Supports direct translation between 100+ languages without relying on English  
Limitations: Very large model size, needs more resources to deploy

---

## 8. Conclusion

This comparison shows that transformer-based models (like T5 and M2M-100) generally outperform traditional models such as SMT and Seq2Seq when it comes to translation quality, with M2M-100 achieving the highest BLEU score in our tests.

However, the best model depends on the use case:

- For quick and simple translations: **SMT**  
- For general-purpose translations: **Seq2Seq**  
- For multilingual translation tasks: **T5** or **M2M-100**  
- For creative and flexible translations: **GPT**

**Recommendation for beginners:**

1. Start with SMT to understand how rule-based and statistical translations work  
2. Move to Seq2Seq to explore how neural networks improve translation  
3. Eventually explore transformer-based models like T5, GPT, or M2M-100 for state-of-the-art results
