# Assignment 6: Machine Translation System
## English ‚Üî Hindi Translation for Public Information Content

This notebook implements a Machine Translation system to translate public information content between English and Hindi (an Indian language).

### Topics Covered:
1. Introduction to Neural Machine Translation
2. Using Pre-trained Translation Models (Hugging Face Transformers)
3. English to Hindi Translation
4. Hindi to English Translation
5. Translating Public Information Content
6. Evaluation using BLEU Score

## 1. Install Required Libraries

In [None]:
# Install required packages
!pip install transformers torch sentencepiece sacremoses sacrebleu

## 2. Import Libraries

In [None]:
import warnings
warnings.filterwarnings('ignore')

from transformers import MarianMTModel, MarianTokenizer
from transformers import pipeline
import torch
import sacrebleu

print("Libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

## 3. Load Translation Models

We use Helsinki-NLP's MarianMT models from Hugging Face:
- **English ‚Üí Hindi**: `Helsinki-NLP/opus-mt-en-hi`
- **Hindi ‚Üí English**: `Helsinki-NLP/opus-mt-hi-en`

In [None]:
# English to Hindi Model
print("Loading English to Hindi translation model...")
en_hi_model_name = "Helsinki-NLP/opus-mt-en-hi"
en_hi_tokenizer = MarianTokenizer.from_pretrained(en_hi_model_name)
en_hi_model = MarianMTModel.from_pretrained(en_hi_model_name)
print("English to Hindi model loaded successfully!")

In [None]:
# Hindi to English Model
print("Loading Hindi to English translation model...")
hi_en_model_name = "Helsinki-NLP/opus-mt-hi-en"
hi_en_tokenizer = MarianTokenizer.from_pretrained(hi_en_model_name)
hi_en_model = MarianMTModel.from_pretrained(hi_en_model_name)
print("Hindi to English model loaded successfully!")

## 4. Define Translation Functions

In [None]:
def translate_en_to_hi(text):
    """
    Translate English text to Hindi
    
    Args:
        text (str): English text to translate
    
    Returns:
        str: Translated Hindi text
    """
    # Tokenize the input text
    inputs = en_hi_tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
    
    # Generate translation
    with torch.no_grad():
        translated = en_hi_model.generate(**inputs)
    
    # Decode the translated text
    translated_text = en_hi_tokenizer.decode(translated[0], skip_special_tokens=True)
    
    return translated_text


def translate_hi_to_en(text):
    """
    Translate Hindi text to English
    
    Args:
        text (str): Hindi text to translate
    
    Returns:
        str: Translated English text
    """
    # Tokenize the input text
    inputs = hi_en_tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
    
    # Generate translation
    with torch.no_grad():
        translated = hi_en_model.generate(**inputs)
    
    # Decode the translated text
    translated_text = hi_en_tokenizer.decode(translated[0], skip_special_tokens=True)
    
    return translated_text


print("Translation functions defined successfully!")

## 5. Test Basic Translation

In [None]:
# Test English to Hindi Translation
test_sentences_en = [
    "Hello, how are you?",
    "Welcome to India.",
    "Education is the key to success.",
    "Please wash your hands regularly.",
    "The weather is very pleasant today."
]

print("=" * 70)
print("ENGLISH TO HINDI TRANSLATION")
print("=" * 70)

for sentence in test_sentences_en:
    hindi_translation = translate_en_to_hi(sentence)
    print(f"\nEnglish: {sentence}")
    print(f"Hindi:   {hindi_translation}")
    print("-" * 50)

In [None]:
# Test Hindi to English Translation
test_sentences_hi = [
    "‡§®‡§Æ‡§∏‡•ç‡§§‡•á, ‡§Ü‡§™ ‡§ï‡•à‡§∏‡•á ‡§π‡•à‡§Ç?",
    "‡§≠‡§æ‡§∞‡§§ ‡§Æ‡•á‡§Ç ‡§Ü‡§™‡§ï‡§æ ‡§∏‡•ç‡§µ‡§æ‡§ó‡§§ ‡§π‡•à‡•§",
    "‡§∂‡§ø‡§ï‡•ç‡§∑‡§æ ‡§∏‡§´‡§≤‡§§‡§æ ‡§ï‡•Ä ‡§ï‡•Å‡§Ç‡§ú‡•Ä ‡§π‡•à‡•§",
    "‡§ï‡•É‡§™‡§Ø‡§æ ‡§Ö‡§™‡§®‡•á ‡§π‡§æ‡§• ‡§®‡§ø‡§Ø‡§Æ‡§ø‡§§ ‡§∞‡•Ç‡§™ ‡§∏‡•á ‡§ß‡•ã‡§è‡§Ç‡•§",
    "‡§Ü‡§ú ‡§Æ‡•å‡§∏‡§Æ ‡§¨‡§π‡•Å‡§§ ‡§∏‡•Å‡§π‡§æ‡§µ‡§®‡§æ ‡§π‡•à‡•§"
]

print("=" * 70)
print("HINDI TO ENGLISH TRANSLATION")
print("=" * 70)

for sentence in test_sentences_hi:
    english_translation = translate_hi_to_en(sentence)
    print(f"\nHindi:   {sentence}")
    print(f"English: {english_translation}")
    print("-" * 50)

## 6. Translating Public Information Content

Now let's translate various types of public information content including:
- Government announcements
- Health advisories
- Educational information
- Public safety messages
- Transportation information

In [None]:
# Public Information Content - English
public_info_english = {
    "Government Notice": [
        "All citizens must carry valid identification documents.",
        "The new tax policy will be effective from next month.",
        "Voter registration deadline is approaching. Register now."
    ],
    "Health Advisory": [
        "Drink plenty of water and stay hydrated during summer.",
        "Vaccination is important for preventing diseases.",
        "Consult a doctor if you experience any symptoms."
    ],
    "Public Safety": [
        "Always wear a helmet while riding a motorcycle.",
        "Do not drink and drive. Stay safe on roads.",
        "In case of emergency, call the helpline number."
    ],
    "Education": [
        "School admissions are now open for the academic year.",
        "Free textbooks will be distributed to all students.",
        "Online classes are available for remote learners."
    ],
    "Transportation": [
        "The metro service will be available from 6 AM to 11 PM.",
        "Bus fares have been revised. Please check the new rates.",
        "Traffic diversions are in place due to road construction."
    ]
}

print("=" * 80)
print("PUBLIC INFORMATION CONTENT: ENGLISH TO HINDI TRANSLATION")
print("=" * 80)

for category, messages in public_info_english.items():
    print(f"\n{'='*40}")
    print(f"Category: {category}")
    print(f"{'='*40}")
    
    for msg in messages:
        hindi_translation = translate_en_to_hi(msg)
        print(f"\nüìå English: {msg}")
        print(f"üìå Hindi:   {hindi_translation}")

In [None]:
# Public Information Content - Hindi
public_info_hindi = {
    "‡§∏‡§∞‡§ï‡§æ‡§∞‡•Ä ‡§∏‡•Ç‡§ö‡§®‡§æ": [
        "‡§∏‡§≠‡•Ä ‡§®‡§æ‡§ó‡§∞‡§ø‡§ï‡•ã‡§Ç ‡§ï‡•ã ‡§µ‡•à‡§ß ‡§™‡§π‡§ö‡§æ‡§® ‡§™‡§§‡•ç‡§∞ ‡§∞‡§ñ‡§®‡§æ ‡§Ö‡§®‡§ø‡§µ‡§æ‡§∞‡•ç‡§Ø ‡§π‡•à‡•§",
        "‡§®‡§à ‡§ï‡§∞ ‡§®‡•Ä‡§§‡§ø ‡§Ö‡§ó‡§≤‡•á ‡§Æ‡§π‡•Ä‡§®‡•á ‡§∏‡•á ‡§≤‡§æ‡§ó‡•Ç ‡§π‡•ã‡§ó‡•Ä‡•§",
        "‡§Æ‡§§‡§¶‡§æ‡§§‡§æ ‡§™‡§Ç‡§ú‡•Ä‡§ï‡§∞‡§£ ‡§ï‡•Ä ‡§Ö‡§Ç‡§§‡§ø‡§Æ ‡§§‡§ø‡§•‡§ø ‡§®‡§ø‡§ï‡§ü ‡§π‡•à‡•§ ‡§Ö‡§≠‡•Ä ‡§™‡§Ç‡§ú‡•Ä‡§ï‡§∞‡§£ ‡§ï‡§∞‡•á‡§Ç‡•§"
    ],
    "‡§∏‡•ç‡§µ‡§æ‡§∏‡•ç‡§•‡•ç‡§Ø ‡§∏‡§≤‡§æ‡§π": [
        "‡§ó‡§∞‡•ç‡§Æ‡§ø‡§Ø‡•ã‡§Ç ‡§Æ‡•á‡§Ç ‡§ñ‡•Ç‡§¨ ‡§™‡§æ‡§®‡•Ä ‡§™‡§ø‡§è‡§Ç ‡§î‡§∞ ‡§π‡§æ‡§á‡§°‡•ç‡§∞‡•á‡§ü‡•á‡§° ‡§∞‡§π‡•á‡§Ç‡•§",
        "‡§¨‡•Ä‡§Æ‡§æ‡§∞‡§ø‡§Ø‡•ã‡§Ç ‡§∏‡•á ‡§¨‡§ö‡§æ‡§µ ‡§ï‡•á ‡§≤‡§ø‡§è ‡§ü‡•Ä‡§ï‡§æ‡§ï‡§∞‡§£ ‡§Æ‡§π‡§§‡•ç‡§µ‡§™‡•Ç‡§∞‡•ç‡§£ ‡§π‡•à‡•§",
        "‡§Ø‡§¶‡§ø ‡§Ü‡§™‡§ï‡•ã ‡§ï‡•ã‡§à ‡§≤‡§ï‡•ç‡§∑‡§£ ‡§¶‡§ø‡§ñ‡•á ‡§§‡•ã ‡§°‡•â‡§ï‡•ç‡§ü‡§∞ ‡§∏‡•á ‡§™‡§∞‡§æ‡§Æ‡§∞‡•ç‡§∂ ‡§ï‡§∞‡•á‡§Ç‡•§"
    ],
    "‡§∏‡§æ‡§∞‡•ç‡§µ‡§ú‡§®‡§ø‡§ï ‡§∏‡•Å‡§∞‡§ï‡•ç‡§∑‡§æ": [
        "‡§Æ‡•ã‡§ü‡§∞‡§∏‡§æ‡§á‡§ï‡§ø‡§≤ ‡§ö‡§≤‡§æ‡§§‡•á ‡§∏‡§Æ‡§Ø ‡§π‡§Æ‡•á‡§∂‡§æ ‡§π‡•á‡§≤‡§Æ‡•á‡§ü ‡§™‡§π‡§®‡•á‡§Ç‡•§",
        "‡§∂‡§∞‡§æ‡§¨ ‡§™‡•Ä‡§ï‡§∞ ‡§ó‡§æ‡§°‡§º‡•Ä ‡§® ‡§ö‡§≤‡§æ‡§è‡§Ç‡•§ ‡§∏‡§°‡§º‡§ï‡•ã‡§Ç ‡§™‡§∞ ‡§∏‡•Å‡§∞‡§ï‡•ç‡§∑‡§ø‡§§ ‡§∞‡§π‡•á‡§Ç‡•§",
        "‡§Ü‡§™‡§æ‡§§‡§ï‡§æ‡§≤ ‡§Æ‡•á‡§Ç ‡§π‡•á‡§≤‡•ç‡§™‡§≤‡§æ‡§á‡§® ‡§®‡§Ç‡§¨‡§∞ ‡§™‡§∞ ‡§ï‡•â‡§≤ ‡§ï‡§∞‡•á‡§Ç‡•§"
    ]
}

print("=" * 80)
print("PUBLIC INFORMATION CONTENT: HINDI TO ENGLISH TRANSLATION")
print("=" * 80)

for category, messages in public_info_hindi.items():
    print(f"\n{'='*40}")
    print(f"Category: {category}")
    print(f"{'='*40}")
    
    for msg in messages:
        english_translation = translate_hi_to_en(msg)
        print(f"\nüìå Hindi:   {msg}")
        print(f"üìå English: {english_translation}")

## 7. Batch Translation Function

In [None]:
def batch_translate(texts, direction="en_to_hi"):
    """
    Translate a batch of texts
    
    Args:
        texts (list): List of texts to translate
        direction (str): Translation direction - 'en_to_hi' or 'hi_to_en'
    
    Returns:
        list: List of translated texts
    """
    if direction == "en_to_hi":
        tokenizer = en_hi_tokenizer
        model = en_hi_model
    else:
        tokenizer = hi_en_tokenizer
        model = hi_en_model
    
    # Tokenize all texts
    inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512)
    
    # Generate translations
    with torch.no_grad():
        translated = model.generate(**inputs)
    
    # Decode all translations
    translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
    
    return translated_texts


# Test batch translation
batch_texts = [
    "Government offices will remain closed on public holidays.",
    "Please stand in queue and maintain social distancing.",
    "Free medical camps will be organized next week."
]

print("Batch Translation Test:")
print("=" * 60)

hindi_translations = batch_translate(batch_texts, "en_to_hi")

for eng, hin in zip(batch_texts, hindi_translations):
    print(f"\nEnglish: {eng}")
    print(f"Hindi:   {hin}")

## 8. Evaluation Using BLEU Score

BLEU (Bilingual Evaluation Understudy) Score is a metric for evaluating machine translation quality by comparing the machine translation output with reference translations.

In [None]:
def calculate_bleu_score(hypothesis, reference):
    """
    Calculate BLEU score for a single translation
    
    Args:
        hypothesis (str): Machine translated text
        reference (str): Reference (human) translation
    
    Returns:
        float: BLEU score
    """
    bleu = sacrebleu.sentence_bleu(hypothesis, [reference])
    return bleu.score


def calculate_corpus_bleu(hypotheses, references):
    """
    Calculate BLEU score for a corpus of translations
    
    Args:
        hypotheses (list): List of machine translated texts
        references (list): List of reference translations
    
    Returns:
        float: Corpus BLEU score
    """
    bleu = sacrebleu.corpus_bleu(hypotheses, [references])
    return bleu.score


print("BLEU score functions defined!")

In [None]:
# Evaluation with reference translations
# Test set with English sentences and their Hindi references

test_pairs = [
    {
        "english": "Water is essential for life.",
        "hindi_reference": "‡§ú‡§≤ ‡§ú‡•Ä‡§µ‡§® ‡§ï‡•á ‡§≤‡§ø‡§è ‡§Ü‡§µ‡§∂‡•ç‡§Ø‡§ï ‡§π‡•à‡•§"
    },
    {
        "english": "Education empowers people.",
        "hindi_reference": "‡§∂‡§ø‡§ï‡•ç‡§∑‡§æ ‡§≤‡•ã‡§ó‡•ã‡§Ç ‡§ï‡•ã ‡§∏‡§∂‡§ï‡•ç‡§§ ‡§¨‡§®‡§æ‡§§‡•Ä ‡§π‡•à‡•§"
    },
    {
        "english": "Health is wealth.",
        "hindi_reference": "‡§∏‡•ç‡§µ‡§æ‡§∏‡•ç‡§•‡•ç‡§Ø ‡§π‡•Ä ‡§ß‡§® ‡§π‡•à‡•§"
    },
    {
        "english": "India is a democratic country.",
        "hindi_reference": "‡§≠‡§æ‡§∞‡§§ ‡§è‡§ï ‡§≤‡•ã‡§ï‡§§‡§æ‡§Ç‡§§‡•ç‡§∞‡§ø‡§ï ‡§¶‡•á‡§∂ ‡§π‡•à‡•§"
    },
    {
        "english": "Please follow traffic rules.",
        "hindi_reference": "‡§ï‡•É‡§™‡§Ø‡§æ ‡§Ø‡§æ‡§§‡§æ‡§Ø‡§æ‡§§ ‡§®‡§ø‡§Ø‡§Æ‡•ã‡§Ç ‡§ï‡§æ ‡§™‡§æ‡§≤‡§® ‡§ï‡§∞‡•á‡§Ç‡•§"
    }
]

print("=" * 80)
print("TRANSLATION EVALUATION WITH BLEU SCORES")
print("=" * 80)

hypotheses = []
references = []

for pair in test_pairs:
    # Get machine translation
    machine_translation = translate_en_to_hi(pair["english"])
    
    # Calculate BLEU score
    bleu_score = calculate_bleu_score(machine_translation, pair["hindi_reference"])
    
    hypotheses.append(machine_translation)
    references.append(pair["hindi_reference"])
    
    print(f"\nüìù English:           {pair['english']}")
    print(f"ü§ñ Machine Translation: {machine_translation}")
    print(f"‚úÖ Reference:          {pair['hindi_reference']}")
    print(f"üìä BLEU Score:         {bleu_score:.2f}")
    print("-" * 60)

# Calculate corpus BLEU
corpus_bleu = calculate_corpus_bleu(hypotheses, references)
print(f"\n{'='*60}")
print(f"üìà CORPUS BLEU SCORE: {corpus_bleu:.2f}")
print(f"{'='*60}")

## 9. Interactive Translation System

In [None]:
class MachineTranslationSystem:
    """
    A complete Machine Translation System for English-Hindi translation
    """
    
    def __init__(self):
        self.en_hi_tokenizer = en_hi_tokenizer
        self.en_hi_model = en_hi_model
        self.hi_en_tokenizer = hi_en_tokenizer
        self.hi_en_model = hi_en_model
        print("Machine Translation System initialized!")
    
    def translate(self, text, source_lang="en", target_lang="hi"):
        """
        Translate text between English and Hindi
        
        Args:
            text (str): Text to translate
            source_lang (str): Source language ('en' or 'hi')
            target_lang (str): Target language ('en' or 'hi')
        
        Returns:
            str: Translated text
        """
        if source_lang == "en" and target_lang == "hi":
            return translate_en_to_hi(text)
        elif source_lang == "hi" and target_lang == "en":
            return translate_hi_to_en(text)
        else:
            return "Invalid language pair. Supported: en‚Üîhi"
    
    def translate_document(self, sentences, source_lang="en", target_lang="hi"):
        """
        Translate a list of sentences
        
        Args:
            sentences (list): List of sentences to translate
            source_lang (str): Source language
            target_lang (str): Target language
        
        Returns:
            list: List of translated sentences
        """
        return [self.translate(s, source_lang, target_lang) for s in sentences]
    
    def round_trip_translation(self, text, start_lang="en"):
        """
        Perform round-trip translation (translate and back-translate)
        
        Args:
            text (str): Original text
            start_lang (str): Starting language
        
        Returns:
            dict: Original, intermediate, and back-translated text
        """
        if start_lang == "en":
            intermediate = self.translate(text, "en", "hi")
            back_translated = self.translate(intermediate, "hi", "en")
        else:
            intermediate = self.translate(text, "hi", "en")
            back_translated = self.translate(intermediate, "en", "hi")
        
        return {
            "original": text,
            "intermediate": intermediate,
            "back_translated": back_translated
        }


# Create instance
mt_system = MachineTranslationSystem()

In [None]:
# Test the Translation System
print("=" * 70)
print("MACHINE TRANSLATION SYSTEM DEMO")
print("=" * 70)

# Single translation
test_text = "The government has launched a new scheme for farmers."
print(f"\nüîπ Single Translation:")
print(f"   Input (English):  {test_text}")
print(f"   Output (Hindi):   {mt_system.translate(test_text, 'en', 'hi')}")

# Document translation
document = [
    "The new policy aims to improve public health.",
    "Citizens are encouraged to participate in community programs.",
    "Digital services are now available online."
]

print(f"\nüîπ Document Translation:")
translations = mt_system.translate_document(document)
for orig, trans in zip(document, translations):
    print(f"   EN: {orig}")
    print(f"   HI: {trans}")
    print()

In [None]:
# Round-trip translation test
print("=" * 70)
print("ROUND-TRIP TRANSLATION TEST")
print("=" * 70)

test_sentences = [
    "Clean drinking water is a basic human right.",
    "Every child deserves quality education.",
    "Road safety saves lives."
]

for sentence in test_sentences:
    result = mt_system.round_trip_translation(sentence, "en")
    print(f"\nüìå Original (EN):        {result['original']}")
    print(f"üìå Translated (HI):      {result['intermediate']}")
    print(f"üìå Back-translated (EN): {result['back_translated']}")
    print("-" * 60)

## 10. Summary and Conclusion

### What We Accomplished:

1. **Built a Machine Translation System** for English ‚Üî Hindi translation using pre-trained transformer models.

2. **Implemented Translation Functions** for both directions:
   - English ‚Üí Hindi
   - Hindi ‚Üí English

3. **Translated Public Information Content** including:
   - Government notices
   - Health advisories
   - Public safety messages
   - Educational information
   - Transportation updates

4. **Evaluated Translation Quality** using BLEU scores

5. **Created a Complete Translation System Class** with features like:
   - Single text translation
   - Document translation
   - Round-trip translation

### Key Technologies Used:
- **Hugging Face Transformers** - For pre-trained translation models
- **MarianMT** - Helsinki-NLP's neural machine translation models
- **SacreBLEU** - For translation evaluation
- **PyTorch** - Deep learning framework

### Limitations:
- Translation quality depends on the training data
- Some idiomatic expressions may not translate perfectly
- Technical or domain-specific terms may need fine-tuning

### Future Improvements:
- Fine-tune models on domain-specific data
- Add support for more Indian languages
- Implement post-editing suggestions
- Create a web interface for easy access

In [None]:
print("="*60)
print("‚úÖ Assignment 6: Machine Translation System - Completed!")
print("="*60)
print("\nThis system can translate public information content")
print("between English and Hindi (an Indian language).")
print("\nKey Features:")
print("‚Ä¢ English to Hindi translation")
print("‚Ä¢ Hindi to English translation")
print("‚Ä¢ Batch translation support")
print("‚Ä¢ BLEU score evaluation")
print("‚Ä¢ Round-trip translation testing")