# Pivot Translation with M2M100

**Pivot translation** is a technique used in **low-resource language translation**, where the direct translation between two languages may not perform well due to lack of data. Instead, translation is performed in two stages:

* **Source ‚Üí Pivot (High-resource Language)**
* **Pivot ‚Üí Target**

In this example, we demonstrate **Kannada ‚Üí English ‚Üí Hindi** translation using Facebook‚Äôs `M2M100` model.

---

### Why Pivot Translation?

Some languages (like Kannada or Tamil) may not have rich parallel corpora with every target language. Instead of training a direct model, we leverage a common intermediate language ‚Äî typically English ‚Äî which is well-supported and allows for more accurate two-step translation.

---

### Setup

We use `facebook/m2m100_418M`, a multilingual model trained on many language pairs without needing English as a bridge.





In [None]:
!pip install transformers datasets torch accelerate --quiet

In [29]:
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

# Load model and tokenizer
model_name = "facebook/m2m100_418M"
tokenizer = M2M100Tokenizer.from_pretrained(model_name)
model = M2M100ForConditionalGeneration.from_pretrained(model_name)

### üîπ Step 1: Kannada ‚Üí English

First, we translate the Kannada sentence `"‡≤®‡≤æ‡≤®‡≥Å ‡≤à ‡≤™‡≥Å‡≤∏‡≥ç‡≤§‡≤ï‡≤µ‡≤®‡≥ç‡≤®‡≥Å ‡≤á‡≤∑‡≥ç‡≤ü‡≤™‡≤ü‡≥ç‡≤ü‡≥Ü‡≤®‡≥Ü"` (‚ÄúI liked this book‚Äù) to English.

In [None]:
kannada_text = "‡≤®‡≤æ‡≤®‡≥Å ‡≤à ‡≤™‡≥Å‡≤∏‡≥ç‡≤§‡≤ï‡≤µ‡≤®‡≥ç‡≤®‡≥Å ‡≤á‡≤∑‡≥ç‡≤ü‡≤™‡≤ü‡≥ç‡≤ü‡≥Ü‡≤®‡≥Ü"
tokenizer.src_lang = "kn"
encoded_kn = tokenizer(kannada_text, return_tensors="pt")

english_tokens = model.generate(
    **encoded_kn,
    forced_bos_token_id=tokenizer.get_lang_id("en")
)
english_translation = tokenizer.decode(english_tokens[0], skip_special_tokens=True)
print("Step 1 - Kannada to English:", english_translation)

### üîπ Step 2: English ‚Üí Hindi

Now, the English translation is translated into Hindi using the same model.

In [None]:
tokenizer.src_lang = "en"
encoded_en = tokenizer(english_translation, return_tensors="pt")

hindi_tokens = model.generate(
    **encoded_en,
    forced_bos_token_id=tokenizer.get_lang_id("hi")
)
hindi_translation = tokenizer.decode(hindi_tokens[0], skip_special_tokens=True)
print("Step 2 - English to Hindi:", hindi_translation)

### Summary

| Stage             | Input (Source)                   | Output (Target)                                   |
| ----------------- | -------------------------------- | ------------------------------------------------- |
| Kannada ‚Üí English | "‡≤®‡≤æ‡≤®‡≥Å ‡≤à ‡≤™‡≥Å‡≤∏‡≥ç‡≤§‡≤ï‡≤µ‡≤®‡≥ç‡≤®‡≥Å ‡≤á‡≤∑‡≥ç‡≤ü‡≤™‡≤ü‡≥ç‡≤ü‡≥Ü‡≤®‡≥Ü" | `"I liked this book"`                             |
| English ‚Üí Hindi   | `"I liked this book"`            | Hindi translation (e.g., "‡§Ü‡§™‡§ï‡•Ä ‡§Ü‡§Ç‡§ñ‡•ã‡§Ç ‡§ï‡•Ä ‡§ö‡§™‡•á‡§ü ‡§Æ‡•á‡§Ç") |

This pivoting approach can be especially useful for **zero-resource** or **low-resource** language pairs using a common multilingual translation backbone.