# Zero-shot Multilingual Translation using Pretrained Models

Zero-shot translation allows you to translate between language pairs **without any fine-tuning**, using models that have been trained on large multilingual corpora. These models already understand multiple languages, making them capable of handling direct translation across many pairs.

In this section, we demonstrate **zero-shot translation** using:

* facebook/m2m100_418M
* facebook/mbart-large-50-many-to-many-mmt

In [None]:
!pip install transformers torch accelerate --quiet

### 🔹 M2M100: Tamil → Kannada

The **M2M100** model supports over 100 languages without relying on English as a pivot. Here, we directly translate from **Tamil to Kannada** without any training or intermediate step.

#### Steps:

1. Load the M2M100 model and tokenizer.
2. Specify source language (`ta` for Tamil).
3. Encode the Tamil sentence.
4. Set the target language to Kannada (`kn`).
5. Generate and decode the translation.

In [9]:
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

# Load model and tokenizer
model_name = "facebook/m2m100_418M"
tokenizer = M2M100Tokenizer.from_pretrained(model_name)
model = M2M100ForConditionalGeneration.from_pretrained(model_name)

# Tamil input sentence
src_text = "இந்த மடிக்கணினி வேகமாக செயல்படுகிறது"  # "This laptop is fast"
tokenizer.src_lang = "ta"  # Tamil

# Tokenize input
encoded = tokenizer(src_text, return_tensors="pt")

# Translate → Kannada
generated_tokens = model.generate(
    **encoded,
    forced_bos_token_id=tokenizer.get_lang_id("kn")  # Kannada
)

# Decode
translated = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
print("Translated to Kannada:", translated)


Translated to Kannada: ಈ ಕ್ಯಾಮೆರಾ ಅಪ್ಲಿಕೇಶನ್ ಅಪ್ಲಿಕೇಶನ್


### 🔹 mBART: English → Hindi

The **mBART (Multilingual BART)** model is another powerful model trained on many-to-many translation. It requires setting both source and target languages using special language codes.

#### Steps:

1. Load the mBART model and tokenizer.
2. Set the source language code (`en_XX`).
3. Encode the English sentence.
4. Set the target language to Hindi (`hi_IN`).
5. Generate and decode the translation.


In [6]:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

# Load model & tokenizer
model_name = "facebook/mbart-large-50-many-to-many-mmt"
tokenizer = MBart50TokenizerFast.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)

# Source sentence (English)
text = "The weather is nice today."
tokenizer.src_lang = "en_XX"

# Tokenize
encoded = tokenizer(text, return_tensors="pt")

# Translate to Hindi
generated_tokens = model.generate(
    **encoded,
    forced_bos_token_id=tokenizer.lang_code_to_id["hi_IN"]
)

# Decode
translated_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
print("Translated:", translated_text)

Translated: आज मौसम अच्छा है।
