#Introduction

We use Facebook’s MBart-large-50 model, fine-tuned for English to multiple languages.

This notebook uses the facebook/mbart-large-50-one-to-many-mmt model, which supports translation from English to over 50 languages, including Indian languages like Tamil and Hindi.

Key Features:

Supports Tamil (ta_IN), Hindi (hi_IN), and others.

Simple interface: enter English text → choose target language → get translation.


#Setup

Install and verify the required packages. This ensures compatibility with the MBart model.

In [3]:
# Install required packages (run only once)
!pip install transformers -U -q
!pip install sentencepiece
!pip freeze | grep transformers  # Verify versions

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.6/11.6 MB[0m [31m139.5 MB/s[0m eta [36m0:00:00[0m
sentence-transformers==5.1.0
transformers==4.56.2


#Import Libraries

Load the necessary libraries for model handling and translation.

In [4]:
# Import necessary libraries
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
import ipywidgets as widgets  # For interactive UI
from IPython.display import display

#Load Model and Tokenizer

Initialize the MBart model and tokenizer. The model is pre-trained for one-to-many translation from English.



In [5]:
# Load the pre-trained model and tokenizer
try:
    model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-one-to-many-mmt")
    tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-one-to-many-mmt", src_lang="en_XX")
    print("Model and tokenizer loaded successfully!")
except Exception as e:
    print(f"Error loading model: {e}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.44G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.44G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/261 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/528 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/717 [00:00<?, ?B/s]

Model and tokenizer loaded successfully!


#Translation Function

Define a reusable function to translate English text to a specified target language. The function includes error handling and a maximum output length to ensure concise translations.

In [6]:
# Define a reusable translation function
def translate_text(input_text, target_lang_code):
    """
    Translates English text to the specified target language.

    Args:
        input_text (str): English text to translate.
        target_lang_code (str): Language code (e.g., 'ta_IN' for Tamil, 'hi_IN' for Hindi).

    Returns:
        str: Translated text or error message.
    """
    if not input_text.strip():
        return "Error: Input text is empty."

    try:
        # Tokenize input
        model_inputs = tokenizer(input_text, return_tensors="pt")

        # Generate translation
        generated_tokens = model.generate(
            **model_inputs,
            forced_bos_token_id=tokenizer.lang_code_to_id[target_lang_code],
            max_length=200  # Prevent overly long outputs
        )

        # Decode and return
        translation = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
        return translation
    except KeyError:
        return f"Error: Unsupported language code '{target_lang_code}'. Supported examples: ta_IN (Tamil), hi_IN (Hindi), fr_XX (French), es_XX (Spanish)."
    except Exception as e:
        return f"Translation error: {e}"

#Translation Examples

Demonstrate the model's capability by translating sample texts into multiple languages: Tamil, Hindi, French, and Spanish. This showcases the model's multilingual support.



In [7]:
# Define example texts and target languages
examples = [
    ("Zully Broussard's gift was data processing of genetic profiles from donor-recipient pairs.", "ta_IN"),  # Tamil
    ("Zully Broussard's gift was data processing of genetic profiles from donor-recipient pairs.", "hi_IN"),  # Hindi
    ("Hello, how are you?", "fr_XX"),  # French
    ("The quick brown fox jumps over the lazy dog.", "es_XX")  # Spanish
]

# Display translations
print("Translation Examples:")
print("-" * 50)
for text, lang in examples:
    translated = translate_text(text, lang)
    print(f"Original (English): {text}")
    print(f"Translated ({lang}): {translated}")
    print("-" * 50)

Translation Examples:
--------------------------------------------------
Original (English): Zully Broussard's gift was data processing of genetic profiles from donor-recipient pairs.
Translated (ta_IN): சுல்லி புரூசார்ட் கொடுத்த நன்கொடை, நிதியளிப்பவர்-நன்கொடை பெறுபவர் ஜோடிகளில் இருந்து மரபணு விவரங்களைத் தணிக்கை செய்வதாகும்.
--------------------------------------------------
Original (English): Zully Broussard's gift was data processing of genetic profiles from donor-recipient pairs.
Translated (hi_IN): जुली ब्रूसार्ड का उपहार दानकर्ता-ग्राही जोड़ों के आनुवंशिक प्रोफाइलों का डेटा संसाधन था।
--------------------------------------------------
Original (English): Hello, how are you?
Translated (fr_XX): Bonjour, comment est-il?
--------------------------------------------------
Original (English): The quick brown fox jumps over the lazy dog.
Translated (es_XX): El caballo bruno rápido salta sobre el perro escaso.
--------------------------------------------------
