<u><h1><b>AI-Powered Multi-Language Translator</b></h1></u>
This project involves building a generative AI-powered translator that enables natural and context-aware text translation across multiple languages. We will explore NLP concepts, prompt engineering, and large language model integration for seamless multilingual communication.
<hr>
The application is built using:<br>
1. Transformers library (for using pre-trained translation models)<br>
2. Gradio (for creating a simple and interactive web interface)<br>
3. Python / Google Colab (for development and deployment)<br>

<pre>
Model: <u>facebook/m2m100_418M</u>
</pre>

<pre>
supported_languages = {
    "en": "English",
    "fr": "French",
    "de": "German",
    "hi": "Hindi",
    # and more...
}
</pre>

Step 1 :- <b>Install Required Libraries</b><br>

In [None]:
!pip install --quiet transformers sentencepiece langdetect

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/981.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m31.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for langdetect (setup.py) ... [?25l[?25hdone


Step 2 :- <b>Import Dependencies</b>

In [None]:
from huggingface_hub import login
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
import torch
from google.colab import userdata
from langdetect import detect
import gradio as gr

Step 3 :- Configure Huggingface <b>Access Token</b> and <i>Supported Languages</i> for <i>Translation</i>

In [None]:
hf_token = userdata.get('HF_TOKEN')
login(token=hf_token)

# List of supported languages
supported_languages = [
    'en',  # English
    'fr',  # French
    'de',  # German
    'es',  # Spanish
    'it',  # Italian
    'ru',  # Russian
    'zh',  # Chinese
    'ar',  # Arabic
    'hi',  # Hindi
    'bn',  # Bengali
    'ja',  # Japanese
    'ko',  # Korean
    'pt',  # Portuguese
    'tr',  # Turkish
    'vi',  # Vietnamese
]

# Map language code to readable name for user
lang_names = {
    'en': 'English', 'fr': 'French', 'de': 'German', 'es': 'Spanish', 'it': 'Italian',
    'ru': 'Russian', 'zh': 'Chinese', 'ar': 'Arabic', 'hi': 'Hindi', 'bn': 'Bengali',
    'ja': 'Japanese', 'ko': 'Korean', 'pt': 'Portuguese', 'tr': 'Turkish', 'vi': 'Vietnamese'
}


Step 4 :- <b>Download Model from Huggingface</b>

In [None]:
# Load model and tokenizer
MODEL_NAME = "facebook/m2m100_418M"
tokenizer = M2M100Tokenizer.from_pretrained(MODEL_NAME)
model = M2M100ForConditionalGeneration.from_pretrained(MODEL_NAME)
print(f"✅ Model & tokenizer `{MODEL_NAME}` loaded.")

✅ Model & tokenizer `facebook/m2m100_418M` loaded.


Step 5 :- <b>Load Model to GPU for <i>faster Inference</i></b>

In [None]:
# Use CUDA if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

print("Using device:", device)

Using device: cuda


Step 6 :- Accept User Input Language Code

In [None]:
# For translation, set source language.
print("Choose source language code:")
for code, name in lang_names.items():
    print(f"{code}: {name}")
src_lang = input("Enter your source language code: ").strip()
tokenizer.src_lang = src_lang

Choose source language code:
en: English
fr: French
de: German
es: Spanish
it: Italian
ru: Russian
zh: Chinese
ar: Arabic
hi: Hindi
bn: Bengali
ja: Japanese
ko: Korean
pt: Portuguese
tr: Turkish
vi: Vietnamese
Enter your source language code: en


Step 7 :- Accept User's Input

In [None]:
# User input
input_text = input("Enter text: ").strip()

Enter text: hello my name is ansh


Step 8 :- Accept Output Language Code

In [None]:
# For translation, set target language.
print("Choose target language code:")
for code, name in lang_names.items():
    print(f"{code}: {name}")
tgt_lang = input("Enter language code from above: ").strip()

if tgt_lang not in supported_languages:
    print("Language not supported.")

Choose target language code:
en: English
fr: French
de: German
es: Spanish
it: Italian
ru: Russian
zh: Chinese
ar: Arabic
hi: Hindi
bn: Bengali
ja: Japanese
ko: Korean
pt: Portuguese
tr: Turkish
vi: Vietnamese
Enter language code from above: hi


In [None]:
encoded = tokenizer(input_text, return_tensors="pt").to(device)
encoded

{'input_ids': tensor([[128022, 110013,   1949,  33969,    117,     48,   1537,      2]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

In [None]:
generated_tokens = model.generate(
    **encoded,
    forced_bos_token_id=tokenizer.get_lang_id(tgt_lang)
    )
generated_tokens

tensor([[     2, 128036,    776,  10484,  57545,  15392,   5220,   3844,   3188,
            776,      2]], device='cuda:0')

In [None]:
translation = tokenizer.batch_decode(
  generated_tokens, skip_special_tokens=True)[0]
translation

'हैलो मेरा नाम एन्श है'

In [None]:
print(f"Translation ({lang_names.get(tgt_lang, tgt_lang)}):", translation)

Translation (Hindi): हैलो मेरा नाम एन्श है


<h2>Integrating all logic into seperate function</h2>

In [None]:
from_choices = [("Auto-detect", "auto")] + [
    (lang_names[code], code) for code in supported_languages
]
to_choices = [(lang_names[code], code) for code in supported_languages]

def translate(text, from_lang, to_lang):
    if not text:
        return ""
    # 1) auto-detect if needed
    if from_lang == "auto":
        guessed = detect(text)
        if guessed in supported_languages:
            from_lang = guessed
    # 2) encode & generate
    inputs = tokenizer(text, return_tensors="pt", padding=True).to(device)
    outputs = model.generate(
        **inputs,
        forced_bos_token_id=tokenizer.get_lang_id(to_lang)
    )
    # 3) decode
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

<h2>Preparing Gradio UI</h2>

In [None]:
# Launch the interface
iface = gr.Interface(
    fn=translate,
    inputs=[
        gr.Textbox(lines=3, label="Input Text", placeholder="Type here…"),
        gr.Dropdown(choices=from_choices, label="From", value="auto"),
        gr.Dropdown(choices=to_choices,   label="To",   value="en"),
    ],
    outputs=gr.Textbox(label="Translation"),
    title="🌐 Translator ",
    description="Auto-detect or pick one input language, then choose your target language. Transaltion result will be shown in output.",
)

iface.launch()


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f0179d6def478341f4.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


