## Problem Definition & Objective

Language differences often create challenges in communication across regions and cultures. Traditional manual translation methods are slow and prone to errors, making them unsuitable for real-time usage.

The objective of this project is to design an **Intelligent Multilingual Language Translation System** that automatically identifies the input language and translates it into a selected target language. The system also generates voice output to enhance accessibility and user interaction.


## Data Understanding & Preparation

This project does not use a conventional dataset. Instead, it processes user-provided text dynamically in real time.

The input language is automatically detected using the `langdetect` library. Since the NLLB model handles tokenization and language processing internally, no additional data preprocessing is required.


## Model / System Design

The system is developed using Facebook‚Äôs NLLB-200 (No Language Left Behind) model, which supports translation across more than 200 languages.

### Components Used:
- **Language Detection:** langdetect  
- **Translation Model:** facebook/nllb-200-distilled-600M  
- **Text-to-Speech:** gTTS  
- **User Interface:** ipywidgets  


In [1]:
import torch
from transformers import pipeline, AutoTokenizer
from langdetect import detect
from gtts import gTTS
import ipywidgets as widgets
from IPython.display import display, Audio
import os


In [2]:
from transformers import pipeline, AutoTokenizer

translator = pipeline(
    task="translation",
    model="facebook/nllb-200-distilled-600M",
    device=-1  # CPU
)

tokenizer = AutoTokenizer.from_pretrained(
    "facebook/nllb-200-distilled-600M"
)

print("Model loaded successfully")


Device set to use cpu


Model loaded successfully


In [3]:
LANGUAGE_CODES = {
    # Indian
    "English": "eng_Latn",
    "Tamil": "tam_Taml",
    "Hindi": "hin_Deva",
    "Telugu": "tel_Telu",
    "Malayalam": "mal_Mlym",
    "Kannada": "kan_Knda",
    "Marathi": "mar_Deva",
    "Gujarati": "guj_Gujr",
    "Punjabi": "pan_Guru",
    "Bengali": "ben_Beng",
    "Urdu": "urd_Arab",
    "Odia": "ory_Orya",
    "Assamese": "asm_Beng",
    "Nepali": "npi_Deva",
    "Sinhala": "sin_Sinh",

    # Middle East
    "Arabic": "arb_Arab",
    "Persian": "pes_Arab",
    "Hebrew": "heb_Hebr",

    # Europe
    "French": "fra_Latn",
    "German": "deu_Latn",
    "Spanish": "spa_Latn",
    "Portuguese": "por_Latn",
    "Italian": "ita_Latn",
    "Dutch": "nld_Latn",
    "Russian": "rus_Cyrl",
    "Ukrainian": "ukr_Cyrl",
    "Polish": "pol_Latn",
    "Czech": "ces_Latn",
    "Hungarian": "hun_Latn",
    "Romanian": "ron_Latn",
    "Greek": "ell_Grek",

    # East Asia
    "Chinese (Simplified)": "zho_Hans",
    "Chinese (Traditional)": "zho_Hant",
    "Chinese": "zho_Hant",
    "Japanese": "jpn_Jpan",
    "Korean": "kor_Hang",

    # Others
    "Thai": "tha_Thai",
    "Vietnamese": "vie_Latn",
    "Indonesian": "ind_Latn",
    "Turkish": "tur_Latn",
    "Swahili": "swh_Latn",
    "Afrikaans": "afr_Latn"
}

print(f"Loaded {len(LANGUAGE_CODES)} target languages")


Loaded 42 target languages


In [4]:
ISO_TO_NLLB = {
    "en": "eng_Latn",
    "ta": "tam_Taml",
    "hi": "hin_Deva",
    "te": "tel_Telu",
    "ml": "mal_Mlym",
    "kn": "kan_Knda",
    "mr": "mar_Deva",
    "gu": "guj_Gujr",
    "pa": "pan_Guru",
    "bn": "ben_Beng",
    "ur": "urd_Arab",
    "or": "ory_Orya",
    "as": "asm_Beng",
    "ne": "npi_Deva",
    "si": "sin_Sinh",

    "ar": "arb_Arab",
    "fa": "pes_Arab",
    "he": "heb_Hebr",

    "fr": "fra_Latn",
    "de": "deu_Latn",
    "es": "spa_Latn",
    "pt": "por_Latn",
    "it": "ita_Latn",
    "nl": "nld_Latn",
    "ru": "rus_Cyrl",
    "uk": "ukr_Cyrl",
    "pl": "pol_Latn",
    "cs": "ces_Latn",
    "hu": "hun_Latn",
    "ro": "ron_Latn",
    "el": "ell_Grek",

    "zh-cn": "zho_Hans",
    "zh-tw": "zho_Hant",
    "ja": "jpn_Jpan",
    "ko": "kor_Hang",

    "th": "tha_Thai",
    "vi": "vie_Latn",
    "id": "ind_Latn",
    "tr": "tur_Latn",
    "sw": "swh_Latn",
    "af": "afr_Latn"
}


In [5]:
def auto_translate_with_voice(text, target_language):
    detected = detect(text)
    detected = detected.lower().strip()

    print("Detected ISO language:", detected)

    # Handle Chinese
    if detected.startswith("zh"):
        detected = "zh-cn"

    # üî• FALLBACK FIX (THIS IS THE KEY)
    if detected not in ISO_TO_NLLB:
        # If text looks English, force English
        if text.isascii():
            detected = "en"
        else:
            return f"Detected language not supported: {detected}", None

    src_lang = ISO_TO_NLLB[detected]
    tgt_lang = LANGUAGE_CODES[target_language]


    result = translator(
        text,
        src_lang=src_lang,
        tgt_lang=tgt_lang
    )

    translated = result[0]["translation_text"]

    tts = gTTS(translated)
    audio_file = "output.mp3"
    tts.save(audio_file)

    return translated, audio_file


In [6]:
text_box = widgets.Textarea(
    placeholder="üìù Enter text in ANY language",
    layout=widgets.Layout(width="100%", height="120px")
)

target_dropdown = widgets.Dropdown(
    options=sorted(LANGUAGE_CODES.keys()),
    description="Translate To:"
)

translate_button = widgets.Button(
    description="Translate",
    button_style="success",
    icon="language"
)

output = widgets.Output()

def on_translate_clicked(b):
    with output:
        output.clear_output()
        translated, audio = auto_translate_with_voice(
            text_box.value,
            target_dropdown.value
        )

        print("üñ§ Translated Text:\n")
        print(translated)

        if audio:
            display(Audio(audio, autoplay=True))
        else:
            print("üîá Audio not generated")

translate_button.on_click(on_translate_clicked)

display(text_box, target_dropdown, translate_button, output)


Textarea(value='', layout=Layout(height='120px', width='100%'), placeholder='üìù Enter text in ANY language')

Dropdown(description='Translate To:', options=('Afrikaans', 'Arabic', 'Assamese', 'Bengali', 'Chinese', 'Chine‚Ä¶

Button(button_style='success', description='Translate', icon='language', style=ButtonStyle())

Output()

## Results & Output

The intelligent translation system successfully:
- Detects the input language automatically  
- Translates text into the selected target language  
- Generates voice output for the translated content  

The system supports multiple Indian and international languages, enabling effective multilingual communication.


## Conclusion

This project demonstrates the successful implementation of an intelligent multilingual language translation system. By integrating automatic language detection, neural machine translation, and voice output, the system improves both usability and accessibility.

The solution can be further enhanced by incorporating speech-to-text functionality and deploying the system as a web-based application.


## Ethical Considerations & Responsible AI

- User input is not stored, ensuring privacy protection  
- Translation accuracy may vary, especially for low-resource languages  
- The system should not be used for sensitive or legal translations without human review  


## References

- https://huggingface.co/facebook/nllb-200-distilled-600M  
- https://huggingface.co  
- https://pypi.org/project/gTTS/  
- https://pypi.org/project/langdetect/  
