# üöâ IndicRail-Translate: Domain-Specific NMT System
**Major Project | Gati Shakti Vishwavidyalaya**

**Objective:** Neural Machine Translation for Technical Railway & Logistics Documentation (English to Hindi/Marathi).
**Model:** Meta's M2M-100 (Many-to-Many Transformer)

In [None]:
# 1. Installing Libraries
!pip install transformers sentencepiece accelerate nltk -q

# 2. Downloading NLTK Resources (Fixing the punkt_tab error)
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')

import torch
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
from nltk.tokenize import sent_tokenize

print("‚úÖ Setup Complete: All libraries and resources loaded.")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


‚úÖ Setup Complete: All libraries and resources loaded.


In [None]:
class IndicRailTranslator:
    def __init__(self):
        self.model_name = "facebook/m2m100_418M"
        self.device = "cuda" if torch.cuda.is_available() else "cpu"

        print(f"üöÄ Loading model on {self.device}...")
        self.tokenizer = M2M100Tokenizer.from_pretrained(self.model_name)
        self.model = M2M100ForConditionalGeneration.from_pretrained(self.model_name).to(self.device)
        print("‚úÖ Neural Engine Loaded Successfully")

    def translate_process(self, text, target_lang):
        if not text.strip(): return "Error: Input is empty."

        lang_code = "hi" if target_lang == "Hindi" else "mr"
        sentences = sent_tokenize(text)
        results = []

        for sent in sentences:
            self.tokenizer.src_lang = "en"
            inputs = self.tokenizer(sent, return_tensors="pt").to(self.device)
            output_tokens = self.model.generate(
                **inputs,
                forced_bos_token_id=self.tokenizer.get_lang_id(lang_code),
                num_beams=5,
                early_stopping=True
            )
            decoded = self.tokenizer.batch_decode(output_tokens, skip_special_tokens=True)[0]
            results.append(decoded)
        return " ".join(results)

# Initialize
engine = IndicRailTranslator()

üöÄ Loading model on cuda...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


‚úÖ Neural Engine Loaded Successfully


In [None]:
# @title üöâ INDICRAIL-TRANSLATE v1.0 { display-mode: "form" }
# @markdown Enter your technical text below and click the 'Play' button.

English_Text = "Railway signaling systems must be checked every month for passenger safety." # @param {type:"string"}
Target_Language = "Hindi" # @param ["Hindi", "Marathi"]

if English_Text:
    print("‚è≥ Processing Technical Translation...")
    # Fix: Matching the function name with the class above
    output = engine.translate_process(English_Text, Target_Language)

    print("\n" + "="*70)
    print(f"üìÇ SOURCE (EN): {English_Text}")
    print("-" * 70)
    print(f"üéØ TRANSLATED ({Target_Language}): {output}")
    print("="*70)

‚è≥ Processing Technical Translation...

üìÇ SOURCE (EN): Railway signaling systems must be checked every month for passenger safety.
----------------------------------------------------------------------
üéØ TRANSLATED (Hindi): ‡§Ø‡§æ‡§§‡•ç‡§∞‡§ø‡§Ø‡•ã‡§Ç ‡§ï‡•Ä ‡§∏‡•Å‡§∞‡§ï‡•ç‡§∑‡§æ ‡§ï‡•á ‡§≤‡§ø‡§è ‡§∞‡•á‡§≤‡§µ‡•á ‡§∏‡§ø‡§ó‡•ç‡§®‡§≤ ‡§∏‡§ø‡§∏‡•ç‡§ü‡§Æ ‡§ï‡•ã ‡§π‡§∞ ‡§Æ‡§π‡•Ä‡§®‡•á ‡§ú‡§æ‡§Ç‡§ö ‡§ï‡•Ä ‡§ú‡§æ‡§®‡•Ä ‡§ö‡§æ‡§π‡§ø‡§è‡•§


In [None]:
# Test the brain first
test_text = "The railway station is under renovation."
try:
    hindi_res = engine.translate_process(test_text, "Hindi")
    print(f"‚úÖ Engine is Working!\nInput: {test_text}\nOutput: {hindi_res}")
except Exception as e:
    print(f"‚ùå Engine Error: {e}")

‚úÖ Engine is Working!
Input: The railway station is under renovation.
Output: ‡§∞‡•á‡§≤‡§µ‡•á ‡§∏‡•ç‡§ü‡•á‡§∂‡§® ‡§ï‡•Ä ‡§Æ‡§∞‡§Æ‡•ç‡§Æ‡§§ ‡§ï‡•Ä ‡§ú‡§æ ‡§∞‡§π‡•Ä ‡§π‡•à‡•§


In [None]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [None]:
# @title üöâ IndicRail-Translate Dashboard
# @markdown Paste your English text and select the language. Then click the play button.

English_Text = "The Indian Railways is modernizing its infrastructure with high-speed corridors." # @param {type:"string"}
Target_Language = "Hindi" # @param ["Hindi", "Marathi"]

# 1. Download required NLTK resources
import nltk
nltk.download('punkt', quiet=True)
nltk.download('punkt_tab', quiet=True)

# 2. Run Translation
try:
    print("üöÄ Translating... Please wait.")
    output = engine.translate_process(English_Text, Target_Language)

    print("\n" + "="*50)
    print(f"SOURCE (English): {English_Text}")
    print(f"TRANSLATED ({Target_Language}): {output}")
    print("="*50)
except Exception as e:
    print(f"‚ùå Error: {e}")

üöÄ Translating... Please wait.

SOURCE (English): The Indian Railways is modernizing its infrastructure with high-speed corridors.
TRANSLATED (Hindi): ‡§≠‡§æ‡§∞‡§§‡•Ä‡§Ø ‡§∞‡•á‡§≤‡§µ‡•á ‡§Ö‡§™‡§®‡•Ä ‡§¨‡•Å‡§®‡§ø‡§Ø‡§æ‡§¶‡•Ä ‡§¢‡§æ‡§Ç‡§ö‡•á ‡§ï‡•ã ‡§â‡§ö‡•ç‡§ö ‡§ó‡§§‡§ø ‡§µ‡§æ‡§≤‡•á ‡§ó‡§≤‡§ø‡§Ø‡§æ‡§∞‡•á ‡§ï‡•á ‡§∏‡§æ‡§• ‡§Ü‡§ß‡•Å‡§®‡§ø‡§ï‡•Ä‡§ï‡§∞‡§£ ‡§ï‡§∞ ‡§∞‡§π‡§æ ‡§π‡•à‡•§


In [None]:
!pip uninstall gradio -y
!pip install gradio==3.50.2 --no-cache-dir

Found existing installation: gradio 4.44.1
Uninstalling gradio-4.44.1:
  Successfully uninstalled gradio-4.44.1
Collecting gradio==3.50.2
  Downloading gradio-3.50.2-py3-none-any.whl.metadata (17 kB)
Collecting gradio-client==0.6.1 (from gradio==3.50.2)
  Downloading gradio_client-0.6.1-py3-none-any.whl.metadata (7.1 kB)
Downloading gradio-3.50.2-py3-none-any.whl (20.3 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m20.3/20.3 MB[0m [31m86.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading gradio_client-0.6.1-py3-none-any.whl (299 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m299.2/299.2 kB[0m [31m216.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gradio-client, gradio
  Attempting uninstall: gradio-client
    Found existing installation: gradio_client 1.3.0
    Uninstal

In [None]:
import gradio as gr

# 1. Wrapper function jo UI aur Engine ko joddta hai
def translate_wrapper(text, language):
    if not text.strip():
        return "Bhai, kuch text toh daalo!"
    # 'engine' humne upar class mein initialize kiya tha
    return engine.translate_process(text, language)

# 2. Interface Design (Tera Fav Purana Style + Themes)
with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown("# üöâ IndicTranslate AI: GSV Major Project")
    gr.Markdown("Technical Railway Documents ko English se Hindi/Marathi mein translate karein.")

    with gr.Row():
        with gr.Column():
            input_box = gr.Textbox(
                label="English Article",
                placeholder="Technical text yahan paste karein...",
                lines=10
            )
            lang_choice = gr.Radio(["Hindi", "Marathi"], label="Bhasha Chunein", value="Hindi")
            btn = gr.Button("Translate ‚ú®", variant="primary")

        with gr.Column():
            output_box = gr.Textbox(label="Translated Output", lines=12, interactive=False)

    # Logic Connection
    btn.click(fn=translate_wrapper, inputs=[input_box, lang_choice], outputs=output_box)

# 3. Launch with Public Link
# 'share=True' se tujhe ek public link milega jo tu sabko bhej sakta hai
demo.launch(share=True, debug=True)

IMPORTANT: You are using gradio version 3.50.2, however version 4.44.1 is available, please upgrade.
--------
Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://21404fa7245338eaa4.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
