**Public Information Machine Translation (EN ↔ HI/MR)**

**Installation & Setup**



In [4]:
# 1. Install required libraries
!pip install -q transformers sentencepiece torch pandas openpyxl

import torch
import pandas as pd
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

print(f"Setup Complete. GPU Available: {torch.cuda.is_available()}")

Setup Complete. GPU Available: False


**Load the AI Model**

We load mBART-50, a transformer-based model. Think of this model as having a "Universal Encoder" that understands the meaning of English and separate "Decoders" for Hindi and Marathi.

In [5]:
# 2. Load the model and tokenizer
model_name = "facebook/mbart-large-50-many-to-many-mmt"
tokenizer = MBart50TokenizerFast.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)

# Use GPU (cuda) for faster processing
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/529 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/649 [00:00<?, ?B/s]



config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/2.44G [00:00<?, ?B/s]

Loading weights:   0%|          | 0/516 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/261 [00:00<?, ?B/s]

MBartForConditionalGeneration(
  (model): MBartModel(
    (shared): MBartScaledWordEmbedding(250054, 1024, padding_idx=1)
    (encoder): MBartEncoder(
      (embed_tokens): MBartScaledWordEmbedding(250054, 1024, padding_idx=1)
      (embed_positions): MBartLearnedPositionalEmbedding(1026, 1024)
      (layers): ModuleList(
        (0-11): 12 x MBartEncoderLayer(
          (self_attn): MBartAttention(
            (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
          )
          (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (activation_fn): ReLU()
          (fc1): Linear(in_features=1024, out_features=4096, bias=True)
          (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        

**Generate the Data & Logic**

we define a list of common public info messages. We then create a function that takes these messages and translates them into both target languages.


In [6]:
# A. Generate sample public information data
data = {
    'English_Content': [
        "Please follow the safety guidelines issued by the health department.",
        "Heavy rainfall is expected in the coastal areas tomorrow.",
        "The public library will remain closed on Sunday for maintenance.",
        "Always wear a helmet while riding a motorcycle.",
        "New vaccination centers are opening in your local district."
    ]
}
df = pd.DataFrame(data)

# B. Translation Function
def translate_info(text, target_lang_code):
    tokenizer.src_lang = "en_XX"
    encoded_en = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device)

    with torch.no_grad():
        generated_tokens = model.generate(
            **encoded_en,
            forced_bos_token_id=tokenizer.lang_code_to_id[target_lang_code]
        )
    return tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]

**Execute Full Translation**

This cell iterates through the generated list and creates new columns for Hindi and Marathi. It uses the apply method, which is the standard way to process dataframes in Python.

In [7]:
# 4. Apply translation to both languages
print("Translating data to Hindi and Marathi... Please wait.")

# Generate Hindi Column
df['Hindi_Translation'] = df['English_Content'].apply(lambda x: translate_info(x, "hi_IN"))

# Generate Marathi Column
df['Marathi_Translation'] = df['English_Content'].apply(lambda x: translate_info(x, "mr_IN"))

print("Translation Complete.")

Translating data to Hindi and Marathi... Please wait.
Translation Complete.


**Display & Save Results**

This displays the final table in the notebook and saves it to an Excel file which you can download from the Colab file explorer on the left.



In [8]:
# 5. Show and Save
# Formatting the output for better readability
pd.set_option('display.max_colwidth', None)
print("\n--- TRANSLATED PUBLIC INFO TABLE ---")
display(df)

# Export to Excel
df.to_excel("Public_Info_Multilingual.xlsx", index=False)
print("\nFile 'Public_Info_Multilingual.xlsx' has been created in your files folder.")


--- TRANSLATED PUBLIC INFO TABLE ---


Unnamed: 0,English_Content,Hindi_Translation,Marathi_Translation
0,Please follow the safety guidelines issued by the health department.,कृपया स्वास्थ्य विभाग द्वारा जारी सुरक्षा दिशानिर्देशों का पालन करें।,कृपया आरोग ् य विभागाकडून निर ् माण केलेल ् या सुरक ् षिततेच ् या शिष ् टाचार ् यांचा पालन करा.
1,Heavy rainfall is expected in the coastal areas tomorrow.,कल तटीय क्षेत्रों में भारी वर्षा की संभावना है।,कल समुद ् रकिनाऱ ् यातील प ् रदेशात घनदाट पावसाचं अपेक ् षा आहे.
2,The public library will remain closed on Sunday for maintenance.,सार्वजनिक पुस्तकालय रखरखाव के लिए रविवार को बंद रहेगा।,सामान ् य ग ् रंथालय सेवा पूर ् ण करण ् यासाठी त ् याचं नियमबद ् ध सप ् ताण सुरु आहे.
3,Always wear a helmet while riding a motorcycle.,मोटरसाइकिल चलाते समय हमेशा हेमलेट पहनें।,always wear a helmet while riding a motorcycle.
4,New vaccination centers are opening in your local district.,आपके स्थानीय जिले में नए टीकाकरण केंद्र खुल रहे हैं।,तुमच ् या स ् थानिक भागात नवीन वैद ् यकीय कार ् यक ् रम सुरु झाले आहेत.



File 'Public_Info_Multilingual.xlsx' has been created in your files folder.
