In [19]:
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
from IPython.display import Markdown

In [2]:
# Model and Tokenizer
model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")

config.json: 100%|██████████| 908/908 [00:00<00:00, 2.27MB/s]
pytorch_model.bin: 100%|██████████| 1.94G/1.94G [02:43<00:00, 11.8MB/s]
generation_config.json: 100%|██████████| 233/233 [00:00<00:00, 1.03MB/s]
tokenizer_config.json: 100%|██████████| 298/298 [00:00<00:00, 1.11MB/s]
vocab.json: 100%|██████████| 3.71M/3.71M [00:01<00:00, 2.89MB/s]
sentencepiece.bpe.model: 100%|██████████| 2.42M/2.42M [00:00<00:00, 15.8MB/s]
special_tokens_map.json: 100%|██████████| 1.14k/1.14k [00:00<00:00, 8.04MB/s]


In [14]:
import pandas as pd

# Original sentences
original_en = [
    "A picture is worth a thousand words.",
    "The pen is mightier than the sword.",
    "You can't judge a book by its cover.",
    "Two wrongs don't make a right.",
    "The grass is always greener on the other side.",
    "The best way to predict the future is to invent it.",
    "It's not a bug, it's a feature.",
    "Any sufficiently advanced technology is indistinguishable from magic.",
    "Technology is a useful servant but a dangerous master.",
    "The advance of technology is based on making it fit in so that you don't really even notice it, so it's part of everyday life.",
]

# Translate to Hindi
original_encoded = tokenizer(original_en, return_tensors="pt", padding=True)
generated_tokens = model.generate(
    **original_encoded, forced_bos_token_id=tokenizer.get_lang_id("hi")
)
target_hi = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

# Translate back to English
target_encoding = tokenizer(target_hi, return_tensors="pt", padding=True)
generated_tokens = model.generate(
    **target_encoding, forced_bos_token_id=tokenizer.get_lang_id("en")
)
back_translated_en = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

df = pd.DataFrame(
    {
        "Original (English)": original_en,
        "Translated (Hindi)": target_hi,
        "Back-translated (English)": back_translated_en,
    }
)

In [20]:
Markdown(df.to_markdown(index=False))

| Original (English)                                                                                                             | Translated (Hindi)                                                                     | Back-translated (English)                                                                                             |
|:-------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------|
| A picture is worth a thousand words.                                                                                           | एक तस्वीर एक हजार शब्दों के लायक है।                                                            | A picture is worth a thousand words.                                                                                  |
| The pen is mightier than the sword.                                                                                            | पेंसिल तलवार की तुलना में मजबूत है।                                                               | The pencil is stronger than the sword.                                                                                |
| You can't judge a book by its cover.                                                                                           | आप एक किताब को उसके कवर से न्याय नहीं कर सकते।                                                   | You cannot judge a book by its cover.                                                                                 |
| Two wrongs don't make a right.                                                                                                 | दो गलतियां एक सही नहीं बनाती हैं।                                                                 | Two mistakes do not make one right.                                                                                   |
| The grass is always greener on the other side.                                                                                 | घास हमेशा दूसरी तरफ ग्रीन होता है।                                                               | The grass is always green on the other side.                                                                          |
| The best way to predict the future is to invent it.                                                                            | भविष्य की भविष्यवाणी करने का सबसे अच्छा तरीका यह है कि इसे खोजें।                                          | The best way to predict the future is to find it.                                                                     |
| It's not a bug, it's a feature.                                                                                                | यह एक बग नहीं है, यह एक विशेषता है।                                                           | This is not a bug, it is a feature.                                                                                   |
| Any sufficiently advanced technology is indistinguishable from magic.                                                          | किसी भी पर्याप्त रूप से उन्नत प्रौद्योगिकी जादू से अलग नहीं है।                                             | No sufficiently advanced technology is different from magic.                                                          |
| Technology is a useful servant but a dangerous master.                                                                         | प्रौद्योगिकी एक उपयोगी दास है, लेकिन एक खतरनाक मास्टर है।                                              | Technology is a useful slave, but a dangerous master.                                                                 |
| The advance of technology is based on making it fit in so that you don't really even notice it, so it's part of everyday life. | प्रौद्योगिकी की प्रगति इसे फिट करने पर आधारित है ताकि आप वास्तव में इसे भी नोटिस न करें, इसलिए यह दैनिक जीवन का हिस्सा है। | The progress of technology is based on fiting it so that you really don’t even notice it, so it’s part of daily life. |