**Step-1 Install all important libraries for Translation**

1.   transformers
2.   sentencepiece

In [8]:
!pip install transformers -U -q

In [9]:
!pip install sentencepiece



**Step-2 Pre-process the Sample dataset**

> To preprocess the NLP data we need to tokenize it using predefined tokenizers.

**Tokenization:**
1.   MBartForConditionalGeneration
2.   MBart50TokenizerFast


In [10]:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

In [11]:
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-one-to-many-mmt")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.44G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/261 [00:00<?, ?B/s]

**Step-3 Define Tokenizer class**

In [12]:
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-one-to-many-mmt", src_lang = "en_XX")

Downloading (…)okenizer_config.json:   0%|          | 0.00/528 [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/717 [00:00<?, ?B/s]

**Step-4 Enter Your Sample Text**

In [13]:
input_text = ['Definitely share your feedback in the comment section.']

In [20]:
input_text = ["So even if it's a big video, I will clearly mention all the products."]

In [25]:
input_text = ["I was waiting for my bag."]

# Define Model Input


In [26]:
model_input = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)

**Step-5 Translate The Sample Text**


> Translation:  English -> Hindi

In [27]:
generated_tokens = model.generate(
    **model_input,
    forced_bos_token_id = tokenizer.lang_code_to_id["hi_IN"]
)


In [28]:
translate = tokenizer.batch_decode(generated_tokens, skip_special_tokens = True)

Sample Translation

In [19]:
translate

['निश्चित रूप से अपनी राय टिप्पणी सेक्शन में साझा करें।']

In [24]:
translate

['तो अगर यह एक बड़ा वीडियो है, मैं स्पष्ट रूप से सभी उत्पादों का उल्लेख करेंगे।']

In [29]:
translate

['मैं अपने बैग की प्रतीक्षा कर रहा था।']

**Step-6 Evaluation**


1.   ROUGE (Recall-Oriented Understudy for Gisting Evaluation)
2.   BLEU (Bilingual Evaluation Understudy)


In [30]:
from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.bleu_score import SmoothingFunction

def rouge_n(hypothesis, reference, n):
    smoothie = SmoothingFunction().method4
    return sentence_bleu([reference], hypothesis, smoothing_function=smoothie, weights=[1/n] * n)

def rouge_l(hypothesis, reference):
    smoothie = SmoothingFunction().method4
    return sentence_bleu([reference], hypothesis, smoothing_function=smoothie)

# Example usage:
hypothesis = "Definitely share your feedback in the comment section."
reference = "निश्चित रूप से अपनी राय टिप्पणी सेक्शन में साझा करें।"
rouge1_score = rouge_n(hypothesis, reference, 1)
rouge2_score = rouge_n(hypothesis, reference, 2)
rougeL_score = rouge_l(hypothesis, reference)

print("ROUGE-1:", rouge1_score)
print("ROUGE-2:", rouge2_score)
print("ROUGE-L:", rougeL_score)


ROUGE-1: 0.12962962962962962
ROUGE-2: 0.03123527651806008
ROUGE-L: 0.009248854568215463


In [31]:
from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.bleu_score import SmoothingFunction

def calculate_bleu(hypothesis, reference):
    smoothie = SmoothingFunction().method4
    return sentence_bleu([reference], hypothesis, smoothing_function=smoothie)

# Example usage:
hypothesis = "Definitely share your feedback in the comment section."
reference = "निश्चित रूप से अपनी राय टिप्पणी सेक्शन में साझा करें।"
bleu_score = calculate_bleu(hypothesis.split(), reference.split())

print("BLEU Score:", bleu_score)


BLEU Score: 0
