**Language Translation using BERT :**

For the language translation app, the model we use is mBART(Multilingual BART)

**mBART** is a sequence-to-sequence model that was specifically designed to work with multiple languages. It has been pre-trained on 50 languages and can translate between them. This is exactly what you need when building a translation system that supports multiple languages, as you can translate any language to any other language.

Import necessary libraries

In [14]:
import torch
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

- **MBartForConditionalGeneration** : This is the model class for mBART, which is used for tasks that involve generating a sequence of text based on an input sequence. It is particularly used for sequence-to-sequence tasks, such as machine translation.

- **MBart50TokenizerFast** :This is the tokenizer class that converts the input text into tokenized form that can be fed into the MBartForConditionalGeneration model. Tokenization is the process of splitting text into smaller units (tokens) that the model can understand, and the MBart50TokenizerFast is a fast, optimized version of the tokenizer for mBART.

In [20]:
#Initialize model and tokenizer
model_name = "facebook/mbart-large-50-many-to-many-mmt"
tokenizer = MBart50TokenizerFast.from_pretrained(model_name)

In [21]:
model = MBartForConditionalGeneration.from_pretrained(model_name)

In [22]:
def translate(text, source_lang, target_lang):
    """
    Translate the given text from source language to target language using mBART.
    """
    tokenizer.src_lang = source_lang  # Set source language (e.g., 'en_XX' for English)
    model_inputs = tokenizer(text, return_tensors="pt")

    # Translate the input text
    translated = model.generate(**model_inputs, forced_bos_token_id=tokenizer.lang_code_to_id[target_lang])

    # Decode the translated text
    translated_text = tokenizer.decode(translated[0], skip_special_tokens=True)

    return translated_text

In [23]:
# Example usage
translated_text = translate("Hello, how are you?", "en_XX", "mr_IN")
print(translated_text)

नमस ् ते, आपण कसे आहात?


In [24]:
# Example usage for Hindi to English
translated_text_hindi = translate("नमस्कार, आप कैसे हैं?", "hi_IN", "en_XX")
print(translated_text_hindi)

Hello, how are you?


# Save the model an tokenizer using pickle

In [25]:
import pickle
with open('tokenizer.pkl', "wb") as tokenizer_file :
  pickle.dump(tokenizer, tokenizer_file)

In [26]:
#This is the standard way to save PyTorch models.
torch.save(model.state_dict(), "model_weights.pth")