# **Task 1: Load a pre-trained LSTM-based NMT model and use it to translate a sentence**
Load a Pre-trained NMT Model:
Use a library like Hugging Face's transformers or fairseq to load a pre-trained NMT model.
Translate a given sentence from one language to another.

---



In [1]:
from transformers import MarianMTModel, MarianTokenizer

# Load the pre-trained model and tokenizer
model_name = 'Helsinki-NLP/opus-mt-en-fr'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# Function to translate text
def translate(text, model, tokenizer):
    inputs = tokenizer.encode(text, return_tensors='pt')
    outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)
    translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return translated_text

# Translate a sentence
sentence = "Hello, how are you?"
translated_sentence = translate(sentence, model, tokenizer)
print(translated_sentence)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Bonjour, comment ça va?


# **Task 2: Implement beam search decoding for an NMT model**
# Beam Search Decoding: **bold text**
Modify the translation function to use beam search for improved translation quality.

In [2]:
def translate_with_beam_search(text, model, tokenizer, num_beams=5):
    inputs = tokenizer.encode(text, return_tensors='pt')
    outputs = model.generate(inputs, max_length=40, num_beams=num_beams, early_stopping=True)
    translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return translated_text

# Translate a sentence using beam search
translated_sentence_beam_search = translate_with_beam_search(sentence, model, tokenizer)
print(translated_sentence_beam_search)


Bonjour, comment allez-vous?


# **Task 3: Translate French to Tamil with a 5-letter word constraint**
French to Tamil Translation with 5-letter Constraint:
Use a dictionary or pre-trained French to Tamil model.
Check if the French word has exactly five letters before translating.

In [3]:
french_tamil_dict = {
    'bonjour': 'வணக்கம்',  # Example dictionary
    'salut': 'வணக்கம்'
}

def translate_french_to_tamil(word):
    if len(word) == 5:
        if word in french_tamil_dict:
            return french_tamil_dict[word]
        else:
            return "Word not available"
    else:
        return "Only 5-letter words are translated"

# Translate a 5-letter French word
word = "salut"
translated_word = translate_french_to_tamil(word)
print(translated_word)


வணக்கம்


# **Task 4: Error handling and suggestions for incorrect words**
Error Handling with Suggestions:
Implement error handling for incorrect words and provide suggestions.
Track wrong words and display them after two consecutive errors.

In [4]:
from difflib import get_close_matches

wrong_words = []

def handle_errors(word, dictionary):
    if word in dictionary:
        return dictionary[word]
    else:
        suggestions = get_close_matches(word, dictionary.keys())
        wrong_words.append(word)
        error_message = f"Word '{word}' is not available. Suggestions: {suggestions}"
        if len(wrong_words) >= 2:
            error_message += f"\nConsecutive wrong words: {wrong_words}"
        return error_message

# Example dictionary
dictionary = {'hello': 'bonjour', 'goodbye': 'au revoir'}

# Handle errors
word = "helo"
translation_or_error = handle_errors(word, dictionary)
print(translation_or_error)


Word 'helo' is not available. Suggestions: ['hello']


# **Task 5: Dual language translation for specific word lengths**
Translate English to French and Hindi for 10-letter words:
Use pre-trained models for English to French and English to Hindi translations.
Ensure the word length is exactly 10.

In [5]:
from transformers import MarianMTModel, MarianTokenizer

model_name_fr = 'Helsinki-NLP/opus-mt-en-fr'
model_name_hi = 'Helsinki-NLP/opus-mt-en-hi'
tokenizer_fr = MarianTokenizer.from_pretrained(model_name_fr)
model_fr = MarianMTModel.from_pretrained(model_name_fr)
tokenizer_hi = MarianTokenizer.from_pretrained(model_name_hi)
model_hi = MarianMTModel.from_pretrained(model_name_hi)

def dual_translate(word):
    if len(word) == 10:
        translated_fr = translate(word, model_fr, tokenizer_fr)
        translated_hi = translate(word, model_hi, tokenizer_hi)
        return translated_fr, translated_hi
    else:
        return "Word length is not 10 letters"

# Translate a 10-letter English word
word = "translation"
translated_words = dual_translate(word)
print(translated_words)


tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/812k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/1.07M [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.10M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/306M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Word length is not 10 letters


# **Task 6: Audio to Hindi translation with time and letter constraints**
Audio Translation to Hindi with Time Constraints:
Use a speech recognition library to convert audio to text.
Check the time and initial letter constraints before translation.

In [None]:
!pip install pyaudio==0.2.11
!pip install SpeechRecognition==3.10.0
!pip install transformers==4.31.0
!pip install pyaudio SpeechRecognition transformers
!pip install sounddevice wavio

In [3]:
import speech_recognition as sr
from datetime import datetime
from transformers import MarianMTModel, MarianTokenizer

# Load the pre-trained English to Hindi translation model
model_name_hi = 'Helsinki-NLP/opus-mt-en-hi'
tokenizer_hi = MarianTokenizer.from_pretrained(model_name_hi)
model_hi = MarianMTModel.from_pretrained(model_name_hi)

def translate(text, model, tokenizer):
    inputs = tokenizer.encode(text, return_tensors='pt')
    outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)
    translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return translated_text

def is_valid_time():
    current_hour = datetime.now().hour
    return current_hour >= 18

def get_audio_input():
    recognizer = sr.Recognizer()
    with sr.Microphone() as source:
        print("Listening...")
        audio = recognizer.listen(source)

    try:
        text = recognizer.recognize_google(audio)
        print(f"Recognized text: {text}")
        return text
    except sr.UnknownValueError:
        return None
    except sr.RequestError as e:
        print(f"Could not request results; {e}")
        return None

def translate_audio_to_hindi():
    if not is_valid_time():
        return "Please try after 6 PM IST"

    text = get_audio_input()

    if text is None:
        print("Could not understand the audio, please repeat")
        text = get_audio_input()
        if text is None:
            return "Could not understand the audio after second attempt, please try again later"

    if text[0].lower() in ['m', 'o']:
        return "Words starting with 'M' and 'O' are not translated"

    translated_text = translate(text, model_hi, tokenizer_hi)
    return translated_text

# Translate audio to Hindi
translated_audio_text = translate_audio_to_hindi()
print(translated_audio_text)

Please try after 6 PM IST


# **Task 7: English to Hindi translation with vowel constraints**
Translate English to Hindi, blocking words starting with vowels outside a specific time:
Implement the constraint check for words starting with vowels.
Allow translations only between 9 PM and 10 PM for words starting with vowels.

In [4]:
def is_valid_vowel_time():
    current_hour = datetime.now().hour
    current_minute = datetime.now().minute
    return current_hour == 21 and current_minute <= 59

def translate_english_to_hindi(word):
    if word[0].lower() in 'aeiou':
        if is_valid_vowel_time():
            translated_word = translate(word, model_hi, tokenizer_hi)
            return translated_word
        else:
            return "This word starts with a vowel. Provide another word or try between 9 PM to 10 PM."
    else:
        translated_word = translate(word, model_hi, tokenizer_hi)
        return translated_word

# Translate an English word to Hindi with constraints
word = "apple"
translated_word = translate_english_to_hindi(word)
print(translated_word)


This word starts with a vowel. Provide another word or try between 9 PM to 10 PM.


# **Explanation**
Load the Translation Model:

The MarianMT model for English to Hindi translation is loaded using transformers.
Define Translation Function:

translate(text, model, tokenizer) function to translate English text to Hindi.
Time Validation:

is_valid_time() function to check if the current time is after 6 PM IST.
Get Audio Input:

get_audio_input() function captures audio using the microphone and uses Google Web Speech API to recognize the text.
If the audio is not understood, it returns None.
Translate Audio to Hindi:

translate_audio_to_hindi() function handles the entire process.
Checks if the current time is valid.
Captures audio input and tries to recognize it.
If the recognized text starts with 'M' or 'O', it does not translate it.
If the recognition fails the first time, it prompts the user to repeat the audio once more.
If the second attempt also fails, it returns an error message.
# **Running the Script**
Ensure you have a microphone connected to your system and the necessary permissions to access it. Run the script, and it will listen to your audio input, recognize the text, and translate it to Hindi based on the specified conditions. If the current time is before 6 PM IST, it will prompt you to try again after 6 PM IST.