## Task 1: Building a Question Answering and Translation System

Hey everyone! For this project, I built a system that can answer questions and then translate those answers into Hindi. I used some great libraries from the `transformers` library by Hugging Face.

Let's break down how we're doing this, step by step!

First, we need to import the necessary tools from the `transformers` library. We'll need `pipeline` for easily using pre-trained models, `AutoModelForQuestionAnswering` and `AutoTokenizer` for our question answering part, and `AutoModelForSeq2SeqLM` for our translation part.

In [20]:
# Import required libraries
from transformers import pipeline
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
from transformers import AutoModelForSeq2SeqLM

Next, we're setting up our Question Answering (QA) model. We're using a pre-trained model called "deepset/roberta-base-squad2" which is good at answering questions based on a given text as it is trained using SQuAD 2.0 dataset which includes unanswerable questions,and it also achieved high Exact Match and F1 scores in that dataset.

The `pipeline()` function is helpful here because it takes care of loading both the model and its tokenizer (which helps the model understand the text) in one go. This makes things much simpler!

In [21]:
# Load the Question Answering (QA) model

# We specify the task 'question-answering'
# and the model name we chose.
qa_model_name = "deepset/roberta-base-squad2"

# The pipeline() function handles loading the model and tokenizer for us.
# This is much easier than loading them separately.
qa_pipeline = pipeline('question-answering', model=qa_model_name, tokenizer=qa_model_name)

Device set to use cuda:0


Now, we need to load our Translation model. We want to translate from English to Hindi, so we're using a model specifically trained for this from the Helsinki-NLP group called "Helsinki-NLP/opus-mt-en-hi" which I also used in PartB of this Task.

Unlike the QA model where we used `pipeline`, here we're loading the tokenizer and the model separately using `AutoTokenizer.from_pretrained()` and `AutoModelForSeq2SeqLM.from_pretrained()`.

In [22]:
# Load the Translation (English to Hindi) model

# We need an English-to-Hindi model.
# The Helsinki-NLP opus-mt models are great for this.
trans_model_name = "Helsinki-NLP/opus-mt-en-hi"

# Load the tokenizer and model separately, like we did in Part B
trans_tokenizer = AutoTokenizer.from_pretrained(trans_model_name)
trans_model = AutoModelForSeq2SeqLM.from_pretrained(trans_model_name)



This is where we put it all together! We've created a Python function called `get_hindi_answer`

This function takes an English question and some text (the context) as input.

Inside the function:
1. It first uses our QA pipeline (`qa_pipeline`) to find the answer to the English question within the given context.
2. Then, it takes the English answer it found and uses the translation model (`trans_model`) and its tokenizer (`trans_tokenizer`) to translate it into Hindi.
3. Finally, it prints out the original question, context, the English answer, and the translated Hindi answer.

In [28]:
# Create a function to combine both the stages

def get_hindi_answer(question, context):
    """
    Takes an English question and context, finds the English answer,
    and translates it to Hindi.
    """

    print(f"Original Question: {question}")
    print(f"Original Context: {context[:50]}...") # Prints first 50 chars only

    #Step 1: Get english answer
    qa_result = qa_pipeline(question=question, context=context)
    english_answer = qa_result['answer']

    print(f"Found English Answer: {english_answer}")

    # Step 2: Translate to hindi

    # Tokenize the english answer
    # We put it in a list [english_answer] because the translation model
    # expects a batch of sentences.
    inputs = trans_tokenizer([english_answer], return_tensors="pt", padding=True, truncation=True)

    # Generate translation
    outputs = trans_model.generate(**inputs)

    # Decode translation, and output[0] because only one entry in the list
    hindi_answer = trans_tokenizer.decode(outputs[0], skip_special_tokens=True)

    print(f"Hindi Answer: {hindi_answer}")
    return hindi_answer

To Test or see the system working, we're running a little demonstration.

We define a sample English question and context. Then, we call our `get_hindi_answer` function with these inputs.

The output shows the steps our function takes and the final Hindi answer it produces. It looks like it's working!

In [29]:
# Run a demonstration

# define our example question and context
demo_question = "What was Leo's secret fear?"
demo_context = 'Leo the lion was known throughout the savanna for his majestic roar, which could be heard for miles. However, Leo had a secret: he was afraid of the dark. Every night, as the sun dipped below the horizon, he would quietly retreat to his cave and curl up with his favorite, worn-out toy, a stuffed giraffe named Gary. The other animals never knew of his fear, and Leo hoped they never would. His reputation as the bravest animal depended on it.'

print("---- Running System Demonstration -----")
print("") # Add a blank line for spacing

#Call our function with the demo inputs
final_answer = get_hindi_answer(question=demo_question, context=demo_context)

---- Running System Demonstration -----

Original Question: What was Leo's secret fear?
Original Context: Leo the lion was known throughout the savanna for ...
Found English Answer: the dark
Hindi Answer: अंधेरे


Let's Try with another similar example


In [30]:
# Run a demonstration2

# define our example question and context
demo_question = "What was Elias's secret and what was the true source of his bravery?"
demo_context = 'The old lighthouse keeper, Elias, was known for his stoic demeanor and unwavering commitment to his post. For forty years, he had kept the light burning brightly through the most ferocious storms, never once faltering in his duty. The fishermen swore that the lighthouse was protected by Elias\'s own iron will. But in truth, Elias\'s hands shook with a terrible fear every time a storm approached. The thunder and lightning reminded him of the shipwreck that had claimed his family long ago, and he feared that one day, he would fail and another vessel would meet the same fate. His bravery was not the absence of fear, but a constant, silent battle against it.'

print("---- Running System Demonstration -----")
print("") # Add a blank line for spacing

#Call our function with the demo inputs
final_answer = get_hindi_answer(question=demo_question, context=demo_context)

---- Running System Demonstration -----

Original Question: What was Elias's secret and what was the true source of his bravery?
Original Context: The old lighthouse keeper, Elias, was known for hi...
Found English Answer: fear
Hindi Answer: डर
