# **USER QUESTION AND ANSWERING** Project
CoderOne


# **INSTALL NECESSARY  DEPENDENCIES**
 To install the necessary dependencies for running BERT-based question answering models with SQuAD, and to ensure smooth execution, follow these steps. This setup includes installing Python packages, setting up the environment, and managing versions for consistency.

In [None]:
# Install the required libraries
!pip install transformers datasets


### This sets up a question-answering system using a pre-trained DistilBERT model. It first imports the necessary libraries from the Hugging Face transformers package. Then, it loads the DistilBERT tokenizer and model fine-tuned on the SQuAD dataset. After loading, it initializes a question-answering pipeline, which combines the model and tokenizer to easily answer questions based on provided context



In [None]:

# Import the necessary libraries
from transformers import pipeline, AutoTokenizer, AutoModelForQuestionAnswering

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-distilled-squad")
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased-distilled-squad")

# Initialize the question-answering pipeline
qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/451 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

 The AutoModelForQuestionAnswering class in Hugging Face's transformers library simplifies loading pre-trained models for question-answering tasks by automatically selecting the appropriate model architecture based on the provided model name. This allows for easy switching between different models without altering the code.

This  pre-trained question-answering model provides answers based on a given context. It prompts the user to input a question, processes it through a QA pipeline, and then displays the corresponding answer amd includes an option to run multiple predefined questions.

In [None]:

# Example context and question
context = """
The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.
It is named after the engineer Gustave Eiffel, whose company designed and built the tower.
Constructed from 1887 to 1889 as the entrance arch for the 1889 World's Fair,
it was initially criticized by some of France's leading artists and intellectuals for its design,
but it has become a global cultural icon of France and one of the most recognizable structures in the world.
"""

'''questions = [
    "Who designed the Eiffel Tower?",
    "When was it constructed?",
    "What was it initially criticized for?"
]
'''

# Get the question from the user
question = input("Please enter your question: ")

# Run the QA pipeline with the user-provided question
result = qa_pipeline(question=question, context=context)

# Display the answer
print(f"Question: {question}")
print(f"Answer: {result['answer']}")

'''# Run the QA pipeline for each question
for q in questions:
    result = qa_pipeline(question=q, context=context)
    print(f"Question: {q}")
    print(f"Answer: {result['answer']}\n")'''


Please enter your question: When was it constructed
Question: When was it constructed
Answer: 1887 to 1889


'# Run the QA pipeline for each question\nfor q in questions:\n    result = qa_pipeline(question=q, context=context)\n    print(f"Question: {q}")\n    print(f"Answer: {result[\'answer\']}\n")'

In [None]:
pip install transformers datasets




In [None]:
# Load the dataset
dataset = load_dataset("squad")

# Load a pre-trained tokenizer
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [None]:
# Tokenize the dataset
def preprocess_function(examples):
    return tokenizer(examples["question"], examples["context"], truncation=True, padding="max_length", max_length=384)

tokenized_datasets = dataset.map(preprocess_function, batched=True)

# Load the pre-trained model
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
# Set up training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Define the compute metrics function
def compute_metrics(p):
    metric = load_metric("squad_v2")
    start_logits, end_logits = p.predictions
    start_positions = p.label_ids[:, 0]
    end_positions = p.label_ids[:, 1]
    return metric.compute(start_logits=start_logits, end_logits=end_logits, start_positions=start_positions, end_positions=end_positions)




### Fine-tuning a BERT model with the Trainer class tailors it to specific tasks like question answering by training on datasets such as SQuAD. The Trainer simplifies training, evaluation, and saving, adjusting the model's weights to boost task-specific performance while leveraging BERT's pre-existing language understanding.

In [None]:
from transformers import BertForQuestionAnswering
from transformers import BertForQuestionAnswering, BertTokenizer

# Load a pre-trained model and tokenizer
model = BertForQuestionAnswering.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

model = BertForQuestionAnswering.from_pretrained('bert-base-uncased')


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)



In [None]:
# Fine-tune the model
trainer.train()
# Save the model
model.save_pretrained("./qa_model")
tokenizer.save_pretrained("./qa_model")

## BERT (Bidirectional Encoder Representations from Transformers) is highly effective for question answering due to its ability to understand context and extract precise answers from text. Its transformer architecture captures bidirectional context, allowing it to accurately identify answer spans. After pre-training on extensive text data, BERT is fine-tuned on question-answering datasets like SQuAD to optimize its performance for practical applications.

In [3]:
import torch
from transformers import BertForQuestionAnswering, BertTokenizer, Trainer, TrainingArguments
from datasets import load_dataset

# Load a pre-trained BERT model for question answering
model = BertForQuestionAnswering.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Load a dataset (e.g., SQuAD dataset)
datasets = load_dataset("squad")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/7.62k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/87599 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10570 [00:00<?, ? examples/s]

### This demonstrates using a pre-trained BERT model for question answering. It initializes the BERT tokenizer and model, tokenizes the question and context, then uses the model to predict the answer span. Finally, it extracts and prints the most likely answer from the context based on the model's predictions

In [5]:
print(datasets.keys())
print(datasets['train'][0])



dict_keys(['train', 'validation'])
{'id': '5733be284776f41900661182', 'title': 'University_of_Notre_Dame', 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.', 'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?', 'answers': {'text': ['Saint Bernadette Soubirous'], 'answer_start': [515]}}


In [None]:
# Tokenize the data
def tokenize_data(example):
    return tokenizer(example['question'], example['context'], truncation=True, padding='max_length', max_length=512)

# Apply tokenization
tokenized_dataset = datasets.map(tokenize_data, batched=True)



In [7]:
# Print a tokenized sample
print(tokenized_dataset['train'][0])

{'id': '5733be284776f41900661182', 'title': 'University_of_Notre_Dame', 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.', 'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?', 'answers': {'text': ['Saint Bernadette Soubirous'], 'answer_start': [515]}, 'input_ids': [101, 2000, 3183, 2106, 1996, 6261, 2984,

In [None]:
import logging
logging.getLogger("transformers").setLevel(logging.ERROR)


from transformers import BertTokenizer

# Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

# Define your input question and context
question = "What is the largest and deepest ocean on Earth?"
context = "The Pacific Ocean is the largest and deepest of Earth's oceanic divisions. It extends from the Arctic Ocean in the north to the Southern Ocean in the south and is bounded by Asia and Australia on the west and the Americas on the east. The Pacific Ocean covers more than 63 million square miles (165 million square kilometers), making up about 46% of the Earth's water surface and about 32% of its total surface area."

# Tokenize the inputs
inputs = tokenizer(question, context, return_tensors='pt', padding=True, truncation=True)

import torch
from transformers import BertForQuestionAnswering

# Load the model
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

# Set model to evaluation mode
model.eval()

# Ensure inputs are on the same device as the model
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']

# Perform inference
with torch.no_grad():
    outputs = model(input_ids=input_ids, attention_mask=attention_mask)

# Extract logits
start_logits = outputs.start_logits
end_logits = outputs.end_logits

# Get the most likely start and end tokens
start_index = torch.argmax(start_logits, dim=-1)
end_index = torch.argmax(end_logits, dim=-1)

# Convert indices to tokens
start_token = start_index.item()
end_token = end_index.item()

# Decode the tokens
answer_tokens = tokenizer.convert_ids_to_tokens(input_ids[0][start_token:end_token+1])

# Join the tokens to form the final answer
answer = tokenizer.convert_tokens_to_string(answer_tokens)

print(f"Question: {question}")
print(f"Answer: {answer}")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Question: What is the largest and deepest ocean on Earth?
Answer: the pacific ocean


## Here below takes a user-provided question and identifies the relevant context to find an answer. It utilizes a pre-trained BERT model, processes the question and context through tokenization, and generates a precise answer by evaluating the model's output. The relevant context is searched and matched to ensure the most accurate response is provided.


## This also  provides the capability to update the context on each execution and pose questions based on the latest context. It efficiently processes the input question and context using a pre-trained BERT model, ensuring accurate answers by tokenizing and evaluating the context dynamically. This allows for flexible and precise question-answering based on varying contexts.

In [18]:
import logging
import torch
import random
from transformers import BertTokenizer, BertForQuestionAnswering
from datasets import load_dataset

# Set logging level
logging.getLogger("transformers").setLevel(logging.ERROR)

# Initialize the tokenizer and model
model_name = 'bert-large-uncased-whole-word-masking-finetuned-squad'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForQuestionAnswering.from_pretrained(model_name)
model.eval()

# Load SQuAD dataset
squad = load_dataset("squad")

# Function to get a random context from the SQuAD dataset
def get_relevant_context():
    # Randomly select an example from the dataset
    random_example = random.choice(squad['train'])
    context = random_example['context']
    return context

# Function to get the answer from the context
def get_answer(question, context):
    # Tokenize the inputs
    inputs = tokenizer(question, context, return_tensors='pt', padding=True, truncation=True)

    # Ensure inputs are on the same device as the model
    input_ids = inputs['input_ids']
    attention_mask = inputs['attention_mask']

    # Perform inference
    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)

    # Extract logits
    start_logits = outputs.start_logits
    end_logits = outputs.end_logits

    # Get the most likely start and end tokens
    start_index = torch.argmax(start_logits, dim=-1)
    end_index = torch.argmax(end_logits, dim=-1)

    # Convert indices to tokens
    start_token = start_index.item()
    end_token = end_index.item()

    # Decode the tokens
    answer_tokens = tokenizer.convert_ids_to_tokens(input_ids[0][start_token:end_token+1])

    # Join the tokens to form the final answer
    answer = tokenizer.convert_tokens_to_string(answer_tokens)

    return answer

# Interactive loop
while True:
    # Get and display the context before user input
    context = get_relevant_context()
    print(f"Context: {context}\n")

    question = input("Enter your question based on the above context (or 'exit' to quit): ")
    if question.lower() == 'exit':
        break

    answer = get_answer(question, context)

    print(f"Answer: {answer}\n")


Context: The Seljuk Empire soon started to collapse. In the early 12th century, Armenian princes of the Zakarid noble family drove out the Seljuk Turks and established a semi-independent Armenian principality in Northern and Eastern Armenia, known as Zakarid Armenia, which lasted under the patronage of the Georgian Kingdom. The noble family of Orbelians shared control with the Zakarids in various parts of the country, especially in Syunik and Vayots Dzor, while the Armenian family of Hasan-Jalalians controlled provinces of Artsakh and Utik as the Kingdom of Artsakh.

Enter your question based on the above context (or 'exit' to quit): in which does it happen
Answer: in the early 12th century

Context: Websites and online media companies in or near the city include All Media Guide, the Weather Underground, and Zattoo. Ann Arbor is the home to Internet2 and the Merit Network, a not-for-profit research and education computer network. Both are located in the South State Commons 2 building o

In [20]:
import logging
import torch
import random
from transformers import BertTokenizer, BertForQuestionAnswering
from datasets import load_dataset

# Set logging level
logging.getLogger("transformers").setLevel(logging.ERROR)

# Initialize the tokenizer and model
model_name = 'bert-large-uncased-whole-word-masking-finetuned-squad'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForQuestionAnswering.from_pretrained(model_name)
model.eval()

# Load SQuAD dataset
squad = load_dataset("squad")

# Function to get a random context from the SQuAD dataset
def get_relevant_context():
    random_example = random.choice(squad['train'])
    return random_example['context']

# Function to get the answer from the context
def get_answer(question, context):
    # Tokenize the inputs
    inputs = tokenizer(question, context, return_tensors='pt', padding=True, truncation=True)

    # Perform inference
    with torch.no_grad():
        outputs = model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])

    # Extract logits and find start and end tokens
    start_index = torch.argmax(outputs.start_logits, dim=-1).item()
    end_index = torch.argmax(outputs.end_logits, dim=-1).item()

    # If start index is after the end index or both are at the padding token, consider it as no answer
    if start_index > end_index or (start_index == tokenizer.pad_token_id and end_index == tokenizer.pad_token_id):
        return "No answer"

    # Decode the tokens if an answer is found
    answer_tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][start_index:end_index+1])
    answer = tokenizer.convert_tokens_to_string(answer_tokens)

    return answer

# Interactive loop
context = get_relevant_context()
while True:
    # Display the current context
    print(f"Context: {context}\n")

    # Ask the user if they want a new context
    change_context = input("Do you want a new context? (yes/no) ").strip().lower()
    if change_context == 'yes':
        context = get_relevant_context()
        print(f"\nNew Context: {context}\n")

    question = input("Enter your question based on the above context (or 'exit' to quit): ").strip()
    if question.lower() == 'exit':
        break

    answer = get_answer(question, context)
    print(f"Answer: {answer}\n")


Context: In the race for individual contributions, economist Lyndon LaRouche dominated the pack leading up to the primaries. According to the Federal Election Commission statistics, LaRouche had more individual contributors to his 2004 presidential campaign than any other candidate, until the final quarter of the primary season, when John Kerry surpassed him. As of the April 15 filing, LaRouche had 7834 individual contributions, of those who have given cumulatively, $200 or more, as compared to 6257 for John Kerry, 5582 for John Edwards, 4090 for Howard Dean, and 2744 for Gephardt.

Do you want a new context? (yes/no) no
Enter your question based on the above context (or 'exit' to quit): where is effiel tower
Answer: 2744 for gephardt .

Context: In the race for individual contributions, economist Lyndon LaRouche dominated the pack leading up to the primaries. According to the Federal Election Commission statistics, LaRouche had more individual contributors to his 2004 presidential cam

In [16]:
qa=pipeline("question-answering",model = model_name,tokenizer = tokenizer)

## This  evaluates a BERT model on the SQuAD validation dataset by calculating Exact Match (EM) and F1 scores. It processes each example by tokenizing the question and context, running inference, and comparing predicted answers with ground truth answers. Exact Match measures the proportion of exact predictions, while the F1 Score assesses the overlap between predicted and true answers. The results are then printed as percentages

In [None]:
model.eval()
def evaluate_squad():
    em, f1 = 0, 0
    total = 0

    for example in squad["validation"]:
        question = example["question"]
        context = example["context"]
        answers = example["answers"]["text"]

        # Tokenize inputs
        inputs = tokenizer(question, context, return_tensors="pt", truncation=True)

        # Perform inference
        with torch.no_grad():
            outputs = model(**inputs)
            start_logits = outputs.start_logits
            end_logits = outputs.end_logits

        # Get predicted start and end indices
        start_idx = torch.argmax(start_logits)
        end_idx = torch.argmax(end_logits) + 1

        # Convert token IDs back to string
        predicted_answer = tokenizer.convert_tokens_to_string(
            tokenizer.convert_ids_to_tokens(inputs["input_ids"][0][start_idx:end_idx])
        )

        # Compute Exact Match (EM)
        em += max([int(predicted_answer == answer) for answer in answers])

        # Compute F1 Score
        predicted_tokens = set(predicted_answer.split())
        f1_scores = []
        for answer in answers:
            answer_tokens = set(answer.split())
            common = predicted_tokens & answer_tokens
            if not common:
                f1_scores.append(0)
                continue
            precision = len(common) / len(predicted_tokens)
            recall = len(common) / len(answer_tokens)
            f1_scores.append(2 * precision * recall / (precision + recall))
        f1 += max(f1_scores)

        total += 1

    # Calculate average EM and F1
    em = 100 * em / total
    f1 = 100 * f1 / total

    return em, f1

# Evaluate the model
exact_match, f1 = evaluate_squad()
print(f"Exact Match (EM): {exact_match:.2f}%")
print(f"F1 Score: {f1:.2f}%")



Exact Match (EM): 84.2

F1 Score: 90.9

# The pre-trained `bert-large-squad` model achieves about 84.2% Exact Match (EM) and 90.9% F1 Score on the SQuAD dataset. These metrics indicate how well the model identifies the correct answer spans. The high F1 Score suggests that the model excels at finding answers that are near the correct ones, even if they don’t match perfectly.



In [17]:
!pip install rouge-score



In [14]:
from tqdm import tqdm

model.eval()

# Function to get the answer from the context
def get_answer(question, context):
    inputs = tokenizer(question, context, return_tensors='pt', padding=True, truncation=True)
    with torch.no_grad():
        outputs = model(**inputs)
    start_index = torch.argmax(outputs.start_logits, dim=-1).item()
    end_index = torch.argmax(outputs.end_logits, dim=-1).item()
    answer_tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][start_index:end_index+1])
    return tokenizer.convert_tokens_to_string(answer_tokens)

# Evaluation loop
correct = 0
total = 0

for example in tqdm(squad['validation']):
    question = example['question']
    context = example['context']
    true_answer = example['answers']['text'][0]

    predicted_answer = get_answer(question, context)

    if predicted_answer.strip() == true_answer.strip():
        correct += 1
    total += 1

accuracy = correct / total
print(f"Accuracy: {accuracy:.2%}")




In [None]:

def preprocess_function(examples):
    # Tokenize inputs
    inputs = tokenizer(examples["question"], examples["context"], truncation=True, padding="max_length", max_length=384, return_tensors="pt")

    # Compute start and end positions
    start_positions = []
    end_positions = []

    for i in range(len(examples["answers"])):
        start_char = examples["answers"]["answer_start"][i]
        end_char = start_char + len(examples["answers"]["text"][i])

        start_token = inputs.char_to_token(i, start_char)
        end_token = inputs.char_to_token(i, end_char - 1)

        # If start or end token cannot be found, use the first token (usually the CLS token)
        if start_token is None:
            start_token = tokenizer.model_max_length
        if end_token is None:
            end_token = tokenizer.model_max_length

        start_positions.append(start_token)
        end_positions.append(end_token)

    # Add to inputs
    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions

    return inputs

tokenized_squad = squad.map(preprocess_function, batched=True)

# Custom Trainer class
class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        outputs = model(**inputs)
        start_logits = outputs.start_logits
        end_logits = outputs.end_logits

        start_positions = inputs["start_positions"]
        end_positions = inputs["end_positions"]

        # Compute the loss using CrossEntropyLoss
        loss_fct = torch.nn.CrossEntropyLoss()
        start_loss = loss_fct(start_logits, start_positions)
        end_loss = loss_fct(end_logits, end_positions)
        loss = (start_loss + end_loss) / 2

        return (loss, outputs) if return_outputs else loss

# Compute Metrics function
bleu_metric = load_metric("bleu")

def compute_metrics(pred):
    start_logits, end_logits = pred.predictions
    start_positions = pred.label_ids[0]
    end_positions = pred.label_ids[1]

    predictions = tokenizer.batch_decode(start_logits.argmax(dim=-1), skip_special_tokens=True)
    references = tokenizer.batch_decode(start_positions, skip_special_tokens=True)

    # Compute BLEU score
    bleu_score = bleu_metric.compute(predictions=predictions, references=references)

    return {"bleu": bleu_score['bleu']}

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",  # Updated based on the deprecation warning
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize the Trainer
trainer = CustomTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_squad["train"],
    eval_dataset=tokenized_squad["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

# Train the model
trainer.train()

# Evaluate the model
metrics = trainer.evaluate()
print(metrics)'''
