<a href="https://colab.research.google.com/github/AtifQureshi110/BERT/blob/main/Question_answering_is_achieved_by_combining_a_pre_trained_model_with_a_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## some unrelated question to the passage
- working on all questions
- "ContextualQA"

- In this code, a pre-trained BERT-based question-answering model is loaded from the "deepset/bert-base-cased-squad2" checkpoint, along with its corresponding tokenizer. A context passage is provided, which contains information about the Presidential Initiative for Artificial Intelligence and Computing (PIAIC).

- A list of questions is defined, and a question-answering pipeline is initialized using the loaded model and tokenizer. The code iterates through the list of questions, uses the model to find answers within the given context, and stores the questions and answers in lists. A confidence threshold is used to filter answers.

- Tokenizer: The tokenizer is responsible for converting the text (context and question) into a format that the model can understand. It splits the text into tokens, assigns each token an ID, and handles various text preprocessing tasks like lowercasing, punctuation removal, and tokenization. It's a crucial component for preparing the input data for the model.

- Model: The model is the neural network that performs the actual question-answering task. It takes tokenized input and predicts the start and end positions of the answer within the context passage. The pre-trained model has learned to recognize patterns in text data and extract meaningful answers from passages. It uses the tokenized input to generate answer predictions.

In [1]:
!pip install transformers



In [2]:
# Load the pre-trained model for question answering
from transformers import BertForQuestionAnswering, AutoTokenizer, pipeline

model = BertForQuestionAnswering.from_pretrained("deepset/bert-base-cased-squad2")

# Load the corresponding tokenizer for the model
tokenizer = AutoTokenizer.from_pretrained("deepset/bert-base-cased-squad2")

Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [3]:
# Define your context (passage)
context = """
The Presidential Initiative for Artificial Intelligence and Computing (PIAIC) was launched by the President of Pakistan, Dr. Arif Alvi,
to promote education, research and business opportunities in Artificial Intelligence, Blockchain, Internet of Things, and Cloud Native Computing.
The initiative comes in a bid to enable Pakistan in making an imprint on the world’s path towards the Fourth Industrial Revolution.
It aims to transform the fields of education, research, and business in Pakistan. President Dr. Arif Alvi had launched PIAIC to
reshape Pakistan by revolutionising education, research and businesses through introducing latest cutting-edge technologies.
PIAIC offers programs for distance learning as well as on-site learning, allowing students from across Pakistan to enrol online.
However, students need to be present for exams onsite in order to enrol into the program and for examinations throughout the course of study.
The program has an initial target to enrol as many as 100,000 students within a year.
After a successful launch in Karachi with 12,000 students enrolling, PIAIC have started registering students in other major cities like Islamabad
and Faisalabad and soon plan on offering programs in Lahore, Quetta, and Peshawar. This initiative is a privately funded not-for-profit educational program
that has partnership with non-profit and for-profit organizations like Panacloud, Saylani Welfare International Trust, and Pakistan Stock Exchange (PSX)
"""

In [4]:
# Initialize the question-answering pipeline with pretrained model and tokenizer
nlp = pipeline("question-answering", model=model, tokenizer=tokenizer)

In [6]:
import pandas as pd

In [7]:
# List of questions
questions = [
    "What is the aim of PIAIC?",
    "How does PIAIC contribute to education in Pakistan?",
    "What are the key areas of focus for PIAIC?",
    "Who launched PIAIC?",
    "Where did PIAIC initially launch?",
    "What is the capital of France?",  # A question unrelated to the passage
    "what is England doing in AI", # A question unrelated to the passage
    "Where was there a successful launch with 12,000 students enrolling?",
    "in which other major cities has PIAIC commenced student registration?",
    "What are the cities where PIAIC plans to offer programs in the near future?",
    "how love to write the poem?" # A question unrelated to the passage
]

# Create empty lists to store questions and answers
question_list = []
answer_list = []

# Define a confidence threshold
confidence_threshold = 0.01  # You can adjust this threshold as needed

# Iterate through the list of questions
for question in questions:
    # Find the answer for each question
    result = nlp(question=question, context=context)

    # Check if the answer confidence is above the threshold
    if result['score'] >= confidence_threshold: # the score, answer is mentioned in result b/c pipeline has these information by defult
        print("Question:", question)
        print("Answer:", result['answer'])
        question_list.append(question)  # Append the question
        answer_list.append(result['answer'])  # Append the answer
    else:
        print("Question:", question)
        print("I don't have data about this question.")
        question_list.append(question)  # Append the question
        answer_list.append("I don't have data about this question.")  # Append a placeholder
    print()

# Create a DataFrame
df = pd.DataFrame({'Question': question_list, 'Answer': answer_list})

Question: What is the aim of PIAIC?
Answer: to transform the fields of education, research, and business in Pakistan

Question: How does PIAIC contribute to education in Pakistan?
Answer: It aims to transform the fields of education, research, and business in Pakistan

Question: What are the key areas of focus for PIAIC?
I don't have data about this question.

Question: Who launched PIAIC?
Answer: Dr. Arif Alvi

Question: Where did PIAIC initially launch?
Answer: Karachi

Question: What is the capital of France?
I don't have data about this question.

Question: what is England doing in AI
I don't have data about this question.

Question: Where was there a successful launch with 12,000 students enrolling?
Answer: Karachi

Question: in which other major cities has PIAIC commenced student registration?
Answer: Islamabad
and Faisalabad

Question: What are the cities where PIAIC plans to offer programs in the near future?
Answer: Lahore, Quetta, and Peshawar

Question: how love to write the

In [8]:
# Print the DataFrame
df

Unnamed: 0,Question,Answer
0,What is the aim of PIAIC?,"to transform the fields of education, research..."
1,How does PIAIC contribute to education in Paki...,"It aims to transform the fields of education, ..."
2,What are the key areas of focus for PIAIC?,I don't have data about this question.
3,Who launched PIAIC?,Dr. Arif Alvi
4,Where did PIAIC initially launch?,Karachi
5,What is the capital of France?,I don't have data about this question.
6,what is England doing in AI,I don't have data about this question.
7,"Where was there a successful launch with 12,00...",Karachi
8,in which other major cities has PIAIC commence...,Islamabad\nand Faisalabad
9,What are the cities where PIAIC plans to offer...,"Lahore, Quetta, and Peshawar"


- firstly, loading a pre-trained BERT-based question-answering model and tokenizer. Then, defining a context passage and a list of questions. than used the model to find answers to these questions in the context passage. If the model's confidence in the answer was above a specified threshold, you recorded the question and answer in a DataFrame. If the confidence was below the threshold, you recorded the question with a placeholder answer.

# "InteractiveQA"

In [9]:
!pip install transformers



In [10]:
# Load the pre-trained model for question answering
from transformers import BertForQuestionAnswering, AutoTokenizer, pipeline

In [11]:
import pandas as pd

In [12]:
# Load model
model = BertForQuestionAnswering.from_pretrained("deepset/bert-base-cased-squad2")
# Load the corresponding tokenizer for the model
tokenizer = AutoTokenizer.from_pretrained("deepset/bert-base-cased-squad2")

Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [13]:
# Define your context (passage)
context = """
The Presidential Initiative for Artificial Intelligence and Computing (PIAIC) was launched by the President of Pakistan, Dr. Arif Alvi,
to promote education, research and business opportunities in Artificial Intelligence, Blockchain, Internet of Things, and Cloud Native Computing.
The initiative comes in a bid to enable Pakistan in making an imprint on the world’s path towards the Fourth Industrial Revolution.
It aims to transform the fields of education, research, and business in Pakistan. President Dr. Arif Alvi had launched PIAIC to
reshape Pakistan by revolutionising education, research and businesses through introducing latest cutting-edge technologies.
PIAIC offers programs for distance learning as well as on-site learning, allowing students from across Pakistan to enrol online.
However, students need to be present for exams onsite in order to enrol into the program and for examinations throughout the course of study.
The program has an initial target to enrol as many as 100,000 students within a year.
After a successful launch in Karachi with 12,000 students enrolling, PIAIC have started registering students in other major cities like Islamabad
and Faisalabad and soon plan on offering programs in Lahore, Quetta, and Peshawar. This initiative is a privately funded not-for-profit educational program
that has partnership with non-profit and for-profit organizations like Panacloud, Saylani Welfare International Trust, and Pakistan Stock Exchange (PSX)
"""

In [14]:
# Initialize the question-answering pipeline with pretrained model and tokenizer
nlp = pipeline("question-answering", model=model, tokenizer=tokenizer)

In [20]:
# Initialize lists to store questions and answers
question_list = []
answer_list = []

while True:
    # Prompt the user to enter a question
    question = input("Enter a question (or type 'exit' to stop): ")

    # Check if the user wants to exit
    if question.lower() == "exit":
        break

    # Find the answer for the entered question
    result = nlp(question=question, context=context)

    # Check if the answer confidence is above a threshold
    confidence_threshold = 0.01  # You can adjust this threshold as needed
    if result['score'] >= confidence_threshold:
        print("Question:", question)
        print("Answer:", result['answer'])
        question_list.append(question)  # Append the question
        answer_list.append(result['answer'])  # Append the answer
    else:
        print("Question:", question)
        print("I don't have data about this question.")
        question_list.append(question)  # Append the question
        answer_list.append("I don't have data about this question.")  # Append a placeholder
    print()

# Create a DataFrame to store the interactive QA results
df = pd.DataFrame({'Question': question_list, 'Answer': answer_list})


Enter a question (or type 'exit' to stop): president of pakistan?
Question: president of pakistan?
Answer: Dr. Arif Alvi

Enter a question (or type 'exit' to stop): organizations name which are connect to piaic?
Question: organizations name which are connect to piaic?
Answer: Panacloud

Enter a question (or type 'exit' to stop): where i can learn to swim?
Question: where i can learn to swim?
I don't have data about this question.

Enter a question (or type 'exit' to stop): exit


In [21]:
df

Unnamed: 0,Question,Answer
0,president of pakistan?,Dr. Arif Alvi
1,organizations name which are connect to piaic?,Panacloud
2,where i can learn to swim?,I don't have data about this question.
