# Bert Baseline Model

In [None]:
from transformers import pipeline, BertForQuestionAnswering, BertTokenizer
import textwrap
import time

I am setting up the necessary tools to build and run a question-answering system based on BERT, a powerful NLP model. The `pipeline` from the transformers library acts as a high-level interface to quickly create tasks like question answering without manually handling model details. Importing `BertForQuestionAnswering` allows me to directly access a pretrained BERT model specifically fine-tuned for finding answers within a given context. The `BertTokenizer` is crucial because BERT requires text inputs to be converted into tokens in a specific way, handling things like subwords and special tokens, so the model can understand the text properly.

I also bring in the `textwrap `module to help format the output text, making it easier to read by wrapping long lines neatly. The `time` module is included so I can track how long operations take or add pauses if needed, which can be useful for optimizing performance or debugging the flow of the code. Altogether, these imports provide both the model and the supportive utilities needed to efficiently implement a precise question-answering pipeline.

In [None]:
model = BertForQuestionAnswering.from_pretrained('deepset/bert-base-cased-squad2')
tokenizer = BertTokenizer.from_pretrained('deepset/bert-base-cased-squad2')

Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


1. Loading the Pretrained Model

- `BertForQuestionAnswering` is not just a plain BERT model  it’s a version of BERT that has been adapted for extractive question answering. Instead of generating sentences, this model predicts two things:

   - the start index of the answer in the context

   - the end index of the answer in the context

- This is possible because the final output layer of BERT was fine-tuned on the **SQuAD v2 dataset** (Stanford Question Answering Dataset 2.0). That dataset doesn’t just contain questions with answers but also includes **unanswerable questions**, which makes the model smart enough to sometimes say “no answer” instead of forcing one.

- According to Hugging Face’s docs, this class is specifically designed for span-based prediction, which fits perfectly for Q&A.


2. Tokenizing the Input

- `BertTokenizer` is responsible for preprocessing the text before it reaches the model. The model can’t understand raw text, so the tokenizer converts text into input IDs and attention masks.

- Through my reading, I found out that it uses **WordPiece tokenization**. For example, if the model encounters the word “unbelievable,” it might break it into “un,” “##believ,” and “##able.” This ensures that the model can still handle words that weren’t in its original training vocabulary

- The tokenizer also takes care of adding special tokens like **[CLS](Classification token) and [SEP](Separator token)**. [CLS] is placed at the start of the input, and [SEP] separates the question from the context passage. This structure is what BERT expects when working on Q&A tasks.

3. Why Use `deepset/bert-base-cased-squad2`?

- This specific checkpoint comes from **deepset**, a team that has done a lot of work in NLP applications.

- **“cased”** means it preserves the difference between uppercase and lowercase letters. I realized this matters in cases where casing changes meaning, for example, “US” (United States) vs. “us” (pronoun).

- **“squad2”** means the model was fine-tuned using the SQuAD v2 dataset, which adds a layer of robustness because it forces the model to handle cases where there is no correct answer in the context

4. How This Fits Together

With these two lines, the system is basically ready for Q&A:

- The tokenizer breaks the question and passage into a format BERT can process.

- The model then looks at the encoded input and predicts the most likely answer span within the passage.


Source Used : (https://huggingface.co/deepset/bert-base-cased-squad2), (https://huggingface.co/docs/transformers/main_classes/tokenizer),(https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForQuestionAnswering
)

In [None]:
qna_pipeline = pipeline('question-answering', model=model, tokenizer=tokenizer)

Device set to use cpu


Here,creating a question answering pipeline using Hugging Face’s pipeline API.

- `pipeline('question-answering')`: Upon reading the pipeline function is like a shortcut in Hugging Face Transformers. Instead of writing all the steps manually (tokenization → feeding into model → postprocessing), the pipeline bundles everything into one callable object. I found out from the documentation that pipelines were designed to simplify the most common NLP tasks like sentiment analysis, text classification, and question answering

- `model=model, tokenizer=tokenizer`: Here I’m explicitly passing in the model and tokenizer I set up earlier `(deepset/bert-base-cased-squad2)`. This is important because if I don’t specify, the pipeline will try to pick a default model for Q&A, which might not be the exact one I want. By providing my own, I have full control and consistency in results.

- What happens under the pipeline:
    - When I call the pipeline with a question and context, the tokenizer will encode them into the `[CLS] Question [SEP] Context [SEP] format`.
    - The **model** will then predict the start and end tokens that represent the answer span.
    - The pipeline will automatically convert those tokens back into human-readable text as the final answer.

So in just one line, I have a fully working **Q&A system**. Instead of manually coding the tokenization, model prediction, and decoding steps, the pipeline makes it easy to test and experiment quickly. That’s why Hugging Face calls pipelines the “fast track” to using state-of-the-art NLP models.

Source Used: (https://huggingface.co/docs/transformers/main_classes/pipelines)

In [None]:
context = input("enter Context Article: ")
dedented_text = textwrap.dedent(context).strip()

print("Context Article:\n")
print(textwrap.fill(dedented_text, width=120))

enter Context Article: MANILA – The government is arranging chartered flights for the repatriation of more than 200 overseas Filipino workers in Beirut, Lebanon, the Department of Migrant Workers (DMW) said Wednesday.  “We are trying to provide for chartered flights. We’re talking to airline companies so that the chartered flights would be able to accommodate for example, no less than 300 overseas Filipino workers from Beirut,” DMW Undersecretary Bernard Olalia said in a Palace press briefing.  This was after the scheduled flights of around 15 OFWs on Sept. 25 were cancelled because of the recent bombings in Beirut.  Olalia said around 111 OFWs are staying in four temporary shelters in Beirut and waiting for their repatriation.  An additional 110 OFWs are applying for exit permits from the Lebanese government, Olalia said.  “Apart from the documented OFWs, we have undocumented OFWs who need to secure travel documents and once they’re given travel documents, we will help them in securin

- `context = input("enter Context Article: ")`
Here, I’m letting the user provide a** context passage **directly from the keyboard. In Q&A tasks, the context is the text where the model will search for the answer. By using `input()`, I can dynamically test different passages instead of hardcoding them.

- `textwrap.dedent(context).strip()`
When I researched this, I learned that `textwrap.dedent()` removes any common leading whitespace from the text. This is useful if the context is copied from another source (like an article or a document) where indentation might mess up the formatting.
  - The `.strip()` at the end cleans up any extra whitespace at the start or end of the text.
  - Together, this ensures the context is clean and properly formatted before printing or passing it to the model
- `print("Context Article:\n") `and `print(textwrap.fill(dedented_text, width=120))`
Instead of just printing raw text, `textwrap.fill()` formats it so the text doesn’t run off the edge of the screen. By setting `width=120`, the output is wrapped neatly at 120 characters per line. This makes the passage easier to read inside the notebook, especially for long articles.


So what’s happening here is:
- I type or paste an article/passage as input.

- The code cleans and normalizes the formatting.

- It then prints the context in a reader-friendly format, making sure long lines don’t clutter the output.

This step doesn’t directly involve BERT yet, but it ensures that the context text is properly prepared before I pair it with a question for the Q&A pipeline.

Source Used: (https://docs.python.org/3/library/textwrap.html)

In [None]:
inquiry = input("\nType your question: ")
while (inquiry != '*'):
  start_time = time.time()
  answer = qna_pipeline({ "question": inquiry, "context": context })
  end_time = time.time()

  elapsed = end_time - start_time
  print("Answer found: " + answer['answer'])
  print("At Index: ", answer['start'], " - ", answer['end'])
  print("With Probability:", answer['score'], "\n")
  print(f"Time Elapsed: {elapsed:.4f} seconds")

  inquiry = input("Enter another question (* to stop): ")


Type your question: Who is arranging chartered flights for repatriation of more than 200 OFWs in Beirut?
Answer found: (DMW)
At Index:  173  -  178
With Probability: 0.0004722768208011985 

Time Elapsed: 2.9210 seconds
Enter another question (* to stop): How many OFWs did the DMW say they are trying to accommodate with the chartered flights?
Answer found: no less than 300
At Index:  352  -  368
With Probability: 0.09249385446310043 

Time Elapsed: 2.6738 seconds
Enter another question (* to stop): Why were the scheduled flights around Sept. 25 cancelled
Answer found: the recent bombings in Beirut.
At Index:  570  -  601
With Probability: 0.3953486382961273 

Time Elapsed: 2.7410 seconds
Enter another question (* to stop): How many OFWs are staying in four temporary shelters in Beirut?
Answer found: 111
At Index:  621  -  624
With Probability: 0.48273301124572754 

Time Elapsed: 2.7256 seconds
Enter another question (* to stop): How many OFWs were applying for exit permits from the Leb

- `inquiry = input("\nType your question: ")`
This lets me type a question that I want the model to answer based on the context article I entered earlier. It’s interactive, so I can test different questions without restarting the program.

- `while (inquiry != '*'):`
The loop continues as long as I don’t type `*`. This gives me the flexibility to ask multiple questions about the same context in one run. Typing `*` is like a stop command to break out of the loop.

- `start_time = time.time() and end_time = time.time()`
These are used to measure **how long the pipeline takes** to find an answer. By subtracting the start from the end, I can calculate the execution time for each question. This is useful to evaluate the efficiency of the model, not just the accuracy.

- `answer = qna_pipeline({ "question": inquiry, "context": context })`
This is the most important line. I pass a dictionary containing my question and the context passage to the Q&A pipeline.

This is the most important line. I pass a dictionary containing my question and the context passage to the Q&A pipeline.

What happens under the pipeline :  
- Tokenizes the input ([CLS] Question [SEP] Context [SEP])
- Runs it through the BERT model
- Predicts the **start** and **end** indices of the answer span in the context
- Decodes those indices back into text.

- The result is a dictionary with keys like` answer, start, end, and score`.

When the model returns its result, it provides several important pieces of information. The `answer['answer']` shows the actual text span that the model predicts as the correct answer. Along with this, `answer['start'] `and` answer['end']` indicate the exact character indices in the context where the answer is located, which makes it clear where the model “found” its response. The `answer['score']` represents the confidence level of the prediction, ranging from 0 to 1, with values closer to 1 meaning the model is more certain about its answer. I also calculate the `elapsed` time to measure how long the pipeline took to process the question, which is displayed with four decimal places for precision. Finally, after each loop, the program asks again for input allowing me to enter another question using the same context or type `*` to stop the interaction.



# DistilBERT Model

In [None]:
from transformers import pipeline, DistilBertForQuestionAnswering, DistilBertTokenizer
import textwrap
import time

I’m importing the main libraries and modules that I’ll need for building and running the question answering system.
- `from transformers import pipeline, DistilBertForQuestionAnswering, DistilBertTokenizer`
   - The Transformers library (by Hugging Face) provides state-of-the-art NLP models and tools.
   - pipeline is a high-level API that makes it easy to use pre-trained models without writing all the preprocessing and postprocessing steps manually. From my research, this is one of the reasons Hugging Face is widely used it allows fast prototyping
   - DistilBertForQuestionAnswering is a smaller, lighter version of BERT designed for Q&A tasks. It runs faster and requires fewer resources while keeping most of BERT’s accuracy. According to Hugging Face, DistilBERT retains about 95% of BERT’s performance but is 40% smaller, which makes it more efficient for experiments
   - DistilBertTokenizer handles the tokenization process specifically for DistilBERT. Like the standard BERT tokenizer, it uses WordPiece tokenization, adds special tokens ([CLS], [SEP]), and prepares the inputs in the format that DistilBERT expects.
- `import textwrap`:
This is a Python standard library that helps format long text. I’m using it to neatly display the context passage by wrapping lines at a fixed width, so the article looks clean in the output.

- `import time` :
The time module is used to measure how long it takes for the model to process each question. This helps me not only test the accuracy but also evaluate the speed and efficiency of the model’s predictions.

Source Used: (https://huggingface.co/docs/transformers/en/model_doc/distilbert)

In [None]:
# Load DistilBERT fine-tuned on SQuAD
model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad')
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad')

config.json:   0%|          | 0.00/451 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In this part, I am loading `DistilBERT`, which is a smaller and lighter version of BERT that has been fine-tuned on the **SQuAD dataset**. The line `DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad')` loads the actual model, while `DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad')` loads the tokenizer that goes with it. Just like with the other models, the tokenizer handles splitting the text into tokens the model can understand, and the model then predicts the start and end positions of the answer in the context.

What I noticed here is that `DistilBERT` **ran faster and gave me better results** compared to BERT-Large. Since `DistilBERT` has fewer parameters, it doesn’t require as much processing time or memory, which explains why it was quicker to download and run. Despite being lighter, it still performed well in my tests, sometimes even more accurately than the heavier models. Based on my research, `DistilBERT` was designed to be a more efficient version of BERT it keeps about **97% of BERT’s performance while being 40% smaller and 60% faster**. This matches what I experienced when running it.

For my case, this model struck the right balance between speed and accuracy, which made it more practical than BERT-Large. It showed me that in real-world tasks, efficiency and responsiveness can sometimes matter more than just having the largest model available.

Source Used: (https://huggingface.co/docs/transformers/en/model_doc/distilbert)

In [None]:
qna_pipeline = pipeline('question-answering', model=model, tokenizer=tokenizer)

Device set to use cpu


Here,creating a question answering pipeline using Hugging Face’s pipeline API.

- `pipeline('question-answering')`: Upon reading the pipeline function is like a shortcut in Hugging Face Transformers. Instead of writing all the steps manually (tokenization → feeding into model → postprocessing), the pipeline bundles everything into one callable object. I found out from the documentation that pipelines were designed to simplify the most common NLP tasks like sentiment analysis, text classification, and question answering.

- `model=model, tokenizer=tokenizer`: Here I’m explicitly passing in the model and tokenizer I set up also `('distilbert-base-uncased-distilled-squad')`. This is important because if I don’t specify, the pipeline will try to pick a default model for Q&A, which might not be the exact one I want. By providing my own, I have full control and consistency in results.

- What happens under the pipeline:
    - When I call the pipeline with a question and context, the tokenizer will encode them into the `[CLS] Question [SEP] Context [SEP] format`.
    - The **model** will then predict the start and end tokens that represent the answer span.
    - The pipeline will automatically convert those tokens back into human-readable text as the final answer.

So in just one line, I have a fully working **Q&A system**. Instead of manually coding the tokenization, model prediction, and decoding steps, the pipeline makes it easy to test and experiment quickly. That’s why Hugging Face calls pipelines the “fast track” to using state-of-the-art NLP models.

Source used: (https://huggingface.co/docs/transformers/main_classes/pipelines)

In [None]:

# Input context
context = input("Enter Context Article: ")
dedented_text = textwrap.dedent(context).strip()

print("\nContext Article:\n")
print(textwrap.fill(dedented_text, width=120))

Enter Context Article: MANILA – The government is arranging chartered flights for the repatriation of more than 200 overseas Filipino workers in Beirut, Lebanon, the Department of Migrant Workers (DMW) said Wednesday.  “We are trying to provide for chartered flights. We’re talking to airline companies so that the chartered flights would be able to accommodate for example, no less than 300 overseas Filipino workers from Beirut,” DMW Undersecretary Bernard Olalia said in a Palace press briefing.  This was after the scheduled flights of around 15 OFWs on Sept. 25 were cancelled because of the recent bombings in Beirut.  Olalia said around 111 OFWs are staying in four temporary shelters in Beirut and waiting for their repatriation.  An additional 110 OFWs are applying for exit permits from the Lebanese government, Olalia said.  “Apart from the documented OFWs, we have undocumented OFWs who need to secure travel documents and once they’re given travel documents, we will help them in securin

- `context = input("enter Context Article: ")`
Here, I’m letting the user provide a** context passage **directly from the keyboard. In Q&A tasks, the context is the text where the model will search for the answer. By using `input()`, I can dynamically test different passages instead of hardcoding them.

- `textwrap.dedent(context).strip()`
When I researched this, I learned that `textwrap.dedent()` removes any common leading whitespace from the text. This is useful if the context is copied from another source (like an article or a document) where indentation might mess up the formatting.
  - The `.strip()` at the end cleans up any extra whitespace at the start or end of the text.
  - Together, this ensures the context is clean and properly formatted before printing or passing it to the model.
- `print("Context Article:\n") `and `print(textwrap.fill(dedented_text, width=120))`
Instead of just printing raw text, `textwrap.fill()` formats it so the text doesn’t run off the edge of the screen. By setting `width=120`, the output is wrapped neatly at 120 characters per line. This makes the passage easier to read inside the notebook, especially for long articles.


So what’s happening here is:
- I type or paste an article/passage as input.

- The code cleans and normalizes the formatting.

- It then prints the context in a reader-friendly format, making sure long lines don’t clutter the output.

This step doesn’t directly involve BERT yet, but it ensures that the context text is properly prepared before I pair it with a question for the Q&A pipeline.

Source used: (https://docs.python.org/3/library/textwrap.html)

In [None]:
# Interactive Q&A loop
inquiry = input("\nType your question: ")
while (inquiry != '*'):
    start_time = time.time()
    answer = qna_pipeline({
        "question": inquiry,
        "context": context
    })
    end_time = time.time()

    elapsed = end_time - start_time
    print("\nAnswer found: " + answer['answer'])
    print("At Index:", answer['start'], "-", answer['end'])
    print("With Probability:", answer['score'], "\n")
    print(f"Time Elapsed: {elapsed:.4f} seconds")

    inquiry = input("Enter another question (* to stop): ")


Type your question: Who is arranging chartered flights for repatriation of more than 200 OFWs in Beirut?

Answer found: MANILA – The government
At Index: 0 - 23
With Probability: 0.8498142957687378 

Time Elapsed: 1.3940 seconds
Enter another question (* to stop): How many OFWs did the DMW say they are trying to accommodate with the chartered flights?

Answer found: 110
At Index: 730 - 733
With Probability: 0.6784839034080505 

Time Elapsed: 1.3576 seconds
Enter another question (* to stop): Why were the scheduled flights around Sept. 25 cancelled?

Answer found: the recent bombings in Beirut.
At Index: 570 - 601
With Probability: 0.3191678524017334 

Time Elapsed: 1.4362 seconds
Enter another question (* to stop): How many OFWs are staying in four temporary shelters in Beirut?

Answer found: 110
At Index: 730 - 733
With Probability: 0.8639078140258789 

Time Elapsed: 1.3949 seconds
Enter another question (* to stop): How many OFWs were applying for exit permits from the Lebanese gove

- `inquiry = input("\nType your question: ")`
This lets me type a question that I want the model to answer based on the context article I entered earlier. It’s interactive, so I can test different questions without restarting the program.

- `while (inquiry != '*'):`
The loop continues as long as I don’t type `*`. This gives me the flexibility to ask multiple questions about the same context in one run. Typing `*` is like a stop command to break out of the loop.

- `start_time = time.time() and end_time = time.time()`
These are used to measure **how long the pipeline takes** to find an answer. By subtracting the start from the end, I can calculate the execution time for each question. This is useful to evaluate the efficiency of the model, not just the accuracy.

- `answer = qna_pipeline({ "question": inquiry, "context": context })`
This is the most important line. I pass a dictionary containing my question and the context passage to the Q&A pipeline.

This is the most important line. I pass a dictionary containing my question and the context passage to the Q&A pipeline.

What happens under the pipeline :  
- Tokenizes the input ([CLS] Question [SEP] Context [SEP])
- Runs it through the BERT model
- Predicts the **start** and **end** indices of the answer span in the context
- Decodes those indices back into text.

- The result is a dictionary with keys like` answer, start, end, and score`.

When the model returns its result, it provides several important pieces of information. The `answer['answer']` shows the actual text span that the model predicts as the correct answer. Along with this, `answer['start'] `and` answer['end']` indicate the exact character indices in the context where the answer is located, which makes it clear where the model “found” its response. The `answer['score']` represents the confidence level of the prediction, ranging from 0 to 1, with values closer to 1 meaning the model is more certain about its answer. I also calculate the `elapsed` time to measure how long the pipeline took to process the question, which is displayed with four decimal places for precision. Finally, after each loop, the program asks again for input allowing me to enter another question using the same context or type `*` to stop the interaction.



# Bert-Large Model

In [None]:
from transformers import pipeline, BertForQuestionAnswering, BertTokenizer
import textwrap
import time

This part of the code is about importing the necessary libraries and modules we need for the Question Answering system.
- `from transformers import pipeline, BertForQuestionAnswering, BertTokenizer` → This imports the **Hugging Face Transformers** library tools.
   - `pipeline` makes it easier to quickly set up a pre-built model for specific tasks like question answering.
   - `BertForQuestionAnswering` is the actual BERT model architecture fine-tuned for finding answers in a given passage.
   - `BertTokenizer` is used to convert text into tokens (small pieces that the model can understand).
- import textwrap → This helps format long text outputs so they look cleaner and more readable when printed.

- import time → This module is used to track how long the model takes to process each question (execution time).

In short, this step brings in all the tools needed: the model and tokenizer for question answering, a formatter for better text display, and a timer for performance measurement.

In [None]:
# Load BERT-Large fine-tuned on SQuAD
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In this part, I am specifically working with **BERT-Large**, which is one of the heaviest but also one of the most powerful versions of BERT. The line `BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')` loads the actual model, while `BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')` loads the tokenizer that pairs with it. Both of these have already been fine-tuned on the **SQuAD dataset (Stanford Question Answering Dataset)**, so the model already has a strong ability to find exact answers within a context. This is important because it saves me from having to train a model from scratch, which would take a lot more time and resources.

The way it works is that the **tokenizer** first breaks down the input text into smaller pieces that the model can actually process, including adding special tokens like `[CLS] `at the start and `[SEP]` between the context and the question. The model then takes these tokens and predicts the best possible start and end positions of the answer within the context. Since this is the Large version, it has more layers and parameters compared to smaller models, which gives it the ability to understand more complex patterns in the text.

However, when I actually tested it, I found that this model was **the slowest to download and run**, and surprisingly, it also gave me the **underperforming results** out of the three models I tried. Even though it is technically more powerful on paper, the answers it generated in my tests were less accurate and less reliable compared to` DistilBERT `and the other `BERT` variant. Based on what I’ve read from Hugging Face and community discussions, this can happen because larger models sometimes overfit on their fine-tuning data or behave inconsistently when used in smaller-scale tasks, especially if the context or questions don’t perfectly match what the model was fine-tuned on.


So in practice, even though `BERT-Large` is expected to perform better in benchmarks, my experience showed that a lighter model like DistilBERT actually gave me more useful results. This showed me that **bigger doesn’t always mean better**, and that model performance depends a lot on the actual task and data being used.

Sourced used: (https://huggingface.co/google-bert/bert-large-uncased) and (https://medium.com/data-science/bert-3d1bf880386a)

In [None]:
qna_pipeline = pipeline('question-answering', model=model, tokenizer=tokenizer)

Device set to use cpu


Here,creating a question answering pipeline using Hugging Face’s pipeline API.

- `pipeline('question-answering')`: Upon reading the pipeline function is like a shortcut in Hugging Face Transformers. Instead of writing all the steps manually (tokenization → feeding into model → postprocessing), the pipeline bundles everything into one callable object. I found out from the documentation that pipelines were designed to simplify the most common NLP tasks like sentiment analysis, text classification, and question answering

- `model=model, tokenizer=tokenizer`: Here I’m explicitly passing in the model and tokenizer I set up also `('bert-large-uncased-whole-word-masking-finetuned-squad')`. This is important because if I don’t specify, the pipeline will try to pick a default model for Q&A, which might not be the exact one I want. By providing my own, I have full control and consistency in results.

- What happens under the pipeline:
    - When I call the pipeline with a question and context, the tokenizer will encode them into the `[CLS] Question [SEP] Context [SEP] format`.
    - The **model** will then predict the start and end tokens that represent the answer span.
    - The pipeline will automatically convert those tokens back into human-readable text as the final answer.

So in just one line, I have a fully working **Q&A system**. Instead of manually coding the tokenization, model prediction, and decoding steps, the pipeline makes it easy to test and experiment quickly. That’s why Hugging Face calls pipelines the “fast track” to using state-of-the-art NLP models.

Sourced used : https://huggingface.co/docs/transformers/main_classes/pipelines

In [None]:
# Input context
context = input("Enter Context Article: ")
dedented_text = textwrap.dedent(context).strip()

print("\nContext Article:\n")
print(textwrap.fill(dedented_text, width=120))

Enter Context Article: MANILA – The government is arranging chartered flights for the repatriation of more than 200 overseas Filipino workers in Beirut, Lebanon, the Department of Migrant Workers (DMW) said Wednesday.  “We are trying to provide for chartered flights. We’re talking to airline companies so that the chartered flights would be able to accommodate for example, no less than 300 overseas Filipino workers from Beirut,” DMW Undersecretary Bernard Olalia said in a Palace press briefing.  This was after the scheduled flights of around 15 OFWs on Sept. 25 were cancelled because of the recent bombings in Beirut.  Olalia said around 111 OFWs are staying in four temporary shelters in Beirut and waiting for their repatriation.  An additional 110 OFWs are applying for exit permits from the Lebanese government, Olalia said.  “Apart from the documented OFWs, we have undocumented OFWs who need to secure travel documents and once they’re given travel documents, we will help them in securin

- `context = input("enter Context Article: ")`
Here, I’m letting the user provide a** context passage **directly from the keyboard. In Q&A tasks, the context is the text where the model will search for the answer. By using `input()`, I can dynamically test different passages instead of hardcoding them.

- `textwrap.dedent(context).strip()`
When I researched this, I learned that `textwrap.dedent()` removes any common leading whitespace from the text. This is useful if the context is copied from another source (like an article or a document) where indentation might mess up the formatting.
  - The `.strip()` at the end cleans up any extra whitespace at the start or end of the text.
  - Together, this ensures the context is clean and properly formatted before printing or passing it to the model
- `print("Context Article:\n") `and `print(textwrap.fill(dedented_text, width=120))`
Instead of just printing raw text, `textwrap.fill()` formats it so the text doesn’t run off the edge of the screen. By setting `width=120`, the output is wrapped neatly at 120 characters per line. This makes the passage easier to read inside the notebook, especially for long articles.


So what’s happening here is:
- I type or paste an article/passage as input.

- The code cleans and normalizes the formatting.

- It then prints the context in a reader-friendly format, making sure long lines don’t clutter the output.

This step doesn’t directly involve BERT yet, but it ensures that the context text is properly prepared before I pair it with a question for the Q&A pipeline.

Source Used: (https://docs.python.org/3/library/textwrap.html)

In [None]:
# Interactive Q&A loop
inquiry = input("\nType your question: ")
while (inquiry != '*'):
    start_time = time.time()
    answer = qna_pipeline({
        "question": inquiry,
        "context": context
    })
    end_time = time.time()

    elapsed = end_time - start_time
    print("\nAnswer found: " + answer['answer'])
    print("At Index:", answer['start'], "-", answer['end'])
    print("With Probability:", answer['score'], "\n")
    print(f"Time Elapsed: {elapsed:.4f} seconds")

    inquiry = input("Enter another question (* to stop): ")



Type your question: Who is arranging chartered flights for repatriation of more than 200 OFWs in Beirut?

Answer found: the Philippine government
At Index: 1093 - 1118
With Probability: 0.2613091766834259 

Time Elapsed: 10.0048 seconds
Enter another question (* to stop): How many OFWs did the DMW say they are trying to accommodate with the chartered flights?

Answer found: no less than 300
At Index: 352 - 368
With Probability: 0.3347918689250946 

Time Elapsed: 9.8940 seconds
Enter another question (* to stop): Why were the scheduled flights around Sept. 25 cancelled?

Answer found: the recent bombings in Beirut.
At Index: 570 - 601
With Probability: 0.32395607233047485 

Time Elapsed: 9.6469 seconds
Enter another question (* to stop): How many OFWs are staying in four temporary shelters in Beirut?

Answer found: 111
At Index: 621 - 624
With Probability: 0.6315323114395142 

Time Elapsed: 9.7508 seconds
Enter another question (* to stop): How many OFWs were applying for exit permits 

- `inquiry = input("\nType your question: ")`
This lets me type a question that I want the model to answer based on the context article I entered earlier. It’s interactive, so I can test different questions without restarting the program.

- `while (inquiry != '*'):`
The loop continues as long as I don’t type `*`. This gives me the flexibility to ask multiple questions about the same context in one run. Typing `*` is like a stop command to break out of the loop.

- `start_time = time.time() and end_time = time.time()`
These are used to measure **how long the pipeline takes** to find an answer. By subtracting the start from the end, I can calculate the execution time for each question. This is useful to evaluate the efficiency of the model, not just the accuracy.

- `answer = qna_pipeline({ "question": inquiry, "context": context })`
This is the most important line. I pass a dictionary containing my question and the context passage to the Q&A pipeline.

This is the most important line. I pass a dictionary containing my question and the context passage to the Q&A pipeline.

What happens under the pipeline :  
- Tokenizes the input ([CLS] Question [SEP] Context [SEP])
- Runs it through the BERT model
- Predicts the **start** and **end** indices of the answer span in the context
- Decodes those indices back into text.

- The result is a dictionary with keys like` answer, start, end, and score`.

When the model returns its result, it provides several important pieces of information. The `answer['answer']` shows the actual text span that the model predicts as the correct answer. Along with this, `answer['start'] `and` answer['end']` indicate the exact character indices in the context where the answer is located, which makes it clear where the model “found” its response. The `answer['score']` represents the confidence level of the prediction, ranging from 0 to 1, with values closer to 1 meaning the model is more certain about its answer. I also calculate the `elapsed` time to measure how long the pipeline took to process the question, which is displayed with four decimal places for precision. Finally, after each loop, the program asks again for input allowing me to enter another question using the same context or type `*` to stop the interaction.

