📌 SQuAD

A short project on SQuaD v2.0 Dataset.

🚀 Features

✨ Trained on SQuAD v2.0 dataset
🔧 Question Answering with RoBERTa / BERT / Albert-v2
📦 Streamlit-based interactive QA demo

📚 Table of Contents

About
Demo
Installation
SQuAD v2.0 Statistics
Preprocessing & Data Handling
Model
Experiments
Next Steps
References

📝 About

This project demonstrates a Question Answering (QA) system trained on the SQuAD v2.0 dataset.
It can answer questions based on user-provided text and identify when no answer exists within the paragraph.

The project includes:

A fine-tuned QA model
A Streamlit web interface
Easy-to-use inference pipeline

🎥 Demo

Click for Demo.

🛠 Installation

1. Clone the Repository

git clone https://github.com/dionvou/squad.git cd squad

2. Download SQuAD Dataset

SQuAD v2.0 can be downloaded at:
🔗 https://rajpurkar.github.io/SQuAD-explorer/

Or, use the provided script included in the repository:

Make sure the script is executable

chmod +x download.sh

Run the script to download the dataset

./download.sh

📊 SQuAD v2.0 Statistics

The SQuAD v2.0 dataset contains a mixture of answerable and unanswerable questions, which makes it more challenging than v1.1.

Training set: 130,319 questions
- Answerable: 86,821 (~67%)
- Unanswerable: 43,498 (~33%)
Development set: 11,873 questions
- Answerable: 5n928 (~50%)
- Unanswerable: 5,945 (~50%)

This dataset introduces unanswerable questions to train models to identify when no answer exists. To handle the varying lengths of contexts, questions, and answers during tokenization, we analyzed the distributions of token lengths using a BERT tokenizer.

The plot shows the number of BERT tokens for:

Context: The full paragraph
Question: Each question text
Answer: Each answer span

🛠 Preprocessing & Data Handling

Tokenization & Overflow

Used Hugging Face tokenizers with return_overflowing_tokens=True to handle long contexts
Split long paragraphs into smaller chunks to avoid truncation

Labeling Strategy

Initially, all non-answerable chunks were labeled as 0 (impossible)
This led to a high imbalance and caused the model to overfit on predicting zeros
To fix this:
- Removed answerable question parts that did not contain answers after split
- Kept only answerable portions for training
Result: better balance between answerable and unanswerable examples and reduced model collapse

Model

We use base bert, roberta, albert, distill-bert and spanbert for testing.

Experiments

To evaluate our models and select the best-performing architecture, we conducted a series of controlled experiments.
Due to time constraints, all evaluations were performed on a development split created from the original training set, using an 80% / 20% train–validation split.
After determining the strongest model, we retrained it on the full SQuAD training dataset.

All experiments employed early stopping with a patience of 3 epochs to prevent overfitting and reduce training time. The models were trained using a learning rate of 1e-5, a batch size of 64, a maximum sequence length of 384 tokens, and a document stride of 128.

The results follow the trends observed in previous literature:
ALBERT consistently achieves the highest validation EM and F1 scores among the tested models.
Based on this outcome, we select ALBERT as the architecture for full fine-tuning on the complete dataset.

🔭 Next Steps

In future work, we aim to further enhance the performance of our QA system by implementing techniques from the research paper “Retrospective Reader for Machine Reading Comprehension” (arXiv:2001.09694). This method introduces a retrospective reader architecture that revisits previously attended context during prediction, improving comprehension of long passages and producing more accurate answer spans. By incorporating this approach, we hope to achieve state-of-the-art performance on SQuAD v2.0.

Additionally, we plan to explore model ensembling to boost overall accuracy.

📄 References

Know What You Don't Know: Unanswerable Questions for SQuAD
Rajpurkar et al., 2018 – Introduces unanswerable questions in SQuAD 2.0 to improve model robustness.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin et al., 2019 – Introduces BERT, a deeply bidirectional transformer for language understanding tasks.
Question Answering on SQuAD 2.0: BERT Is All You Need
Schwager et al., 2019 – Explores using BERT for SQuAD 2.0 and shows strong QA performance.
Really Paying Attention: A BERT + BiDAF Ensemble Model for Question Answering
Yin et al., 2019 – Combines BERT with BiDAF in an ensemble to enhance QA accuracy on SQuAD. Ensemble Model for Question‑Answering`

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app		app
images		images
scripts		scripts
.gitignore		.gitignore
README.md		README.md
download.sh		download.sh
requirements.txt		requirements.txt
run.sh		run.sh
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📌 SQuAD

🚀 Features

📚 Table of Contents

📝 About

🎥 Demo

🛠 Installation

1. Clone the Repository

2. Download SQuAD Dataset

Make sure the script is executable

Run the script to download the dataset

📊 SQuAD v2.0 Statistics

🛠 Preprocessing & Data Handling

Tokenization & Overflow

Labeling Strategy

Model

Experiments

🔭 Next Steps

📄 References

About

Uh oh!

Releases

Packages

Languages

dionvou/squad

Folders and files

Latest commit

History

Repository files navigation

📌 SQuAD

🚀 Features

📚 Table of Contents

📝 About

🎥 Demo

🛠 Installation

1. Clone the Repository

2. Download SQuAD Dataset

Make sure the script is executable

Run the script to download the dataset

📊 SQuAD v2.0 Statistics

🛠 Preprocessing & Data Handling

Tokenization & Overflow

Labeling Strategy

Model

Experiments

🔭 Next Steps

📄 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages