Skip to content

This project uses BERT to build a QA system fine-tuned on the SQuAD dataset, improving the accuracy and efficiency of question-answering tasks. We address challenges in contextual understanding and ambiguity handling to enhance user experience and system performance.

Notifications You must be signed in to change notification settings

SreecharanV/Building-a-Language-Understanding-and-Question-Answering-System-from-open-ended-Trivia-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building a Language Understanding and Question-Answering System with BERT

This project leverages BERT for creating a robust question-answering system fine-tuned on the SQuAD dataset. The system aims to improve accuracy and efficiency in understanding and responding to queries.

Table of Contents

Introduction

Natural Language Processing (NLP) is ever-evolving, with question-answering (QA) systems being a crucial part of various services. This project focuses on enhancing QA systems using transformer-based models like BERT, fine-tuned with the SQuAD dataset.

Problem Statement

Despite advancements in NLP, current QA systems face challenges in contextual understanding, handling ambiguous queries, and domain adaptability. This project aims to address these issues using BERT.

Methodology

Workflow Diagram

Workflow Diagram

Architecture

Architecture Diagram

Dataset

The dataset used is the Stanford Question Answering Dataset (SQuAD), containing over 100,000 question-answer pairs derived from more than 500 Wikipedia articles.

Data Preparation

  • Tokenization
  • Cleaning
  • Normalization
  • Answer Mapping
  • Truncation & Padding
  • Attention Masking

Implementation

Algorithm/Pseudocode

Detailed pseudocode for loading the model, preprocessing data, fine-tuning, and inference.

Libraries Used

  • Preprocessing: string, re, defaultdict, collections
  • Visualizations: Seaborn, Matplotlib, WordCloud, IPython.display
  • Model and Evaluation: Transformers, Torch, BertTokenizer, BertForQuestionAnswering

Integration of NLP Techniques

Combining BERT's tokenization, attention mechanisms, and transfer learning to improve QA performance.

Steps to use

  1. Select a topic you need information about:

s1

  1. Input a question can be in any human written form with typos too:

s2

  1. Finally, retrieves the Answer and Context along with it:

s3

Results

Performance

  • Exact Match (EM): 23.1%
  • F1 Score: 34.07%

Interesting Results

Examples of the model's ability to handle rephrased questions effectively.

ex4

ex5

ex6

Project Management

Completed Work

  • BERT Model Integration
  • Data Pre-processing Pipeline
  • Model Training and Evaluation
  • User Interface Development

Issues Faced

  • Data Quality
  • Model Overfitting
  • Computational Resources
  • User Experience

References

  1. Adversarial QA
  2. Large QA Datasets
  3. Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics.
  4. Radford, A., & Narasimhan, K. (2018). Improving Language Understanding by Generative Pre-Training.
  5. Peters, M.E., Ruder, S., & Smith, N.A. (2019). To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. ArXiv, abs/1903.05987.
  6. Wynter, A.D., & Perry, D.J. (2020). Optimal Subarchitecture Extraction For BERT. ArXiv, abs/2010.10499.
  7. Wang, W., Bi, B., Yan, M., Wu, C., Bao, Z., Peng, L., & Si, L. (2019). StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding. ArXiv, abs/1908.04577.
  8. Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text. Conference on Empirical Methods in Natural Language Processing.
  9. Liu, X., Cheng, H., He, P., Chen, W., Wang, Y., Poon, H., & Gao, J. (2020). Adversarial Training for Large Neural Language Models. ArXiv, abs/2004.08994.

About

This project uses BERT to build a QA system fine-tuned on the SQuAD dataset, improving the accuracy and efficiency of question-answering tasks. We address challenges in contextual understanding and ambiguity handling to enhance user experience and system performance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages