Data Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. You can download this dataset here
SQuAD 1.1: The previous version of the SQuAD dataset, contains 100,000+ question-answer pairs on 500+ articles.
Predicting the right answer for the given question and context.
Implemented standford attentive reader model using keras.Please refer this paper.
BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.
Please refer this research paper.
- Most of the code is taken from google-research github account
- The bert model is fine-tuned only.
- The code modified as per necesscity
- Used the bert base model with 110M parameters
- All the referance are mentioned in the referances section
For ipynb notebook , please check the bert folder
I have written a detailed post regarding this on medium. You can read it here
- Obtained micro f1_score of 40.33% on test data.
- Algined question embedding and f_exact match found to be the moset effective as mentioned in paper
- f1_score can be further improoved by adding Algined question embedding feature to context.
- Algined question embedding was omitted due to computational power limits
- To train on 1 epoch it took around hour without Algined question embedding
- Algined question embedding was omittited because, training on 1 epoch was taking more than 5 hours.
- Performance can be improoved further by considering:
- All data points
- Taking 128 units and 3 Layer of Bi_LSTM as mentioned in paper.
- Considering Algined question embedding + f_exact together.
- Fine tuned Bert Uncased state of the art model to get the results.
- Bert model results are obtained using TPU provided by google
- The Stanford Question Answering Dataset by Rajpurkar
- ReadingWikipedia to Answer Open-Domain Questions