- A Tensorflow implementation of R-NET: MACHINE READING COMPREHENSION WITH SELF-MATCHING NETWORKS. This project is specially designed for the SQuAD dataset.
- Should you have any question, please contact Wenxuan Zhou (firstname.lastname@example.org).
There have been a lot of known problems caused by using different software versions. Please check your versions before opening issues or emailing me.
- Python >= 3.4
- unzip, wget
- tensorflow-gpu >= 1.5.0
- spaCy >= 2.0.0
To download and preprocess the data, run
# download SQuAD and Glove sh download.sh # preprocess the data python config.py --mode prepro
Hyper parameters are stored in config.py. To debug/train/test the model, run
python config.py --mode debug/train/test
To get the official score, run
python evaluate-v1.1.py ~/data/squad/dev-v1.1.json log/answer/answer.json
The default directory for tensorboard log file is
See release for trained model.
- The original paper uses additive attention, which consumes lots of memory. This project adopts scaled multiplicative attention presented in Attention Is All You Need.
- This project adopts variational dropout presented in A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.
- To solve the degradation problem in stacked RNN, outputs of each layer are concatenated to produce the final output.
- When the loss on dev set increases in a certain period, the learning rate is halved.
- During prediction, the project adopts search method presented in Machine Comprehension Using Match-LSTM and Answer Pointer.
- To address efficiency issue, this implementation uses bucketing method (contributed by xiongyifan) and CudnnGRU. The bucketing method can speedup training, but will lower the F1 score by 0.3%.
Training Time (s/it)
|Native||Native + Bucket||Cudnn||Cudnn + Bucket|
These settings may increase the score but not used in the model by default. You can turn these settings on in