Tensorflow Implementation of R-Net
Switch branches/tags
Nothing to show
Clone or download
Latest commit 1efc5ef Aug 9, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
img v2 Dec 24, 2017
LICENSE Update README Nov 28, 2017
README.md Update README Apr 15, 2018
config.py minor fix Mar 16, 2018
download.sh minor fix Jan 10, 2018
evaluate-v1.1.py v2 Dec 24, 2017
func.py minor fix Apr 15, 2018
inference.py Still take the upper triangle of the probability matrix Jun 20, 2018
main.py v2 Dec 24, 2017
model.py v2.1 Mar 15, 2018
prepro.py v2.1 Mar 15, 2018
util.py minor fix Mar 16, 2018




There have been a lot of known problems caused by using different software versions. Please check your versions before opening issues or emailing me.


  • Python >= 3.4
  • unzip, wget

Python Packages

  • tensorflow-gpu >= 1.5.0
  • spaCy >= 2.0.0
  • tqdm
  • ujson


To download and preprocess the data, run

# download SQuAD and Glove
sh download.sh
# preprocess the data
python config.py --mode prepro

Hyper parameters are stored in config.py. To debug/train/test the model, run

python config.py --mode debug/train/test

To get the official score, run

python evaluate-v1.1.py ~/data/squad/dev-v1.1.json log/answer/answer.json

The default directory for tensorboard log file is log/event

See release for trained model.

Detailed Implementaion

  • The original paper uses additive attention, which consumes lots of memory. This project adopts scaled multiplicative attention presented in Attention Is All You Need.
  • This project adopts variational dropout presented in A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.
  • To solve the degradation problem in stacked RNN, outputs of each layer are concatenated to produce the final output.
  • When the loss on dev set increases in a certain period, the learning rate is halved.
  • During prediction, the project adopts search method presented in Machine Comprehension Using Match-LSTM and Answer Pointer.
  • To address efficiency issue, this implementation uses bucketing method (contributed by xiongyifan) and CudnnGRU. The bucketing method can speedup training, but will lower the F1 score by 0.3%.



original paper 71.1 79.5
this project 71.07 79.51

Training Time (s/it)

Native Native + Bucket Cudnn Cudnn + Bucket
E5-2640 6.21 3.56 - -
TITAN X 2.56 1.31 0.41 0.28


These settings may increase the score but not used in the model by default. You can turn these settings on in config.py.