Implementation of Attention-over-Attention Neural Networks for Reading Comprehension (https://arxiv.org/abs/1607.04423) in TensorFlow
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
models/my_model initial impl Oct 31, 2016
README.md Update README.md Dec 2, 2016
model.py initial impl Oct 31, 2016
reader.py tfrecords preprocess script Dec 2, 2016
test.tfrecords initial impl Oct 31, 2016
training.tfrecords initial impl Oct 31, 2016
util.py initial impl Oct 31, 2016
validation.tfrecords initial impl Oct 31, 2016

README.md

Attention over Attention

Implementation of the paper Attention-over-Attention Neural Networks for Reading Comprehension in tensorflow

Some context on my blog

Reading comprehension for cloze style tasks is to remove word from an article summary, then read the article and try to infer the missing word. This example works on the CNN news dataset.

With the same hyperparameters as reported in the paper, this implementation got an accuracy of 74.3% on both the validation and test set, compared with 73.1% and 74.4% reported by the author.

To train a new model: python model.py --training=True --name=my_model

To test accuracy: python model.py --training=False --name=my_model --epochs=1 --dropout_keep_prob=1

Note that the tfrecords and model files are stored with git lfs

Raw data for use with reader.py to produce .tfrecords files was downloaded from [http://cs.nyu.edu/~kcho/DMQA/]

Interesting parts

  • Masked softmax implementation
  • Example of batched sparse tensors with correct mask handling
  • Example of pointer style attention
  • Test/validation split part of the tf-graph