Code for the 2018 EMNLP Interpretability Workshop Paper "Interpreting Neural Networks with Nearest Neighbors"
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Deep k-Nearest Neighbors and Interpretable NLP

This is the official code for the 2018 EMNLP Interpretability Workshop paper, Interpreting Neural Networks with Nearest Neighbors.

This repository contains the code for:

  • Deep k-Nearest Neighbors for text classification models. Allows pretrained word vectors, character level models, etc. on a number of datasets
  • Saliency map techniques for NLP, such as leave one out and gradient. Also includes our conformity leave one out method.
  • Create visualizations like the ones on our paper's supplementary website.
  • Temperature scaling as described in On Calibration of Modern Neural Networks
  • SNLI interpretations


This code is written in python using the highly underrated Chainer framework. If you know PyTorch, you will love it =).

Dependencies include:

If you want to do efficient nearest neighbor lookup:

  • Scikit-Learn (for KDTree)
  • nearpy (for locally sensitive hashing)

If you want to visualize saliency maps:

  • matplotlib

This code is built off Chainers text classification example. See their documentation and code to understand the basic layout of our project.


To train a model:

python --dataset stsa.binary --model cnn

The output directory result contains:

  • best_model.npz: a model snapshot, which won the best accuracy for validation data during training
  • vocab.json: model's vocabulary dictionary as a json file
  • args.json: model's setup as a json file, which also contains paths of the model and vocabulary
  • calib.json: The indices of the held out training data that will be used to calibrate the DkNN model

To run a model with and without DkNN:

python --model-setup results/DATASET_MODEL/args.json
  • Where results/DATASET_MODEL/args.json is the argument log that is generated after training a model
  • This command will store the activations for all of the training data into a KDTree, calibrate the credibility values, and run the model with and without DkNN.

Word Vectors

In our paper, we used GloVe word vectors, though any pretrained vectors should work fine (word2vec, fastText, etc.). To obtain GloVe vectors, run the following commands.


Then pass the pretrained vectors in using the argument --word_vectors glove.840B.300d.txt when training a model using

Temperature Scaling contains the temperature scaling implementation.

Interpretations and Visualizations

All of the code for generating interpretations using leave one out (conformity, confidence, or calibrated confidence) and first-order gradient is contained in See the code for details on running with the desired settings. You should first train a model (see above), and then pass that in.

The code for visualization is also present in


Please consider citing 1 if you found this code or our work beneficial to your research.

Interpreting Neural Networks with Nearest Neighbors

[1] Eric Wallace, Shi Feng, and Jordan Boyd-Graber, Interpreting Neural Networks with Nearest Neighbors.

  title={Interpreting Neural Networks with Nearest Neighbors},
  author={Eric Wallace and Shi Feng and Jordan Boyd-Graber},
  journal={arXiv preprint arXiv:1809.02847},  


For issues with code or suggested improvements, feel free to open a pull request.

To contact the authors, reach out to Eric Wallace ( and Shi Feng (