Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Deep k-Nearest Neighbors and Interpretable NLP

This is the official code for the 2018 EMNLP Interpretability Workshop paper, Interpreting Neural Networks with Nearest Neighbors.

This repository contains the code for:

  • Deep k-Nearest Neighbors for text classification models. Allows pretrained word vectors, character level models, etc. on a number of datasets
  • Saliency map techniques for NLP, such as leave one out and gradient. Also includes our conformity leave one out method.
  • Create visualizations like the ones on our paper's supplementary website.
  • Temperature scaling as described in On Calibration of Modern Neural Networks
  • SNLI interpretations


This code is written in python using the highly underrated Chainer framework. If you know PyTorch, you will love it =).

Dependencies include:

If you want to do efficient nearest neighbor lookup:

  • Scikit-Learn (for KDTree)
  • nearpy (for locally sensitive hashing)

If you want to visualize saliency maps:

  • matplotlib

This code is built off Chainers text classification example. See their documentation and code to understand the basic layout of our project.


To train a model:

python --dataset stsa.binary --model cnn

The output directory result contains:

  • best_model.npz: a model snapshot, which won the best accuracy for validation data during training
  • vocab.json: model's vocabulary dictionary as a json file
  • args.json: model's setup as a json file, which also contains paths of the model and vocabulary
  • calib.json: The indices of the held out training data that will be used to calibrate the DkNN model

To run a model with and without DkNN:

python --model-setup results/DATASET_MODEL/args.json
  • Where results/DATASET_MODEL/args.json is the argument log that is generated after training a model
  • This command will store the activations for all of the training data into a KDTree, calibrate the credibility values, and run the model with and without DkNN.

Word Vectors

In our paper, we used GloVe word vectors, though any pretrained vectors should work fine (word2vec, fastText, etc.). To obtain GloVe vectors, run the following commands.


Then pass the pretrained vectors in using the argument --word_vectors glove.840B.300d.txt when training a model using

Temperature Scaling contains the temperature scaling implementation.

Interpretations and Visualizations

All of the code for generating interpretations using leave one out (conformity, confidence, or calibrated confidence) and first-order gradient is contained in See the code for details on running with the desired settings. You should first train a model (see above), and then pass that in.

The code for visualization is also present in


Please consider citing 1 if you found this code or our work beneficial to your research.

Interpreting Neural Networks with Nearest Neighbors

[1] Eric Wallace, Shi Feng, and Jordan Boyd-Graber, Interpreting Neural Networks with Nearest Neighbors.

  title={Interpreting Neural Networks with Nearest Neighbors},
  author={Eric Wallace and Shi Feng and Jordan Boyd-Graber},
  journal={arXiv preprint arXiv:1809.02847},  


For issues with code or suggested improvements, feel free to open a pull request.

To contact the authors, reach out to Eric Wallace ( and Shi Feng (


Code for the 2018 EMNLP Interpretability Workshop Paper "Interpreting Neural Networks with Nearest Neighbors"



No releases published


No packages published


You can’t perform that action at this time.