Nguyen, NAACL 2018
The paper uses two datasets:
-
Movie reviews: Zaidan et al. (NAACL 2007), using the data with rationales (but rationales were not used in this paper).
-
Twenty news groups: The 20news-bydate version is used with the following two categories: alt.atheism and soc.religion.christian
The processed data, including vocabulary files and trained machine learning models, can be found in the output_data folder.
These files can be download here (75.1MB). The zip file contains two directories: experiments & output_data
The explanations can be generated as follows for two classifiers (LR and MLP):
experiments.explanations_rationales() generates:
- ../experiments/rationales/rationales_MLP_explanations.csv
- ../experiments/rationales/rationales_lr_explanations.csv
experiments.explanations_news() generates:
- ../experiments/news/news_MLP_explanations.csv
- ../experiments/news/news_lr_explanations.csv
- ../experiments/rationales/rationales_cf_responses.csv (movie reviews)
- ../experiments/news/news_cf_responses.csv (20news)
- ../experiments/rationales/rationales_cf_responses_with_noise.csv (movie reviews, with noise)
- analysis.rmd
- src/analysis.rmd R analysis code
- src/classifierwrapper.py wrapper around keras and scikit-learn models
- src/dataset.py reading, processing and saving the datasets
- src/experiments.py main file to generate the explanations
- src/explanation_evaluation.py computes the automatic evaluation metrics
- src/explanation_test.py some tests
- src/explanation_util.py some utility methods
- src/keras_networks.py code to train the models using keras
- src/process_crowdflower_annotations.py computes correlations between automatic and human evaluations and prints out the results
- src/rationales_dataset.py to process the movie data
- src/twentynewsgroups_dataset.py to process the 20news data
- keras
- tensorflow
- numpy
- scikit learn
- LIME http://github.com/marcotcr/lime