This repo has the code and data for the 2019 EMNLP paper "What Gets Echoed? Understanding the 'Pointers' in Explanations of Persuasive Arguments."
To run:
- Get the data files from the latest release and place them in the
data
directory. - Activate the
cmv-gen
conda environment. (A copy of the environment is inrecords/cmv-gen.yml
). - Run
python extra_scripts/create_vectors.py
to calculate and store the feature vectors. - Find the instructions below for the figure you want to generate.
Feel free to email me at davatk@gmail.com
if there's something you can't get to work.
Everything below is done in the cmv-gen
conda environment in the project root directory.
The random, logistic regression, and xgb results can be gotten with python word_task/non_neural/predict.py --random
(or --lr
, or --xgb
). Models are loaded from records/word_random/*.joblib
(or word_lr
, or word_xgb
), or, if those files aren't there, they'll be created and placed there. They store their results in the same directories, in the log.log
file, which is ultimately where Tables 4 and 5 come from. New results are appended, so make sure you're looking at the right section of the file.
To get results for just the IN_OP_AND_PC
feature, run python word_task/non_neural/predict_single_feature.py
. This one just sends its results to stdout.
To train the vanilla LSTM model, run allennlp train experiments/word_glove.jsonnet -s records/word_glove --include-package word_task
(make sure records/word_glove
is empty). You can change hyperparameters in experiments/word_glove.jsonnet
. Records will be stored in records/word_task/stdout.log
.
To evaluate (on the test set, in this case), run python word_task/embeddings/evaluate.py records/word_glove data/cmv_triples_test_token.jsonlist.gz --gpu 2
. Records are stored in records/word_task/evaluate.log
.
To train and evaluate the LSTM+features model, just do the above, but replace word_glove
with word_glove_features
.
The actual plots are done in notebooks:
- Figure 2a (overall performance):
Paper - Overall Performance.ipynb
- Figure 2b (group 3 feature importance):
Paper - XBG Feature Importance.ipynb
- Figure 2c (performance by word source):
Paper - OP, PC performance.ipynb
Coming later!
These are in notebooks.
- Figure 1a (length correlations):
Paper - Length correlations (heatmap).ipynb
- Figure 1b (echo sources):
Paper - Source of Explanation Token.ipynb
- Figure 1c (echo prob. vs. freq):
Paper - Echoing Probability vs. Document Frequency.ipynb
- Table 3:
Paper - Feature Significance.ipynb