Skip to content

Text classication experiments and UMAP visualizations of A Psychological Taxonomy of Anti-Vaccination Arguments: Systematic Literature Review and Text Modeling.

License

Notifications You must be signed in to change notification settings

UKPLab/attitude-root-taxonomy

Repository files navigation

A Psychological Taxonomy of Anti-Vaccination Arguments: Systematic Literature Review and Text Modeling -- Text Classification

Source code for the text classification experiments from A Psychological Taxonomy of Anti-Vaccination Arguments: Systematic Literature Review and Text Modeling (https://osf.io/e4yp6/)

Contact person: Luke Bates, bates@ukp.informatik.tu-darmstadt.de

https://www.ukp.tu-darmstadt.de/

https://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Project structure

  • main.py -- code file that uses the other code files
  • umap_plots/data/ -- Study 1 and Study 2 data files.
  • umap_plots -- source code and data used for the umap plots from the paper. Please see the README.md file there.

Requirements

Our results were computed in Python 3.6.8 with a 40 GB NVIDIA A100 Tensor Core GPU. Note that files will be written to disk if the code is run.

Installation

To setup, please follow the instructions below.

python -m venv mvenv
source mvenv/bin/activate
pip install -r requirements.txt
pip install torch==1.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

Then, you can run the code with python main.py. You can specific which configurations by passing arguments to python.

  • There are the following modes:
    • st_baseline (Sentence Transformer + logistic regression)
    • setfit_sft (SetFit standard fine-tuning)
    • setfit_zero_shot (SetFit zero-shot)
    • setfit_tsft (SetFit two-step fine-tuning)
    • transformer_sft (Transformer standard fine-tuning)
    • transformer_zero_shot (Transformer zero-shot)
    • transformer_tsft (Transformer two-step fine-tuning).
  • You can pass any Sentence Transformer. We used paraphrase-mpnet-base-v2.
  • You can pass any Transformer. We used roberta-base.
  • You can choose 11 or 7 roots. The code will not work if you pass a different number.
  • With the exception of the modes setfit_zero_shot and tranformer_zero_shot, you need to specify a fold between 0-4 for five-fold cross validation. You must run the code with all five folds, only changing this argument (0-4), to reproduce our results.
  • You can choose the number of epochs. This controls the number of epochs the models will train for in standard fine-tuning and the second step of two-step fine-tuning.
  • You can choose the number of "pretraining" epochs. This controls the number of epochs the models will train for in zero-shot and first step of two-step fine-tuning.

For example, if you wish to reproduce our SetFit standard fine-tuning for 5 epoch results on the first fold with 11 roots:

python main.py --FOLD=0\
               --ST_MODEL='paraphrase-mpnet-base-v2'\
               --NUM_ROOTS=11\
               --MODE='setfit_sft'\
               --EPOCHS=5\

If you wish to reproduce our SetFit two-step fine-tuning for 15 epochs results on the fifth fold with 7 roots:

python main.py --FOLD=4\
               --ST_MODEL='paraphrase-mpnet-base-v2'\
               --NUM_ROOTS=7\
               --MODE='setfit_tsft'\
               --EPOCHS=5\
               --PRETRAIN_EPOCHS=10\

If you wish to reproduce our SetFit zero-shot results with 11 roots:

python main.py --ST_MODEL='paraphrase-mpnet-base-v2'\
               --NUM_ROOTS=11\
               --MODE='setfit_zero_shot'\
               --PRETRAIN_EPOCHS=10\

If you wish to reproduce our roberta-base standard fine-tuning for 15 epochs results with 11 roots on the third fold:

python main.py --FOLD=2\
               --TRANSFORMER_CLF='roberta-base'\
               --NUM_ROOTS=11\
               --MODE='transformer_sft'\
               --EPOCHS=15\

Expected results

Once finished, results will be written in the "split_output" folder as json files. The "mac" field is the macro F1 metric for a given fold, while the "ap" field is the sample-average average precision from Supplemental Material for a given fold. You can see a summary by using the get_results.ipynb code file in via jupyter notebook. Note that, with the exception of the zero-shot results, the code there will not work if you do not complete all five (0-4) folds. This is because, for these modes, we report the average over the five folds for both the macro F1 and sample-average average precision.

About

Text classication experiments and UMAP visualizations of A Psychological Taxonomy of Anti-Vaccination Arguments: Systematic Literature Review and Text Modeling.

Resources

License

Stars

Watchers

Forks

Releases

No releases published