Stance Detection for COVID-19-related Health Policies

[DASFAA2023] The source codes and datasets for paper: Adversarial Learning-based Stance Classifier for COVID-19-related Health Policies.

Installation

For embedding-based methods, we apply GloVe to initialize the words embedding layer, while for BERT-based methods, we directly use the AutoTokenizer for words tokenization. The dataset preprocess file is in the:

src/datasets_glove.py
src/datasets.py

For GloVe download:

cd data
wget wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip

For BERT-based models:

Install Huggingface Transformers:

conda install -c huggingface transformers
pip install emoji
pip install -r requirements.txt

Dataset

Topic	#Unlabeled	#Labeled (Favor/Against/None)
Stay at Home (SH)	778	420 (194/113/113)
Wear Masks (WM)	1030	756 (173/288/295)
Vaccination (VA)	1535	526 (106/194/226)

To obtain the raw tweet data, please contact: xiefeng@nudt.edu.cn

For more detailed descriptions about the dataset, please clike here!

Quick Start

Running

cross-target setting:

python run.py --topic vaccination,face_masks --model mymodel --batch 16 --epoch 100 --hidden 128 --p_lambda 0.1 --alpha 0.01 --backbone bert_base

zero-shot setting:

python run.py --topic zeroshot,face_masks --model mymodel --batch 16 --epoch 100 --hidden 256 --p_lambda 0.1 --alpha 0.01 --backbone bert_base --lr 0.00002

Parameters

Parameter	Description	Default	Values.
--model	the running models	mymodel	bilstm, bicond, textcnn, crossnet, tan, bert_base, mymodel
--topic	the running tasks	-	cross-target setting or zero-shot setting
--batch	batch size	16	-
--epoch	the number of epochs of traning process	100	-
--patience	we conduct early stop with fixed patience	5	-
--max_len	the maximum length of tokens	100	100, 150
--hidden	the hidden dimension of model if it need it	128	128, 256
--alpha	the trade-off parameter of objectives	0.01	-
--p_lambda	negative constant in Gradient Reversal Layer	0.1	-

Baselines

All implemented methods are stored in the folder of src/baselines.

BiLSTM (1997): Long short-term memory.
BiCond (2016): Stance Detection with Bidirectional Conditional Encoding.
TAN (2017): Stance Classification with Target-Specific Neural Attention Networks.
CrossNet (2018): Cross-Target Stance Classification with Self-Attention Networks.
SiamNet (2019): Can siamese networks help in stance detection?
Bert (2019): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
Bertweet (2020): BERTweet: A pre-trained language model for English Tweets.
Covid-Tweet-Bert (2020): COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter.
WS-Bert (2022): Infusing Knowledge from Wikipedia to Enhance Stance Detection.

Implementation Details

All programs are implemented using 3.6.13 and PyTorch 1.10.2 with 11.3 on a personal workstation with an NVIDIA GeForce RTX 3090 GPU. The reported results are the averaged score of 5 runs with different random initialization.

Training settings

In cross-target setting, the models are trained and validated on one topic and evaluated on another. There can be categorized into six source->destination tasks for cross-target evaluation: SH->WM, SH->VA, WM->SH, WMVA, VA->SH, and VA->WM. In zero-shot setting, the models are trained and validated on multiple topics and tested on one unseen topic. We use the unseen topic's name as the task's name, thus, the zero-shot evaluation can be set into: SH, WM, and VA. For all tasks, the batch size is set to 16, the dropout rate is set to 0.1, and the input texts are truncated or padded to a maximum of 100 tokens. We train all models using AdamW optimizer with weight decay 5e-5 for a maximum of 100 epochs with patience of 10 epochs, and the learning rate is chosen in {1e-5, 2e-5}.

Models configuration

For BiLSTM, BiCond, TAN, CrossNet, TextCNN, the word embeddings are initialized with the pre-trained word vectors from GloVe, and the hidden dimension is optimized in {128, 256}. For BERT, we fine-tune the pre-trained language model from the Hugging Face Transformer Library to predict the stance by appending a linear classification layer to the hidden representation of the [CLS] token. In terms of WS-BERT-S and WS-BERT-D, considering the computational resource and fair comparison, the maximum length of Wikipedia summaries is set to 100 tokens and we use the pre-trained uncased BERT-base as encoder, in which each word is mapped to a 768-dimensional embedding. To speed up the training process, we only finetune the top layers of the Wikipedia encoder in WS-BERT-D, which is consistent with paper. In our model, we also adopt the pre-trained uncased BERT-base as encoder. The maximum length of policy description is fixed at 50, the layer number l of GCN is set to 2, the trade-off parameter alpha is set to 0.01, the GRL's parameter lambda is set to 0.1, and the hidden dimension of GeoEncoder is optimized in {128, 256}.

Code of Conduct and Ethical Statement

The tweet set for each policy contained a good mixture of pro, con, and neutral categories, as well as tweets with implicit and explicit opinions about the target. We removed the hashtags that appeared at the end of a tweet to exclude obvious cues, without making the tweet syntactically ambiguous. Each tweet was annotated by three annotators to avoid subjective errors of judgment. At present, we only collect social content on Twitter, without considering other social platforms, such as Weibo.

Our dataset does not provide any personally identifiable information as only the tweet IDs and human-annotated stance labels will be shared. Thus, the dataset complies with Twitter’s information privacy policy.

Acknowledgement

We refer to the codes of these repos: WS-BERT, GVB, DANN. Thanks for their great contributions!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
run_ablation.py		run_ablation.py
run_params.py		run_params.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

run.py

run.py

run_ablation.py

run_ablation.py

run_params.py

run_params.py

Repository files navigation

Stance Detection for COVID-19-related Health Policies

Installation

Dataset

Quick Start

Running

Parameters

Baselines

Implementation Details

Training settings

Models configuration

Code of Conduct and Ethical Statement

Acknowledgement

About

Releases

Packages

Languages

Xiefeng69/stance-detection-for-covid19-related-health-policies

Folders and files

Latest commit

History

Repository files navigation

Stance Detection for COVID-19-related Health Policies

Installation

Dataset

Quick Start

Running

Parameters

Baselines

Implementation Details

Training settings

Models configuration

Code of Conduct and Ethical Statement

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Languages