This repository includes the code for integrating contextual information for supervised text classification tasks using a dual-encoder approach and information exchange via cross-attention.
Further details can be found in our publication Robust Integration of Contextual Information for Cross-Target Stance Detection.
Abstract: Stance detection deals with identifying an author’s stance towards a target. Most existing stance detection models are limited because they do not consider relevant contextual information which allows for inferring the stance correctly.Complementary context can be found in knowledge bases but integrating the context into pretrained language models is non-trivial due to the graph structure of standard knowledge bases. To overcome this, we explore an approach to integrate contextual information as text which allows for integrating contextual information from heterogeneous sources, such as structured knowledge sources and by prompting large language models.Our approach can outperform competitive baselines on a large and diverse stance detection benchmark in a cross-target setup, i.e. for targets unseen during training. We demonstrate that it is more robust to noisy context and can regularize for unwanted correlations between labels and target-specific vocabulary. Finally, it is independent of the pretrained language model in use.
Contact person: Tilman Beck, tilman.beck@tu-darmstadt.de
https://www.ukp.tu-darmstadt.de/
Don't hesitate to e-mail us or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
(change this as needed!)
data/
-- container for the data which includes the scripts for processing the benchmark datasets and retrieving tweets for Twitter datasetssrc/analysis
-- Python scripts to analyze data, attention attribution, compute significance and some visualization utilssrc/model
-- contains the model filessrc/retrieve
-- the code for retrieving contextual information from external knowledge sources (e.g. ConceptNet, CauseNet, T0pp)src/train
-- utility files for training
- Python3.6 or higher
- PyTorch 1.10.2 or higher
We make use of the benchmark datasets provided by Schiller et al. 2021 and Hardalov et al. 2021. The datasets are linked in the respective repositories here and here
Once you have obtained all datasets, put them in a folder (e.g. benchmark_original
) and run
$ preprocess_benchmark.py --root_dir /path/to/benchmarḱ_original --output_dir /path/to/benchmark_processed
Now all datasets should be available in the same JSON format, that is
{"text":"This is sample text", "label":1, "target":"example target", "split":"train"}
Our dataset numbers differ slightly from the ones reported by Hardalov et al. (2021) due to the following reasons:
- rumor: not all tweets could be downloaded
- mtsd: we were provided the full dataset by the original authors
To produce the final files for running the benchmark experiments you can either use the pre-computed context or compute it by your self.
Note you can also add other tasks. Therefore, you need to create a new folder within the processed benchmark folder (e.g. benchmark_processed
) for each task and compute the context on your own. There you need to put the samples as .jsonl
in the same format as mentioned above.
The pre-computed context is available to download (TBD) and is the same as we used in our paper. It includes the following sources:
CauseNet
Heindorf et al. 2020ConceptNet
hereT0pp-NP
using Sanh et al. 2021T0pp-NP-Targ
using Sanh et al. 2021All
combination of the four sources aboveRandom*
context sampled from the Gutenberg Project here
After downloading the pre-computed context, you need to extract into a folder (e.g. benchmark_context
). The following will then compose the final files to run the experiments.
$ finalize_sd_benchmark.py --root_dir /path/to/benchmark_processed --context_dir /path/to/benchmark_context --output_dir /path/to/benchmark_final
TODO
- download conceptnet, extract to graphfile, find relevant nodes
- download causenet, encode causenet and task inputs, find the most closest
- find NPs, describe them using T0pp
- Clone the repository
$ git clone https://github.com/UKPLab/arxiv2022-context-injection-stance
$ cd arxiv2022-context-injection-stance
- Create the environment and install dependencies
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
To run the INJECT
model with k=2
context candidates retrieved from conceptnet
for dataset argmin
:
$ python run.py --task argmin --variation conceptnet --setting INJECT_JOINED_TOPIC --k 2 --output_dir /path/to/results/
(change this as needed!)
--setting
- The experiment setting which defines which context injection to use from
[BASELINE, BASELINE_TOPIC, BASELINE_JOINED_TOPIC, INJECT_TOPIC, INJECT_JOINED_TOPIC]
- The experiment setting which defines which context injection to use from
--k
- The number of context candidates to use (default 2)
--is_cross_topic
- Declare if the evaluation setting should be cross-target (True) or in-target (False)
--input_encoder_layers
- The layer of the input encoder at which cross-attention should be applied to (default 11)
--context_encoder_layers
- The layer of the context encoder at which cross-attention should be applied to (default 11)
--task
- The task which is used (choose from
[emergent,argmin,vast,poldeb,mtsd,rumor,wtwt,iac1,scd,semeval2019t7,semeval2016task6,perspectrum,ibmcs,arc,fnc1,snopes]
)
- The task which is used (choose from
--variation
- The context source to use (choose from
[conceptnet, causenet, t0pp_key_np, t0pp_key_np_target]
)
- The context source to use (choose from
--model_name
- The backbone transformer model which is used for the encoders (e.g.
bert-base-uncased
)
- The backbone transformer model which is used for the encoders (e.g.
--output_dir
- The directory to write the experiment outputs
--data_folder
- The directory where the data files are stored
--shared_model
- Both input encoder and context encoder use shared weights
--frozen_model
- Only train classification head, freeze basemodel parameters of both encoders
--truncation_length
- Cut input at
truncation_length
- Cut input at
--random_seed
--batch_size
Please use the following citation:
@inproceedings{beck-etal-2023-robust,
title = "Robust Integration of Contextual Information for Cross-Target Stance Detection",
author = "Beck, Tilman and
Waldis, Andreas and
Gurevych, Iryna",
booktitle = "Proceedings of the The 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.starsem-1.43",
pages = "494--511"
}