-
Two versions of datasets in Spanish (CIC-ES and CIC-Random-ES) and Catalan (CIC-CA and CIC-Ramndom-CA) that consists of annotated Twitter messages for automatic stance detection. The data was collected during 12 days in February and March of 2019 posted in Barcelona, and during September of 2018 posted in the town of Terrassa, Catalonia. The corpus is annotated with three classes: AGAINST, FAVOR and NONE, which express stance towards the target -- the independence of Catalonia. Each dataset is splitted into train, validation and test sets in relation 60/20/20.
-
LM Models trained on:
- IberEval 2018 dataset
- SemEval Task 6A dataset
- CIC Corpus
@inproceedings{zotova-etal-2020-multilingual, title = "Multilingual Stance Detection in Tweets: The {C}atalonia Independence Corpus", author = "Zotova, Elena and Agerri, Rodrigo and Nu{~n}ez, Manuel and Rigau, German", booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference", month = may, year = "2020", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://www.aclweb.org/anthology/2020.lrec-1.171", pages = "1368--1375", }