This is the repository for the ACL 2024 paper Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques
In the paper, we explore different strategies for dealing with data scarcity in Argument Mining tasks, namely fine-tuning multilingual BERT and adapting EntLM (a template-free few-shot approach for sequence labeling task). In our experiments, we generate the few-shot medical data from the AbstRCT corpus in 4 languages (English, Spanish, Italian and French).
- Run
pip install requirements
to install the required packages
All the data used for the experiments can be found in dataset
folder.
- Run
sh scripts/count_freq.sh
to generate the label words for EntLM - Run
sh scripts/run_fewshot.sh
to launch few-shot learning using EntLM
- Run
sh fine-tuning/finetune_fewshot.sh
to fine-tune the model with a small amount of data
- Alternatively, you can run
sh fine-tuning/finetune_full.sh
in order to fine-tune the model using full data.
@article{yeginbergen2024argument,
title={Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques},
author={Yeginbergen, Anar and Oronoz, Maite and Agerri, Rodrigo},
journal={arXiv preprint arXiv:2407.03748},
year={2024}
}