PD3: Better Low-Resource Cross-Lingual Transfer By Combining Direct Transfer and Annotation Projection

This repository contains selected code and data for our EMNLP Argumentation Mining Workshop paper on Better Low-Resource Cross-Lingual Transfer By Combining Direct Transfer and Annotation Projection.

Citation

@inproceedings{Eger:2018:Workshop,
	title = {PD3: Better Low-Resource Cross-Lingual Transfer By Combining Direct Transfer and Annotation Projection},
	author = {Eger, Steffen and Rücklé, Andreas and Gurevych, Iryna},
        booktitle = {Proceedings of the 5th Workshop on Argument Mining},
        year = {2018},
        pages = {131--143},
        month = {November},
        location = {Brussels, Belgium},
        publisher = {Association for Computational Linguistics}
}

Abstract: We consider unsupervised cross-lingual transfer on two tasks, viz., sentence-level argumentation mining and standard POS tagging. We combine direct transfer using bilingual embeddings with annotation projection, which projects labels across unlabeled parallel data. We do so by either merging respective source and target language datasets or alternatively by using multi-task learning. Our combination strategy considerably improves upon both direct transfer and projection with few available parallel sentences, the most realistic scenario for many low-resource target languages.

Contact persons: Steffen Eger, eger@ukp.informatik.tu-darmstadt.de Andreas Rückle, rueckle@ukp.informatik.tu-darmstadt.de

https://www.ukp.tu-darmstadt.de/

https://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

Data

POS We provide some POS tagged data in POS/home/eger/projects/MTCODE_newcurriculum/mt-code/experiments/12_tensorflow_port/. The data contains French and English data. The French data is automatically annotated using annotation projection (except for the test data).

ArgMin

Code

POS Two yaml files for a sequence tagger are given in POS. These contain the setups for the PD3-merge and PD3-mtl paradigms. To run the yaml files, install the Tensorflow Sequence Tagging architecture from here and supply the yaml files as inputs. NB The annotation projection step has to be done before-hand. Also, you need a bilingual word embedding file for English-French (some can be found here).

ArgMin The folder ArgMin contains the source code for our PD3 argumentation mining sentence-level experiments. A sample configuration is given in config.yaml. Our other configurations only differ in the size of the parallel corpus, the source of the parallel corpus (PE or TED), and the used word embeddings (e.g., the ones we trained ourselves on small parallel data for a more realistic low-resource scenario).

The framework is based on our general framework introduced in our work on cQA answer selection. Please see the linked repository for more documentation on the framework, required dependencies, and the configuration files.

You can run all ArgMin PD3 experiments in one single run:

python run_transfer.py config.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ArgMin		ArgMin
POS		POS
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ArgMin

ArgMin

POS

POS

README.md

README.md

Repository files navigation

PD3: Better Low-Resource Cross-Lingual Transfer By Combining Direct Transfer and Annotation Projection

Citation

Data

Code

About

Releases

Packages

Languages

UKPLab/emnlp2018-argmin-workshop-pd3

Folders and files

Latest commit

History

Repository files navigation

PD3: Better Low-Resource Cross-Lingual Transfer By Combining Direct Transfer and Annotation Projection

Citation

Data

Code

About

Resources

Stars

Watchers

Forks

Languages