This repository contains the code used for multi-task sequence labeling experiments in SC-LSTM: Learning Task-Specific Representation in Multi-task Learning for Sequence Labeling paper, partly forked from the OpenNMT-py. If you use this code or our results in your research, we'd appreciate if you cite our apper as following:
@inproceedings{lu2019sc,
title={SC-LSTM: Learning Task-Specific Representations in Multi-Task Learning for Sequence Labeling},
author={Lu, Peng and Bai, Ting and Langlais, Philippe},
booktitle={Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
pages={2396--2406},
year={2019}
}
Python 3.5+, PyTorch 0.4 and Allennlp (For ELMo embeddings) are required for the current repo.
-
Preprossing data: NER {CoNLL2003}, Chunking {CoNLL2000} and POS {UD English POS} The files should formatted as same as the CoNLL 2000 dataset. For example,
Rockwell NNP B-NP International NNP I-NP Corp. NNP I-NP 's POS B-NP Tulsa NNP I-NP unit NN I-NP said VBD B-VP it PRP B-NP signed VBD B-VP a DT B-NP tentative JJ I-NP agreement NN I-NP extending VBG B-VP its PRP$ B-NP contract NN I-NP with IN B-PP Boeing NNP B-NP Co. NNP I-NP
Then put CoNLL2003/CoNLL2000/Unidep datasets in
./data/
, put embedding files likeglove.6B.100d.txt
in./data/emb
You can change training groups by delete specific tasks in the dict:datasets_config of fileconfig_uni.py
-
Train the model: command:
python main_uni.py
The default setting is inconfig_uni.py
file.