Accompanying code for our paper at NAACL-HLT 2019, What do entity-centric models learn? Insights from entity linking in multi-party dialogue.
@inproceedings{aina-silberer-sorodoc-westera-boleda:2019:NAACL,
title = {What do entity-centric models learn? Insights from entity linking in multi-party dialogue},
author = {Aina, Laura and Silberer, Carina and Sorodoc, Ionut-Teodor and Westera, Matthijs and Boleda, Gemma},
booktitle = {Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)},
month = {June},
year = {2019},
address = {Minneapolis, Minnesota},
publisher = {Association for Computational Linguistics},
}
- Code for training, deploying and evaluating different types of entity-centric models and a baseline model on SemEval 2018 Task 4: Character Identification on Multiparty Dialogues.
- The trained models referenced in our paper.
- A dataset we built for probing the entity representations of trained models, which we include in the folder wikia_task_sentences. Note that in the experiment described in our paper we used the sentences of the pattern type 'I' exclusively.
- The script
_fetch_data.sh
for downloading the datasets for SemEval 2018 Task 4: Character Identification on Multiparty Dialogues. (Alternatively, you can download the data yourself from the organizers' github. Store them in the folder data/friends.)
The PyTorch version used here is somewhat old, namely 0.3.0.post4, which can only be installed from a manually downloaded installer (e.g., for python 3.6 and CUDA8.0: https://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl ). For more instructions see http://pytorch.org/.
The folder models contains 3 types of models, each in two versions: trained with cross-validation (_5_folds
, then evaluated as an ensemble) and without (_1_fold
, a single model). The three model types are, as referenced in our paper:
BILSTM
: our baseline model, a plain bidirectional LSTM.ENTLIB
: an implementation of the Entity Library from Aina et al. 2018.ENTNET
: an implementation of Recurrent Entity Networks from Henaff et al. 2016.
Run the main.py
script with the corresponding parameters (more details to the different phases are given below):
python main.py --phase <phase> [-c <config_file>] [--model <model_path>] [--deploy_data <path_to_data>] [--no_cuda] [--no_eval]
where
phase
can be train or deploy (optionally runs evaluation)config_file
specifies the hyperparameter settings. Is obligatory for training.model_path
specifies the path to the model. It is obligatory for the deploy phase.deploy_data
gives the path to the data for which the model has to output predictions (in CONLL format) (phase: deploy)no_eval
applies to the deploy phase. It can be set if you do not want to evaluate the model, but just want to obtain predictions for some input data. If the input data does not contain target entity ids,no_eval
is set by default.no_cuda
is set to run the system in CPU mode.
python main.py --deploy_data test --model models/ENTLIB_5_folds/ [--no_cuda]
This will produce the following output files, saved in the directory models/ENTLIB_5_folds/answers/friends_test_scene/ :
-
static_0--ensemble.csv
The answer file: It has three columns (called index, prediction, target),
where each row contains the index of the target mention in the test data, the predicted entity id, and the gold entity id to which the mention refers -
static_0--ensemble_scores.txt
The evaluation results. -
static_0--ensemble_matrix.csv
A confusion matrix. -
static_0.ini
The used config file.
The demo describes how to train, deploy and evaluate a model from scratch using the official trial data of the SemEval task.
python main.py --phase train -c config_demo.ini [-r] [--no_cuda]
where the optional parameter
r
is used to activate random sampling of hyperparameters from intervals specified in the config file. (see config_demo.ini for details)- See above for the description of the other parameters.
The system will produce a subfolder <year_month>
in the models
directory, in which it will store several files:
- the config file
- the model file (or files, if run with cross-validation, see parameter
folds
in the config), - a
logs
subfolder with the training log (it records the loss, accuracy etc. on the training and validation data for each epoch).
The files will contain a timestamp in their name in the format <yyyy_mm_dd_hh_mm_ss>
.
For example, running the command above in May 2019 will train a model with 2-fold cross-validation, and produce something like
.
|__ `models/2019_05/`
| | `fixed--2019_05_19_17_58_14.ini`
| | `fixed--2019_05_19_17_58_14--fold0.pt`
| | `fixed--2019_05_19_17_58_14--fold1.pt`
| |__`logs/`
| | `fixed--2019_05_19_17_58_14.log`
| | `fixed--2019_05_19_17_58_14.ini`
The prefix fixed means that the model was trained using fixed hyperparameters (since parameter r
was not set, see above).
Note that the model in this demo initialises the token embeddings randomly. If you want to use the pre-trained Google News skip-gram word embeddings (as we did for the paper), you first need to download the data from here: GoogleNews-vectors-negative300.bin.gz. Put this in the data/ folder. In config_demo.ini, set the parameter token emb to google_news.
The system was trained using 2-fold cross-validation. So for evaluation on the trial data (on which it was trained), it averages the scores of each fold's models obtained on the respective test split:
python main.py --phase deploy --deploy_data trial --model models/2019_05/fixed--2019_05_20_11_28_19 [--no_cuda]
This will produce a subfolder answers/friends_trial_scene/
in the model subfolder models/2019_05/
.
See the Section above for the description of the files stored therein.
python main.py --phase deploy --deploy_data test --model models/2019_05/fixed--2019_05_20_11_28_19 [--no_cuda]
See the Section above for details.
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 715154), and from the Spanish Ram'on y Cajal programme (grant RYC-2015-18907). We are grateful to the NVIDIA Corporation for the donation of GPUs used for this research. We are also very grateful to the Pytorch developers. This paper reflects the authors' view only, and the EU is not responsible for any use that may be made of the information it contains.