Training the Neural Network Parser

An evaluation of the REACH information extraction task indicated that much of the runtime is exacerbated by the CoreNLPProcessor parsing text. In an effort to improve this runtime, we considered alternatives, namely FastNLPProcessor and the Neural Network DependencyParser. As a sanity check and to provide concrete results, the Penn Treebank Wall Street Journal (WSJ) and GENIA corpora were used to test each parser. Both corpora were split into test, dev, and test partitions as follows:

The GENIA division by David McClosky, which includes a future_use partition that was not used, was used.
Our distribution of the WSJ corpora was split into sets labeled 00 to 24. The standard partitioning of 02-21 for train, {01,22,24} for dev, 23 for test, and discarding 00 was used.

The Dependency Parser required the corpora to use Basic Dependencies, and details on converting Penn Treebank to Basic Dependencies can be found on this wiki page.

Processors represents text as Document objects which in turn contain an array of Sentence objects. Reading in corpora in Basic Dependency format was a task in itself, though heavily adapted from the DocumentSerializer class and manifested in the ConllxReader class. Additional utilities were included in CoreNLPUtils (the addition of sentenceToCoreMap, sentenceToAnnotation, and docToAnnotation), EvaluateUtils (for calculating precision, recall, etc.), and ParserUtils (for performing the training and model saving itself). reference.conf includes the paths to all relevant files, most notably the train, dev, test, and model files for each model. Note that this may still just be in the nn-parser-training branch.

The observed models included CoreNLPProcessor, FastNLPProcessor, and the DependencyParser with various configurations. The neural network configurations required word embeddings, and the Word2Vec embeddings for Gigaword and Pubmed Open Access were chosen given the nature of their content being relatively similar to WSJ and GENIA respectively. Two models were trained as a sanity check for this intuition: (1) training/testing on WSJ with Gigaword embeddings and (2) training/testing on GENIA with Pubmed Open Access embeddings. The intended "best" model which uses both corpora and both sets of embeddings was trained with 5 different multiples of the GENIA corpus. Given how much smaller the GENIA corpus is relative to WSJ, the GENIA corpus was concatenated onto the WSJ corpus 1, 2, 3, 4, and 5 times to create 5 different models which use the combined WSJ+GENIA*k (k being the number of iterations of the GENIA corpus) corpus with Gigaword+Pubmed embeddings. These embeddings were generated with Word2Vec with dimension 200 and all other settings set to their default values. (All relevant Gigaword-Pubmed files can be found here: /net/kate/storage/data/nlp/corpora/word2vec/gigaword-pubmed/. See reference.conf for more information.)

The results of training on these models are as follows:

Vanilla CoreNLPProcessor, tested on:

WSJ train file

tp: 716313, fp: 83544, tn: 0, fn: 193883
Precision: 0.8955513298002018
Recall: 0.786987637827457
F1: 0.8377670165778488

GENIA train file

tp: 263133, fp: 58407, tn: 0, fn: 96928
Precision: 0.8183523045344281
Recall: 0.7308011698017836
F1: 0.7721027404595945

WSJ test file

tp: 40867, fp: 6421, tn: 0, fn: 13401
Precision: 0.8642150228387752
Recall: 0.7530588929018943
F1: 0.8048170467525306

GENIA test file

tp: 24495, fp: 6047, tn: 0, fn: 9784
Precision: 0.8020103464082248
Recall: 0.7145774380816243
F1: 0.7557735918915167

Vanilla FastNLPProcessor, tested on:

WSJ train file

tp: 834228, fp: 75968, tn: 0, fn: 75968
Precision: 0.9165366580384884
Recall: 0.9165366580384884
F1: 0.9165366580384884

GENIA train file

tp: 322992, fp: 37069, tn: 0, fn: 37069
Precision: 0.8970480001999661
Recall: 0.8970480001999661
F1: 0.8970480001999661

WSJ test file

tp: 47476, fp: 6792, tn: 0, fn: 6792
Precision: 0.8748433699417705
Recall: 0.8748433699417705
F1: 0.8748433699417705

GENIA test file

tp: 29522, fp: 4757, tn: 0, fn: 4757
Precision: 0.8612269902855976
Recall: 0.8612269902855976
F1: 0.8612269902855976

Train on WSJ with gigaword embeddings, test on:

WSJ test file

tp: 6585, fp: 47683, tn: 0, fn: 47683
Precision: 0.12134222746369869
Recall: 0.12134222746369869
F1: 0.12134222746369869

GENIA test file

tp: 4040, fp: 30239, tn: 0, fn: 30239
Precision: 0.11785641354765308
Recall: 0.11785641354765308
F1: 0.11785641354765308

Train on GENIA with PMC embeddings, test on:

WSJ test file

tp: 10042, fp: 44226, tn: 0, fn: 44226
Precision: 0.18504459349893124
Recall: 0.18504459349893124
F1: 0.18504459349893124

GENIA test file

tp: 6323, fp: 27956, tn: 0, fn: 27956
Precision: 0.18445695615391347
Recall: 0.18445695615391347
F1: 0.18445695615391347

Train on WSJ+GENIA*1 with gigaword+PMC embeddings

Test on WSJ

tp: 8410, fp: 45858, tn: 0, fn: 45858
Precision: 0.15497162231886194
Recall: 0.15497162231886194
F1: 0.15497162231886194

Test on GENIA

tp: 5346, fp: 28933, tn: 0, fn: 28933
Precision: 0.1559555412935033
Recall: 0.1559555412935033
F1: 0.1559555412935033

Train on WSJ+GENIA*2 with gigaword+PMC embeddings

Test on WSJ

tp: 9496, fp: 44772, tn: 0, fn: 44772
Precision: 0.17498341564089334
Recall: 0.17498341564089334
F1: 0.17498341564089334

Test on GENIA

tp: 5978, fp: 28301, tn: 0, fn: 28301
Precision: 0.17439248519501735
Recall: 0.17439248519501735
F1: 0.17439248519501735

Train on WSJ+GENIA*3 with gigaword+PMC embeddings

Test on WSJ

tp: 9004, fp: 45264, tn: 0, fn: 45264
Precision: 0.1659172993292548
Recall: 0.1659172993292548
F1: 0.1659172993292548

Test on GENIA

tp: 5790, fp: 28489, tn: 0, fn: 28489
Precision: 0.16890807783190875
Recall: 0.16890807783190875
F1: 0.16890807783190875

Train on WSJ+GENIA*4 with gigaword+PMC embeddings

Test on WSJ

tp: 9448, fp: 44820, tn: 0, fn: 44820
Precision: 0.17409891648853837
Recall: 0.17409891648853837
F1: 0.17409891648853837

Test on GENIA

tp: 5794, fp: 28485, tn: 0, fn: 28485
Precision: 0.16902476735027278
Recall: 0.16902476735027278
F1: 0.16902476735027278

Train on WSJ+GENIA*5 with gigaword+PMC embeddings

Test on WSJ

tp: 9912, fp: 44356, tn: 0, fn: 44356
Precision: 0.18264907496130317
Recall: 0.18264907496130317
F1: 0.18264907496130317

Test on GENIA

tp: 5992, fp: 28287, tn: 0, fn: 28287
Precision: 0.1748008985092914
Recall: 0.1748008985092914
F1: 0.1748008985092914

FastNLPProcessor has already been noted as having even better results that CoreNLPProcessor and has a faster runtime, so this was an improvement that required little changed to the existing code in REACH. Strangely, the neural network models have very poor performance. Whether this is an implementation error or a true reflection of their performance is being looked into.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training the Neural Network Parser

Training the Neural Network Parser

Table of Contents

Clone this wiki locally