Skip to content

Training the Neural Network Parser

Terron Ishihara edited this page Jan 22, 2017 · 2 revisions

Training the Neural Network Parser

An evaluation of the REACH information extraction task indicated that much of the runtime is exacerbated by the CoreNLPProcessor parsing text. In an effort to improve this runtime, we considered alternatives, namely FastNLPProcessor and the Neural Network DependencyParser. As a sanity check and to provide concrete results, the Penn Treebank Wall Street Journal (WSJ) and GENIA corpora were used to test each parser. Both corpora were split into test, dev, and test partitions as follows:

  • The GENIA division by David McClosky, which includes a future_use partition that was not used, was used.
  • Our distribution of the WSJ corpora was split into sets labeled 00 to 24. The standard partitioning of 02-21 for train, {01,22,24} for dev, 23 for test, and discarding 00 was used.

The Dependency Parser required the corpora to use Basic Dependencies, and details on converting Penn Treebank to Basic Dependencies can be found on this wiki page.

Processors represents text as Document objects which in turn contain an array of Sentence objects. Reading in corpora in Basic Dependency format was a task in itself, though heavily adapted from the DocumentSerializer class and manifested in the ConllxReader class. Additional utilities were included in CoreNLPUtils (the addition of sentenceToCoreMap, sentenceToAnnotation, and docToAnnotation), EvaluateUtils (for calculating precision, recall, etc.), and ParserUtils (for performing the training and model saving itself). reference.conf includes the paths to all relevant files, most notably the train, dev, test, and model files for each model. Note that this may still just be in the nn-parser-training branch.

The observed models included CoreNLPProcessor, FastNLPProcessor, and the DependencyParser with various configurations. The neural network configurations required word embeddings, and the Word2Vec embeddings for Gigaword and Pubmed Open Access were chosen given the nature of their content being relatively similar to WSJ and GENIA respectively. Two models were trained as a sanity check for this intuition: (1) training/testing on WSJ with Gigaword embeddings and (2) training/testing on GENIA with Pubmed Open Access embeddings. The intended "best" model which uses both corpora and both sets of embeddings was trained with 5 different multiples of the GENIA corpus. Given how much smaller the GENIA corpus is relative to WSJ, the GENIA corpus was concatenated onto the WSJ corpus 1, 2, 3, 4, and 5 times to create 5 different models which use the combined WSJ+GENIA*k (k being the number of iterations of the GENIA corpus) corpus with Gigaword+Pubmed embeddings. These embeddings were generated with Word2Vec with dimension 200 and all other settings set to their default values. (All relevant Gigaword-Pubmed files can be found here: /net/kate/storage/data/nlp/corpora/word2vec/gigaword-pubmed/. See reference.conf for more information.)

The results of training on these models are as follows:

  • Vanilla CoreNLPProcessor, tested on:

    • WSJ train file
    tp: 716313, fp: 83544, tn: 0, fn: 193883
    Precision: 0.8955513298002018
    Recall: 0.786987637827457
    F1: 0.8377670165778488
    
    • GENIA train file
    tp: 263133, fp: 58407, tn: 0, fn: 96928
    Precision: 0.8183523045344281
    Recall: 0.7308011698017836
    F1: 0.7721027404595945
    
    • WSJ test file
    tp: 40867, fp: 6421, tn: 0, fn: 13401
    Precision: 0.8642150228387752
    Recall: 0.7530588929018943
    F1: 0.8048170467525306
    
    • GENIA test file
    tp: 24495, fp: 6047, tn: 0, fn: 9784
    Precision: 0.8020103464082248
    Recall: 0.7145774380816243
    F1: 0.7557735918915167
    
  • Vanilla FastNLPProcessor, tested on:

    • WSJ train file
    tp: 834228, fp: 75968, tn: 0, fn: 75968
    Precision: 0.9165366580384884
    Recall: 0.9165366580384884
    F1: 0.9165366580384884
    
    • GENIA train file
    tp: 322992, fp: 37069, tn: 0, fn: 37069
    Precision: 0.8970480001999661
    Recall: 0.8970480001999661
    F1: 0.8970480001999661
    
    • WSJ test file
    tp: 47476, fp: 6792, tn: 0, fn: 6792
    Precision: 0.8748433699417705
    Recall: 0.8748433699417705
    F1: 0.8748433699417705
    
    • GENIA test file
    tp: 29522, fp: 4757, tn: 0, fn: 4757
    Precision: 0.8612269902855976
    Recall: 0.8612269902855976
    F1: 0.8612269902855976
    
  • Train on WSJ with gigaword embeddings, test on:

    • WSJ test file
    tp: 6585, fp: 47683, tn: 0, fn: 47683
    Precision: 0.12134222746369869
    Recall: 0.12134222746369869
    F1: 0.12134222746369869
    
    • GENIA test file
    tp: 4040, fp: 30239, tn: 0, fn: 30239
    Precision: 0.11785641354765308
    Recall: 0.11785641354765308
    F1: 0.11785641354765308
    
  • Train on GENIA with PMC embeddings, test on:

    • WSJ test file
    tp: 10042, fp: 44226, tn: 0, fn: 44226
    Precision: 0.18504459349893124
    Recall: 0.18504459349893124
    F1: 0.18504459349893124
    
    • GENIA test file
    tp: 6323, fp: 27956, tn: 0, fn: 27956
    Precision: 0.18445695615391347
    Recall: 0.18445695615391347
    F1: 0.18445695615391347
    
  • Train on WSJ+GENIA*1 with gigaword+PMC embeddings

    • Test on WSJ
    tp: 8410, fp: 45858, tn: 0, fn: 45858
    Precision: 0.15497162231886194
    Recall: 0.15497162231886194
    F1: 0.15497162231886194
    
    • Test on GENIA
    tp: 5346, fp: 28933, tn: 0, fn: 28933
    Precision: 0.1559555412935033
    Recall: 0.1559555412935033
    F1: 0.1559555412935033
    
  • Train on WSJ+GENIA*2 with gigaword+PMC embeddings

    • Test on WSJ
    tp: 9496, fp: 44772, tn: 0, fn: 44772
    Precision: 0.17498341564089334
    Recall: 0.17498341564089334
    F1: 0.17498341564089334
    
    • Test on GENIA
    tp: 5978, fp: 28301, tn: 0, fn: 28301
    Precision: 0.17439248519501735
    Recall: 0.17439248519501735
    F1: 0.17439248519501735
    
  • Train on WSJ+GENIA*3 with gigaword+PMC embeddings

    • Test on WSJ
    tp: 9004, fp: 45264, tn: 0, fn: 45264
    Precision: 0.1659172993292548
    Recall: 0.1659172993292548
    F1: 0.1659172993292548
    
    • Test on GENIA
    tp: 5790, fp: 28489, tn: 0, fn: 28489
    Precision: 0.16890807783190875
    Recall: 0.16890807783190875
    F1: 0.16890807783190875
    
  • Train on WSJ+GENIA*4 with gigaword+PMC embeddings

    • Test on WSJ
    tp: 9448, fp: 44820, tn: 0, fn: 44820
    Precision: 0.17409891648853837
    Recall: 0.17409891648853837
    F1: 0.17409891648853837
    
    • Test on GENIA
    tp: 5794, fp: 28485, tn: 0, fn: 28485
    Precision: 0.16902476735027278
    Recall: 0.16902476735027278
    F1: 0.16902476735027278
    
  • Train on WSJ+GENIA*5 with gigaword+PMC embeddings

    • Test on WSJ
    tp: 9912, fp: 44356, tn: 0, fn: 44356
    Precision: 0.18264907496130317
    Recall: 0.18264907496130317
    F1: 0.18264907496130317
    
    • Test on GENIA
    tp: 5992, fp: 28287, tn: 0, fn: 28287
    Precision: 0.1748008985092914
    Recall: 0.1748008985092914
    F1: 0.1748008985092914
    

FastNLPProcessor has already been noted as having even better results that CoreNLPProcessor and has a faster runtime, so this was an improvement that required little changed to the existing code in REACH. Strangely, the neural network models have very poor performance. Whether this is an implementation error or a true reflection of their performance is being looked into.