# Deep Transition Dependency Parser

Project Goals:

- Implement an arc-standard transition-based dependency parser in PyTorch
- Implement neural network components for choosing actions and combining stack elements
- Train your network to parse English and Norwegian sentences

In [76]:
! nosetests tests/test_parser.py:test_get_arc_components_d1_1a

.
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK


In [77]:
! nosetests tests/test_parser.py:test_create_arc_d1_1b

.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK


In [78]:
! nosetests tests/test_parser.py:test_stack_terminating_cond_d1_2

.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK


In [79]:
! nosetests tests/test_parser.py:test_validate_action_d1_3

.
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK


In [84]:
! nosetests tests/test_parser.py:test_word_embed_lookup_d2_1

.
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


In [83]:
! nosetests tests/test_parser.py:test_feature_extraction_d2_2

.
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


In [105]:
! nosetests tests/test_parser.py:test_action_chooser_d2_3

.
----------------------------------------------------------------------
Ran 1 test in 0.091s

OK


In [81]:
! nosetests tests/test_parser.py:test_combiner_d2_4

.
----------------------------------------------------------------------
Ran 1 test in 0.084s

OK


In [80]:
! nosetests tests/test_parser.py:test_parse_logic_d3_1

.
----------------------------------------------------------------------
Ran 1 test in 0.109s

OK


In [86]:
! nosetests tests/test_parser.py:test_predict_after_train_d3_1

.
----------------------------------------------------------------------
Ran 1 test in 0.792s

OK


In [85]:
! nosetests tests/test_parser.py:test_dev_d3_2_english

.
----------------------------------------------------------------------
Ran 1 test in 0.010s

OK


In [87]:
! nosetests tests/test_parser.py:test_dev_d3_3_norwegian

.
----------------------------------------------------------------------
Ran 1 test in 0.012s

OK


In [88]:
! nosetests tests/test_parser.py:test_bilstm_word_embeds_d4_1

.
----------------------------------------------------------------------
Ran 1 test in 0.101s

OK


In [89]:
! nosetests tests/test_parser.py:test_suff_word_embeds_d4_2

.
----------------------------------------------------------------------
Ran 1 test in 0.004s

OK


In [90]:
! nosetests tests/test_parser.py:test_pretrained_embeddings_d4_3

.
----------------------------------------------------------------------
Ran 1 test in 0.007s

OK


In [91]:
! nosetests tests/test_parser.py:test_lstm_combiner_d4_4

.
----------------------------------------------------------------------
Ran 1 test in 0.082s

OK


In [104]:
! nosetests tests/test_parser.py:test_lstm_action_chooser_d4_5

.
----------------------------------------------------------------------
Ran 1 test in 0.091s

OK


In [1]:
import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import torch.autograd as ag

import nose
import numpy as np
import collections

from imp import reload

In [2]:
print('My library versions')

print('numpy: {}'.format(np.__version__))
print('nose: {}'.format(nose.__version__))
print('torch: {}'.format(torch.__version__))

My library versions
numpy: 1.23.3
nose: 1.3.7
torch: 1.4.0


To test whether your libraries are the right version, run:

`nosetests tests/test_environment.py`

In [8]:
# use ! to run shell commands in notebook
! nosetests tests/test_environment.py

.
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK


In [3]:
import mynlplib.parsing as parsing
import mynlplib.data_tools as data_tools
import mynlplib.constants as consts
import mynlplib.evaluation as evaluation
import mynlplib.utils as utils
import mynlplib.feat_extractors as feat_extractors
import mynlplib.neural_net as neural_net

In [4]:
# Read in the datasets
reload(data_tools)
en_dataset = data_tools.Dataset(consts.EN_TRAIN_FILE, consts.EN_DEV_FILE, consts.EN_TEST_FILE)
nr_dataset = data_tools.Dataset(consts.NR_TRAIN_FILE, consts.NR_DEV_FILE, consts.NR_TEST_FILE)

# Assign each word a unique index, including two special tokens needed for parsing logic
word_to_ix_en = { word: i for i, word in enumerate(en_dataset.vocab) }
word_to_ix_nr = { word: i for i, word in enumerate(nr_dataset.vocab) }

In [5]:
# Some constants to keep around
LSTM_NUM_LAYERS = 1
TEST_EMBEDDING_DIM = 4
WORD_EMBEDDING_DIM = 64
STACK_EMBEDDING_DIM = 100
NUM_FEATURES = 3

# Hyperparameters
ETA_0 = 0.01
DROPOUT = 0.0

# High-Level Overview of the Parser
* Initialize your parsing stack and input buffer.
* At each step, until the parse is done:
  * Extract some features.  We will start with simple features, but these can be anything: words in the sentence, the configuration of the stack, the configuration of the input buffer, the previous action, etc.
  * Send these features through a feed-forward (FF) network to get a probability distribution over actions (`SHIFT`, `ARC_L`, `ARC_R`).  The next action you choose is the one with the highest probability.
  * If the action is an arc- operation, you use a neural network to combine the two items in the operation and get a dense output to place back on the input buffer.

**Classes of note:**
* Feature extraction in `feat_extractors.py`
* The `ParserState` class, which keeps track of the input buffer and parse stack, and offers a public interface for doing the parsing actions to update the state
* The `TransitionParser` class, which is a PyTorch module where the core parsing logic resides, in `parsing.py`.
* The neural network components in `neural_net.py`

The network components are compartmentalized as follows:
* `TransitionParser`, the base component that contains and coordinates the other substitutable components

* Embedding Lookup: These embeddings are used to initialize the input buffer, and will be shifted on the stack / serve as inputs to the combiner networks.
  - `VanillaWordEmbedding` just gets embeddings from a lookup table.
  - `BiLSTMWordEmbedding` will run a sequence model in both directions over the sentence. The hidden state at step t is the embedding for the `t`-th word of the sentence.
  - `SuffixAndWordEmbedding` gets embeddings for words as in the vanilla embeddings, and also gets embeddings for word suffixes, and concatenates them together.
* Action Choosing: 
  - `FFActionChooser` is a simple feed-forward neural network that outputs log probabilities over the three actions given the extracted features as input.
  - `LSTMActionChooser` applies a sequence model that takes the hidden state of the previous action decision as input.

* Combiners: These are the network components that take the two embeddings of the items in an arc- operation and creates a single vector.
  - `FFCombiner` takes the two input embeddings and gives a dense output.
  - `LSTMCombiner` applies a sequence model, where the output embedding is the hidden state of the next timestep.

### Parsing example

The following is how the input buffer and stack look at each step of a parse, up to the first arc.  The input sentence is "the dog ran away".  Our action chooser network takes the top element of the stack, the top element of the input buffer, plus a one-token "lookahead" in the input buffer.  $C(x,y)$ refers to calling our combiner network on arguments $x, y$.  Also let $A$ be the set of actions: $\{ \text{SHIFT}, \text{ARC-L}, \text{ARC-R} \}$, and let $q_w$ be the embedding for word $w$.

1. 
  * Input Buffer: $\left[ q_\text{the}, q_\text{dog}, q_\text{ran}, q_\text{away}, q_\text{END-INPUT} \right]$
  * Stack: $\left[ q_\text{ROOT} \right]$
  * Action: $ \text{argmax}_{a \in A} \ \text{ActionChooser}(q_\text{ROOT}, q_\text{the}, \overbrace{q_\text{dog}}^\text{lookahead}) \Rightarrow \text{SHIFT}$
  
2.
  * Input Buffer: $\left[ q_\text{dog}, q_\text{ran}, q_\text{away}, q_\text{END-INPUT} \right]$
  * Stack: $\left[ q_\text{ROOT}, q_\text{the} \right]$
  * Action: $ \text{argmax}_{a \in A} \ \text{ActionChooser}(q_\text{the}, q_\text{dog}, q_\text{ran}) \Rightarrow \text{ARC-L}$
  
3.
  * Input Buffer: $\left[C(q_\text{dog}, q_\text{the}), q_\text{ran}, q_\text{away}, q_\text{END-INPUT} \right]$
  * Stack: $\left[ q_\text{ROOT} \right]$
  
This is a partial picture of parsing - we keep more than just the embedding on the stack and input buffer.  We also keep the word and its position in the sentence so that when we create an arc, we know what edge was just created.
So, for example, the initial input buffer really looks like

$$ \left[ (\text{the}, 0, q_\text{the}), (\text{dog}, 1, q_\text{dog}), (\text{ran}, 2, q_\text{ran}), (\text{away}, 3, q_\text{away}), (\text{END-INPUT}, 4, q_\text{END-INPUT}) \right] $$


# 1. Managing and Updating the Parser State

### Implementing Arc

**Implemented:** `_get_arc_components` in `parsing.py`

Selects the head and modifier according to the action passed in. Method also removes the items from the stack and input buffer

- **Test**: ` test_parser.py:test_get_arc_components_d1_1a`

In [225]:
! nosetests tests/test_parser.py:test_get_arc_components_d1_1a

.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK


In [9]:
# TEST VERSION
parser_state.shift()
parser_state.shift()
print(parser_state)

head, modifier = parser_state._get_arc_components(consts.Actions.ARC_L)
print(head, modifier)

head, modifier = parser_state._get_arc_components(consts.Actions.ARC_R)
print(head, modifier)

parser_state.combiner(head, modifier)

Stack: ['<ROOT>', 'The', 'man']
Input Buffer: ['ran', 'away', '<END-OF-INPUT>']

StackEntry(headword='ran', headword_pos=2, embedding=None) StackEntry(headword='man', headword_pos=1, embedding=None)
StackEntry(headword='The', headword_pos=0, embedding=None) StackEntry(headword='away', headword_pos=3, embedding=None)


StackEntry(headword='The', headword_pos=0, embedding=None)

In [6]:
reload(parsing)
test_sentence = "The man ran away".split()
parser_state = parsing.ParserState(test_sentence + [consts.END_OF_INPUT_TOK], 
                                   [None] * (len(test_sentence)+1),
                                   utils.DummyCombiner())

In [7]:
parser_state.shift()
parser_state.shift()
print(parser_state)

head, modifier = parser_state._get_arc_components(consts.Actions.ARC_L)
print(head, modifier)

head, modifier = parser_state._get_arc_components(consts.Actions.ARC_R)
print(head, modifier)

Stack: ['<ROOT>', 'The', 'man']
Input Buffer: ['ran', 'away', '<END-OF-INPUT>']

StackEntry(headword='ran', headword_pos=2, embedding=None) StackEntry(headword='man', headword_pos=1, embedding=None)
StackEntry(headword='The', headword_pos=0, embedding=None) StackEntry(headword='away', headword_pos=3, embedding=None)


**Implemented:** `_create_arc` in `parsing.py` to use the `ParserState`'s `combiner` component to **combine** the passed in head and modifier, put the combination on the input buffer, and create a new dependency graph edge.
- **Test**: ` test_parser.py:test_create_arc_d1_1b`

In [224]:
! nosetests tests/test_parser.py:test_create_arc_d1_1b

.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK


In [11]:
# TEST VERSION
reload(parsing)
parser_state = parsing.ParserState(test_sentence + [consts.END_OF_INPUT_TOK], 
                                   [None] * (len(test_sentence)+1),
                                   utils.DummyCombiner())

print(parser_state)

parser_state.shift()
print(parser_state)

arc = parser_state.arc_left()
print("First arc: Head: {}, Modifier: {}".format(arc[0], arc[1]), "\n")
print(parser_state)

parser_state.shift()
arc = parser_state.arc_left()
print("Second arc: Head: {}, Modifier: {}".format(arc[0], arc[1]), "\n")
print(parser_state)

Stack: ['<ROOT>']
Input Buffer: ['The', 'man', 'ran', 'away', '<END-OF-INPUT>']

Stack: ['<ROOT>', 'The']
Input Buffer: ['man', 'ran', 'away', '<END-OF-INPUT>']

First arc: Head: ('man', 1), Modifier: ('The', 0) 

Stack: ['<ROOT>']
Input Buffer: ['man', 'ran', 'away', '<END-OF-INPUT>']

Second arc: Head: ('ran', 2), Modifier: ('man', 1) 

Stack: ['<ROOT>']
Input Buffer: ['ran', 'away', '<END-OF-INPUT>']



In [8]:
reload(parsing)
parser_state = parsing.ParserState(test_sentence + [consts.END_OF_INPUT_TOK], 
                                   [None] * (len(test_sentence)+1),
                                   utils.DummyCombiner())

print(parser_state)

parser_state.shift()
print(parser_state)

arc = parser_state.arc_left()
print("First arc: Head: {}, Modifier: {}".format(arc[0], arc[1]), "\n")
print(parser_state)

parser_state.shift()
arc = parser_state.arc_left()
print("Second arc: Head: {}, Modifier: {}".format(arc[0], arc[1]), "\n")
print(parser_state)

Stack: ['<ROOT>']
Input Buffer: ['The', 'man', 'ran', 'away', '<END-OF-INPUT>']

Stack: ['<ROOT>', 'The']
Input Buffer: ['man', 'ran', 'away', '<END-OF-INPUT>']

First arc: Head: ('man', 1), Modifier: ('The', 0) 

Stack: ['<ROOT>']
Input Buffer: ['man', 'ran', 'away', '<END-OF-INPUT>']

Second arc: Head: ('ran', 2), Modifier: ('man', 1) 

Stack: ['<ROOT>']
Input Buffer: ['ran', 'away', '<END-OF-INPUT>']



### Parser Terminating Condition
**Implemented:** `done_parsing()` in `ParserState`
- **Test**: `test_parsing.py:test_stack_terminating_cond_d1_2`

In [223]:
! nosetests tests/test_parser.py:test_stack_terminating_cond_d1_2

.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK


In [9]:
reload(parsing)
parser_state = parsing.ParserState(test_sentence + [consts.END_OF_INPUT_TOK], 
                                   [None] * (len(test_sentence)+1),
                                   utils.DummyCombiner())

parser_state.shift()
parser_state.arc_left()
parser_state.shift()
parser_state.arc_left()

print(parser_state.done_parsing())

parser_state.shift()
parser_state.arc_right()
print(parser_state.done_parsing())

parser_state.arc_right()
print(parser_state.done_parsing())

parser_state.shift()
print(parser_state.done_parsing())

False
False
False
True


### Validating parser actions 
**Implemented:** `_validate_action` method in `parsing.TransitionParser`

Used in the prediction setting, when the gold standard is not available. We need to ensure that any action we take is legal

Parser action rules:

- You cannot shift when the input buffer has <= 2 items on it (including the end of input token), UNLESS the stack is empty.
  - **In this case, do `ARC_R` by default.**
- You cannot do an arc- operation when the stack is empty (this will happen after creating an arc with ROOT).
  - **In this case, do `SHIFT` by default.**
- You cannot do an arc-left operation when the root token is on top of the stack.
  - **In this case, do `SHIFT` or `ARC-R` depending on the state of the input buffer.**
  
**Test:**
- `test_parser.py:test_validate_action_d1_3`

In [222]:
! nosetests tests/test_parser.py:test_validate_action_d1_3

.
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK


In [35]:
inp = [1,2,3]
print(inp[-1])

3


In [10]:
reload(parsing)
parser_state = parsing.ParserState(test_sentence + [consts.END_OF_INPUT_TOK], 
                                   [None] * (len(test_sentence)+1),
                                   utils.DummyCombiner())
ix_to_action = consts.Actions.ix_to_action

In [57]:
# TEST VERSION
print(parser_state)
act_to_do = consts.Actions.ARC_L
valid_action = parser_state._validate_action(act_to_do)
print("Chosen action: %s, Valid action: %s\n" % (ix_to_action[act_to_do], ix_to_action[valid_action]))

parser_state.shift()

print(parser_state)
act_to_do = consts.Actions.ARC_L
valid_action = parser_state._validate_action(act_to_do)
print("Chosen action: %s, Valid action: %s\n" % (ix_to_action[act_to_do], ix_to_action[valid_action]))

parser_state.shift()
parser_state.shift()

print(parser_state)
act_to_do = consts.Actions.SHIFT
valid_action = parser_state._validate_action(act_to_do)
print("Chosen action: %s, Valid action: %s\n" % (ix_to_action[act_to_do], ix_to_action[valid_action]))

Stack: ['<ROOT>']
Input Buffer: ['The', 'man', 'ran', 'away', '<END-OF-INPUT>']

Chosen action: ARC_L, Valid action: SHIFT

Stack: ['<ROOT>', 'The']
Input Buffer: ['man', 'ran', 'away', '<END-OF-INPUT>']

Chosen action: ARC_L, Valid action: ARC_L

Stack: ['<ROOT>', 'The', 'man', 'ran']
Input Buffer: ['away', '<END-OF-INPUT>']

Chosen action: SHIFT, Valid action: ARC_R



In [11]:
print(parser_state)
act_to_do = consts.Actions.ARC_L
valid_action = parser_state._validate_action(act_to_do)
print("Chosen action: %s, Valid action: %s\n" % (ix_to_action[act_to_do], ix_to_action[valid_action]))

parser_state.shift()

print(parser_state)
act_to_do = consts.Actions.ARC_L
valid_action = parser_state._validate_action(act_to_do)
print("Chosen action: %s, Valid action: %s\n" % (ix_to_action[act_to_do], ix_to_action[valid_action]))

parser_state.shift()
parser_state.shift()

print(parser_state)
act_to_do = consts.Actions.SHIFT
valid_action = parser_state._validate_action(act_to_do)
print("Chosen action: %s, Valid action: %s\n" % (ix_to_action[act_to_do], ix_to_action[valid_action]))

Stack: ['<ROOT>']
Input Buffer: ['The', 'man', 'ran', 'away', '<END-OF-INPUT>']

Chosen action: ARC_L, Valid action: SHIFT

Stack: ['<ROOT>', 'The']
Input Buffer: ['man', 'ran', 'away', '<END-OF-INPUT>']

Chosen action: ARC_L, Valid action: ARC_L

Stack: ['<ROOT>', 'The', 'man', 'ran']
Input Buffer: ['away', '<END-OF-INPUT>']

Chosen action: SHIFT, Valid action: ARC_R



# 2. Neural Network for Action Decisions
Using PyTorch to create a neural network which examines the current state of the parse and makes the decision to either shift, arc left, or arc right.

In [12]:
words_to_ix = { 'hello': 0, 'world': 1}
embeds = nn.Embedding(2, 5)
# lookup_tensor is our lookup IDX for the first word, we use it to index into our embedding
lookup_tensor = torch.tensor([words_to_ix["hello"]], dtype=torch.long)
print(embeds(lookup_tensor))

tensor([[-0.3458, -1.3527, -0.8332,  0.6330,  0.5851]],
       grad_fn=<EmbeddingBackward>)


### Word Embedding Lookup
**Implemented:** `VanillaWordEmbedding` in `neural_net.py`

**Test:** `test_parser.py:test_word_embed_lookup_d2_1`

In [221]:
! nosetests tests/test_parser.py:test_word_embed_lookup_d2_1

.
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


In [13]:
# EMBEDDING DIM IS 4
reload(neural_net)
torch.manual_seed(1)

test_sentence = "natural language processing".split()
test_word_to_ix = { "natural": 0, "language": 1, "processing": 2 }

word_embedder = neural_net.VanillaWordEmbedding(test_word_to_ix, TEST_EMBEDDING_DIM)
embeds = word_embedder(test_sentence)
print(type(embeds))
print(len(embeds), "\n")
print("Embedding for 'natural':\n {}".format(embeds[0]))

<class 'list'>
3 

Embedding for 'natural':
 tensor([[0.6614, 0.2669, 0.0617, 0.6213]], grad_fn=<EmbeddingBackward>)


In [13]:
reload(neural_net)
torch.manual_seed(1)

test_sentence = "natural language processing".split()
test_word_to_ix = { "natural": 0, "language": 1, "processing": 2 }

word_embedder = neural_net.VanillaWordEmbedding(test_word_to_ix, TEST_EMBEDDING_DIM)
embeds = word_embedder(test_sentence)
print(type(embeds))
print(len(embeds), "\n")
print("Embedding for 'natural':\n {}".format(embeds[0]))

<class 'list'>
3 

Embedding for 'natural':
 tensor([[0.6614, 0.2669, 0.0617, 0.6213]], grad_fn=<ViewBackward>)


### Feature Extraction
**Implemented:** `SimpleFeatureExtractor` class in `feat_extractors.py` to give the following 3 features as a list **in this order**:
* The embedding of the top of the stack
* The embedding of the first token in the input buffer
* The embedding of the next token in the input buffer (one-token lookahead)

**Test:** `test_parser.py:test_feature_extraction_d2_2`

In [220]:
! nosetests tests/test_parser.py:test_feature_extraction_d2_2

.
----------------------------------------------------------------------
Ran 1 test in 0.003s

OK


In [213]:
# TEST VERSION
reload(feat_extractors)
torch.manual_seed(1)

test_sentence = "The Sound and the Fury".split()
test_word_to_ix = { word: i for i, word in enumerate(sorted(set(test_sentence))) }

embedder = neural_net.VanillaWordEmbedding(test_word_to_ix, TEST_EMBEDDING_DIM)
embeds = embedder(test_sentence)

state = parsing.ParserState(test_sentence, embeds, utils.DummyCombiner())

state.shift()
feat_extractor = feat_extractors.SimpleFeatureExtractor()
feats = feat_extractor.get_features(state)

print("Embedding for 'The':\n {}".format(feats[0]))
print("Embedding for 'Sound':\n {}".format(feats[1]))
print("Embedding for 'and' (from buffer lookahead):\n {}".format(feats[2]))

Embedding for 'The':
 tensor([[ 0.4391,  1.1712,  1.7674, -0.0954]], grad_fn=<EmbeddingBackward>)
Embedding for 'Sound':
 tensor([[ 0.8657,  0.2444, -0.6629,  0.8073]], grad_fn=<EmbeddingBackward>)
Embedding for 'and' (from buffer lookahead):
 tensor([[ 0.0612, -0.6177, -0.7981, -0.1316]], grad_fn=<EmbeddingBackward>)


In [14]:
reload(feat_extractors)
torch.manual_seed(1)

test_sentence = "The Sound and the Fury".split()
test_word_to_ix = { word: i for i, word in enumerate(sorted(set(test_sentence))) }

embedder = neural_net.VanillaWordEmbedding(test_word_to_ix, TEST_EMBEDDING_DIM)
embeds = embedder(test_sentence)

state = parsing.ParserState(test_sentence, embeds, utils.DummyCombiner())

state.shift()
feat_extractor = feat_extractors.SimpleFeatureExtractor()
feats = feat_extractor.get_features(state)

print("Embedding for 'The':\n {}".format(feats[0]))
print("Embedding for 'Sound':\n {}".format(feats[1]))
print("Embedding for 'and' (from buffer lookahead):\n {}".format(feats[2]))

Embedding for 'The':
 tensor([[ 0.4391,  1.1712,  1.7674, -0.0954]], grad_fn=<EmbeddingBackward>)
Embedding for 'Sound':
 tensor([[ 0.8657,  0.2444, -0.6629,  0.8073]], grad_fn=<EmbeddingBackward>)
Embedding for 'and' (from buffer lookahead):
 tensor([[ 0.0612, -0.6177, -0.7981, -0.1316]], grad_fn=<EmbeddingBackward>)


### Feedforward Network for Choosing Actions
**Implemented:** `neural_net.FFActionChooser`

Takes the list of embeddings passed in (that come from your feature extractor) and concatenate them to one long row vector (size [1 x num actions])

This network takes as input the features from feature extractor, concatenates them, runs them through a feedforward network, and outputs log probabilities over actions.

**Test:** `test_parser.py:test_action_chooser_d2_3`

In [219]:
! nosetests tests/test_parser.py:test_action_chooser_d2_3

.
----------------------------------------------------------------------
Ran 1 test in 0.080s

OK


In [101]:
# TEST VERSION
reload(neural_net)
torch.manual_seed(1)
act_chooser = neural_net.FFActionChooser(TEST_EMBEDDING_DIM * NUM_FEATURES)
feats = [ ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM)) for _ in range(NUM_FEATURES) ] # make some dummy feature embeddings
log_probs = act_chooser(feats)
print(log_probs)

torch.Size([12])
tensor([[-1.2443, -0.8323, -1.2844]], grad_fn=<LogSoftmaxBackward>)


In [16]:
reload(neural_net)
torch.manual_seed(1)
act_chooser = neural_net.FFActionChooser(TEST_EMBEDDING_DIM * NUM_FEATURES)
feats = [ ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM)) for _ in range(NUM_FEATURES) ] # make some dummy feature embeddings
log_probs = act_chooser(feats)
print(log_probs)

tensor([[-1.2443, -0.8323, -1.2844]], grad_fn=<LogSoftmaxBackward>)


### Network for Combining Stack Items
**Implemented:** `neural_net.FFCombiner`

Component takes two embeddings, the head and modifier, during an arc- operation and output a combined embedding (of size [1 x embedding_dim]), which is then pushed back onto the input buffer during parsing.

**Test:** `test_parser.py:test_combiner_d2_4`

In [218]:
! nosetests tests/test_parser.py:test_combiner_d2_4

.
----------------------------------------------------------------------
Ran 1 test in 0.095s

OK


In [96]:
# TEST VERSION
reload(neural_net)
torch.manual_seed(1)
combiner = neural_net.FFCombiner(TEST_EMBEDDING_DIM)

# Again, make dummy inputs
head_feat = ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM))
modifier_feat = ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM))
combined = combiner(head_feat, modifier_feat)
print(combined)

tensor([[ 0.4285, -0.1363,  0.4046,  0.6006]], grad_fn=<ViewBackward>)


In [17]:
reload(neural_net)
torch.manual_seed(1)
combiner = neural_net.FFCombiner(TEST_EMBEDDING_DIM)

# Again, make dummy inputs
head_feat = ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM))
modifier_feat = ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM))
combined = combiner(head_feat, modifier_feat)
print(combined)

tensor([[ 0.4285, -0.1363,  0.4046,  0.6006]], grad_fn=<ViewBackward>)


# 3. Return of the Parser

In [18]:
from collections import deque, namedtuple
from mynlplib.constants import Actions

def test(actions=None):
    if actions is not None:
        action_queue = deque()
        action_queue.extend([ Actions.action_to_ix[a] for a in actions ])
        have_gold_actions = True
    else:
        have_gold_actions = False
    print(action_queue.popleft())
    
test(["SHIFT", "SHIFT", "ARC_L", "ARC_R"])

0


### Parser Training Code

**Implemented:**`forward()` function in `mynlplib.parsing.TransitionParser`.

Parsing logic is roughly as follows:
* Loop until parsing state is in its terminating state
* Get the features from the parsing state 
* Send them through your action chooser network to get log probabilities over actions
* If we have `gold_actions`, do them. Otherwise (when predicting), take the argmax of log probabilities, validate the action, and do that

**Tests:**
- `test_parser.py:test_parse_logic_d3_1`
- `test_parser.py:test_predict_after_train_d3_1`

In [234]:
! nosetests tests/test_parser.py:test_parse_logic_d3_1

.
----------------------------------------------------------------------
Ran 1 test in 0.097s

OK


In [239]:
! nosetests tests/test_parser.py:test_predict_after_train_d3_1

.
----------------------------------------------------------------------
Ran 1 test in 0.771s

OK


In [19]:
test_sentence = "The man ran away".split()
test_word_to_ix = { word: i for i, word in enumerate(sorted(set(test_sentence))) }
test_word_to_ix[consts.END_OF_INPUT_TOK] = len(test_word_to_ix)
test_sentence_vocab = set(test_sentence)
gold_actions = ["SHIFT", "ARC_L", "SHIFT", "ARC_L", "SHIFT", "ARC_R", "ARC_R", "SHIFT"]
print(test_word_to_ix['<END-OF-INPUT>'])

4


In [233]:
# TEST VERSION
reload(parsing)
torch.manual_seed(1)
feat_extractor = feat_extractors.SimpleFeatureExtractor()

word_embedding_lookup = neural_net.VanillaWordEmbedding(test_word_to_ix, STACK_EMBEDDING_DIM)

action_chooser = neural_net.FFActionChooser(STACK_EMBEDDING_DIM * NUM_FEATURES)

combiner_network = neural_net.FFCombiner(STACK_EMBEDDING_DIM)

parser = parsing.TransitionParser(feat_extractor, word_embedding_lookup,
                                     action_chooser, combiner_network)
output, depgraph, actions_done = parser(test_sentence, gold_actions)
print(depgraph)
print(actions_done)

{DepGraphEdge(head=('ran', 2), modifier=('man', 1)), DepGraphEdge(head=('<ROOT>', -1), modifier=('ran', 2)), DepGraphEdge(head=('man', 1), modifier=('The', 0)), DepGraphEdge(head=('ran', 2), modifier=('away', 3))}
[0, 1, 0, 1, 0, 2, 2, 0]


In [20]:
reload(parsing)
torch.manual_seed(1)
feat_extractor = feat_extractors.SimpleFeatureExtractor()
word_embedding_lookup = neural_net.VanillaWordEmbedding(test_word_to_ix, STACK_EMBEDDING_DIM)
action_chooser = neural_net.FFActionChooser(STACK_EMBEDDING_DIM * NUM_FEATURES)
combiner_network = neural_net.FFCombiner(STACK_EMBEDDING_DIM)
parser = parsing.TransitionParser(feat_extractor, word_embedding_lookup,
                                     action_chooser, combiner_network)
output, depgraph, actions_done = parser(test_sentence, gold_actions)
print(depgraph)
print(actions_done)

{DepGraphEdge(head=('man', 1), modifier=('The', 0)), DepGraphEdge(head=('<ROOT>', -1), modifier=('ran', 2)), DepGraphEdge(head=('ran', 2), modifier=('away', 3)), DepGraphEdge(head=('ran', 2), modifier=('man', 1))}
[0, 1, 0, 1, 0, 2, 2, 0]


### Training the Parser!

In [52]:
def train_parser(parser, optimizer, dataset, n_epochs=1, n_train_insts=1000):
    for epoch in range(n_epochs):
        print("Epoch {}".format(epoch+1))

        parser.train() # turn on dropout layers if they are there
        parsing.train(dataset.training_data[:n_train_insts], parser, optimizer, verbose=True)

        print("Dev Evaluation")
        parser.eval() # turn them off for evaluation
        parsing.evaluate(dataset.dev_data, parser, verbose=True)
        print("F-Score: {}".format(evaluation.compute_metric(parser, dataset.dev_data, evaluation.fscore)))
        print("Attachment Score: {}".format(evaluation.compute_attachment(parser, dataset.dev_data)))
        print("\n")

In [21]:
reload(parsing)
torch.manual_seed(1)
feat_extractor = feat_extractors.SimpleFeatureExtractor()
word_embedding_lookup = neural_net.VanillaWordEmbedding(word_to_ix_en, STACK_EMBEDDING_DIM)
action_chooser = neural_net.FFActionChooser(STACK_EMBEDDING_DIM * NUM_FEATURES)
combiner_network = neural_net.FFCombiner(STACK_EMBEDDING_DIM)
parser = parsing.TransitionParser(feat_extractor, word_embedding_lookup,
                                     action_chooser, combiner_network)
optimizer = optim.SGD(parser.parameters(), lr=ETA_0)

In [22]:
%%timeit
torch.manual_seed(1)
parsing.train(en_dataset.training_data[:100], parser, optimizer, verbose=True)

Number of instances: 100    Number of network actions: 4836
Acc: 0.7096774193548387  Loss: 32.758761103153226
Number of instances: 100    Number of network actions: 4836
Acc: 0.8440860215053764  Loss: 18.042054556012154
Number of instances: 100    Number of network actions: 4836
Acc: 0.9077750206782464  Loss: 11.479276912212372
Number of instances: 100    Number of network actions: 4836
Acc: 0.9421009098428453  Loss: 6.985745858773589
Number of instances: 100    Number of network actions: 4836
Acc: 0.9574028122415219  Loss: 5.298160364925861
Number of instances: 100    Number of network actions: 4836
Acc: 0.9716708023159636  Loss: 3.816723921522498
Number of instances: 100    Number of network actions: 4836
Acc: 0.9834574028122415  Loss: 2.5622586871363455
Number of instances: 100    Number of network actions: 4836
Acc: 0.9768403639371381  Loss: 3.3731359971829806
10.3 s ± 298 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [21]:
%%timeit
torch.manual_seed(1)
parsing.train(en_dataset.training_data[:100], parser, optimizer, verbose=True)

Number of instances: 100    Number of network actions: 4836
Acc: 0.7084367245657568  Loss: 32.68232423067093
Number of instances: 100    Number of network actions: 4836
Acc: 0.8442928039702233  Loss: 18.603589257597925
Number of instances: 100    Number of network actions: 4836
Acc: 0.9069478908188585  Loss: 11.905540626049042
Number of instances: 100    Number of network actions: 4836
Acc: 0.9410669975186104  Loss: 7.805608856528997
Number of instances: 100    Number of network actions: 4836
Acc: 0.9526468155500414  Loss: 6.317962162379408
Number of instances: 100    Number of network actions: 4836
Acc: 0.9627791563275434  Loss: 4.578675594367087
Number of instances: 100    Number of network actions: 4836
Acc: 0.9820099255583127  Loss: 2.974977000351646
Number of instances: 100    Number of network actions: 4836
Acc: 0.9828370554177006  Loss: 2.4389425828808453
2.61 s ± 21.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [238]:
# train the parser for a while here.
# Shouldn't take *too* long, even on a laptop
torch.manual_seed(1)
train_parser(parser, optimizer, en_dataset, n_train_insts=1000)

Epoch 1
Number of instances: 1000    Number of network actions: 44560
Acc: 0.8273339317773788  Loss: 20.328739062763635
Dev Evaluation
Number of instances: 501    Number of network actions: 15846
Acc: 0.8243089738735327  Loss: 15.471425432964654
F-Score: 0.48036655636963194
Attachment Score: 0.45550927678909503




In [22]:
# train the parser for a while here.
# Shouldn't take *too* long, even on a laptop
torch.manual_seed(1)
train_parser(parser, optimizer, en_dataset, n_train_insts=1000)

Epoch 1
Number of instances: 1000    Number of network actions: 44560
Acc: 0.8217684021543986  Loss: 21.130796878618188
Dev Evaluation
Number of instances: 501    Number of network actions: 15846
Acc: 0.8315032184778492  Loss: 14.812238732707907
F-Score: 0.48959454240639877
Attachment Score: 0.47444149943203334




### Dev Data Predictions
**Test**: `test_parser.py:test_dev_d3_2_english`

In [23]:
! nosetests tests/test_parser.py:test_dev_d3_2_english

.
----------------------------------------------------------------------
Ran 1 test in 0.012s

OK


In [29]:
dev_sentences = [ sentence for sentence, _ in en_dataset.dev_data ]
evaluation.output_preds(consts.EN_D3_2_DEV_FILENAME, parser, dev_sentences)

In [30]:
evaluation.output_preds(consts.EN_D3_2_TEST_FILENAME, parser, en_dataset.test_data)

### Dependency parsing in Norwegian

**Test**: `test_parser.py:test_dev_d3_3_norwegian`

In [31]:
! nosetests tests/test_parser.py:test_dev_d3_3_norwegian

.
----------------------------------------------------------------------
Ran 1 test in 0.008s

OK


In [32]:
reload(parsing)
torch.manual_seed(1)
feat_extractor_nr = feat_extractors.SimpleFeatureExtractor()
word_embedding_lookup_nr = neural_net.VanillaWordEmbedding(word_to_ix_nr, STACK_EMBEDDING_DIM)
action_chooser_nr = neural_net.FFActionChooser(STACK_EMBEDDING_DIM * NUM_FEATURES)
combiner_network_nr = neural_net.FFCombiner(STACK_EMBEDDING_DIM)
parser_nr = parsing.TransitionParser(feat_extractor_nr, word_embedding_lookup_nr,
                                     action_chooser_nr, combiner_network_nr)
optimizer_nr = optim.SGD(parser_nr.parameters(), lr=ETA_0)

In [34]:
reload(evaluation)
dev_sentences_nr = [ sentence for sentence, _ in nr_dataset.dev_data ]
evaluation.output_preds(consts.NR_D3_3_DEV_FILENAME, parser_nr, dev_sentences_nr)

In [35]:
evaluation.output_preds(consts.NR_D3_3_TEST_FILENAME, parser_nr, nr_dataset.test_data)

# 4. Evaluation and Training Improvements

### BiLSTM Word Embeddings 
**Implemented:** `BiLSTMWordEmbedding` in `neural_net.py`

Class implements a sequence model over the sentence, where the t'th word's embedding is the hidden state at timestep t.
This means that, rather than have our embeddings on the stack only include the semantics of a single word, our embeddings will contain information from all parts of the sentence (the LSTM will, in principle, learn what information is relevant).

**Test**: `tests/test_parser.py:test_bilstm_word_embeds_d4_1`

In [37]:
! nosetests tests/test_parser.py:test_bilstm_word_embeds_d4_1

.
----------------------------------------------------------------------
Ran 1 test in 0.137s

OK


In [21]:
print(STACK_EMBEDDING_DIM)

100


In [38]:
reload(neural_net)
torch.manual_seed(1)
test_sentence = "Noam Chomsky".split()
test_word_to_ix = { "Noam": 0, "Chomsky": 1 }

lstm_word_embedder = neural_net.BiLSTMWordEmbedding(test_word_to_ix,
                                                    WORD_EMBEDDING_DIM,
                                                    STACK_EMBEDDING_DIM,
                                                    num_layers=LSTM_NUM_LAYERS,
                                                    dropout=DROPOUT)
    
lstm_embeds = lstm_word_embedder(test_sentence)
print(type(lstm_embeds))
print(len(lstm_embeds), "\n")
print(lstm_embeds[0].size())
print("Embedding for Noam:\n {}".format(lstm_embeds[0]))

<class 'list'>
2 

torch.Size([1, 100])
Embedding for Noam:
 tensor([[-6.3007e-02,  2.4330e-01, -7.0760e-02, -1.1852e-01,  1.8881e-01,
         -1.6543e-01, -2.1600e-02,  1.4040e-02, -6.8058e-02, -1.8666e-01,
          1.0207e-01,  2.2894e-02, -5.8540e-02, -6.3337e-02, -2.9607e-01,
         -2.0053e-02, -1.8389e-01, -9.1271e-02, -5.1386e-02, -3.4879e-01,
         -3.8826e-02,  8.8795e-02, -3.8836e-02,  1.2170e-02,  4.6013e-02,
         -1.3923e-01,  1.9091e-02,  7.1751e-02,  9.5653e-02, -3.5629e-01,
          1.9788e-01,  2.9786e-02,  6.1633e-02,  4.7286e-02, -2.9223e-01,
         -7.4602e-02,  2.4812e-01, -1.3309e-01,  4.2635e-02,  4.2023e-02,
          3.1180e-02,  5.5482e-03, -1.1297e-01,  1.4214e-02, -1.0769e-01,
         -1.4725e-01, -7.3080e-02,  2.1588e-02,  1.7645e-01,  4.3659e-02,
         -2.4069e-04,  1.1204e-02, -2.2866e-01,  1.1086e-01, -3.3928e-02,
         -1.3846e-01, -8.5202e-03,  8.6117e-02,  9.5097e-02, -1.2923e-01,
         -2.7905e-03, -6.9797e-02,  1.6902e-01, -1.

In [29]:
reload(neural_net)
torch.manual_seed(1)
test_sentence = "Noam Chomsky".split()
test_word_to_ix = { "Noam": 0, "Chomsky": 1 }

lstm_word_embedder = neural_net.BiLSTMWordEmbedding(test_word_to_ix,
                                                    WORD_EMBEDDING_DIM,
                                                    STACK_EMBEDDING_DIM,
                                                    num_layers=LSTM_NUM_LAYERS,
                                                    dropout=DROPOUT)
    
lstm_embeds = lstm_word_embedder(test_sentence)
print(type(lstm_embeds))
print(len(lstm_embeds), "\n")
print("Embedding for Noam:\n {}".format(lstm_embeds[0]))

<class 'list'>
2 

Embedding for Noam:
 tensor([[-6.3007e-02,  2.4330e-01, -7.0760e-02, -1.1852e-01,  1.8881e-01,
         -1.6543e-01, -2.1600e-02,  1.4040e-02, -6.8058e-02, -1.8666e-01,
          1.0207e-01,  2.2894e-02, -5.8540e-02, -6.3337e-02, -2.9607e-01,
         -2.0053e-02, -1.8389e-01, -9.1271e-02, -5.1386e-02, -3.4879e-01,
         -3.8826e-02,  8.8795e-02, -3.8836e-02,  1.2170e-02,  4.6013e-02,
         -1.3923e-01,  1.9091e-02,  7.1751e-02,  9.5653e-02, -3.5629e-01,
          1.9788e-01,  2.9786e-02,  6.1633e-02,  4.7286e-02, -2.9223e-01,
         -7.4602e-02,  2.4812e-01, -1.3309e-01,  4.2635e-02,  4.2023e-02,
          3.1180e-02,  5.5482e-03, -1.1297e-01,  1.4215e-02, -1.0769e-01,
         -1.4725e-01, -7.3080e-02,  2.1588e-02,  1.7645e-01,  4.3659e-02,
         -2.4070e-04,  1.1204e-02, -2.2866e-01,  1.1086e-01, -3.3928e-02,
         -1.3846e-01, -8.5202e-03,  8.6117e-02,  9.5097e-02, -1.2923e-01,
         -2.7905e-03, -6.9797e-02,  1.6902e-01, -1.0969e-01, -1.3452e-01

### Suffix Embeddings 
We can also try to more explicitly include morphological information by embedding the suffix of a word in addition to the word itself. We approximate the "suffix" by just looking at the last two characters of a word.

**Implemented:** `build_suff_to_ix` in `utils.py`. 

It should take in a `word_to_ix` lookup and return a `suff_to_ix` lookup.

**Implemented:** `SuffixAndWordEmbedding` in `neural_net.py`.

This class embeds the words and suffixes in a sentence and then concatenates them to form one embedding. 

**Test**: `tests/test_parser.py:test_suff_word_embeds_d4_2`

In [39]:
! nosetests tests/test_parser.py:test_suff_word_embeds_d4_2

.
----------------------------------------------------------------------
Ran 1 test in 0.004s

OK


In [41]:
reload(utils)
suff_to_ix_en = utils.build_suff_to_ix(word_to_ix_en)
suff_to_ix_nr = utils.build_suff_to_ix(word_to_ix_nr)

In [42]:
len(suff_to_ix_en), len(suff_to_ix_nr)

(1145, 849)

In [43]:
reload(neural_net)
torch.manual_seed(1)
test_sentence = "prefix fixsuf fixinfix".split()
test_word_to_ix = { "prefix": 0, "fixsuf": 1, "fixinfix": 2 }
test_suff_to_ix = utils.build_suff_to_ix(test_word_to_ix)

suff_word_embedder = neural_net.SuffixAndWordEmbedding(test_word_to_ix, test_suff_to_ix, TEST_EMBEDDING_DIM)
test_embs = suff_word_embedder(test_sentence)

In [23]:
test_embs[0]

tensor([[ 0.6614,  0.2669, -0.4519, -0.1661]], grad_fn=<CatBackward>)

In [33]:
test_embs[0]

tensor([[ 0.6614,  0.2669, -1.5228,  0.3817]], grad_fn=<ViewBackward>)

### Pretrained Embeddings 

**Implemented:** `initialize_with_pretrained` in `utils.py`.

It will take a word embedding lookup component and initialize its lookup table with pretrained embeddings, which are provided.

**Test**: `tests/test_parser.py:test_pretrained_embeddings_d4_3`

In [44]:
from torch.nn.utils.rnn import pad_sequence
weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
vals = [torch.tensor([1, 2.3, 3]), torch.tensor([4, 5.1, 6.3])]
output = torch.stack(vals)
embedding = nn.Embedding.from_pretrained(weight)
# Get embeddings for index 1
input = torch.LongTensor([1])
# embedding(input)
# print(weight.size())
print(output)

tensor([[1.0000, 2.3000, 3.0000],
        [4.0000, 5.1000, 6.3000]])


In [45]:
reload(utils);
import pickle
pretrained_embeds = pickle.load(open(consts.PRETRAINED_EMBEDS_FILE, 'rb'))
torch.manual_seed(1)
print(len(word_to_ix_en))
embedder = neural_net.VanillaWordEmbedding(word_to_ix_en,64)
utils.initialize_with_pretrained(pretrained_embeds,embedder)
print(embedder.forward(['four'])[0][0,:5])

18716
tensor([ 0.1243, -0.1147, -0.5684, -0.3970,  0.2294])


In [46]:
! nosetests tests/test_parser.py:test_pretrained_embeddings_d4_3

.
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


In [47]:
import pickle
pretrained_embeds = pickle.load(open(consts.PRETRAINED_EMBEDS_FILE, 'rb'))
print(pretrained_embeds['four'][:5])

[0.12429751455783844, -0.11472601443529129, -0.5684014558792114, -0.396965891122818, 0.22938089072704315]


In [48]:
torch.manual_seed(1)
embedder = neural_net.VanillaWordEmbedding(word_to_ix_en,64)

In [49]:
embedder.forward(['four'])[0][0,:5] # For this cell, don't worry about unmatching output. Just make sure to pass the test.

tensor([ 0.4008, -0.5257, -1.0608,  0.4633, -1.4320], grad_fn=<SliceBackward>)

In [50]:
reload(utils);
utils.initialize_with_pretrained(pretrained_embeds,embedder)
print(embedder.forward(['four'])[0][0,:5])

tensor([ 0.1243, -0.1147, -0.5684, -0.3970,  0.2294])


### Better Arc Component Combination
Before, in order to combine two embeddings during an arc- operation, we just passed them through a feed-forward network and got a dense output.  Now, we will instead use a sequence model of the stack.  The combined embedding from an arc- operation is the next time step of an LSTM.

**Implemented:** `neural_net.LSTMCombiner`.

**Test**: `tests/test_parser.py:test_lstm_combiner_d4_4`

In [51]:
! nosetests tests/test_parser.py:test_lstm_combiner_d4_4

.
----------------------------------------------------------------------
Ran 1 test in 0.089s

OK


In [139]:
print(TEST_EMBEDDING_DIM)

4


In [52]:
reload(neural_net)
torch.manual_seed(1)
combiner = neural_net.LSTMCombiner(TEST_EMBEDDING_DIM,
                                          num_layers=LSTM_NUM_LAYERS,
                                          dropout=DROPOUT)
head_feat = ag.Variable(torch.randn(1,TEST_EMBEDDING_DIM))
mod_feat = ag.Variable(torch.randn(1,TEST_EMBEDDING_DIM))

In [244]:
combined = combiner(head_feat, mod_feat)
combined

tensor([[[ 0.0532, -0.1534,  0.1484, -0.0595]]], grad_fn=<StackBackward>)

### Better action choosing 
Instead of choosing the action from the combiner output independently at each time step, let's use an LSTM to predict the action. This way, past actions can influence the current decision directly. 

**Implemented:** `neural_net.LSTMActionChooser`. 

We use a linear layer to predict the action from the LSTM hidden state.

**Test**: `tests/test_parser.py:test_lstm_action_chooser_d4_5`

In [53]:
! nosetests tests/test_parser.py:test_lstm_action_chooser_d4_5

.
----------------------------------------------------------------------
Ran 1 test in 0.088s

OK


In [54]:
reload(neural_net)
torch.manual_seed(1)
action_chooser = neural_net.LSTMActionChooser(TEST_EMBEDDING_DIM * NUM_FEATURES,
                                                     LSTM_NUM_LAYERS,
                                                     dropout=DROPOUT)
feats = [ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM)) for _ in range(NUM_FEATURES)]

In [248]:
print(TEST_EMBEDDING_DIM)
print(TEST_EMBEDDING_DIM * NUM_FEATURES)

4
12


In [55]:
# TEST VERSION
output = action_chooser(feats)
output

tensor([[-1.0328, -1.1798, -1.0887]], grad_fn=<LogSoftmaxBackward>)

In [41]:
output = action_chooser(feats)
output

tensor([[-1.0328, -1.1798, -1.0887]], grad_fn=<LogSoftmaxBackward>)

In [58]:
def train_parser(parser, optimizer, dataset, n_epochs=1, n_train_insts=1000):
    for epoch in range(n_epochs):
        print("Epoch {}".format(epoch+1))

        parser.train() # turn on dropout layers if they are there
        parsing.train(dataset.training_data[:n_train_insts], parser, optimizer, verbose=True)

        print("Dev Evaluation")
        parser.eval() # turn them off for evaluation
        parsing.evaluate(dataset.dev_data, parser, verbose=True)
        print("F-Score: {}".format(evaluation.compute_metric(parser, dataset.dev_data, evaluation.fscore)))
        print("Attachment Score: {}".format(evaluation.compute_attachment(parser, dataset.dev_data)))
        print("\n")

### Retrain with the new components

In [56]:
print(STACK_EMBEDDING_DIM * NUM_FEATURES)
print(action_chooser)

300
LSTMActionChooser(
  (lstm): LSTM(12, 12)
  (linear): Linear(in_features=12, out_features=3, bias=True)
)


In [57]:
reload(utils)
reload(neural_net)
reload(parsing)
reload(feat_extractors)
torch.manual_seed(1)
stack_dim = STACK_EMBEDDING_DIM
feat_extractor = feat_extractors.SimpleFeatureExtractor()
word_embedding_lookup = neural_net.BiLSTMWordEmbedding(word_to_ix_en,
                                                       WORD_EMBEDDING_DIM,
                                                       STACK_EMBEDDING_DIM,
                                                       num_layers=LSTM_NUM_LAYERS,
                                                       dropout=DROPOUT)

utils.initialize_with_pretrained(pretrained_embeds, word_embedding_lookup)

action_chooser = neural_net.LSTMActionChooser(STACK_EMBEDDING_DIM * NUM_FEATURES,
                                              LSTM_NUM_LAYERS,
                                              dropout=DROPOUT)

combiner = neural_net.LSTMCombiner(STACK_EMBEDDING_DIM,
                                   num_layers=LSTM_NUM_LAYERS,
                                   dropout=DROPOUT)
parser = parsing.TransitionParser(feat_extractor, word_embedding_lookup,
                                  action_chooser, combiner)
optimizer = optim.SGD(parser.parameters(), lr=ETA_0)

In [59]:
# The LSTMs will make this take longer, probably just a few minutes
train_parser(parser, optimizer, en_dataset, n_epochs=2, n_train_insts=1000)

Epoch 1
Number of instances: 1000    Number of network actions: 44560
Acc: 0.7938285457809695  Loss: 21.09894013696909
Dev Evaluation
Number of instances: 501    Number of network actions: 15846
Acc: 0.8525810930203206  Loss: 11.350117835232954
F-Score: 0.5578799618676566
Attachment Score: 0.5458790862047205


Epoch 2
Number of instances: 1000    Number of network actions: 44560
Acc: 0.8786131059245961  Loss: 13.061638945944607
Dev Evaluation
Number of instances: 501    Number of network actions: 15846
Acc: 0.8737220749716017  Loss: 9.969901558881748
F-Score: 0.6049045930885987
Attachment Score: 0.5895494131010981




In [43]:
# The LSTMs will make this take longer, probably just a few minutes
train_parser(parser, optimizer, en_dataset, n_epochs=2, n_train_insts=1000)

Epoch 1
Number of instances: 1000    Number of network actions: 44560
Acc: 0.7991023339317774  Loss: 20.710318389445543
Dev Evaluation
Number of instances: 501    Number of network actions: 15846
Acc: 0.8578821153603433  Loss: 10.88978472700019
F-Score: 0.5828719165056494
Attachment Score: 0.5544616938028525


Epoch 2
Number of instances: 1000    Number of network actions: 44560
Acc: 0.8856597845601436  Loss: 12.440973245767877
Dev Evaluation
Number of instances: 501    Number of network actions: 15846
Acc: 0.8786444528587656  Loss: 9.406325502964313
F-Score: 0.6367727343343508
Attachment Score: 0.6081029912911776




### Dev Predictions: English

**Test**: `tests/test_parser.py:test_dev_preds_d4_6_english`


In [65]:
! nosetests tests/test_parser.py:test_dev_preds_d4_6_english

.
----------------------------------------------------------------------
Ran 1 test in 0.009s

OK


In [63]:
dev_sentences = [ sentence for sentence, _ in en_dataset.dev_data ]
evaluation.output_preds(consts.EN_D4_6_DEV_FILENAME, parser, dev_sentences)

In [64]:
evaluation.output_preds(consts.EN_D4_6_TEST_FILENAME, parser, en_dataset.test_data)

### Deliverable 4.7: Dev Predictions: Norwegian

**Test**: `tests/test_parser.py:test_dev_preds_d4_7_norwegian`


In [71]:
! nosetests tests/test_parser.py:test_dev_preds_d4_7_norwegian

.
----------------------------------------------------------------------
Ran 1 test in 0.013s

OK


In [66]:
torch.manual_seed(1)
feat_extractor_nr = feat_extractors.SimpleFeatureExtractor()
word_embedding_lookup_nr = neural_net.BiLSTMWordEmbedding(word_to_ix_nr,
                                                          WORD_EMBEDDING_DIM,
                                                          STACK_EMBEDDING_DIM,
                                                          num_layers=LSTM_NUM_LAYERS,
                                                          dropout=DROPOUT)
action_chooser_nr = neural_net.FFActionChooser(STACK_EMBEDDING_DIM * NUM_FEATURES)
combiner_nr = neural_net.LSTMCombiner(STACK_EMBEDDING_DIM,
                                          num_layers=LSTM_NUM_LAYERS,
                                          dropout=DROPOUT)
parser_nr = parsing.TransitionParser(feat_extractor_nr, word_embedding_lookup_nr,
                                  action_chooser_nr, combiner_nr)
optimizer_nr = optim.SGD(parser_nr.parameters(), lr=ETA_0)

In [67]:
train_parser(parser_nr, optimizer_nr, nr_dataset, n_epochs=3, n_train_insts=1000)

Epoch 1
Number of instances: 1000    Number of network actions: 30942
Acc: 0.8118415099217892  Loss: 13.315743741195648
Dev Evaluation
Number of instances: 501    Number of network actions: 16028
Acc: 0.8389693037184927  Loss: 11.810927859165458
F-Score: 0.5180403846769993
Attachment Score: 0.48802096331420014


Epoch 2
Number of instances: 1000    Number of network actions: 30942
Acc: 0.8804537521815009  Loss: 8.607644319234415
Dev Evaluation
Number of instances: 501    Number of network actions: 16028
Acc: 0.8583104567007737  Loss: 10.717952607695883
F-Score: 0.5564319361993936
Attachment Score: 0.5351884202645371


Epoch 3
Number of instances: 1000    Number of network actions: 30942
Acc: 0.9157779070519035  Loss: 6.328315814725589
Dev Evaluation
Number of instances: 501    Number of network actions: 16028
Acc: 0.8616795607686548  Loss: 11.288906418453037
F-Score: 0.5700246343932462
Attachment Score: 0.5486648365360619




In [68]:
train_parser(parser_nr, optimizer_nr, nr_dataset, n_epochs=3, n_train_insts=1000)

Epoch 1
Number of instances: 1000    Number of network actions: 30942
Acc: 0.9421821472432292  Loss: 4.55632748673798
Dev Evaluation
Number of instances: 501    Number of network actions: 16028
Acc: 0.8653606189168954  Loss: 12.705599151956273
F-Score: 0.5685171182828339
Attachment Score: 0.5471674569503369


Epoch 2
Number of instances: 1000    Number of network actions: 30942
Acc: 0.9632861482774223  Loss: 3.1026266131536904
Dev Evaluation
Number of instances: 501    Number of network actions: 16028
Acc: 0.8607436985275767  Loss: 15.276605466688448
F-Score: 0.5674847416047215
Attachment Score: 0.5444222610431745


Epoch 3
Number of instances: 1000    Number of network actions: 30942
Acc: 0.9737896709973499  Loss: 2.1643147558134364
Dev Evaluation
Number of instances: 501    Number of network actions: 16028
Acc: 0.8654854005490392  Loss: 17.10550305387595
F-Score: 0.5663808167899317
Attachment Score: 0.5521587222360869




In [69]:
dev_sentences_nr = [ sentence for sentence, _ in nr_dataset.dev_data ]
evaluation.output_preds(consts.NR_D4_7_DEV_FILENAME, parser_nr, dev_sentences_nr)

In [70]:
evaluation.output_preds(consts.NR_D4_7_TEST_FILENAME, parser_nr, nr_dataset.test_data)