** Assignment-3 Neural Transition-Based Dependency Parsing**

Everything should be done ON MY code, no new code.

1. Read https://aclanthology.org/D14-1082.pdf and maybe just write one paragraph summary in your README.md in your github

2. Do something called ablation study (meaning try to delete something so we know the impact of that deleted thing - very common in NLP)

        Recall that we have 18 word + 18 pos + 12 dep features

            Try to delete only the 12 dep features and check UAS
            Try to delete only the 18 pos features and check UAS
            Do another comparison study testing the embedding
3. Chaky uses some embedding Try to use (1) glove embedding (smallest), (2) nn.Embedding (train from scratch) and compare with Chaky's embedding

4. Do some testing, compare 2-3 sentences with spaCy and see whether our neural network gives the same dependency.
Criteria: 0: not done 1: ok 2: with comments/explanation like how Chaky does his tutorial

1. Reading Assignment:

1. Doing same thing what we did on class

In [60]:
# Import neccessary libraries
import sys
import numpy as np
import time
import os
import logging
from collections import Counter
from datetime import datetime
import math

from tqdm import tqdm  #gimmick for progressbar when you train
import pickle #saving and loading models

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import nn, optim

**1. Parsing function**

In [61]:

#basically, it takes the current state of the buffer, stack, dependencies
#tell us how SHIFT, LA, RA changes these three objects

class Parsing(object):
    
    #init stack, buffer, dep
    def __init__(self, sentence):  
        self.sentence = sentence     #['The', 'cat', 'sat]  #conll format which is already in the tokenized form
        self.stack    = ['ROOT']
        self.buffer   = sentence[:]  #in the beginning, everything is inside the buffer
        self.dep      = []           #maintains a list of tuples of dep
    
    #parse function that tells me how shift, la, ra changes these three objects
    def parse_step(self, transition):     #transition could be either S, LA, RA
        if transition == 'S':
            #get the top guy in the buffer and put in stack
            head = self.buffer.pop(0)
            self.stack.append(head)
        elif transition == 'LA':  #stack = [ROOT, He, has] ==> append to dep (has, he) and then He is gone from the stack [ROOT, has]
            dependent = self.stack.pop(-2)  #He
            self.dep.append((self.stack[-1], dependent))  #(has, he)
        elif transition == 'RA':
            #can you guys try to this???
            dependent = self.stack.pop()  #stack = [ROOT, has, control] ==> dep (has, control), control will be gone fromt he stack [ROOT, has]
            self.dep.append((self.stack[-1], dependent))
        else:
            print(f"Bad transition: {transition}")
    
    #given some series of transition, it gonna for-loop the parse function
    def parse(self, transitions):
        for t in transitions:
            self.parse_step(t)
        return self.dep
    
    #check whether things are finished - no need to do anymore functions....
    def is_completed(self):
        return (len(self.buffer) == 0) and (len(self.stack) == 1)  #so buffer is empty and ROOT is the only guy in stack

**Minibatch parsing**

In [62]:
def minibatch_parse(sentences, model, batch_size):
    dep = []  #all the resulting dep
    
    #init Parsing instance for each sentence in the batch
    partial_parses = [Parsing(sentence) for sentence in sentences]  #in tokenized form
    #Parsing(['The', 'cat', 'sat']), Parsing(['Chaky', 'is', 'mad'])
    
    unfinished_parses = partial_parses[:]
    
    #while we still have sentence
    while unfinished_parses:  #if there are still a Parsing object
    
        #take a certain batch of sentence
        minibatch = unfinished_parses[:batch_size] #number of Parsing object
        
        #create a dummy model to tell us what's the next transition for each sentence
        transitions = model.predict(minibatch) 
        #transitions = [S, S, .....]
        #minibatch   = [Parsing(sentence1), Parsing(sentence2)]
        
                
        # for transition predicted this dummy model
        for transition, partial_parse in zip(transitions, minibatch):
            #parse step
            #transition: S
            #partial_parse: Parsing(sentence)
            partial_parse.parse_step(transition)
            
        #remove any sentence is finish
        unfinished_parses[:] = [p for p in unfinished_parses if not p.is_completed()]
    
    dep = [parse.dep for parse in partial_parses]
    
    return dep

**2. Loading dataset**

In [63]:
def read_conll(filename):
    
    examples = []
    
    with open(filename) as f:
        i = 0
        word, pos, head, dep = [], [], [], []
        for line in f.readlines():
            i = i+1
            wa = line.strip().split('\t')  #['1', 'In', '_', 'ADP', 'IN', '_', '5', 'case', '_', '_']
            #In <--------  5th guy
            #     case
            
            if len(wa) == 10:  #if all the columns are there
                word.append(wa[1].lower())
                pos.append(wa[4])
                head.append(int(wa[6]))
                dep.append(wa[7])
            
            #the row is not exactly 10, it means new sentence
            elif len(word) > 0:  #if there is somethign inside the word
                examples.append({'word': word, 'pos': pos, 'head': head, 'dep': dep})  #in the sentence level
                word, pos, head, dep = [], [], [], [] #clear word, pos, head, dep
        
        if len(word) > 0:  #if there is somethign inside the word
            examples.append({'word': word, 'pos': pos, 'head': head, 'dep': dep})  #in the sentence level

    return examples  

In [64]:
def load_data():
    print("1. Loading data")
    train_set = read_conll("D:\\data\\train.conll")
    dev_set   = read_conll("D:\\data\\dev.conll")
    test_set   = read_conll("D:\\data\\test.conll")
    
    #make my dataset smaller because my mac cannot handle it
    train_set = train_set[:1000]
    dev_set   = dev_set[:500]
    test_set  = test_set[:500]
    
    return train_set, dev_set, test_set

Parser 

In [65]:
P_PREFIX = '<p>:' #indicating pos tags
D_PREFIX = '<d>:' #indicating dependency tags
UNK      = '<UNK>'
NULL     = '<NULL>'
ROOT     = '<ROOT>'

class Parser(object):

    def __init__(self, dataset):
        
        #set the root dep
        self.root_dep = 'root'
                
        #get all the dep of the dataset as list, e.g., ['root', 'acl', 'nmod', 'nmod:npmod']
        all_dep = [self.root_dep] + list(set([w for ex in dataset
                                               for w in ex['dep']
                                               if w != self.root_dep]))
        
        #1. put dep into tok2id lookup table, with D_PREFIX so we know it is dependency
        #{'D_PREFIX:root': 0, 'D_PREFIX:acl': 1, 'D_PREFIX:nmod': 2, ..., 'D_PREFIX:<NULL>': 30}
        tok2id = {D_PREFIX + l: i for (i, l) in enumerate(all_dep)}
        tok2id[D_PREFIX + NULL] = self.D_NULL = len(tok2id)
        
        #we are using "unlabeled" where we do not label with the dependency
        #thus the number of dependency relation is 1
        trans = ['L', 'R', 'S']
        self.n_deprel = 1 #dependency relationship #because we ar not predicting the realtions, we are only predicting S, L, R
        
        #create a simple lookup table mapping action and id
        #e.g., tran2id: {'L': 0, 'R': 1, 'S': 2}
        #e.g., id2tran: {0: 'L', 1: 'R', 2: 'S'}
        self.n_trans = len(trans)
        self.tran2id = {t: i for (i, t) in enumerate(trans)} #use for easy coding
        self.id2tran = {i: t for (i, t) in enumerate(trans)} 

        #2. put pos tags into tok2id lookup table, with P_PREFIX so we know it is pos
        tok2id.update(build_dict([P_PREFIX + w for ex in dataset for w in ex['pos']],
                                  offset=len(tok2id)))
        tok2id[P_PREFIX + UNK]  = self.P_UNK  = len(tok2id)  #also remember the pos tags of unknown
        tok2id[P_PREFIX + NULL] = self.P_NULL = len(tok2id)
        tok2id[P_PREFIX + ROOT] = self.P_ROOT = len(tok2id)
        
        #now tok2id:  {'P_PREFIX:root': 0, 'P_PREFIX:acl': 1, ..., 'P_PREFIX:JJR': 62, 'P_PREFIX:<UNK>': 63, 'P_PREFIX:<NULL>': 64, 'P_PREFIX:<ROOT>': 65}
        
        #3. put word into tok2id lookup table
        tok2id.update(build_dict([w for ex in dataset for w in ex['word']],
                                  offset=len(tok2id)))
        tok2id[UNK]  = self.UNK = len(tok2id)
        tok2id[NULL] = self.NULL = len(tok2id)
        tok2id[ROOT] = self.ROOT = len(tok2id)
        
        #now tok2id: {'D_PREFIX:root': 0, 'D_PREFIX:acl': 1, 'D_PREFIX:nmod': 2, ..., 'memory': 340, 'mr.': 341, '<UNK>': 342, '<NULL>': 343, '<ROOT>': 344}
        
        #create id2tok
        self.tok2id = tok2id
        self.id2tok = {v: k for (k, v) in tok2id.items()}
        
        #why 18 normal features + 18 (pos) + 12 (dep)
        #18 features - top 3 words on buffer, top 3 words on stack, 
        # the first and second left most/rightmost children of the top two words on the stack
        # the leftmost of leftmost/rightmost of rightmost children of the top two words on the stack
        #18 pos - basically corresponding POS tags
        #12 dep - corresponding ARC, excluding 6 words on hte stack/buffer..
        self.n_features = 18 + 18 + 12
        self.n_tokens = len(tok2id)
        
    #utility function, in case we want to convert token to id
    #function to turn train set with words to train set with id instead using tok2id
    def numericalize(self, examples):
        numer_examples = []
        for ex in examples:
            word = [self.ROOT] + [self.tok2id[w] if w in self.tok2id
                                  else self.UNK for w in ex['word']]
            pos  = [self.P_ROOT] + [self.tok2id[P_PREFIX + w] if P_PREFIX + w in self.tok2id
                                   else self.P_UNK for w in ex['pos']]
            head = [-1] + ex['head']
            dep  = [-1] + [self.tok2id[D_PREFIX + w] if D_PREFIX + w in self.tok2id
                            else -1 for w in ex['dep']]
            numer_examples.append({'word': word, 'pos': pos,
                                 'head': head, 'dep': dep})
        return numer_examples

    #function to extract features to form a feature embedding matrix
    def extract_features(self, stack, buf, arcs, ex):
             
        #ex['word']:  [55, 32, 33, 34, 35, 30], i.e., ['root', 'ms.', 'haag', 'plays', 'elianti', '.']
        #ex['pos']:   [29, 14, 14, 16, 14, 17], i.e., ['NNP', 'NNP', 'VBZ', 'NNP', '.']
        #ex['head']:  [-1, 2, 3, 0, 3, 3]  or ['root', 'compound', 'nsubj', 'root', 'dobj', 'punct']}
        #ex['dep']:   [-1, 1, 2, 0, 6, 12] or ['compound', 'nsubj', 'root', 'dobj', 'punct']

        #stack     :  [0]
        #buffer    :  [1, 2, 3, 4, 5]
        
        if stack[0] == "ROOT":
            stack[0] = 0  #start the stack with [ROOT]

        #get leftmost children based on the dependency arcs
        def get_lc(k):
            return sorted([arc[1] for arc in arcs if arc[0] == k and arc[1] < k])

        #get right most children based on the dependency arcs
        def get_rc(k):
            return sorted([arc[1] for arc in arcs if arc[0] == k and arc[1] > k],
                          reverse=True)

        p_features = [] #pos features (2a, 2b, 2c) - 18
        d_features = [] #dep features (3b, 3c) - 12
        
        #last 3 things on the stack as features
        #if the stack is less than 3, then we simply append NULL from the left
        features = [self.NULL] * (3 - len(stack)) + [ex['word'][x] for x in stack[-3:]]
        
        # next 3 things on the buffer as features
        #if the buffer is less than 3, simply append NULL
        #the reason why NULL is appended on end because buffer is read left to right
        features += [ex['word'][x] for x in buf[:3]] + [self.NULL] * (3 - len(buf))
        
        #corresponding pos tags
        p_features = [self.P_NULL] * (3 - len(stack)) + [ex['pos'][x] for x in stack[-3:]]
        p_features += [ex['pos'][x] for x in buf[:3]] + [self.P_NULL] * (3 - len(buf))
        
        #get the leftmost and rightmost children of the top two words, thus we loop 2 times
        for i in range(2):
            if i < len(stack):
                k = stack[-i-1] #-1, -2 last two in the stack
                
                #the first and second lefmost/rightmost children of the top two words (i=1, 2) on the stack
                lc = get_lc(k)  
                rc = get_rc(k)
                
                #the leftmost of leftmost/rightmost of rightmost children of the top two words on the stack:
                llc = get_lc(lc[0]) if len(lc) > 0 else []
                rrc = get_rc(rc[0]) if len(rc) > 0 else []

                #(leftmost of first word on stack, rightmost of first word, 
                # leftmost of the second word on stack, rightmost of second, 
                # leftmost of leftmost, rightmost of rightmost
                features.append(ex['word'][lc[0]] if len(lc) > 0 else self.NULL)
                features.append(ex['word'][rc[0]] if len(rc) > 0 else self.NULL)
                features.append(ex['word'][lc[1]] if len(lc) > 1 else self.NULL)
                features.append(ex['word'][rc[1]] if len(rc) > 1 else self.NULL)
                features.append(ex['word'][llc[0]] if len(llc) > 0 else self.NULL)
                features.append(ex['word'][rrc[0]] if len(rrc) > 0 else self.NULL)

                #corresponding pos
                p_features.append(ex['pos'][lc[0]] if len(lc) > 0 else self.P_NULL)
                p_features.append(ex['pos'][rc[0]] if len(rc) > 0 else self.P_NULL)
                p_features.append(ex['pos'][lc[1]] if len(lc) > 1 else self.P_NULL)
                p_features.append(ex['pos'][rc[1]] if len(rc) > 1 else self.P_NULL)
                p_features.append(ex['pos'][llc[0]] if len(llc) > 0 else self.P_NULL)
                p_features.append(ex['pos'][rrc[0]] if len(rrc) > 0 else self.P_NULL)
            
                #corresponding dep
                d_features.append(ex['dep'][lc[0]] if len(lc) > 0 else self.D_NULL)
                d_features.append(ex['dep'][rc[0]] if len(rc) > 0 else self.D_NULL)
                d_features.append(ex['dep'][lc[1]] if len(lc) > 1 else self.D_NULL)
                d_features.append(ex['dep'][rc[1]] if len(rc) > 1 else self.D_NULL)
                d_features.append(ex['dep'][llc[0]] if len(llc) > 0 else self.D_NULL)
                d_features.append(ex['dep'][rrc[0]] if len(rrc) > 0 else self.D_NULL)
                
            else:
                #attach NULL when they don't exist
                features += [self.NULL] * 6
                p_features += [self.P_NULL] * 6
                d_features += [self.D_NULL] * 6

        features += p_features + d_features
        assert len(features) == self.n_features  #assert they are 18 + 18 + 12
        return features

    #decide whether to shift, leftarc, or rightarc, based on gold parse trees
    #this is needed to create training examples which contain samples and ground truth
    def get_oracle(self, stack, buf, ex):

        #leave if the stack is only 1, thus nothing to predict....
        if len(stack) < 2:
            return self.n_trans - 1

        #predict based on the last two words on the stack
        #stack (ROOT, he, has)
        i0 = stack[-1] #has
        i1 = stack[-2] #he

        #get the head and dependency
        h0 = ex['head'][i0]
        h1 = ex['head'][i1]
        d0 = ex['dep'][i0]
        d1 = ex['dep'][i1]

        #either shift, left arc or right arc
        #"Shift" = 2; "LA" = 0; "RA" = 1
        #if head of the second last word is the last word, then leftarc
        if (i1 > 0) and (h1 == i0):
            return 0
        #if head of the last word is the second last word, then rightarc
        #make sure nothing in the buffer has head with the last word on the stack
        #otherwise, we lose the last word.....
        elif (i1 >= 0) and (h0 == i1) and \
                (not any([x for x in buf if ex['head'][x] == i0])):
            return 1
        #otherwise shift, if something is left in buffer, otherwise, do nothing....
        else:
            return None if len(buf) == 0 else 2

    #generate training examples
    #from the training sentences and their gold parse trees 
    def create_instances(self, examples): #examples = word, pos, head, dep
        all_instances = []
        
        for i, ex in enumerate(examples):
            #Ms. Hang plays Elianti .
            #e.g., ex['word]: [344, 163, 99, 164, 165, 68]
            #here 344 stands for ROOT
            #Chaky - I chated and take a look
            n_words = len(ex['word']) - 1  #excluding the root

            #arcs = {(head, tail, dependency label)}
            stack = [0]
            buf = [i + 1 for i in range(n_words)]  #[1, 2, 3, 4, 5]
            arcs = []
            instances = []
            
            #because that's the maximum number of shift, leftarcs, rightarcs you can have
            #this will determine the sample size of each training example
            #if given five words, we will get a sample of (10, 48) where 10 comes from 5 * 2, and 48 is n_features
            #but this for loop can be break if there is nothing left....
            for i in range(n_words * 2): #during stack[0] buffer[1, 2, 3, 4, 5] #maximum times you can do either S, L, R

                #get the gold transition based on the parse trees
                #gold_t can be either shift(2), leftarc(0), or rightarc(1)
                gold_t = self.get_oracle(stack, buf, ex)
                
                #if gold_t is None, no need to extract features.....
                if gold_t is None:
                    break
                
                #make sure when the model predicts, we inform the current state of stack and buffer, so
                #the model is not allowed to make any illegal action, e.g., buffer is empty but trying to pop
                legal_labels = self.legal_labels(stack, buf)                
                assert legal_labels[gold_t] == 1
               
                #extract all the 48 features 
                features = self.extract_features(stack, buf, arcs, ex)
                instances.append((features, legal_labels, gold_t))
            
                #shift 
                if gold_t == 2:
                    stack.append(buf[0])
                    buf = buf[1:]
                #left arc 
                elif gold_t == 0:
                    arcs.append((stack[-1], stack[-2], gold_t))
                    stack = stack[:-2] + [stack[-1]]
                #right arc
                else:
                    arcs.append((stack[-2], stack[-1], gold_t - self.n_deprel))
                    stack = stack[:-1]
            else:
                all_instances += instances

        return all_instances

    #provide an one hot encoding of the labels
    def legal_labels(self, stack, buf):
        labels =  ([1] if len(stack) > 2  else [0]) * self.n_deprel #left arc but you cannot do ROOT <-----He
        labels += ([1] if len(stack) >= 2 else [0]) * self.n_deprel #right arc because ROOT ----> He
        labels += [1] if len(buf) > 0 else [0] #shift
        return labels
    
    #a simple function to check punctuation POS tags
    def punct(self, pos):
        return pos in ["''", ",", ".", ":", "``", "-LRB-", "-RRB-"]

    def parse(self, dataset, eval_batch_size=5000):
        sentences = []
        sentence_id_to_idx = {}
                
        for i, example in enumerate(dataset):
            
            #example['word']=[188, 186, 186, ..., 59]
            #n_words=37
            #sentence=[1, 2, 3, 4, 5,.., 37]
                        
            n_words = len(example['word']) - 1
            sentence = [j + 1 for j in range(n_words)]            
            sentences.append(sentence)
            
            #mapping the object unique id to the i            
            #The id is the object's memory address
            sentence_id_to_idx[id(sentence)] = i
            
        model = ModelWrapper(self, dataset, sentence_id_to_idx)
        dependencies = minibatch_parse(sentences, model, eval_batch_size)
                
        UAS = all_tokens = 0.0
        with tqdm(total=len(dataset)) as prog:
            for i, ex in enumerate(dataset):
                head = [-1] * len(ex['word'])
                for h, t, in dependencies[i]:
                    head[t] = h
                for pred_h, gold_h, gold_l, pos in \
                        zip(head[1:], ex['head'][1:], ex['dep'][1:], ex['pos'][1:]):
                        assert self.id2tok[pos].startswith(P_PREFIX)
                        pos_str = self.id2tok[pos][len(P_PREFIX):]
                        if (not self.punct(pos_str)):
                            UAS += 1 if pred_h == gold_h else 0
                            all_tokens += 1
                prog.update(i + 1)
        UAS /= all_tokens
        return UAS, dependencies

In [66]:
class ModelWrapper(object):
    def __init__(self, parser, dataset, sentence_id_to_idx):
        self.parser = parser
        self.dataset = dataset
        self.sentence_id_to_idx = sentence_id_to_idx

    def predict(self, partial_parses):
        mb_x = [self.parser.extract_features(p.stack, p.buffer, p.dep,
                                             self.dataset[self.sentence_id_to_idx[id(p.sentence)]])
                for p in partial_parses]
        mb_x = np.array(mb_x).astype('int32')
        mb_x = torch.from_numpy(mb_x).long()
        mb_l = [self.parser.legal_labels(p.stack, p.buffer) for p in partial_parses]

        pred = self.parser.model(mb_x)
        pred = pred.detach().numpy()
        
        #we need to multiply 10000 with legal labels, to force the model not to make any impossible prediction
        #other, when we parse sequentially, sometimes there is nothing in the buffer or stack, thus error....        
        pred = np.argmax(pred + 10000 * np.array(mb_l).astype('float32'), 1)
        pred = ["S" if p == 2 else ("LA" if p == 0 else "RA") for p in pred]
        
        return pred

In [67]:
#a simple function to create ids.....
def build_dict(keys, offset=0):
    #keys = ['P_PREFIX:IN', 'P_PREFIX:DT', 'P_PREFIX:NNP', 'P_PREFIX:CD', so on...]
    #offset is needed because this tok2id has something already inside....
    count = Counter()
    for key in keys:
        count[key] += 1
    
    #most_common =X [('P_PREFIX:NN', 70), ('P_PREFIX:IN', 57), ... , ('P_PREFIX:JJR', 1)]
    #we use most_common in case we only want some maximum pos tags....
    mc = count.most_common()
    
    #{'P_PREFIX:NN': 31, 'P_PREFIX:IN': 32, .., 'P_PREFIX:JJR': 62} 
    return {w[0]: index + offset for (index, w) in enumerate(mc)}

In [68]:
train_set, dev_set, test_set = load_data()
len(train_set), len(dev_set), len(test_set)

1. Loading data


(1000, 500, 500)

In [69]:
print('2. Building parser....')
start = time.time()
parser = Parser(train_set)
print("took {:.2f} seconds".format(time.time()-start))

2. Building parser....
took 0.05 seconds


In [70]:
#before numericalize
print('Word: ',train_set[1]['word'])
print('Pos: ',train_set[1]['pos'])
print('Head: ',train_set[1]['head'])
print('Dep: ',train_set[1]['dep'])

Word:  ['ms.', 'haag', 'plays', 'elianti', '.']
Pos:  ['NNP', 'NNP', 'VBZ', 'NNP', '.']
Head:  [2, 3, 0, 3, 3]
Dep:  ['compound', 'nsubj', 'root', 'dobj', 'punct']


In [71]:
train_set = parser.numericalize(train_set)
dev_set = parser.numericalize(dev_set)
test_set = parser.numericalize(test_set)

In [72]:
#after numericalize (rootis added in front)
print('Word: ',train_set[1]['word'])
print('Pos: ',train_set[1]['pos'])
print('Head: ',train_set[1]['head'])
print('Dep: ',train_set[1]['dep'])

Word:  [5156, 304, 1364, 1002, 2144, 87]
Pos:  [84, 42, 42, 55, 42, 46]
Head:  [-1, 2, 3, 0, 3, 3]
Dep:  [-1, 19, 34, 0, 35, 5]


**1.4. Word Embedding**

In [73]:
print("4. Loading pretrained embeddings...",)
#config = Config()
start = time.time()
word_vectors = {}
for line in open('D:\\data\\en-cw.txt').readlines():
    we = line.strip().split() #we = word embeddings - first column: word;  the rest is embedding
    word_vectors[we[0]] = [float(x) for x in we[1:]] #{word: [list of 50 numbers], nextword: [another list], so on...}
    
#create an empty embedding matrix holding the embedding lookup table (vocab size, embed dim)
#we use random.normal instead of zeros, to keep the embedding matrix arbitrary in case word vectors don't exist....
embeddings_matrix = np.asarray(np.random.normal(0, 0.9, (parser.n_tokens, 50)), dtype='float32')

for token in parser.tok2id:
        i = parser.tok2id[token]
        if token in word_vectors:
            embeddings_matrix[i] = word_vectors[token]
        elif token.lower() in word_vectors:
            embeddings_matrix[i] = word_vectors[token.lower()]
print("Embedding matrix shape (vocab, emb size): ", embeddings_matrix.shape)
print("took {:.2f} seconds".format(time.time() - start))

4. Loading pretrained embeddings...
Embedding matrix shape (vocab, emb size):  (5157, 50)
took 4.19 seconds


**1.5 PreProcesing**

In [74]:
print("5. Preprocessing training data...",)
start = time.time()
train_examples = parser.create_instances(train_set)
print("took {:.2f} seconds".format(time.time() - start))

5. Preprocessing training data...
took 1.82 seconds


**1.6. Minibatch loader**

In [75]:
def get_minibatches(data, minibatch_size, shuffle=True):
    data_size = len(data[0])
    indices = np.arange(data_size)
    if shuffle:
        np.random.shuffle(indices)
    for minibatch_start in np.arange(0, data_size, minibatch_size):
        minibatch_indices = indices[minibatch_start:minibatch_start + minibatch_size]
        yield [_minibatch(d, minibatch_indices) for d in data]

def _minibatch(data, minibatch_idx):
    return data[minibatch_idx] if type(data) is np.ndarray else [data[i] for i in minibatch_idx]

def minibatches(data, batch_size):
    x = np.array([d[0] for d in data])
    y = np.array([d[2] for d in data])
    one_hot = np.zeros((y.size, 3))
    one_hot[np.arange(y.size), y] = 1
    return get_minibatches([x, one_hot], batch_size)

**1.7. Neural Network**

In [76]:
class ParserModel(nn.Module):
    def __init__(self, embeddings_matrix, hidden_size = 200, embed_size = 50,n_features = 48):
        super(ParserModel, self).__init__()
        self.transition_size = 3
        self.n_features = n_features
        self.embed_size = embeddings_matrix.shape[1]
        self.pretrained_embeddings = nn.Embedding(embeddings_matrix.shape[0], self.embed_size)
        self.pretrained_embeddings.weight = nn.Parameter(torch.tensor(embeddings_matrix))
        self.hidden1 = nn.Linear(self.n_features * embed_size, hidden_size) 
        self.hidden2 = nn.Linear(hidden_size, self.transition_size)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.5)

    def embedding_lookup(self, x):
        batch_size = x.size()[0]
        x = self.pretrained_embeddings(x)        
        x = x.reshape(-1, self.n_features * self.embed_size) # x = (1024, 48 * 50)
        return x

    def forward(self,x):
        #x: (batch_size,48)
        #goes to the embeddding layer ==> (batch_sizem, 40 * emb_size)
        input_embed = self.embedding_lookup(x)
        #goes through the linear layer ==> (batch_sizem, 40 *hid_size)
        h1 = self.dropout(self.relu(self.hidden1(input_embed)))
        #do relu then dropout
        #compute the logits (basically a linear layer that converts to (batch_size,transition_size) ==> (batch_size,3))
        logits = self.hidden2(h1)
        return logits


In [77]:
# to get the average.....
class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

In [78]:
def train_for_epoch(parser, train_data, dev_data, optimizer, loss_func, batch_size, dep_tags):
    
    parser.model.train()  # Places model in "train" mode, i.e. apply dropout layer
    n_minibatches = math.ceil(len(train_data) / batch_size)
    loss_meter = AverageMeter()

    with tqdm(total=(n_minibatches)) as prog:
        for i, (train_x, train_y) in enumerate(minibatches(train_data, batch_size, dep_tags)):
            
            #train_x:  batch_size, n_features
            #train_y:  batch_size, target(=3)
            
            optimizer.zero_grad() 
            loss = 0.
            train_x = torch.from_numpy(train_x).long()  #long() for int so embedding works....
            train_y = torch.from_numpy(train_y.nonzero()[1]).long()  #get the index with 1 because torch expects label to be single integer

            # Forward pass: compute predicted logits.
            logits = parser.model(train_x)
            # Compute loss
            loss = loss_func(logits, train_y)
            # Compute gradients of the loss w.r.t model parameters.
            loss.backward()
            # Take step with optimizer.
            optimizer.step()

            prog.update(1)
            loss_meter.update(loss.item())

    print("Average Train Loss: {}".format(loss_meter.avg))
    print("Evaluating on dev set",)
    parser.model.eval()  # Places model in "eval" mode, i.e. don't apply dropout layer
    
    dev_UAS, _ = parser.parse(dev_data)
    print("- dev UAS: {:.2f}".format(dev_UAS * 100.0))
    return dev_UAS

In [79]:
def train(parser, train_data, dev_data, output_path, batch_size=1024, n_epochs=10, lr=0.0005):
    
    best_dev_UAS = 0
    
    optimizer = optim.Adam(parser.model.parameters(), lr=0.001)
    loss_func = nn.CrossEntropyLoss()

    for epoch in range(n_epochs):
        print("Epoch {:} out of {:}".format(epoch + 1, n_epochs))
        dev_UAS = train_for_epoch(
            parser, train_data, dev_data, optimizer, loss_func, batch_size)
        if dev_UAS > best_dev_UAS:
            best_dev_UAS = dev_UAS
            print("New best dev UAS! Saving model.")
            torch.save(parser.model.state_dict(), output_path)
        print("")


def train_for_epoch(parser, train_data, dev_data, optimizer, loss_func, batch_size):
    
    parser.model.train()  # Places model in "train" mode, i.e. apply dropout layer
    n_minibatches = math.ceil(len(train_data) / batch_size)
    loss_meter = AverageMeter()

    with tqdm(total=(n_minibatches)) as prog:
        for i, (train_x, train_y) in enumerate(minibatches(train_data, batch_size)):
            
            #train_x:  batch_size, n_features
            #train_y:  batch_size, target(=3)
            
            optimizer.zero_grad() 
            loss = 0.
            train_x = torch.from_numpy(train_x).long()  #long() for int so embedding works....
            train_y = torch.from_numpy(train_y.nonzero()[1]).long()  #get the index with 1 because torch expects label to be single integer

            # Forward pass: compute predicted logits.
            logits = parser.model(train_x)
            # Compute loss
            loss = loss_func(logits, train_y)
            # Compute gradients of the loss w.r.t model parameters.
            loss.backward()
            # Take step with optimizer.
            optimizer.step()

            prog.update(1)
            loss_meter.update(loss.item())

    print("Average Train Loss: {}".format(loss_meter.avg))
    print("Evaluating on dev set",)
    parser.model.eval()  # Places model in "eval" mode, i.e. don't apply dropout layer
        
    dev_UAS, _ = parser.parse(dev_data)
    print("- dev UAS: {:.2f}".format(dev_UAS * 100.0))
    return dev_UAS

**1.8. Training**

In [80]:
#create directory if it does not exist for saving the weights...
output_dir = "output/{:%Y%m%d_%H%M%S}/".format(datetime.now())
output_path = output_dir + "model.weights"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
    
print(80 * "=")
print("TRAINING")
print(80 * "=")
    
model = ParserModel(embeddings_matrix)
parser.model = model

start = time.time()
train(parser, train_examples, dev_set, output_path,
      batch_size=1024, n_epochs=10, lr=0.0005)

TRAINING
Epoch 1 out of 10


100%|██████████| 48/48 [00:05<00:00,  8.65it/s]


Average Train Loss: 0.6508162667353948
Evaluating on dev set


125250it [00:00, 4175070.34it/s]       


- dev UAS: 53.10
New best dev UAS! Saving model.

Epoch 2 out of 10


100%|██████████| 48/48 [00:03<00:00, 12.04it/s]


Average Train Loss: 0.35207876066366833
Evaluating on dev set


125250it [00:00, 6263700.68it/s]       


- dev UAS: 60.49
New best dev UAS! Saving model.

Epoch 3 out of 10


100%|██████████| 48/48 [00:03<00:00, 13.93it/s]


Average Train Loss: 0.2819244544953108
Evaluating on dev set


125250it [00:00, 7368284.44it/s]       


- dev UAS: 66.64
New best dev UAS! Saving model.

Epoch 4 out of 10


100%|██████████| 48/48 [00:03<00:00, 13.45it/s]


Average Train Loss: 0.24733912448088327
Evaluating on dev set


125250it [00:00, 5964514.87it/s]       


- dev UAS: 68.48
New best dev UAS! Saving model.

Epoch 5 out of 10


100%|██████████| 48/48 [00:05<00:00,  8.84it/s]


Average Train Loss: 0.2216050180916985
Evaluating on dev set


125250it [00:00, 4473538.52it/s]       


- dev UAS: 67.55

Epoch 6 out of 10


100%|██████████| 48/48 [00:04<00:00, 11.43it/s]


Average Train Loss: 0.2006984998782476
Evaluating on dev set


125250it [00:00, 8949210.86it/s]       


- dev UAS: 70.72
New best dev UAS! Saving model.

Epoch 7 out of 10


100%|██████████| 48/48 [00:03<00:00, 13.41it/s]


Average Train Loss: 0.18233583432932696
Evaluating on dev set


125250it [00:00, 7367251.13it/s]       


- dev UAS: 72.92
New best dev UAS! Saving model.

Epoch 8 out of 10


100%|██████████| 48/48 [00:03<00:00, 14.43it/s]


Average Train Loss: 0.16944028188784918
Evaluating on dev set


125250it [00:00, 7828225.79it/s]       


- dev UAS: 73.56
New best dev UAS! Saving model.

Epoch 9 out of 10


100%|██████████| 48/48 [00:03<00:00, 15.32it/s]


Average Train Loss: 0.15895312807212272
Evaluating on dev set


125250it [00:00, 4319101.02it/s]       


- dev UAS: 73.91
New best dev UAS! Saving model.

Epoch 10 out of 10


100%|██████████| 48/48 [00:03<00:00, 12.90it/s]


Average Train Loss: 0.14699669275432825
Evaluating on dev set


125250it [00:00, 7367354.44it/s]       

- dev UAS: 74.20
New best dev UAS! Saving model.






**1.9. Testing model**

In [81]:
print(80 * "=")
print("TESTING")
print(80 * "=")

print("Restoring the best model weights found on the dev set")
parser.model.load_state_dict(torch.load(output_path))
print("Final evaluation on test set",)
parser.model.eval()
UAS, dependencies = parser.parse(test_set)
print("- test UAS: {:.2f}".format(UAS * 100.0))
print("Done!")

TESTING
Restoring the best model weights found on the dev set
Final evaluation on test set


125250it [00:00, 5218972.73it/s]       

- test UAS: 73.96
Done!





**Question number 2 Ablation Study**

2. Do something called ablation study (meaning try to delete something so we know the impact of that deleted thing - very common in NLP)

        Recall that we have 18 word + 18 pos + 12 dep features

            Try to delete only the 12 dep features and check UAS
            Try to delete only the 18 pos features and check UAS
            Do another comparison study testing the embedding

Just changing the condition what professor did like if pos_tags comes then excute this or if dep_tags comes then execute this and if both comes then excute this :

In [82]:
P_PREFIX = '<p>:' #indicating pos tags
D_PREFIX = '<d>:' #indicating dependency tags
UNK      = '<UNK>'
NULL     = '<NULL>'
ROOT     = '<ROOT>'

class AblationParser(object):

    def __init__(self, dataset, pos_tags = True, dep_tags = True): #here, pos is the part of speech, dep-f is the  dependency tags
        
        #set pos and dep feature
        self.pos_tags = pos_tags
        self.dep_tags = dep_tags
        tok2id = dict() 
        
       
        if self.dep_tags :  #### Dependency Features : If it is False, dependency relationship will not run or it will skip.
            #set the root dep
            self.root_dep = 'root'
                
            #get all the dep of the dataset as list, e.g., ['root', 'acl', 'nmod', 'nmod:npmod']
            all_dep = [self.root_dep] + list(set([w for ex in dataset
                                               for w in ex['dep']
                                               if w != self.root_dep]))
        
            #1. put dep into tok2id lookup table, with D_PREFIX so we know it is dependency
            #{'D_PREFIX:root': 0, 'D_PREFIX:acl': 1, 'D_PREFIX:nmod': 2, ..., 'D_PREFIX:<NULL>': 30}
            tok2id = {D_PREFIX + l: i for (i, l) in enumerate(all_dep)}
            tok2id[D_PREFIX + NULL] = self.D_NULL = len(tok2id)
        
        #we are using "unlabeled" where we do not label with the dependency
        #thus the number of dependency relation is 1
        trans = ['L', 'R', 'S']
        self.n_deprel = 1 
        
        #create a simple lookup table mapping action and id
        #e.g., tran2id: {'L': 0, 'R': 1, 'S': 2}
        #e.g., id2tran: {0: 'L', 1: 'R', 2: 'S'}
        self.n_trans = len(trans)
        self.tran2id = {t: i for (i, t) in enumerate(trans)} #use for easy coding
        self.id2tran = {i: t for (i, t) in enumerate(trans)} 
        
       
        if self.pos_tags :  #### POS tag features : If it is False, it will skip pos_featues
            #2. put pos tags into tok2id lookup table, with P_PREFIX so we know it is pos
            tok2id.update(build_dict([P_PREFIX + w for ex in dataset for w in ex['pos']],
                                    offset=len(tok2id)))
            tok2id[P_PREFIX + UNK]  = self.P_UNK  = len(tok2id)  #also remember the pos tags of unknown
            tok2id[P_PREFIX + NULL] = self.P_NULL = len(tok2id)
            tok2id[P_PREFIX + ROOT] = self.P_ROOT = len(tok2id)
        
        #now tok2id:  {'P_PREFIX:root': 0, 'P_PREFIX:acl': 1, ..., 'P_PREFIX:JJR': 62, 'P_PREFIX:<UNK>': 63, 'P_PREFIX:<NULL>': 64, 'P_PREFIX:<ROOT>': 65}
        
        #### Word Features 
        #3. put word into tok2id lookup table
        tok2id.update(build_dict([w for ex in dataset for w in ex['word']],
                                  offset=len(tok2id)))
        tok2id[UNK]  = self.UNK = len(tok2id)
        tok2id[NULL] = self.NULL = len(tok2id)
        tok2id[ROOT] = self.ROOT = len(tok2id)
        
        #now tok2id: {'D_PREFIX:root': 0, 'D_PREFIX:acl': 1, 'D_PREFIX:nmod': 2, ..., 'memory': 340, 'mr.': 341, '<UNK>': 342, '<NULL>': 343, '<ROOT>': 344}
        
        #create id2tok
        self.tok2id = tok2id
        self.id2tok = {v: k for (k, v) in tok2id.items()}
        
        #why 18 normal features + 18 (pos) + 12 (dep)
        #18 features - top 3 words on buffer, top 3 words on stack, 
        # the first and second left most/rightmost children of the top two words on the stack
        # the leftmost of leftmost/rightmost of rightmost children of the top two words on the stack
        #18 pos - basically corresponding POS tags
        #12 dep - corresponding ARC, excluding 6 words on hte stack/buffer..

        if self.dep_tags and self.pos_tags: #if we have both dep_tags and pos_tags then, we will have 18 (normal featuers) + 18 (pos_tags) + 12(dep_tags) featuers
            self.n_features = 18 + 18 + 12 
        elif self.pos_tags :
            self.n_features = 18 + 18 # 18 normal features + 18 (pos_tags) 
        elif self.dep_tags :
            self.n_features = 18 + 12 # 18 normal features + 12 (dep_tags)
        else:
            self.n_features = 18 # 18 normal features

        self.n_tokens = len(tok2id)
        
    #utility function, in case we want to convert token to id
    #function to turn train set with words to train set with id instead using tok2id
    def numericalize(self, examples):
        numer_examples = []
        for ex in examples:
            word = [self.ROOT] + [self.tok2id[w] if w in self.tok2id
                                  else self.UNK for w in ex['word']]
            if self.pos_tags:
                pos  = [self.P_ROOT] + [self.tok2id[P_PREFIX + w] if P_PREFIX + w in self.tok2id
                                   else self.P_UNK for w in ex['pos']]
            head = [-1] + ex['head']
            if self.dep_tags:
                dep  = [-1] + [self.tok2id[D_PREFIX + w] if D_PREFIX + w in self.tok2id
                            else -1 for w in ex['dep']]
            
            if self.dep_tags and self.pos_tags:
                numer_examples.append({'word': word, 'pos': pos,
                                 'head': head, 'dep': dep})
            elif self.pos_tags :
                numer_examples.append({'word': word, 'pos': pos,
                                 'head': head})
            elif self.dep_tags :
                numer_examples.append({'word': word,
                                 'head': head, 'dep': dep})
            else:
                numer_examples.append({'word': word, 
                                 'head': head})
        return numer_examples

    #function to extract features to form a feature embedding matrix
    def extract_features(self, stack, buf, arcs, ex):
             
        #ex['word']:  [55, 32, 33, 34, 35, 30], i.e., ['root', 'ms.', 'haag', 'plays', 'elianti', '.']
        #ex['pos']:   [29, 14, 14, 16, 14, 17], i.e., ['NNP', 'NNP', 'VBZ', 'NNP', '.']
        #ex['head']:  [-1, 2, 3, 0, 3, 3]  or ['root', 'compound', 'nsubj', 'root', 'dobj', 'punct']}
        #ex['dep']:   [-1, 1, 2, 0, 6, 12] or ['compound', 'nsubj', 'root', 'dobj', 'punct']

        #stack     :  [0]
        #buffer    :  [1, 2, 3, 4, 5]
        
        if stack[0] == "ROOT":
            stack[0] = 0  #start the stack with [ROOT]

        #get leftmost children based on the dependency arcs
        def get_lc(k):
            return sorted([arc[1] for arc in arcs if arc[0] == k and arc[1] < k])

        #get right most children based on the dependency arcs
        def get_rc(k):
            return sorted([arc[1] for arc in arcs if arc[0] == k and arc[1] > k],
                          reverse=True)

        p_features = [] #pos features (2a, 2b, 2c) - 18
        d_features = [] #dep features (3b, 3c) - 12
        
        #last 3 things on the stack as features
        #if the stack is less than 3, then we simply append NULL from the left
        features = [self.NULL] * (3 - len(stack)) + [ex['word'][x] for x in stack[-3:]]
        
        # next 3 things on the buffer as features
        #if the buffer is less than 3, simply append NULL
        #the reason why NULL is appended on end because buffer is read left to right
        features += [ex['word'][x] for x in buf[:3]] + [self.NULL] * (3 - len(buf))
        
        if self.pos_tags :
            #corresponding pos tags
            p_features = [self.P_NULL] * (3 - len(stack)) + [ex['pos'][x] for x in stack[-3:]]
            p_features += [ex['pos'][x] for x in buf[:3]] + [self.P_NULL] * (3 - len(buf))
        
        #get the leftmost and rightmost children of the top two words, thus we loop 2 times
        for i in range(2):
            if i < len(stack):
                k = stack[-i-1] #-1, -2 last two in the stack
                
                #the first and second lefmost/rightmost children of the top two words (i=1, 2) on the stack
                lc = get_lc(k)  
                rc = get_rc(k)
                
                #the leftmost of leftmost/rightmost of rightmost children of the top two words on the stack:
                llc = get_lc(lc[0]) if len(lc) > 0 else []
                rrc = get_rc(rc[0]) if len(rc) > 0 else []

                #(leftmost of first word on stack, rightmost of first word, 
                # leftmost of the second word on stack, rightmost of second, 
                # leftmost of leftmost, rightmost of rightmost
                features.append(ex['word'][lc[0]] if len(lc) > 0 else self.NULL)
                features.append(ex['word'][rc[0]] if len(rc) > 0 else self.NULL)
                features.append(ex['word'][lc[1]] if len(lc) > 1 else self.NULL)
                features.append(ex['word'][rc[1]] if len(rc) > 1 else self.NULL)
                features.append(ex['word'][llc[0]] if len(llc) > 0 else self.NULL)
                features.append(ex['word'][rrc[0]] if len(rrc) > 0 else self.NULL)

                if self.pos_tags :
                    #corresponding pos
                    p_features.append(ex['pos'][lc[0]] if len(lc) > 0 else self.P_NULL)
                    p_features.append(ex['pos'][rc[0]] if len(rc) > 0 else self.P_NULL)
                    p_features.append(ex['pos'][lc[1]] if len(lc) > 1 else self.P_NULL)
                    p_features.append(ex['pos'][rc[1]] if len(rc) > 1 else self.P_NULL)
                    p_features.append(ex['pos'][llc[0]] if len(llc) > 0 else self.P_NULL)
                    p_features.append(ex['pos'][rrc[0]] if len(rrc) > 0 else self.P_NULL)
                if self.dep_tags :
                    #corresponding dep
                    d_features.append(ex['dep'][lc[0]] if len(lc) > 0 else self.D_NULL)
                    d_features.append(ex['dep'][rc[0]] if len(rc) > 0 else self.D_NULL)
                    d_features.append(ex['dep'][lc[1]] if len(lc) > 1 else self.D_NULL)
                    d_features.append(ex['dep'][rc[1]] if len(rc) > 1 else self.D_NULL)
                    d_features.append(ex['dep'][llc[0]] if len(llc) > 0 else self.D_NULL)
                    d_features.append(ex['dep'][rrc[0]] if len(rrc) > 0 else self.D_NULL)
                
            else:
                #attach NULL when they don't exist
                features += [self.NULL] * 6
                if self.pos_tags :
                    p_features += [self.P_NULL] * 6
                if self.dep_tags :
                    d_features += [self.D_NULL] * 6

        if self.dep_tags and self.pos_tags:
            features += p_features + d_features
        elif self.pos_tags :
            features += p_features
        elif self.dep_tags :
            features += d_features

        assert len(features) == self.n_features  #assert they are 18 + 18 + 12 or 18 + 18 or 18 + 18 + 12
        return features

    #decide whether to shift, leftarc, or rightarc, based on gold parse trees
    #this is needed to create training examples which contain samples and ground truth
    def get_oracle(self, stack, buf, ex):

        #leave if the stack is only 1, thus nothing to predict....
        if len(stack) < 2:
            return self.n_trans - 1

        #predict based on the last two words on the stack
        #stack (ROOT, he, has)
        i0 = stack[-1] #has
        i1 = stack[-2] #he

        #get the head and dependency
        h0 = ex['head'][i0]
        h1 = ex['head'][i1]

        if self.dep_tags :
            d0 = ex['dep'][i0]
            d1 = ex['dep'][i1]

        #either shift, left arc or right arc
        #"Shift" = 2; "LA" = 0; "RA" = 1
        #if head of the second last word is the last word, then leftarc
        if (i1 > 0) and (h1 == i0):
            return 0
        #if head of the last word is the second last word, then rightarc
        #make sure nothing in the buffer has head with the last word on the stack
        #otherwise, we lose the last word.....
        elif (i1 >= 0) and (h0 == i1) and \
                (not any([x for x in buf if ex['head'][x] == i0])):
            return 1
        #otherwise shift, if something is left in buffer, otherwise, do nothing....
        else:
            return None if len(buf) == 0 else 2

    #generate training examples
    #from the training sentences and their gold parse trees 
    def create_instances(self, examples): #examples = word, pos, head, dep
        all_instances = []
        
        for i, ex in enumerate(examples):
            #Ms. Hang plays Elianti .
            #e.g., ex['word]: [344, 163, 99, 164, 165, 68]
            #here 344 stands for ROOT
            #Chaky - I chated and take a look
            n_words = len(ex['word']) - 1  #excluding the root

            #arcs = {(head, tail, dependency label)}
            stack = [0]
            buf = [i + 1 for i in range(n_words)]  #[1, 2, 3, 4, 5]
            arcs = []
            instances = []
            
            #because that's the maximum number of shift, leftarcs, rightarcs you can have
            #this will determine the sample size of each training example
            #if given five words, we will get a sample of (10, 48) where 10 comes from 5 * 2, and 48 is n_features
            #but this for loop can be break if there is nothing left....
            for i in range(n_words * 2): #during stack[0] buffer[1, 2, 3, 4, 5] #maximum times you can do either S, L, R

                #get the gold transition based on the parse trees
                #gold_t can be either shift(2), leftarc(0), or rightarc(1)
                gold_t = self.get_oracle(stack, buf, ex)
                
                #if gold_t is None, no need to extract features.....
                if gold_t is None:
                    break
                
                #make sure when the model predicts, we inform the current state of stack and buffer, so
                #the model is not allowed to make any illegal action, e.g., buffer is empty but trying to pop
                legal_labels = self.legal_labels(stack, buf)                
                assert legal_labels[gold_t] == 1
               
                #extract all the 48 features 
                features = self.extract_features(stack, buf, arcs, ex)
                instances.append((features, legal_labels, gold_t))
            
                #shift 
                if gold_t == 2:
                    stack.append(buf[0])
                    buf = buf[1:]
                #left arc 
                elif gold_t == 0:
                    arcs.append((stack[-1], stack[-2], gold_t))
                    stack = stack[:-2] + [stack[-1]]
                #right arc
                else:
                    arcs.append((stack[-2], stack[-1], gold_t - self.n_deprel))
                    stack = stack[:-1]
            else:
                all_instances += instances

        if not self.dep_tags:
            all_instances = [[instance[0], instance[2]] for instance in all_instances] #return only 'word' and 'head'

        return all_instances

    #provide an one hot encoding of the labels
    def legal_labels(self, stack, buf):
        labels =  ([1] if len(stack) > 2  else [0]) * self.n_deprel #left arc but you cannot do ROOT <-----He
        labels += ([1] if len(stack) >= 2 else [0]) * self.n_deprel #right arc because ROOT ----> He
        labels += [1] if len(buf) > 0 else [0] #shift
        return labels
    
    #a simple function to check punctuation POS tags
    def punct(self, pos):
        return pos in ["''", ",", ".", ":", "``", "-LRB-", "-RRB-"]

    def parse(self, dataset, eval_batch_size=5000):
        sentences = []
        sentence_id_to_idx = {}
                
        for i, example in enumerate(dataset):
            
            #example['word']=[188, 186, 186, ..., 59]
            #n_words=37
            #sentence=[1, 2, 3, 4, 5,.., 37]
                        
            n_words = len(example['word']) - 1
            sentence = [j + 1 for j in range(n_words)]            
            sentences.append(sentence)
            
            #mapping the object unique id to the i            
            #The id is the object's memory address
            sentence_id_to_idx[id(sentence)] = i
            
        model = ModelWrapper(self, dataset, sentence_id_to_idx)
        dependencies = minibatch_parse(sentences, model, eval_batch_size)
                
        UAS = all_tokens = 0.0
        with tqdm(total=len(dataset)) as prog:
            for i, ex in enumerate(dataset):
                head = [-1] * len(ex['word'])
                for h, t, in dependencies[i]:
                    head[t] = h
                if self.dep_tags and self.pos_tags:
                    for pred_h, gold_h, gold_l, pos in \
                            zip(head[1:], ex['head'][1:], ex['dep'][1:], ex['pos'][1:]):
                            assert self.id2tok[pos].startswith(P_PREFIX)
                            pos_str = self.id2tok[pos][len(P_PREFIX):]
                            if (not self.punct(pos_str)):
                                UAS += 1 if pred_h == gold_h else 0
                                all_tokens += 1
                elif self.pos_tags :
                    for pred_h, gold_h, pos in \
                            zip(head[1:], ex['head'][1:], ex['pos'][1:]):
                            assert self.id2tok[pos].startswith(P_PREFIX)
                            pos_str = self.id2tok[pos][len(P_PREFIX):]
                            if (not self.punct(pos_str)):
                                UAS += 1 if pred_h == gold_h else 0
                                all_tokens += 1
                elif self.dep_tags :
                    for pred_h, gold_h, gold_l in \
                            zip(head[1:], ex['head'][1:], ex['dep'][1:]):
                            UAS += 1 if pred_h == gold_h else 0
                            all_tokens += 1
                else:
                    for pred_h, gold_h in \
                            zip(head[1:], ex['head'][1:]):
                            UAS += 1 if pred_h == gold_h else 0
                            all_tokens += 1

                prog.update(i + 1)
        UAS /= all_tokens
        return UAS, dependencies

2.2. Copying the mini batch:

In [83]:
def get_minibatches(data, minibatch_size, shuffle=True):
    data_size = len(data[0])
    indices = np.arange(data_size)
    if shuffle:
        np.random.shuffle(indices)
    for minibatch_start in np.arange(0, data_size, minibatch_size):
        minibatch_indices = indices[minibatch_start:minibatch_start + minibatch_size]
        yield [_minibatch(d, minibatch_indices) for d in data]

def _minibatch(data, minibatch_idx):
    return data[minibatch_idx] if type(data) is np.ndarray else [data[i] for i in minibatch_idx]
           
def minibatches(data, batch_size, dep_tags = True):
    if dep_tags :
        x = np.array([d[0] for d in data])
        y = np.array([d[2] for d in data])
        one_hot = np.zeros((y.size, 3))
        one_hot[np.arange(y.size), y] = 1
    else :
        x = np.array([d[0] for d in data])
        y = np.array([d[1] for d in data])
        one_hot = np.zeros((y.size, 3))
        one_hot[np.arange(y.size), y] = 1

    return get_minibatches([x, one_hot], batch_size)


In [84]:
def train(parser, train_data, dev_data, output_path, batch_size=1024, n_epochs=10, lr=0.0005, dep_tags = True):
    
    best_dev_UAS = 0
    
    optimizer = optim.Adam(parser.model.parameters(), lr=0.001)
    loss_func = nn.CrossEntropyLoss()

    for epoch in range(n_epochs):
        print("Epoch {:} out of {:}".format(epoch + 1, n_epochs))
        dev_UAS = train_for_epoch(
            parser, train_data, dev_data, optimizer, loss_func, batch_size, dep_tags)
        if dev_UAS > best_dev_UAS:
            best_dev_UAS = dev_UAS
            print("New best dev UAS! Saving model.")
            torch.save(parser.model.state_dict(), output_path)
        print("")




In [85]:
def train_for_epoch(parser, train_data, dev_data, optimizer, loss_func, batch_size, dep_tags):
    
    parser.model.train()  # Places model in "train" mode, i.e. apply dropout layer
    n_minibatches = math.ceil(len(train_data) / batch_size)
    loss_meter = AverageMeter()

    with tqdm(total=(n_minibatches)) as prog:
        for i, (train_x, train_y) in enumerate(minibatches(train_data, batch_size, dep_tags)):
            
            #train_x:  batch_size, n_features
            #train_y:  batch_size, target(=3)
            
            optimizer.zero_grad() 
            loss = 0.
            train_x = torch.from_numpy(train_x).long()  #long() for int so embedding works....
            train_y = torch.from_numpy(train_y.nonzero()[1]).long()  #get the index with 1 because torch expects label to be single integer

            # Forward pass: compute predicted logits.
            logits = parser.model(train_x)
            # Compute loss
            loss = loss_func(logits, train_y)
            # Compute gradients of the loss w.r.t model parameters.
            loss.backward()
            # Take step with optimizer.
            optimizer.step()

            prog.update(1)
            loss_meter.update(loss.item())

    print("Average Train Loss: {}".format(loss_meter.avg))
    print("Evaluating on dev set",)
    parser.model.eval()  # Places model in "eval" mode, i.e. don't apply dropout layer
    
    dev_UAS, _ = parser.parse(dev_data)
    print("- dev UAS: {:.2f}".format(dev_UAS * 100.0))
    return dev_UAS


**2. Now, Conducting Experiment**

First, Try to delete only the 18 pos features and check UAS

In [86]:
train_set, dev_set, test_set = load_data()
# len(train_set), len(dev_set), len(test_set)
print('2. Building parser....')
start = time.time()
parser = AblationParser(train_set,pos_tags=False)
print("took {:.2f} seconds".format(time.time()-start))

1. Loading data
2. Building parser....
took 0.02 seconds


In [87]:
#before numericalize
print('Word: ',train_set[1]['word'])
# print('Pos: ',train_set[1]['pos'])
print('Head: ',train_set[1]['head'])
print('Dep: ',train_set[1]['dep'])
train_set = parser.numericalize(train_set)
dev_set = parser.numericalize(dev_set)
test_set = parser.numericalize(test_set)
#after numericalize (rootis added in front)
print('Word: ',train_set[1]['word'])
# print('Pos: ',train_set[1]['pos'])
print('Head: ',train_set[1]['head'])
print('Dep: ',train_set[1]['dep'])

Word:  ['ms.', 'haag', 'plays', 'elianti', '.']
Head:  [2, 3, 0, 3, 3]
Dep:  ['compound', 'nsubj', 'root', 'dobj', 'punct']
Word:  [5110, 258, 1318, 956, 2098, 41]
Head:  [-1, 2, 3, 0, 3, 3]
Dep:  [-1, 19, 34, 0, 35, 5]


In [89]:
print("4. Loading pretrained embeddings...",)
# config = Config()
start = time.time()
word_vectors = {}
for line in open('D:\\data\\en-cw.txt').readlines():
    we = line.strip().split() #we = word embeddings - first column: word;  the rest is embedding
    word_vectors[we[0]] = [float(x) for x in we[1:]] #{word: [list of 50 numbers], nextword: [another list], so on...}
    
#create an empty embedding matrix holding the embedding lookup table (vocab size, embed dim)
#we use random.normal instead of zeros, to keep the embedding matrix arbitrary in case word vectors don't exist....
embeddings_matrix = np.asarray(np.random.normal(0, 0.9, (parser.n_tokens, 50)), dtype='float32')

for token in parser.tok2id:
        i = parser.tok2id[token]
        if token in word_vectors:
            embeddings_matrix[i] = word_vectors[token]
        elif token.lower() in word_vectors:
            embeddings_matrix[i] = word_vectors[token.lower()]
print("Embedding matrix shape (vocab, emb size): ", embeddings_matrix.shape)
print("took {:.2f} seconds".format(time.time() - start))

print("5. Preprocessing training data...",)
start = time.time()
train_examples_without12dep = parser.create_instances(train_set)
print("took {:.2f} seconds".format(time.time() - start))

4. Loading pretrained embeddings...
Embedding matrix shape (vocab, emb size):  (5111, 50)
took 3.67 seconds
5. Preprocessing training data...
took 1.27 seconds


In [90]:
#create directory if it does not exist for saving the weights...
#create directory if it does not exist for saving the weights...
#create directory if it does not exist for saving the weights...
output_path = "./output/model_withoutpos.weights"

print(80 * "=")
print("TRAINING")
print(80 * "=")
    
model = ParserModel(embeddings_matrix, n_features = 30)
parser.model = model

start = time.time()
train(parser, train_examples_without12dep, dev_set, output_path,
      batch_size=1024, n_epochs=10, lr=0.0005, dep_tags = True)

TRAINING
Epoch 1 out of 10


100%|██████████| 48/48 [00:02<00:00, 19.89it/s]


Average Train Loss: 0.5771081689745188
Evaluating on dev set


125250it [00:00, 31335316.19it/s]      


- dev UAS: 51.10
New best dev UAS! Saving model.

Epoch 2 out of 10


100%|██████████| 48/48 [00:02<00:00, 23.17it/s]


Average Train Loss: 0.34286797419190407
Evaluating on dev set


125250it [00:00, 17899031.55it/s]      


- dev UAS: 55.19
New best dev UAS! Saving model.

Epoch 3 out of 10


100%|██████████| 48/48 [00:02<00:00, 23.33it/s]


Average Train Loss: 0.2827078013991316
Evaluating on dev set


125250it [00:00, 31344664.44it/s]      


- dev UAS: 58.34
New best dev UAS! Saving model.

Epoch 4 out of 10


100%|██████████| 48/48 [00:02<00:00, 21.49it/s]


Average Train Loss: 0.24489865793536106
Evaluating on dev set


125250it [00:00, 15664854.96it/s]      


- dev UAS: 61.27
New best dev UAS! Saving model.

Epoch 5 out of 10


100%|██████████| 48/48 [00:02<00:00, 22.79it/s]


Average Train Loss: 0.21482770517468452
Evaluating on dev set


125250it [00:00, 31325973.52it/s]      


- dev UAS: 63.02
New best dev UAS! Saving model.

Epoch 6 out of 10


100%|██████████| 48/48 [00:02<00:00, 23.18it/s]


Average Train Loss: 0.19865311899532875
Evaluating on dev set


125250it [00:00, 15658318.21it/s]      


- dev UAS: 64.68
New best dev UAS! Saving model.

Epoch 7 out of 10


100%|██████████| 48/48 [00:02<00:00, 23.89it/s]


Average Train Loss: 0.1815325963931779
Evaluating on dev set


125250it [00:00, 31322238.02it/s]      


- dev UAS: 65.23
New best dev UAS! Saving model.

Epoch 8 out of 10


100%|██████████| 48/48 [00:02<00:00, 23.20it/s]


Average Train Loss: 0.1624926527341207
Evaluating on dev set


125250it [00:00, 31331578.46it/s]      


- dev UAS: 67.14
New best dev UAS! Saving model.

Epoch 9 out of 10


100%|██████████| 48/48 [00:02<00:00, 22.82it/s]


Average Train Loss: 0.14721053699031472
Evaluating on dev set


125250it [00:00, 20886473.28it/s]      


- dev UAS: 67.46
New best dev UAS! Saving model.

Epoch 10 out of 10


100%|██████████| 48/48 [00:02<00:00, 23.76it/s]


Average Train Loss: 0.13614904802913466
Evaluating on dev set


125250it [00:00, 31337185.40it/s]      

- dev UAS: 66.73






In [91]:
print(80 * "=")
print("TESTING")
print(80 * "=")

print("Restoring the best model weights found on the dev set")
parser.model.load_state_dict(torch.load(output_path))
print("Final evaluation on test set",)
parser.model.eval()
UAS, dependencies = parser.parse(test_set)
print("- test UAS: {:.2f}".format(UAS * 100.0))
print("Done!")

TESTING
Restoring the best model weights found on the dev set
Final evaluation on test set


125250it [00:00, 31335316.19it/s]      

- test UAS: 68.82
Done!





**Try to delete only the 12 dep features and check UAS**

In [92]:
train_set, dev_set, test_set = load_data()
# len(train_set), len(dev_set), len(test_set)
print('2. Building parser....')
start = time.time()
parser = AblationParser(train_set,dep_tags=False)
print("took {:.2f} seconds".format(time.time()-start))

1. Loading data
2. Building parser....
took 0.03 seconds


In [93]:
#before numericalize
print('Word: ',train_set[1]['word'])
print('Pos: ',train_set[1]['pos'])
print('Head: ',train_set[1]['head'])
# print('Dep: ',train_set[1]['dep'])
train_set = parser.numericalize(train_set)
dev_set = parser.numericalize(dev_set)
test_set = parser.numericalize(test_set)
#after numericalize (rootis added in front)
print('Word: ',train_set[1]['word'])
print('Pos: ',train_set[1]['pos'])
print('Head: ',train_set[1]['head'])
# print('Dep: ',train_set[1]['dep'])

Word:  ['ms.', 'haag', 'plays', 'elianti', '.']
Pos:  ['NNP', 'NNP', 'VBZ', 'NNP', '.']
Head:  [2, 3, 0, 3, 3]
Word:  [5117, 265, 1325, 963, 2105, 48]
Pos:  [45, 3, 3, 16, 3, 7]
Head:  [-1, 2, 3, 0, 3, 3]


In [95]:
print("4. Loading pretrained embeddings...",)
# config = Config()
start = time.time()
word_vectors = {}
for line in open('D:\\data\\en-cw.txt').readlines():
    we = line.strip().split() #we = word embeddings - first column: word;  the rest is embedding
    word_vectors[we[0]] = [float(x) for x in we[1:]] #{word: [list of 50 numbers], nextword: [another list], so on...}
    
#create an empty embedding matrix holding the embedding lookup table (vocab size, embed dim)
#we use random.normal instead of zeros, to keep the embedding matrix arbitrary in case word vectors don't exist....
embeddings_matrix = np.asarray(np.random.normal(0, 0.9, (parser.n_tokens, 50)), dtype='float32')

for token in parser.tok2id:
        i = parser.tok2id[token]
        if token in word_vectors:
            embeddings_matrix[i] = word_vectors[token]
        elif token.lower() in word_vectors:
            embeddings_matrix[i] = word_vectors[token.lower()]
print("Embedding matrix shape (vocab, emb size): ", embeddings_matrix.shape)
print("took {:.2f} seconds".format(time.time() - start))

print("5. Preprocessing training data...",)
start = time.time()
train_examples_without12dep = parser.create_instances(train_set)
print("took {:.2f} seconds".format(time.time() - start))

4. Loading pretrained embeddings...
Embedding matrix shape (vocab, emb size):  (5118, 50)
took 3.82 seconds
5. Preprocessing training data...
took 1.39 seconds


In [96]:
#create directory if it does not exist for saving the weights...
output_path = "./output/modelwithout12dep.weights"

print(80 * "=")
print("TRAINING")
print(80 * "=")
    
model = ParserModel(embeddings_matrix, n_features = 36)
parser.model = model

start = time.time()
train(parser, train_examples_without12dep, dev_set, output_path,
      batch_size=1024, n_epochs=10, lr=0.0005, dep_tags = False)


TRAINING
Epoch 1 out of 10


100%|██████████| 48/48 [00:02<00:00, 18.86it/s]


Average Train Loss: 0.5032807743797699
Evaluating on dev set


125250it [00:00, 8350605.25it/s]       


- dev UAS: 59.82
New best dev UAS! Saving model.

Epoch 2 out of 10


100%|██████████| 48/48 [00:02<00:00, 19.78it/s]


Average Train Loss: 0.28263476956635714
Evaluating on dev set


125250it [00:00, 5964921.21it/s]       


- dev UAS: 66.47
New best dev UAS! Saving model.

Epoch 3 out of 10


100%|██████████| 48/48 [00:02<00:00, 19.06it/s]


Average Train Loss: 0.23275735105077425
Evaluating on dev set


125250it [00:00, 6591590.45it/s]       


- dev UAS: 69.70
New best dev UAS! Saving model.

Epoch 4 out of 10


100%|██████████| 48/48 [00:02<00:00, 19.74it/s]


Average Train Loss: 0.19962000070760647
Evaluating on dev set


125250it [00:00, 8350472.51it/s]       


- dev UAS: 71.20
New best dev UAS! Saving model.

Epoch 5 out of 10


100%|██████████| 48/48 [00:02<00:00, 20.49it/s]


Average Train Loss: 0.1749953388546904
Evaluating on dev set


125250it [00:00, 8346757.59it/s]       


- dev UAS: 71.96
New best dev UAS! Saving model.

Epoch 6 out of 10


100%|██████████| 48/48 [00:02<00:00, 19.28it/s]


Average Train Loss: 0.15760486293584108
Evaluating on dev set


125250it [00:00, 6260490.94it/s]       


- dev UAS: 72.40
New best dev UAS! Saving model.

Epoch 7 out of 10


100%|██████████| 48/48 [00:02<00:00, 20.77it/s]


Average Train Loss: 0.1426206623824934
Evaluating on dev set


125250it [00:00, 6255943.22it/s]       


- dev UAS: 73.66
New best dev UAS! Saving model.

Epoch 8 out of 10


100%|██████████| 48/48 [00:02<00:00, 19.62it/s]


Average Train Loss: 0.12744023992369571
Evaluating on dev set


125250it [00:00, 4472929.09it/s]       


- dev UAS: 73.80
New best dev UAS! Saving model.

Epoch 9 out of 10


100%|██████████| 48/48 [00:02<00:00, 20.94it/s]


Average Train Loss: 0.11493367872511347
Evaluating on dev set


125250it [00:00, 9643272.87it/s]       


- dev UAS: 74.57
New best dev UAS! Saving model.

Epoch 10 out of 10


100%|██████████| 48/48 [00:02<00:00, 20.11it/s]


Average Train Loss: 0.10535819331804912
Evaluating on dev set


125250it [00:00, 6959851.83it/s]       

- dev UAS: 76.15
New best dev UAS! Saving model.






In [97]:
print(80 * "=")
print("TESTING")
print(80 * "=")

print("Restoring the best model weights found on the dev set")
parser.model.load_state_dict(torch.load(output_path))
print("Final evaluation on test set",)
parser.model.eval()
UAS, dependencies = parser.parse(test_set)
print("- test UAS: {:.2f}".format(UAS * 100.0))
print("Done!")

TESTING
Restoring the best model weights found on the dev set
Final evaluation on test set


125250it [00:00, 5445764.62it/s]       

- test UAS: 78.01
Done!





**Conclusion of question number 2 :**

| Ablation Study                | UAS result | test UAS |
| -----------                   | ---------  |  ------- |
| word (18)+ pos(18)+ dep(18)   |  73.87     |  74.60   |
| word (18) + pos(18)           |  76.36     |  76.74   |
| word (18)+ dep(12)            |  62.88     |  64.76   |

It seems that, Part of Speech(18) has impact on accuracy as mention in the paper but, it seems that Dependency features (12) has less impact on accuracy.


**3. Do another comparison study testing the embedding**

First Trying with Golve Model

In [98]:
from gensim.test.utils import datapath
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec

In [99]:
train_set, dev_set, test_set = load_data()
# len(train_set), len(dev_set), len(test_set)
print('2. Building parser....')
start = time.time()
parser = Parser(train_set)
print("took {:.2f} seconds".format(time.time()-start))

train_set = parser.numericalize(train_set)
dev_set = parser.numericalize(dev_set)
test_set = parser.numericalize(test_set)

1. Loading data
2. Building parser....
took 0.05 seconds


In [100]:
print("4. Loading pretrained embeddings...",)

start = time.time()
glove_file = datapath('D:\\Machine Learning\\Natural-Language-Processing-2023\\Coding Assignment\\Dependency-Parsing\\glove.6B.50d.txt')
modelgesim = KeyedVectors.load_word2vec_format(glove_file, binary=False, no_header=True)
embeddings_matrix = np.asarray(np.random.normal(0, 0.9, (parser.n_tokens, 50)), dtype='float32')

for token in parser.tok2id:
        i = parser.tok2id[token]
        if token in modelgesim:
            embeddings_matrix[i] = modelgesim[token]
        elif token.lower() in modelgesim:
            embeddings_matrix[i] = modelgesim[token.lower()]
print("Embedding matrix shape (vocab, emb size): ", embeddings_matrix.shape)
print("took {:.2f} seconds".format(time.time() - start))

4. Loading pretrained embeddings...
Embedding matrix shape (vocab, emb size):  (5157, 50)
took 21.99 seconds


In [101]:
output_dir = "output/{:%Y%m%d_%H%M%S}/".format(datetime.now())
output_path = output_dir + "model.weights"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
    

print("TRAINING")
model = ParserModel(embeddings_matrix)
parser.model = model

start = time.time()

train(parser, train_examples, dev_set, output_path,
      batch_size=1024, n_epochs=10, lr=0.0005)

TRAINING
Epoch 1 out of 10


100%|██████████| 48/48 [00:03<00:00, 15.17it/s]


Average Train Loss: 0.6760396057118973
Evaluating on dev set


125250it [00:00, 6958745.53it/s]       


- dev UAS: 48.43
New best dev UAS! Saving model.

Epoch 2 out of 10


100%|██████████| 48/48 [00:03<00:00, 15.42it/s]


Average Train Loss: 0.37189964825908345
Evaluating on dev set


125250it [00:00, 8945705.85it/s]       


- dev UAS: 60.56
New best dev UAS! Saving model.

Epoch 3 out of 10


100%|██████████| 48/48 [00:03<00:00, 15.64it/s]


Average Train Loss: 0.29876033154626686
Evaluating on dev set


125250it [00:00, 5444748.68it/s]       


- dev UAS: 64.57
New best dev UAS! Saving model.

Epoch 4 out of 10


100%|██████████| 48/48 [00:03<00:00, 15.73it/s]


Average Train Loss: 0.25846554804593325
Evaluating on dev set


125250it [00:00, 6261908.79it/s]       


- dev UAS: 66.13
New best dev UAS! Saving model.

Epoch 5 out of 10


100%|██████████| 48/48 [00:03<00:00, 14.18it/s]


Average Train Loss: 0.23187416481475034
Evaluating on dev set


125250it [00:00, 7366218.10it/s]       


- dev UAS: 65.92

Epoch 6 out of 10


100%|██████████| 48/48 [00:03<00:00, 13.42it/s]


Average Train Loss: 0.2145329061895609
Evaluating on dev set


125250it [00:00, 7366734.57it/s]       


- dev UAS: 69.00
New best dev UAS! Saving model.

Epoch 7 out of 10


100%|██████████| 48/48 [00:03<00:00, 15.74it/s]


Average Train Loss: 0.19245785847306252
Evaluating on dev set


125250it [00:00, 5965598.57it/s]       


- dev UAS: 71.03
New best dev UAS! Saving model.

Epoch 8 out of 10


100%|██████████| 48/48 [00:03<00:00, 15.95it/s]


Average Train Loss: 0.1759161086132129
Evaluating on dev set


125250it [00:00, 5693285.97it/s]       


- dev UAS: 71.85
New best dev UAS! Saving model.

Epoch 9 out of 10


100%|██████████| 48/48 [00:03<00:00, 15.50it/s]


Average Train Loss: 0.1586864556496342
Evaluating on dev set


125250it [00:00, 8351003.48it/s]       


- dev UAS: 70.10

Epoch 10 out of 10


100%|██████████| 48/48 [00:03<00:00, 15.63it/s]


Average Train Loss: 0.15070811100304127
Evaluating on dev set


125250it [00:00, 5218143.29it/s]       

- dev UAS: 72.49
New best dev UAS! Saving model.






In [102]:
print(80 * "=")
print("TESTING")
print(80 * "=")

print("Restoring the best model weights found on the dev set")
parser.model.load_state_dict(torch.load(output_path))
print("Final evaluation on test set",)
parser.model.eval()
UAS, dependencies = parser.parse(test_set)
print("- test UAS: {:.2f}".format(UAS * 100.0))
print("Done!")

TESTING
Restoring the best model weights found on the dev set
Final evaluation on test set


125250it [00:00, 8949210.86it/s]       

- test UAS: 74.13
Done!



