# Title

## Introduction

Let's make an API for dependency trees.

Dependency grammars describe relationships between lexical units in a sentence.  *Unit* here means signifiers.  For many languages these are *words*, which I take to be mostly a phonological concept.  For agglutinative languages such as Turkish, the phonological word is too large: instead, we would want to work with *morphemes*.  So instead we simply use the term *form*.  Forms have meaning.

Dependencies are binary asymmetric relations between pairs of forms: a *governor* or *head* and a *dependent*.  What exactly are these relations?  There are many varieties of dependency grammar, taking different sets of relations as primaries of the grammar.  Nivre summarizes the possibilities, for a governor H and a depedent D:

1. H determines the syntactic category ofC and can often replace C. 
2. H determines the semantic category ofC; D gives semantic specification. 
3. H is obligatory; D may be optional. 
4. H selects D and determines whether D is obligatory or optional. 
5. The form ofD depends on H (agreement or government).
6. The linear position ofD is specified with reference to H.

The simplest approach is to simply not state the dependency type overtly, *untyped dependencies*.  In this case the analysis  just yields a graph of binary relations between forms.

Typed dependencies are triples `(head, dependent, dep-relation)`.  [TODO Universal Dependencies]

## Relations

In [17]:
class Dependency:
    def __init__(self, head, dep, relation):
        self.head = head
        self.dep = dep
        self.relation = relation
        
    def __str__(self):
        return '{0}({1}, {2})'.format(self.relation.name, self.head, self.dep)
    
    __repr__ = __str__

In [40]:
from copy import deepcopy

class Relation:
    def __init__(self, name):
        self.name = name
            
    def __call__(self, head, dep):
        print(type(head))
        head = deepcopy(head)
        print(type(head))
        dep = deepcopy(dep)
        dep.set('relation', self.name)
        head.append(dep)
        return Dependency(head, dep, self)
               
    def __str__(self):
        return self.name
    
    __repr__ = __str__

## Grammar

In [41]:
class Grammar:
    def __init__(self):
        self.relations = {}
        
    def add_relation(self, name):
        relation = Relation(name)
        self.relations[name] = relation
        return relation
    
    def __getitem__(self, name):
        return self.relations[name]
    
    def to_tree(self, sentence):
        forms = {}
        for word in sentence.words:
            forms[word.index] = Form.to_form(word)
            #print(type(forms[word.index]))

        for word in sentence.words:
            if word.dependency_relation != 'root':
                gov_id = str(word.governor)
                gov = forms[gov_id]
                dep = forms[word.index]
                
                relation = self[word.dependency_relation]
                dependency = relation(gov, dep)
                #print(type(dependency))
                
                forms[word.index] = dependency.dep
                forms[gov_id] = dependency.head
                #print(type(forms[gov_id]))
                
        for form in forms:
            print(type(form))
            if form.get_feature('relation') == 'root':
                root = form
                break
            
        return Tree(self, root)

In [42]:
UDGrammar = Grammar()

nsubj = UDGrammar.add_relation('nsubj')
nsubj_pass = UDGrammar.add_relation('nsubj:pass')
obj = UDGrammar.add_relation('obj')
iobj = UDGrammar.add_relation('iobj')
csubj = UDGrammar.add_relation('csubj')
ccomp = UDGrammar.add_relation('ccomp')
xcomp = UDGrammar.add_relation('xcomp')
obl = UDGrammar.add_relation('obl')
vocative = UDGrammar.add_relation('vocative')
expl = UDGrammar.add_relation('expl')
dislocated = UDGrammar.add_relation('dislocated')
advcl = UDGrammar.add_relation('advcl')
advmod = UDGrammar.add_relation('advmod')
discourse = UDGrammar.add_relation('discourse')
aux = UDGrammar.add_relation('aux')
aux_pass = UDGrammar.add_relation('aux:pass')
cop = UDGrammar.add_relation('cop')
mark = UDGrammar.add_relation('mark')
nmod = UDGrammar.add_relation('nmod')
nmod_poss = UDGrammar.add_relation('nmod:poss')
appos = UDGrammar.add_relation('appos')
nummod = UDGrammar.add_relation('nummod')
acl = UDGrammar.add_relation('acl')
amod = UDGrammar.add_relation('amod')
det = UDGrammar.add_relation('det')
clf = UDGrammar.add_relation('clf')
case = UDGrammar.add_relation('case')
conj = UDGrammar.add_relation('conj')
cc = UDGrammar.add_relation('cc')
fixed = UDGrammar.add_relation('fixed')
flat = UDGrammar.add_relation('flat')
compound = UDGrammar.add_relation('compound')
list = UDGrammar.add_relation('list')
parataxis = UDGrammar.add_relation('parataxis')
orphan = UDGrammar.add_relation('orphan')
goeswith = UDGrammar.add_relation('goeswith')
reparandum = UDGrammar.add_relation('reparandum')
punct = UDGrammar.add_relation('punct')
root = UDGrammar.add_relation('root')
dep = UDGrammar.add_relation('dep')

In [43]:
t = UDGrammar.to_tree(doc.sentences[0])

<class '__main__.Form'>
<class 'xml.etree.ElementTree.Element'>
<class '__main__.Form'>
<class 'xml.etree.ElementTree.Element'>
<class '__main__.Form'>
<class 'xml.etree.ElementTree.Element'>
<class 'xml.etree.ElementTree.Element'>
<class 'xml.etree.ElementTree.Element'>
<class '__main__.Form'>
<class 'xml.etree.ElementTree.Element'>
<class 'xml.etree.ElementTree.Element'>
<class 'xml.etree.ElementTree.Element'>
<class 'xml.etree.ElementTree.Element'>
<class 'xml.etree.ElementTree.Element'>
<class 'str'>


AttributeError: 'str' object has no attribute 'get_feature'

## Forms

A form is fundamentally just a string.  But in a grammatical analysis we'll also want morphological information tied to the form: POS and other relevant morphosyntactic features.  So instead we'll derive from Python's XML `Element` class.

In [21]:
from xml.etree.ElementTree import ElementTree

In [22]:
from xml.etree.ElementTree import Element
from xml.etree.ElementTree import dump



class Form(Element):
    def __init__(self, form, id = '0'):
        Element.__init__(self, form, attrib={'id' : id})
        
    def get_feature(self, feature):
        return self.get(feature)
    
    def set_feature(self, feature, value):
        self.set(feature, value)
        return self
        
    def __truediv__(self, pos_tag):
        self.set_feature('pos', pos_tag)
        return self
    
    def get_dependencies(self, relation):
        deps = self.findall('*[@relation="{}"]'.format(relation.name))
        return [Dependency(self, dep, relation) for dep in deps]
    
    def __str__(self):
        return self.tag + '_' + self('id') + '/' + self('pos')
    
    def full_str(self):
        return '{0}_{1} [{2}]'.format(self.tag, 
                                        self('id'), 
                                        ','.join([feature + '=' + self(feature) for feature in self.attrib.keys()
                                                 if feature != 'id' and feature != 'relation']))
    
    def __call__(self, feature):
        return self.get_dependency(feature) if isinstance(feature, Relation) else self.get_feature(feature)
    
    @staticmethod
    def to_form(word):
        form = Form(word.text, id = word.index)
        form.set_feature('lemma', word.lemma)
        form.set_feature('pos', word.pos)
        form.set_feature('upos', word.upos)
        form.set_feature('xpos', word.xpos)

        if word.feats != '_':
            for f in word.feats.split('|'):
                feature = f.split('=')
                form.set_feature(feature[0], feature[1])

        return form

## Trees

In [10]:
class Tree(ElementTree):
    def __init__(self, grammar, root):
        self.grammar = grammar
        root.set('relation', 'root')
        
        ElementTree.__init__(self, root)
        
    def _get_deps(self, node, deps):
        for child_node in node.getchildren():
            deps = self._get_deps(child_node, deps)
            deps.extend(node.get_dependencies(self.grammar[child_node('relation')]))
        return deps
        
    def get_dependencies(self):
        deps = self._get_deps(self.getroot(), [])
        deps.append(Dependency(None, self.getroot(), self.grammar['root']))
        return deps
    
    def _print_treelet(self, node, indent, all_features):
        edge = '└─ ' if indent > 0 else ''
        node_str = node.full_str() if all_features else str(node)
        print(' ' * indent + edge + node('relation') + ' | ' + node_str)
        
        for child_node in node.getchildren():
            self._print_treelet(child_node, indent + 4, all_features)
    
    def print_tree(self, all_features = True):
        self._print_treelet(self.getroot(), indent = 0, all_features = all_features) 
        

In [13]:
import stanfordnlp

In [14]:
nlp = stanfordnlp.Pipeline()

Use device: cpu
---
Loading: tokenize
With settings: 
{'model_path': '/home/jds/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings: 
{'model_path': '/home/jds/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/home/jds/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings: 
{'model_path': '/home/jds/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
---
Loading: depparse
With settings: 
{'model_path': '/home/jds/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/home/jds/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 

In [23]:
doc = nlp('My parents decided to travel to Romania.')

In [244]:
n = t.find('//*[@relation="case"]/..')

  """Entry point for launching an IPython kernel.


In [245]:
t.get_dependencies()

  if __name__ == '__main__':


[nmod:poss(parents_2/NNS, My_1/PRP$),
 nsubj(decided_3/VBD, parents_2/NNS),
 mark(travel_5/VB, to_4/TO),
 case(Romania_7/NNP, to_6/IN),
 obl(travel_5/VB, Romania_7/NNP),
 xcomp(decided_3/VBD, travel_5/VB),
 punct(decided_3/VBD, ._8/.),
 root(None, decided_3/VBD)]

In [246]:
t.print_tree(False)

root | decided_3 [lemma=decide,pos=VBD,upos=VERB,xpos=VBD,Mood=Ind,Tense=Past,VerbForm=Fin]
    └─ nsubj | parents_2 [lemma=parent,pos=NNS,upos=NOUN,xpos=NNS,Number=Plur]
        └─ nmod:poss | My_1 [lemma=my,pos=PRP$,upos=PRON,xpos=PRP$,Number=Sing,Person=1,Poss=Yes,PronType=Prs]
    └─ xcomp | travel_5 [lemma=travel,pos=VB,upos=VERB,xpos=VB,VerbForm=Inf]
        └─ mark | to_4 [lemma=to,pos=TO,upos=PART,xpos=TO]
        └─ obl | Romania_7 [lemma=Romania,pos=NNP,upos=PROPN,xpos=NNP,Number=Sing]
            └─ case | to_6 [lemma=to,pos=IN,upos=ADP,xpos=IN]
    └─ punct | ._8 [lemma=.,pos=.,upos=PUNCT,xpos=.]


