# Assignment: Working with Dependency Graphs (Parses)

The objective of the assignment is to learn how to work with dependency graphs by defining functions.

Read [spaCy documentation on dependency parser](https://spacy.io/api/dependencyparser) to learn provided methods.

Define functions to:
- expract a path of dependency relations from the ROOT to a token
- extract subtree of a dependents given a token
- check if a given list of tokens (segment of a sentence) forms a subtree
- identify head of a span, given its tokens
- extract sentence subject, direct object and indirect object spans

In [9]:
import spacy

nlp = spacy.load("en_core_web_sm")

In [10]:
example = 'I saw the man with a telescope.'
doc = nlp(example)

for token in doc:
    print("{}\t\t{}\t\t{}\t\t{}\t\t{}".format(token.text, token.dep_, token.head.text, token.head.pos_,
            [child for child in token.children]))

I		nsubj		saw		VERB		[]
saw		ROOT		saw		VERB		[I, man, with, .]
the		det		man		NOUN		[]
man		dobj		saw		VERB		[the]
with		prep		saw		VERB		[telescope]
a		det		telescope		NOUN		[]
telescope		pobj		with		ADP		[a]
.		punct		saw		VERB		[]


## Display dependencies

In [11]:
spacy.displacy.render(doc, style='dep')

## Extract a path of dependency relations from the ROOT to a token

In [12]:
def token_path_to_root(token, doc):
    path = []
    current = token

    # jump from a token to its head until the root is reached, saving the path
    while not current.dep_ == 'ROOT':
        # add the token at the start of the list
        path.insert(0, current.dep_)
        current = current.head
    
    # add the root
    path.insert(0, 'ROOT')
    return path

def sent_paths_to_root(sent):
    doc = nlp(sent)
    # for each token in doc get its path using token_path_to_root
    paths = {token.text: token_path_to_root(token, doc) for token in doc}
    return paths

sent_paths_to_root(example)

{'I': ['ROOT', 'nsubj'],
 'saw': ['ROOT'],
 'the': ['ROOT', 'dobj', 'det'],
 'man': ['ROOT', 'dobj'],
 'with': ['ROOT', 'prep'],
 'a': ['ROOT', 'prep', 'pobj', 'det'],
 'telescope': ['ROOT', 'prep', 'pobj'],
 '.': ['ROOT', 'punct']}

## Extract subtree of dependents given a token

In [13]:
def extract_subtrees(sent):
    # for each token in the sentence get its subtree and convert it to a list
    return {token.text: list(token.subtree) for token in nlp(sent)}

extract_subtrees(example)

{'I': [I],
 'saw': [I, saw, the, man, with, a, telescope, .],
 'the': [the],
 'man': [the, man],
 'with': [with, a, telescope],
 'a': [a],
 'telescope': [a, telescope],
 '.': [.]}

## Check if a given list of tokens (segment of a sentence) forms a subtree

In [14]:
def is_subtree(sent, words):
    doc = nlp(sent)
    
    # find parsed tokens corresponding to the strings
    tokens = sorted([tk for tk in doc for w in words if tk.text == w])
    
    for tk in tokens:
        # check if the token list and the subtree list are equal
        # to compare two lists they have to be sorted the same way
        if sorted(tk.subtree) == tokens:
            return True
    return False

print(is_subtree(example, ['telescope', 'a', 'with']))
print(is_subtree(example, ['telescope', 'the']))

True
False


## Identify head of a span, given its tokens

In [15]:
def head_of_span(words):
    # convert words to a parsed span (a doc's sentence) and get its root
    # use next() to get just the first sentence as the expected input shouldn't
    # create more than one
    return next(nlp(' '.join(words)).sents).root

head_of_span(['I', 'really', 'love', 'pizza'])

love

## Extract sentence subject, direct object and indirect object spans

In [16]:
def extract_deps_span(sent):
    doc = nlp(sent)
    deps = ['nsubj', 'dobj', 'iobj']

    # create dict of lists for each dependency
    spans = {k: [] for k in deps}

    # iterate over the dependency and all the tokens
    for dep in deps:
        for token in doc:
            if token.dep_ == dep:
                # if a dependency is found, add its subtree in list form to the dictionary
                spans[dep] = list(token.subtree)

    return spans

extract_deps_span(example)

{'nsubj': [I], 'dobj': [the, man], 'iobj': []}