# **NLU First Assignment** 
*   **Zihadul Azam**
*   Id: 221747
*   zihadul.azam@studenti.unitn.it


### **Requirements**


*   spaCy: run *pip install spacy*

In [70]:
example_type_1 = "I saw the man with a telescope." 
example_type_2 = "I saw a man on a hill with a telescope." 
example = example_type_1

import spacy

nlp = spacy.load('en')

### Visualize dependency graph with *displacy*
(For test)

In [71]:
from spacy import displacy
doc = nlp(example)
displacy.render(doc, style="dep", jupyter=True, options={'distance': 90})

## **Function 1:** Extract a path of dependency relations from the ROOT to a token
##### **Input:** a sentence
##### **Output:** for each token the path will be a list of dependency relations, where first element is ROOT

In [72]:
def extract_token_path(token):
  path=["{}({})".format(token.dep_, token.text)]
  for ances_token in token.ancestors:
      path.insert(0, "{}({})".format(ances_token.dep_, ances_token.text)) 
  return path

def extract_dependency_path(sentence):
  """
  this function extract dependency path
  from the root to each token
  """
  doc = nlp(sentence)
  paths = {}

  for sent in doc.sents:
    for token in sent:
      paths[token.i] = {'token': token.text, 'path': extract_token_path(token)}
  return paths

res = extract_dependency_path(example)

for key in res:
  print(res[key])

{'token': 'I', 'path': ['ROOT(saw)', 'nsubj(I)']}
{'token': 'saw', 'path': ['ROOT(saw)']}
{'token': 'the', 'path': ['ROOT(saw)', 'dobj(man)', 'det(the)']}
{'token': 'man', 'path': ['ROOT(saw)', 'dobj(man)']}
{'token': 'with', 'path': ['ROOT(saw)', 'dobj(man)', 'prep(with)']}
{'token': 'a', 'path': ['ROOT(saw)', 'dobj(man)', 'prep(with)', 'pobj(telescope)', 'det(a)']}
{'token': 'telescope', 'path': ['ROOT(saw)', 'dobj(man)', 'prep(with)', 'pobj(telescope)']}
{'token': '.', 'path': ['ROOT(saw)', 'punct(.)']}


## **Function 2:** Extract subtree of a dependents given a token
##### **Input:** a sentence
##### **Output:** for each token in Doc objects you extract a subtree of its dependents as a list (ordered w.r.t. sentence order)

In [73]:
def get_subtree_of_a_token(token):
  return [sub_token.text for sub_token in token.subtree if sub_token is not token]

def extract_subtree(sentence):
  doc = nlp(sentence)
  result = {}

  for sent in doc.sents:
    for token in sent:
      result[token.i] = {'token': token.text, 'subtree': get_subtree_of_a_token(token)}
  return result

res = extract_subtree(example)
for key in res:
  print(res[key])


{'token': 'I', 'subtree': []}
{'token': 'saw', 'subtree': ['I', 'the', 'man', 'with', 'a', 'telescope', '.']}
{'token': 'the', 'subtree': []}
{'token': 'man', 'subtree': ['the', 'with', 'a', 'telescope']}
{'token': 'with', 'subtree': ['a', 'telescope']}
{'token': 'a', 'subtree': []}
{'token': 'telescope', 'subtree': ['a']}
{'token': '.', 'subtree': []}


## **Function 3:** Check if a given list of tokens (segment of a sentence) forms a subtree
##### **Input:** a sentence | ordered list of words from a sentence
##### **Output:** True/False based on the sequence forming a subtree or not

In [74]:
def is_from_subtree(sentence, tokens=[]):
  trees = extract_subtree(sentence)
  for key in trees:
    if trees[key]['subtree'] == tokens:
      return True
  return False

print("Is <a telescope> from subtree?: ", is_from_subtree(example, ['a', 'telescope']))

Is <a telescope> from subtree?:  True


## **Function 4:** Identify head of a span, given its tokens 
##### **Input:** is a sequence of words (not necessarily a sentence)
##### **Output:** the head of the span (single word)

In [75]:
def get_head_of_span(span):
  doc = nlp(span)
  for sent in doc.sents:
    for token in sent:
      if token.head == token:
        return token
        

span = 'a man with a telescope'
print('Head of span <', span, '> is: ', get_head_of_span(span).text)

Head of span < a man with a telescope > is:  man


## **Function 5:** Extract sentence subject, direct object and indirect object spans 
##### **Input:** is a sequence
##### **Output:**  lists of words that form a span (not a single word) for subject, direct object, and indirect object (if present of course, otherwise empty) - Type: dict of lists

In [79]:
def extract_S_DO_IO(sentence):
  doc = nlp(sentence)
  subject_tag = 'nsubj'
  direct_obj_tag = 'dobj'
  indirect_obj_tag = 'iobj'
  result = {
      subject_tag: [],
      direct_obj_tag: [],
      indirect_obj_tag: [],
  }

  for sent in doc.sents:
    for token in sent:
      if token.dep_ == subject_tag:
        result[subject_tag].append(token.text)
      if token.dep_ == direct_obj_tag:
        result[direct_obj_tag].append(token.text)
      if token.dep_ == indirect_obj_tag:
        result[indirect_obj_tag].append(token.text)
  return result

res = extract_S_DO_IO(example)

print('Subject, direct object, and indirect object of the sentence: <', example,'>')
for key in res:
  print(key, ':\t', res[key])


Subject, direct object, and indirect object of the sentence: < I saw the man with a telescope. >
nsubj :	 ['I']
dobj :	 ['man']
iobj :	 []
