# Chapter 2 - spaCy Dependencies and Named Entities

## Instructions

- Run the cells with "assert" statements to see if your answer's output matches what the output should be. If it runs without error, your answer matches! If your output is different, you'll get a hint.

In [1]:
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_sm')

_If you're gettting an error about being unable to find_ `en_core_web_sm`_, uncomment and run this line first:_

In [2]:
#!python3 -m spacy download en_core_web_sm

# Exercise 1 - Dependencies

Below we have provided a sentence called `sent`.  Use the provided language model, saved as `nlp`, to parse this sentence with spaCy and store the results as `doc`.  Then create a Python list called `dependencies` that contains the dependency properties of all the tokens in `doc`.

In [3]:
sent = "Today we are quietly playing soothing jazz music."
dependencies = None

### BEGIN SOLUTION
doc = nlp(sent)
dependencies = [token.dep_ for token in doc]
### END SOLUTION

print(dependencies)

['npadvmod', 'nsubj', 'aux', 'advmod', 'ROOT', 'amod', 'compound', 'dobj', 'punct']


In [4]:
### CHECK YOUR OUTPUT WITH THE ANSWER
assert type(doc) == spacy.tokens.doc.Doc, "Be sure to create doc, a parsed spacy document."
assert doc.text == sent, "The text of doc should match the sentence provided called sent."
assert type(dependencies) == list, "dependencies should be a Python list."
assert type(dependencies[0]) == str, "dependencies should contain string values that list the dependencies of each token in doc."
assert len(dependencies) >= len(sent.split(' ')), "You should find at least as many tokens as there are words in sent."


In [5]:
### BEGIN HIDDEN TESTS
test_doc = nlp(sent)
assert dependencies == [token.dep_ for token in test_doc]
### END HIDDEN TESTS

## Exercise 2 - Traversing the Dependency Tree

The syntactic dependencies provided by spaCy can also be thought of in a tree-like structure.  We can view the word that a token modifies with `.head` and vice versa with `.children`.  

Consider the following dependency tree for `doc`, visualized with displaCy.


In [6]:
displacy.render(doc, jupyter=True, options={'distance': 100})

Which word is the `head` of the adverb "quietly"?  Which word or words are `children` of the noun "music"?

Now answer these questions using spaCy code and save your results.  Save the head of "quietly" as `quietly_head` and the children of "music" as `music_children`.  

_Hint_: remember that you can access a token's text string with the property `.text`.

In [7]:
quietly_head, music_children = None, None

### BEGIN SOLUTION
for token in doc:
    if token.text == 'quietly':
        quietly_head = token.head
    elif token.text == 'music':
        music_children = token.children
### END SOLUTION

music_list = list(music_children)

print(quietly_head)
print(music_list)

playing
[soothing, jazz]


In [8]:
### CHECK YOUR OUTPUT WITH THE ANSWER
assert type(quietly_head) == spacy.tokens.token.Token, "quietly_head should be a spacy token."
assert type(music_list[0]) == spacy.tokens.token.Token, "music_children should be a generator that you can iterate over or put into a list.  The entries should be spacy tokens."
assert len(music_list) >= 2, "You should find at least two children of the token 'music'."

In [9]:
### BEGIN HIDDEN TESTS
test_doc = nlp(sent)
for token in test_doc:
    if token.text == 'quietly':
        assert quietly_head.text == token.head.text
    elif token.text == 'music':
        for child, test_child in zip(music_list, list(token.children)):
            assert child.text == test_child.text
### END HIDDEN TESTS

## Exercise 3 - Named Entities

Now let's think about named entities.  Use spaCy to parse the sentence provided below called `travel_sent`.  

Then save the text and NER labels of each named entity in a list called `named_entities`.  The entries of this list should be tuples; that is `[(first_text, first_label), (second_text, second_label), ...]`.  

Finally, use displaCy to visualize the named entities of `travel_sent`.  (Note: Use the argument `jupyter=True` to display this visual the way that we did for you in the previous exercise.)

In [10]:
travel_sent = "I wish we could travel with Sarah when she goes to see the British Museum and Queen Elizabeth in London, England."
named_entities = None

### BEGIN SOLUTION
travel_doc = nlp(travel_sent)
named_entities = [(token.text, token.label_) for token in travel_doc.ents]
### END SOLUTION

print(named_entities)

[('Sarah', 'PERSON'), ('the British Museum', 'ORG'), ('Queen Elizabeth', 'PERSON'), ('London', 'GPE'), ('England', 'GPE')]


In [11]:
### Put your displaCy visual here

## BEGIN SOLUTION
displacy.render(travel_doc, jupyter=True, style='ent')
## END SOLUTION

In [12]:
### CHECK YOUR OUTPUT WITH THE ANSWER
assert type(named_entities) == list, "named_entities should be a Python list."
assert len(named_entities) >= 3, "You should find at least three named entities in travel_sent, which was provided."
assert type(named_entities[0]) == tuple, "Each entry of named_entities should be a Python tuple."
assert len(named_entities[0]) == 2, "Each entry in named_entities should be a tuple with 2 elements."
assert type(named_entities[0][0]) == str, "The entries of the entries of named_entities should be Python strings."
assert type(named_entities[0][1]) == str, "The entries of the entries of named_entities should be Python strings."

In [13]:
### BEGIN HIDDEN TESTS
test_doc = nlp(travel_sent)
assert named_entities == [(tok.text, tok.label_) for tok in test_doc.ents]
### END HIDDEN TESTS