This homework explores coreference resolution for the task of timeline generation: for a given biography on Wikipedia, can you extract all of the events associated with the people mentioned and create one timeline for each person?  For this homework, be sure you are using the `anlp_spacy2` anaconda environment that we set up in `CorefSetup.ipynb` (that has Spacy 2 installed, along with the neuralcoref package).

Within this environment, install the wikipedia package:

```pip install wikipedia```

In [1]:
import wikipedia
import spacy
import neuralcoref
import re
import operator

In [2]:
nlp = spacy.load('en_core_web_sm')
coref = neuralcoref.NeuralCoref(nlp.vocab)
nlp.add_pipe(coref, name='neuralcoref')

In [9]:
biography = wikipedia.page("Harrison Ford", auto_suggest=False)

In [40]:
doc=nlp(biography.content)

In [12]:
def print_coref_chains(text):
    doc = nlp(text)
    for chain in doc._.coref_clusters:
        for mention in chain.mentions:
            print(mention.text, mention.start, mention.end)
        print()

Q1. Create a method `extract_timeline` to construct a timeline for the subject of the Wikipedia biography; the timeline should be comprised of events (i.e., verbs) in the biography that the subject participates in and that can be grounded to a year. Your criteria:

- Only include events in sentences that contain a year.
- Only include events that the target of the Wikipedia biography is involved in.
- Only include events where that target person is the subject or direct object of the verb.

Your function should return a dict mapping a sentence where those criteria are satisfied to the date mentioned in that sentence.  For example, assume the following to be a biography of the target *Bach*:

> Bach is a composer.  He was born in 1685.  He composed 6 suites for solo cello in 1717. Anna Magdalena Wilcke met him in 1721.  Handel was a contemporary and was also born in 1685. Bach's dog was born in 1723.  

Your function should return the following dict:

{"He was born in 1685": 1685, "He composed 6 suites for solo cello in 1717": 1717, "Anna Magdalena Wilcke met him in 1721": 1721}

We exclude the other sentences because:

- "Bach is a composer" does not mention a date.
- Bach is not mentioned in "Handel was a contemporary and was also born in 1685."
- Bach is not the subject or direct object of "Bach's dog was born in 1723"

In [118]:
def extract_timeline(doc):
    events={}
    mentions = []
    for idx, chain in enumerate(doc._.coref_clusters):
        for mention in chain.mentions:
            if str(chain[0]) in ["Ford", "Harrison", "Harrison Ford"]: # checking if coference is to Harrison Ford
                mentions.append(mention.sent) # add in corresponding sentence(s)
    p = re.compile('\d{4}') # regex expression to check for four-digit year
    for i in range(len(mentions)):
        mention_doc = nlp(str(mentions[i]))
        for token in mention_doc:
            if token.dep_ in ['nsubj', 'dobj']: # checking to see if Harrison Ford is either the subject or object in the sentence
                date_list = []
                for ent in mention_doc.ents:
                    if ent.label_ == "DATE" and p.match(ent.text) != None: # using NER to detect dates, and checking with regex
                        date_list.append(ent.text)
                if len(date_list) != 0:
                    events[str(mentions[i])] = date_list[0] # pick first date in cases of multiple dates
    return events

Let's print that extracted timeline from oldest to newest events.

In [119]:
events=extract_timeline(doc)
for k,v in sorted(events.items(), key=operator.itemgetter(1)):
    print("%s\t%s" % (v, k))

1960	Ford graduated in 1960 from Maine East High School in Park Ridge, Illinois.
1964	In 1964, after a season of summer stock with the Belfry Players in Wisconsin, Ford traveled to Los Angeles to apply for a job in radio voice-overs.
1964	He was first married to Mary Marquardt from 1964 until their divorce in 1979.
1968	He appeared in the western Journey to Shiloh (1968) and had an uncredited, non-speaking role in Michelangelo Antonioni's 1970 film Zabriskie Point as an arrested student protester.
1969	French filmmaker Jacques Demy chose Ford for the lead role of his first American film, Model Shop (1969), but the head of Columbia Pictures thought Ford had "no future" in the film business and told Demy to hire a more experienced actor.
1973	Casting director and fledgling producer Fred Roos championed the young Ford and secured him an audition with George Lucas for the role of Bob Falfa, which Ford went on to play in American Graffiti (1973).
1974	The Conversation (1974) and Apocalypse 