# Topic 6: Opinion Extraction

## Preliminaries 
Run this cell.

In [1]:
import sys
sys.path.append(r'\\ad.susx.ac.uk\ITS\TeachingResources\Departments\Informatics\LanguageEngineering\resources')
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import collections
from collections import defaultdict,Counter
from itertools import zip_longest
from IPython.display import display
from random import seed
get_ipython().magic('matplotlib inline')
import random
import math
import matplotlib.pylab as pylab
%matplotlib inline
params = {'legend.fontsize': 'large',
          'figure.figsize': (15, 5),
         'axes.labelsize': 'large',
         'axes.titlesize':'large',
         'xtick.labelsize':'large',
         'ytick.labelsize':'large'}
pylab.rcParams.update(params)
from pylab import rcParams
from operator import itemgetter, attrgetter, methodcaller
import matplotlib.pyplot as plt
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer
import seaborn as sns
import csv

## Overview
In this topic you will be looking at ways to extract opinion bearing words from DVD Amazon reviews. The goal is to find words that describe particular aspects of the film being reviewed. The specific aspects of films that we will be considering are: the **plot**, the **characters**, the **cinematography** and the **dialogue**. 

We are, in other words, interested in finding all of those words in a review that express the reviewers opinion about one of these aspects of the film. The idea is that this will provide a fine-grained characterisation of the opinion being expressed by the author of the review. We will refer to the words we are looking for as **opinion words**, and refer to the words used for particular aspects of the review as **aspect words**.

Following on from previous topic's material on dependency parsing, you will use the spaCy's output as the basis for identifying opinion words. This is based on the assumption that the opinion words we are looking for are words that occur in a sentence in the review in a particular (dependency) relationship to one of our aspect words (plot, characters, cinematography and dialogue).

For example, the opinion word "*amazing*" might be found because it is used in a sentence where it is an adjective modifying the aspect word "*plot*", as in the sentence "*I thought it had an amazing plot.*".

### Exercise
Run the cell below to set up spaCy and load in a corpus of Amazon DVD reviews.

In [2]:
import spacy
from sussex_nltk.corpus_readers import AmazonReviewCorpusReader

nlp = spacy.load('en')
dvd_reviews = [review for review in AmazonReviewCorpusReader().category("dvd").raw()]
print("The dvd review dataset contains {} reviews".format(len(dvd_reviews)))
parsed_reviews = [nlp(review) for review in dvd_reviews]   

Sussex NLTK root directory is \\ad.susx.ac.uk\ITS\TeachingResources\Departments\Informatics\LanguageEngineering\resources
The dvd review dataset contains 5491 reviews


### Exercise
We are now going to create code that finds all dependents and all heads of a list of aspect words. This code will be similar to the final exercise of the previous topic.

In the blank cell below, write code that takes a set of aspect words ("*plot*","*characters*","*cinematography*", and "*dialogue*") as an argument, and produces a dictionary that maps each aspect word to a dictionary with two keys: `deps` and `heads`, where the `deps` key maps to the list of dependents of the aspect word and `heads` maps to the list of heads of the aspect word. 

- Note that our aspect words are nouns.
- Note that our aspect words are not all lemmas.

Display your results for dependents and heads in separate tables. For example, suppose we have the following:
- `aspect_words` is a list of the aspect words we we interested in,
- `all_heads` is a list of lists of the aspect word's heads, and
- `all_dependents` is a list of lists of the aspect word's dependents


Given this, this code will display the results:

```df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Heads")
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
display(df_heads)
print("Dependents")
display(df_deps)
```

Run your code and look at the output.

In [4]:
aspect_words = ["plot", "characters", "cinematography", "dialogue"]
final_dict = defaultdict(lambda: defaultdict(list))

for review in parsed_reviews:
    for token in review:
        if token.pos_ == "NOUN" and token.orth_ in aspect_words:
            final_dict[token.orth_]["heads"].append(token.head.orth_)
            for child in token.children:
                final_dict[token.orth_]["deps"].append(child.orth_)
                
all_heads = [final_dict[word]["heads"] for word in aspect_words]
all_dependents = [final_dict[word]["deps"] for word in aspect_words]

df_heads = pd.DataFrame(list(zip_longest(*all_heads)), columns = aspect_words)
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)), columns = aspect_words)
print("Dependents")
display(df_deps)

Heads


Unnamed: 0,plot,characters,cinematography,dialogue
0,was,talk,cramped,sounds
1,to,of,is,kept
2,holes,know,is,is
3,is,are,'s,with
4,'s,lumped,despite,part
5,characters,were,was,developing
6,centers,for,did,past
7,for,of,locations,given
8,irrelevant,between,make,of
9,was,practicing,was,was


Dependents


Unnamed: 0,plot,characters,cinematography,dialogue
0,the,The,the,the
1,the,the,the,Her
2,the,other,the,The
3,the,or,the,the
4,usual,the,worst,and
5,predictable,other,ever,some
6,romance,two,",",the
7,the,Black,the,bad
8,the,",",beautiful,and
9,the,both,the,better


In [None]:
# %load solutions/all_heads_and_deps
aspect_words = ["plot","characters","cinematography","dialogue"]
linked_words = defaultdict(lambda: defaultdict(list)) 
for parsed_review in parsed_reviews:
    for token in parsed_review:
        if token.pos_ == 'NOUN' and token.orth_ in aspect_words:
            linked_words[token.orth_]["heads"].append(token.head.orth_)
            for child in token.children:
                linked_words[token.orth_]["deps"].append(child.orth_)
all_heads = [linked_words[word]["heads"] for word in aspect_words]
all_dependents = [linked_words[word]["deps"] for word in aspect_words]
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Dependents")
display(df_deps)


Our goal here is to find out what the reviewer is saying about the different aspects we are interested in. 
As you can see, not all of the words being returned are informative. In the next two exercises we look at how to (partially) address this.

### Exercise
Make a copy of your solution to the last exercise and position it below this cell.

Then adapt the code so that instead of creating lists containing just the tokens, we record the following other features:
- For the heads:
 - the lemma of the head,
 - the part of speech of the head, 
 - the dependency between the head and the aspect word
- For the dependencies:
 - the lemma of the dependent,
 - the port of speech of the dependent,
 - the dependency between the dependent and the aspect word
 
What is the point of this?

The idea is that with this additional information, we will be in a better position to work out what makes heads and dependents uninteresting. For example, we might want to remove punctuation.

Once you have made the necessary changes to the code, run the code on the dataset and look at the outputs to see if you can figure out ways of spotting the uninteresting entries in the table.
 

In [16]:
aspect_words = ["plot", "characters", "cinematography", "dialogue"]
final_dict = defaultdict(lambda: defaultdict(list))

for review in parsed_reviews:
    for token in review:
        if token.pos_ == "NOUN" and token.orth_ in aspect_words:
            final_dict[token.orth_]["heads"].append((token.head.orth_, token.head.lemma_, token.head.pos_, token.head.dep_))
            for child in token.children:
                final_dict[token.orth_]["deps"].append((child.orth_, child.lemma_, child.pos_, child.dep_))
                
all_heads = [final_dict[word]["heads"] for word in aspect_words]
all_dependents = [final_dict[word]["deps"] for word in aspect_words]

df_heads = pd.DataFrame(list(zip_longest(*all_heads)), columns = aspect_words)
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)), columns = aspect_words)
print("Dependents")
display(df_deps)

Heads


Unnamed: 0,plot,characters,cinematography,dialogue
0,"(was, be, VERB, ROOT)","(talk, talk, VERB, ROOT)","(cramped, cramp, VERB, ccomp)","(sounds, sound, VERB, conj)"
1,"(to, to, ADP, prep)","(of, of, ADP, prep)","(is, be, VERB, conj)","(kept, keep, VERB, advcl)"
2,"(holes, hole, NOUN, dobj)","(know, know, VERB, relcl)","(is, be, VERB, conj)","(is, be, VERB, ROOT)"
3,"(is, be, VERB, ROOT)","(are, be, VERB, ROOT)","('s, be, VERB, advcl)","(with, with, ADP, prep)"
4,"('s, be, VERB, case)","(lumped, lump, VERB, ccomp)","(despite, despite, ADP, prep)","(part, part, NOUN, pobj)"
5,"(characters, character, NOUN, pobj)","(were, be, VERB, aux)","(was, be, VERB, ccomp)","(developing, develop, VERB, pcomp)"
6,"(centers, center, NOUN, ROOT)","(for, for, ADP, prep)","(did, do, VERB, relcl)","(past, past, ADP, prep)"
7,"(for, for, ADP, prep)","(of, of, ADP, prep)","(locations, location, NOUN, appos)","(given, give, VERB, aux)"
8,"(irrelevant, irrelevant, ADJ, advcl)","(between, between, ADP, prep)","(make, make, VERB, ROOT)","(of, of, ADP, prep)"
9,"(was, be, VERB, ROOT)","(practicing, practice, VERB, ccomp)","(was, be, VERB, ROOT)","(was, be, VERB, ROOT)"


Dependents


Unnamed: 0,plot,characters,cinematography,dialogue
0,"(the, the, DET, det)","(The, the, DET, det)","(the, the, DET, det)","(the, the, DET, det)"
1,"(the, the, DET, det)","(the, the, DET, det)","(the, the, DET, det)","(Her, -PRON-, ADJ, poss)"
2,"(the, the, DET, det)","(other, other, ADJ, amod)","(the, the, DET, det)","(The, the, DET, det)"
3,"(the, the, DET, det)","(or, or, CCONJ, cc)","(the, the, DET, det)","(the, the, DET, det)"
4,"(usual, usual, ADJ, amod)","(the, the, DET, det)","(worst, bad, ADJ, amod)","(and, and, CCONJ, cc)"
5,"(predictable, predictable, ADJ, amod)","(other, other, ADJ, amod)","(ever, ever, ADV, advmod)","(some, some, DET, conj)"
6,"(romance, romance, NOUN, compound)","(two, two, NUM, nummod)","(,, ,, PUNCT, punct)","(the, the, DET, det)"
7,"(the, the, DET, det)","(Black, black, ADJ, amod)","(the, the, DET, det)","(bad, bad, ADJ, amod)"
8,"(the, the, DET, det)","(,, ,, PUNCT, punct)","(beautiful, beautiful, ADJ, amod)","(and, and, CCONJ, cc)"
9,"(the, the, DET, det)","(both, both, DET, appos)","(the, the, DET, det)","(better, good, ADJ, amod)"


In [None]:
# %load solutions/all_heads_deps_features

### Exercise
We are now ready to filter out some of the entries in the table that are uninteresting.
To do this, create the following sets:

```
unwanted_head_deps = set()
unwanted_head_lemmas = set()
unwanted_head_pos = set()
unwanted_dependent_deps = set()
unwanted_dependent_pos = set()
unwanted_dependent_lemmas = set()
```

The idea is that you will manually add entries to these sets as you spot ways to eliminate uninteresting entries in the table.

Undertake the following steps:
1. Defined an initial version of these sets based on your observations of the tables produced in the previous exercise.
2. Adapt the code so that these sets are used to filter out unwanted tokens before they are added to your lists of heads and dependents. 
 - one option here is to define two functions: `uninteresting_head(token)` and `uninteresting_dependent(token)`.
3. Run the code to see how well it has improved.
4. Keep adding to the sets and re-running your code until you are satisfied that you can't improve them any more.

In [18]:
unwanted_head_deps = {'pobj', 'attr'}
unwanted_head_lemmas = {'be', 'have', 'do'}
unwanted_head_pos = {'PROPN'}
unwanted_dependent_deps = {'det', 'predet','nummod', 'cc', 'prep', 'punct', 'case'}
unwanted_dependent_pos = {'SPACE','PROPN'}
unwanted_dependent_lemmas = {'-PRON-','do','be'}

def uninteresting_head(token):
    return token.dep_ in unwanted_head_deps or token.head.lemma_ in unwanted_head_lemmas

def uninteresting_dep(token):
    return token.dep_ in unwanted_dependent_deps or token.pos_ in unwanted_dependent_pos or token.lemma_ in unwanted_dependent_lemmas

final_dict = defaultdict(lambda: defaultdict(list))
aspect_words = ["plot", "characters", "cinematography", "dialogue"]

for review in parsed_reviews:
    for token in review:
        if token.pos_ == "NOUN" and token.orth_ in aspect_words:
            if not uninteresting_head(token):
                final_dict[token.orth_]["heads"].append((token.head.orth_, token.head.lemma_, token.head.pos_, token.head.dep_))
            for child in token.children:
                if not uninteresting_dep(token):
                    final_dict[token.orth_]["deps"].append((child.orth_, child.lemma_, child.pos_, child.dep_))
                
all_heads = [final_dict[word]["heads"] for word in aspect_words]
all_dependents = [final_dict[word]["deps"] for word in aspect_words]

df_heads = pd.DataFrame(list(zip_longest(*all_heads)), columns = aspect_words)
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)), columns = aspect_words)
print("Dependents")
display(df_deps)

Heads


Unnamed: 0,plot,characters,cinematography,dialogue
0,"(holes, hole, NOUN, dobj)","(talk, talk, VERB, ROOT)","(cramped, cramp, VERB, ccomp)","(sounds, sound, VERB, conj)"
1,"(characters, character, NOUN, pobj)","(know, know, VERB, relcl)","(locations, location, NOUN, appos)","(kept, keep, VERB, advcl)"
2,"(centers, center, NOUN, ROOT)","(lumped, lump, VERB, ccomp)","(make, make, VERB, ROOT)","(part, part, NOUN, pobj)"
3,"(irrelevant, irrelevant, ADJ, advcl)","(practicing, practice, VERB, ccomp)","(cinematography, cinematography, NOUN, ROOT)","(developing, develop, VERB, pcomp)"
4,"(line, line, NOUN, nsubj)","(writing, writing, NOUN, dobj)","(handle, handle, VERB, xcomp)","(given, give, VERB, aux)"
5,"(line, line, NOUN, pobj)","(plot, plot, NOUN, nsubj)","(seem, seem, VERB, ROOT)","(cover, cover, VERB, ROOT)"
6,"(point, point, NOUN, pobj)","(presented, present, VERB, ROOT)","(direction, direction, NOUN, conj)","(given, give, VERB, aux)"
7,"(lost, lose, VERB, ROOT)","(Charlie, charlie, PROPN, nsubj)","(cast, cast, NOUN, appos)","(heavy, heavy, ADJ, amod)"
8,"(summary, summary, NOUN, nsubj)","(locations, location, NOUN, conj)","(undermined, undermine, VERB, ROOT)","(heavy, heavy, ADJ, amod)"
9,"(got, get, VERB, ccomp)","(unlikeable, unlikeable, ADJ, ROOT)","(photography, photography, NOUN, nsubjpass)","(seems, seem, VERB, auxpass)"


Dependents


Unnamed: 0,plot,characters,cinematography,dialogue
0,"(the, the, DET, det)","(The, the, DET, det)","(the, the, DET, det)","(the, the, DET, det)"
1,"(the, the, DET, det)","(the, the, DET, det)","(the, the, DET, det)","(Her, -PRON-, ADJ, poss)"
2,"(the, the, DET, det)","(other, other, ADJ, amod)","(the, the, DET, det)","(The, the, DET, det)"
3,"(the, the, DET, det)","(or, or, CCONJ, cc)","(the, the, DET, det)","(the, the, DET, det)"
4,"(usual, usual, ADJ, amod)","(the, the, DET, det)","(worst, bad, ADJ, amod)","(and, and, CCONJ, cc)"
5,"(predictable, predictable, ADJ, amod)","(other, other, ADJ, amod)","(ever, ever, ADV, advmod)","(some, some, DET, conj)"
6,"(romance, romance, NOUN, compound)","(two, two, NUM, nummod)","(,, ,, PUNCT, punct)","(the, the, DET, det)"
7,"(the, the, DET, det)","(Black, black, ADJ, amod)","(the, the, DET, det)","(bad, bad, ADJ, amod)"
8,"(the, the, DET, det)","(,, ,, PUNCT, punct)","(beautiful, beautiful, ADJ, amod)","(and, and, CCONJ, cc)"
9,"(the, the, DET, det)","(both, both, DET, appos)","(the, the, DET, det)","(better, good, ADJ, amod)"


In [None]:
# %load solutions/filtered_heads_deps
unwanted_head_deps = {'pobj', 'attr', 'pobj'}
unwanted_head_lemmas = {'be','have','do'}
unwanted_head_pos = {'PROPN'}
unwanted_dependent_deps = {'det', 'predet','nummod', 'cc', 'prep', 'punct', 'case'}
unwanted_dependent_pos = {'SPACE','PROPN'}
unwanted_dependent_lemmas = {'-PRON-','do','be'}

def uninteresting_head(token):
    return token.dep_ in unwanted_head_deps or token.head.lemma_ in unwanted_head_lemmas

def uninteresting_dep(token):
    return token.dep_ in unwanted_dependent_deps or token.pos_ in unwanted_dependent_pos or token.lemma_ in unwanted_dependent_lemmas

aspect_words = ["plot","characters","cinematography","dialogue"]
linked_words = defaultdict(lambda: defaultdict(list)) 
for parsed_review in parsed_reviews:
    for token in parsed_review:
        if token.pos_ == 'NOUN' and token.orth_ in aspect_words:
            if not uninteresting_head(token): 
                linked_words[token.orth_]["heads"].append((token.head.orth_,token.head.pos_,token.dep_,token.head.lemma_))
            for child in token.children:
                if not uninteresting_dep(child):
                    linked_words[token.orth_]["deps"].append((child.orth_,child.pos_,child.dep_,child.lemma_))
all_heads = [linked_words[word]["heads"] for word in aspect_words]
all_dependents = [linked_words[word]["deps"] for word in aspect_words]
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Dependents")
display(df_deps)


## Opinion extractor
Over the remaining exercises in this notebook, we are now going to create and subsequently refine a function called `opinion_extractor`.

`opinion_extractor` takes two arguments:
- `aspect_token`: an aspect token, such as "*plot*", "*characters*", "*cinematography*" and "*dialogue*".
- `parsed_sentence`: a parsed sentence
The output should be a list of extracted opinions, i.e. a list of tokens that tell us something about what the reviewer is saying (in `parsed_sentence`) about `aspect_token`. 

### Exercise
To begin, in the cell below, complete an initial verison of the function `opinion_extractor` that returns a list of all dependents of the aspect token. 
- This list will be empty when there are no opinions relating `aspect_token` to extract in `parsed_sentence`.

In [26]:
def opinion_extractor(aspect_token,parsed_sentence):
    opinions = list()
    for token in parsed_sentence:
        if token.pos_ == "NOUN" and token.orth_ == aspect_token:
            for child in token.children:
                opinions.append(child.orth_)
    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
results = [] 
for parsed_review in parsed_reviews:
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))

show_results(results,"plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")       
     

Results for aspect word 'plot'

Sentence:
	However, the plot was predictable.
Opinion of 'plot':
	 'the'


Sentence:
	I'm still not sure what Jody Foster's character brings to the plot.
Opinion of 'plot':
	 'the'


Sentence:
	Furthermore, the plot is so predictable that it made the movie drag.
Opinion of 'plot':
	 'the'


Sentence:
	& then Owen Wilson's character fall in love with Rachel's McAdams character and well, that's the usual predictable romance plot.
Opinion of 'plot':
	 'the', 'usual', 'predictable', 'romance'


Sentence:
	This movie didn't do anything for me...Didn't care for the characters or the plot.
Opinion of 'plot':
	 'the'


Sentence:
	Not enough weirdness, tried too much for the plot.
Opinion of 'plot':
	 'the'


Sentence:
	Given my dislike of how the Japanese culture was handled in this film, it left me finding the plot irrelevant.
Opinion of 'plot':
	 'the'


Sentence:
	The plot was OK but it would have been much better if they had either stuck to a more authentic 

	 'the'


Sentence:
	But I just had to shut it off - there was nothing advancing the plot or the characters, so I just didn't care how it ended.

1 star for the plot, 1.5 stars for the screenplay, 2 stars for the singing
Opinion of 'plot':
	 'the', 'or', 'characters', 'the', ',', 'stars'


Sentence:
	The plot was butchered, the characters twisted, the story ruined, a pure agony.
Opinion of 'plot':
	 'The'


Sentence:
	The plot is rather unbelievable, naming the fact that all three main characters are leaders in their field.... successful author, high-ranking network producer, top-rate doctor.
Opinion of 'plot':
	 'The'


Sentence:
	This series might be considered a classic if watched by a 12-year-old, but adults will see though the indistinct plot and characters.
Opinion of 'plot':
	 'the', 'indistinct', 'and', 'characters'


Sentence:
	Most of the plot's famous "unpleasantness" is just tacked on to the main story, and is of no significance.
Opinion of 'plot':
	 'the', ''s'


Sentence:

	 'a', 'good', 'and', 'movie'


Sentence:
	The plot is not very intriguing, and you know that nothing like this can happen in the real life, so it lacks this sense of reality.
Opinion of 'plot':
	 'The'


Sentence:
	I'm not a movie buff, but Deep Cover has all the elements you would expect of a classic : great acting, memorable lines, interesting ( and logical ) plot and a decent ending.
Opinion of 'plot':
	 'interesting', 'and', 'ending'


Sentence:
	This is a wonderful film, with a silly but extremley funny plot, and some wonderful dialogue, particularly between Horton and Brady, who somehow manage to end up married to each other, much to their surprise.
Opinion of 'plot':
	 'a', 'silly', 'funny', ',', 'and', 'dialogue'


Sentence:
	The plot evolves as many other Coen movies (Raising Arizona comes to mind) as we are introduced to more-and-more colorful characters and the plot gets more-and-more outrageous, yet somehow still seems real.
Opinion of 'plot':
	 'The', 'more', 'the'


Sent

	 'the'


Sentence:
	The later revised version considerably weakened both characters and plot, and the "concert" version is even more woefully incomplete than the original cast album, but thank GOD they saw fit to include, "Too Many Mornings".
Opinion of 'plot':
	 ',', 'and', 'the'


Sentence:
	The plot is pretty unrealistic
Opinion of 'plot':
	 'The'


Sentence:
	As I said the plot's formulaic as you have the standard setup of the main character creating a wacky situation, one that eventually gets out of hand, ultimately crashing down around his ears, resulting in a last ditch effort to save the day.
Opinion of 'plot':
	 'the', ''s'


Sentence:
	It combines a bunch of the best French actrices, and an enjoyable plot.
Opinion of 'plot':
	 'an', 'enjoyable'


Sentence:
	the plot was dry, like the other 2 of the series, but how exciting of a plot can you have with the amount of action and skilled driving packed into the movie.
Opinion of 'plot':
	 'the', 'a'


Sentence:
	Sure the plot of 

- "Love before Breakfast" is a Universal production and has the simplest plot of the collection; in fact, no plot at all really.
Opinion of 'plot':
	 'the', 'simplest', 'of', 'is', '-', '"', 'is', 'no', 'all', 'really', '.'


Sentence:
	The film combines too many genres and the plot becomes convoluted.
Opinion of 'plot':
	 'the'


Sentence:
	That's probably more of the plot than needs to be revealed so I'll leave it at that.
Opinion of 'plot':
	 'the'


Sentence:
	In this treatment of the annoying plot, he loses his mind.
Opinion of 'plot':
	 'the', 'annoying'


Sentence:
	Anatomy of a Murder is at once both a delightful mystery and a suspenseful thiller for those would would rather think about all the many dimensions and layers of intrique going on in a movie rather than be spoon fed a formulaic plot.
Opinion of 'plot':
	 'a', 'formulaic'


Sentence:
	Rather formulaic plot; the dancing is the thing here
Opinion of 'plot':
	 'Rather', 'formulaic'


Sentence:
	The main plot is very thin

	From there, the plot goes nuts, but that's okay, because it's what makes Big Trouble such a funny and strangely smart movie.
Opinion of 'plot':
	 'the'


Sentence:
	He would have known where to go, what backdrops he wanted against the plot. 

I could relate to a few of the characters in this film.
Opinion of 'plot':
	 'the'


Sentence:
	The setting, the plot, the dialogues, the humor, and the music are all wonderful!
Opinion of 'plot':
	 'the', ',', 'are'


Sentence:
	This is a good movie with an interesting plot, good acting, decent directing and a politically relevent comment on the excesses of the US Press.
Opinion of 'plot':
	 'an', 'interesting', ',', 'good'


Sentence:
	A lot of people tend to say that this anime is confusing and doesn't have a real plot.
Opinion of 'plot':
	 'a', 'real'


Sentence:
	I think the director needs prozac to direct this movie, the whole plot is so mess up, junior high student can write better story than this..what was he thinking. don't waste your mo

	 'the', 'of'


Sentence:
	I would have loved to see the film go an hour longer and show why the Grail was vital to the plot.
Opinion of 'plot':
	 'the'


Sentence:
	With cringingly awful acting, laughable special effects (the ridiculous head in a bucket is a real guffaw moment), appalling dubbed dialogue, cheesy music, inconsistent and confusing plot and no tension or atmosphere, The Grim Reaper is a total disaster.
Opinion of 'plot':
	 'inconsistent', 'and', 'tension'


Sentence:
	The plot revolves around Crosby and Kay, old army buddies who make it big in show business after the WWII.
Opinion of 'plot':
	 'The'


Sentence:
	The character work is very good and the plot gets more and more interesting as the movie goes on.
Opinion of 'plot':
	 'the'


Sentence:
	The lead kid's grandma was needless in the plot and very annoying.
Opinion of 'plot':
	 'the', 'and'


Sentence:
	"Stargate" is still one of the science-fiction greats that successfully blends history and science-fiction in one

	 'a', 'cheesy'


Sentence:
	The setting, the plot, the dialogues, the humor, and the music are all wonderful!
Opinion of 'plot':
	 'the', ',', 'dialogues'


Sentence:
	It has comedy, fright, plot, and decent special effects all twisted together for a nice helping of...pardon the pun..."cheesy popcorn."
Opinion of 'plot':
	 ',', 'and'


Sentence:
	This is a movie made to involve the viewer in the plot and, when viewed for what it is, a refreshing change from the superficial, glossed over acting in some of today's' films.
Opinion of 'plot':
	 'the'


Sentence:
	This plot is much easier to digest on screen.
Opinion of 'plot':
	 'This'


Sentence:
	The plot is great, the acting is over the top, and the cinematography is fantastic.
Opinion of 'plot':
	 'The'


IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.


Sentence:
	It goes behind the scenes with a bunch of players including Tim Hudson, Ryan Dempster, select Phillies players, & many others, as well as looking at past characters who stand out in the game such as Tommy Lasorda and Yogi Berra.
Opinion of 'characters':
	 'past', 'stand'


Sentence:
	Hitler: the last ten days' does a bit better than the rest in portraying the supporting cast of characters in the drama but one stands out in 'The Bunker' and that is Michael Lonsdale as Martin Borman, Hitler's personal secretary and gatekeeper, a man of considerable power who plays his cards close to the vest and is always scheming to stay one step ahead of his rivals.
Opinion of 'characters':
	 'in'


Sentence:
	The struggles between the characters get played out over ten riveting scenes bookended by Marianne's opening and closing monologues.
Opinion of 'characters':
	 'the'


Sentence:
	The characters are not always likable, but they are never less than engrossing.
Opinion of 'characters':
	 

	This hubris leads to his un-doing as the world of porn coalesces with the world of drugs and LOTS of shady characters.
Opinion of 'characters':
	 'shady'


Sentence:
	The plot and dialogue is incredible because it explores all the shades of grey in the characters.
Opinion of 'characters':
	 'the'


Sentence:
	The characters (and their offscreen personalities) all are present and make for a fine children's film that both children will enjoy and parents can endure (I had to sit through Pokemon movies - most of them too, yuck!).
Opinion of 'characters':
	 'The', '(', 'and', 'personalities', 'all'


Sentence:
	i was intrigued by all of the various characters and felt that they all blended together in perfect harmony.
Opinion of 'characters':
	 'the', 'various'


Sentence:
	(now I can imagine Joan Crawford saying that to Clark Gable, but there weren't any characters with that kind of will or independence in this film).
Opinion of 'characters':
	 'n't', 'any', 'with'


Sentence:
	Lawrence W

	Something that has started to develop over the seasons has been the friendships/associations/interaction amongst the characters.
Opinion of 'characters':
	 'the'


Sentence:
	You suffer with the characters and you cringe as you know something is about to happen and you wish it not to.
Opinion of 'characters':
	 'the'


Sentence:
	Writer/Director Amy Heckerling, (who also did Fast Times At Ridgemont High - a lesser endeavor), understands and loves her characters, they are not stereotypes.
Opinion of 'characters':
	 'her'


Sentence:
	I refer you to the diner sequence in which characters portrayed by Guy Pearce and Joe Pantoliano converse over soup.
Opinion of 'characters':
	 'in', 'portrayed'


Sentence:
	The characters are so creative and the music is catchy and intelligent.
Opinion of 'characters':
	 'The'


Sentence:
	The voices are perfect for their characters.
Opinion of 'characters':
	 'their'


Sentence:
	The admiration is justified in terms of the formal qualities of the films,

	 'the', 'main'


Sentence:
	It is unclear why many of the secondary characters exist, and their interactions with the main characters are virtually inscrutable at times.
Opinion of 'characters':
	 'the', 'secondary', 'the', 'main'


Sentence:
	The disclaimer is great, saying that it's not a representation of all gay people, however most gay men and women can relate to various characters in the show either as themselves or as friends.
Opinion of 'characters':
	 'various', 'in'


Sentence:
	Truer words were never spoken as said by one of the characters two seconds before he literally loses his head.
Opinion of 'characters':
	 'the'


Sentence:
	This flick is scary with the hotties in tow...

I saw Brian kirkwood in the "real world-the movie'a cinematic take on the reality world! BTW which was a pretty well made take off/and decent horror movie take on the reality genre; and thought to myself "damn-hes hot!"

 now here he is as the hero of a first class horror movie that just happens to 

	You have Scott Campell and Kyra Sedgwick's two characters where one wants to start something, while another's unsure.
Opinion of 'characters':
	 'Sedgwick', 'two', 'wants'


Sentence:
	Another thing I liked about this movie, was the fact that the writers avoided any romantic involvement between Tom Cruise and Demi Moore's characters....something most other films on this scale would have done.
Opinion of 'characters':
	 'Moore'


Sentence:
	The actors and their characters create one of the greatest action-thirllers of all time.
Opinion of 'characters':
	 'their'


Sentence:
	Even with more colorful characters, there really is no need for this!
Opinion of 'characters':
	 'more', 'colorful'


Sentence:
	The four characters are the over the top Nico whos mother has no idea he is gay.
Opinion of 'characters':
	 'The', 'four'


Sentence:
	For a horror movie, it's not scary at all, not even a little suspense (though I kinda wanted the characters to die, so I guess that lost the suspense to m

	Although I'm sure you will want to see what happened to all of your favorite characters, your hopes will certainly be dashed and disappointed by a poor story and poor dialogue
Opinion of 'characters':
	 'your', 'favorite', ',', 'hopes'


Sentence:
	Love the movies that take time to show romantic development between the main characters and this one delivers it...from a woman's perspective.
Opinion of 'characters':
	 'the', 'main', 'and'


Sentence:
	Actually, the theme of the movie is not too far off modern times.  

Loved the striking scenes depicting the differences between North and South and its characters.
Opinion of 'characters':
	 'its'


Sentence:
	*
Devoid of empathetic characters, and bereft of any rational motivation of plot, "Hellraiser" is shocking for all the wrong reasons
Opinion of 'characters':
	 'empathetic', ','


Sentence:
	The characters stick with you and some of the phrases will be part of conversations for decades. "
Opinion of 'characters':
	 'The'


Sentence:


	Why I prefer the second interpretation in the last sentence to the first: when they recreate a key scene from the first movie, the actors portraying the characters in the first movie not only don't look like their first-movie counterparts, they couldn't possibly be more different.
Opinion of 'characters':
	 'the', 'in'


Sentence:
	You have to love a mob show that features characters doing impressions of The Godfather, Goodfellas, and Scarface.
Opinion of 'characters':
	 'doing'


Sentence:
	His woman-bashing, cat-loving, cold-blooded, bearded anti-terrorist expert is one of the better characters I have seen on film.
Opinion of 'characters':
	 'the', 'better', 'seen'


Sentence:
	But more importantly, he was fresh from his best character of his career, an incredibly well recieved heel, and in my opinion one of the best characters in WWE history.
Opinion of 'characters':
	 'the', 'best', 'in'


Sentence:
	You get to know the unique, even eccentric, characters as though they were real p

	The reflexive anti-Americanism of Spain and Europe are integral to many of the interactions experienced by Taylor and Eigeman, the two main characters.
Opinion of 'characters':
	 'the', 'two', 'main'


Sentence:
	A still shot is followed by a people shot which is followed by a still shot and so on; but they are coupled in such a fashion that the shot of the characters is the only real animation, the only real action, the only real drive-giving conversation an otherwise missing dynamic.
Opinion of 'characters':
	 'the'


Sentence:
	Then why do all the male characters still act egotistical?
Opinion of 'characters':
	 'all', 'the', 'male'


Sentence:
	If you want to watch an amazing movie get this movie and watch it because you will never see anything like this ever again russel's performance is incredible and denzel performence is also incredble over all this movie is one helleva movie because they both play very complex characters and they do very good jobs you won't see the sadistic a

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.



Sentence:
	First, the dialogue and situations are so insipid they often inspire unintentional hilarity.
Opinion of 'dialogue':
	 'the', 'and', 'situations'


Sentence:
	Some portions of the TV dialogue are lost in thick Yorkshire accents and unintelligible dialect, interesting only to language scholars.
Opinion of 'dialogue':
	 'the', 'TV'


Sentence:
	Without contrived manipulations, you are drawn into the depths and heights of their emotions through the use of stunning visuals, dramatic portrayals, pithy dialogue, moments of quiet contemplation, and a serene musical soundtrack that is worth owning on its own merits.
Opinion of 'dialogue':
	 'pithy', ',', 'moments'


Sentence:
	It offers a winning combination of drama and comedy, a certain realism that is only acquired through good dialogue and silence.
Opinion of 'dialogue':
	 'good', 'and', 'silence'


Sentence:
	It's not above the occasional cheap joke but counters it with some of the smartest and most off-the-wall pop cultural ri

In [None]:
# %load solutions/opinion_extractor_initial

### Exercise
Look at the output that your opinion extractor is producing.

As you can see, it isn't very good. In the sections below, you will be looking at a variety of ways in which you can refine the `opinion_extractor` function to improve its performance. 

The work you will do in the remainder of this notebook forms part of the first assessed coursework for this module. This will involve developing and assessing the effectiveness of several extensions to the opinion extractor. 
- For full details of what is required for the coursework see the coursework specification document which can be downloaded from the module website.

Have a look at the section **Tips for de-bugging and exploration** (see below). 

As you are investigating how well your `opinion_extractor` is working, you will want to view your opinion extractor's output for a substantial number of sentences. You might prefer to print your output to a file. 

The code in the next cell illustrates how this below.
- Note that you may wish to replace `"savefile.txt"` with a different path.

In [27]:
save_file_path = r"savefile.txt" # Set this to the location of the file you wish to create/overwrite with the saved output.

# This is a "with statement", it invokes a context manager, which handles the opening and closing of resources (like files)
with open(save_file_path, "w") as save_file:  # The 'w' says that we want to write to the file            
    for parsed_review in parsed_reviews:
        for sentence in parsed_review.sents:
            for aspect_token in aspect_words:
                opinions = opinion_extractor(aspect_token,sentence)
                if opinions:
                    save_file.write("--- Sentence ---\n{0}\nOpinions on {1}\n{2}\n".
                                    format(sentence,aspect_token,opinions))


UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 69: character maps to <undefined>

## Required extensions
There are 5 required extensions:
1. Adjectivel modification
2. Adjectives linked by copulae
3. Adverbial modifiers
4. Negation
5. Conjunction

In addition there are a number of optional extensions (see below).

### Example test set
In addition to testing out your opinion extractor on the DVD review dataset, we will also be looking at a very small set of examples that illustrate cases of each of the extensions that we will be making. 

Run the following cell to load up these sentences.
- Each example is a three tuple of the form  
`(<example_name>,<sentence>,<set of opinion words>)`

In [3]:
core = [("A.1","It has an exciting fresh plot.",set(["fresh", "exciting"])),
        ("B.1","The plot was dull.",set(["dull"])),
        ("C.1","It has an excessively dull plot.",set(["excessively-dull"])),
        ("C.2","The plot was excessively dull.",set(["excessively-dull"])),
        ("D.1","The plot wasn't dull.",set(["not-dull"])),
        ("D.2","It wasn't an exciting fresh plot.",set(["not-exciting", "not-fresh"])),
        ("D.3","The plot wasn't excessively dull.",set(["not-excessively-dull"])),
        ("E.1","The plot was cheesy, but fun and inspiring.",set(["cheesy", "fun", "inspiring"])),
        ("E.2","The plot was really cheesy and not particularly special.",set(["really-cheesy", "not-particularly-special"]))
       ]

optional = [("A","The script and plot are utterly excellent.",set(["utterly-excellent"])),
            ("B","The script and plot were unoriginal and boring.",set(["unoriginal", "boring"])),
            ("C","The plot wasn't lacking.",set(["not-lacking"])),
            ("D","The plot is full of holes.",set(["full-of-holes"])),
            ("E","There was no logical plot to this story.",set(["no-logical"])),
            ("F","I loved the plot.",set(["loved"])),
            ("G","I didn't mind the plot.",set(["not-mind"]))
           ]

### Problem with displaCy

You have been encouraged to use the dependency tree visualisation tool found [here](https://demos.explosion.ai/displacy).

However, it turns out that what it says on the displaCy site is incorrect, and the displaCy visualiser uses a *different* parsing model from the one that is installed by default with spaCy, and as a result, there are cases where the trees are different.

It is possible to set up spaCy to use the same model as displaCy by doing the following:

```
import spacy
nlp = spacy.load('en')
python -m spacy download en_core_web_md
nlp2 = spacy.load('en_core_web_md')
```

However, I would recommend that you stick with the spaCy default model, and check for differences. In the cell below, you will find code that can be used to show the structure of the dependency tree produced by spaCy, and it shows an example where the tree differs from that shown by the displaCy visualiser.


In [6]:
def print_dep_tree(node,indent):
    print(indent,node)
    indent = indent + "  "
    for child in node.children:
        print_dep_tree(child,indent)

sent = "He is a complete idiot"   
parsed_sents = nlp(sent)
parsed_sent = next(parsed_sents.sents)
print_dep_tree(parsed_sent.root,"")


 is
   He
   idiot
     a
     complete


## Extension A: Adjectival modification
In this section, we are interested in adjectival modification. This is when we have a noun like "*dog*" or "*plot*", and there are one or more adjectives which are specifying the characteristics of that noun. E.g. "*big brown dog*" or "*exciting fresh plot*" ("*big*" and "*brown*" are both adjectivally modifying "*dog*").

The dependency relation we use to show this relationship is `amod`.

Write a version of the opinion extraction function which, when given sentences such as the example below containing an aspects token (e.g. "*plot*"), uses the `amod` relations to extract a list of the adjectival modifiers of the aspect token (e.g. the two words "*exciting*" and "*fresh*" in this case).

**Core Example A.1**: "*It has an exciting fresh plot.*" should produce "*fresh*", "*exciting*"

The dependency trees for this sentence is shown below.

![Extension 1 example](./img/amod_example.png)

### Exercise
Adapt your opinion extractor so that it just finds adjectival modifiers, and apply it to examples test set in order to check that your function is working as required.
- Note, your opinion extractor should only get the first example, A.1, correct.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [11]:
def opinion_extractor(aspect_token, parsed_sentence):
    opinions = list()
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:
                if child.pos_ == 'ADJ' and child.dep_ == 'amod':
                        opinions.append(child.orth_)
    return opinions

aspect_words = ["plot","characters","cinematography","dialogue"]
sentence = "It has an exciting fresh plot."
parsed_sent = nlp(sentence)

for word in aspect_words:
    if word in sentence:
        print("Results for word '{}'".format(word))
        print("\tSentence: {}".format(sentence))
        print("\t",*opinion_extractor(word, parsed_sent))

Results for word 'plot'
	Sentence: It has an exciting fresh plot.
	 exciting fresh


## Extension B: Adjectives linked by copulae
In this section, we are interested in adjectives which are linked to our aspect term via the copula (conjugations of "*to be*": "*is*", "*was*", "*will be*", etc.). 

Notice that if we were only looking for `amod` relations, we'd completely miss the word "*dull*" in the dependency tree shown below.

Your opinion extraction function when given a sentences like the example below containing the aspect token "*plot*", should use appropriate dependency relations to output the term opinion word "*dull*".

**Core Example B.1**: "*The plot was dull.*" should produce "*dull*"

The dependency trees for this sentence is shown below.

![Extension 2 example](./img/copula_example.png)

### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on examples, A.1 and B.1

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [11]:
def opinion_extractor(aspect_token, parsed_sentence):
    opinions = list()
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:
                if child.dep_ == 'amod':
                        opinions.append(child.orth_)
            if token.dep_ == 'nsubj':
                for child in token.head.children:
                    if child.dep_ == 'acomp':
                        opinions.append(child.orth_)
    return opinions

aspect_words = ["plot","characters","cinematography","dialogue"]
sentence = "The plot was dull." 
parsed_sent = nlp(sentence)

for word in aspect_words:
    if word in sentence:
        print("Results for word '{}'".format(word))
        print("\tSentence: {}".format(sentence))
        print("\t", opinion_extractor(word, parsed_sent))

Results for word 'plot'
	Sentence: The plot was dull.
	 ['dull']


## Extension C: Adverbial modifiers
If you used the extractor you have built so far on the example sentences below, it will only find the opinion "*dull*". It would not recover an indication of the strength of the opinion. Adverbs like "*excessively*" elaborate on the adjectives that they modify in adverbial modification relations.

The relevant dependency relation we use to show this relationship is "*advmod*".

Your opinion extraction function when given a sentence like those below containing the aspect token "*plot*", should use the `advmod` relation to output features like "*excessively-dull*" 
- If you have an adjective token in a variable `adj_token`, and an adverb in a variable `adv_token` then you could create this feature like this: `adv_token.form + "-" + adj_token.form`.
- If you have a list of strings, you can use python's `join` function to concatenate them into a single string. The following would join the strings together, placing a `"-"` between each:  
`joined_string = "-".join(listofstrings)`


**Core Example C.1**: "*It has an excessively dull plot.*" should produce "*excessively-dull*"  
**Core Example C.2**: "*The plot was excessively dull.*" should produce "*excessively-dull*"

The dependency trees for these sentences are shown below.

![Extension 3 example](./img/advmod_example_1.png)

![Extension 3 example 2](./img/advmod_example_2.png)

### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on examples A.1, B.1, C.1, and C.2.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [12]:
def opinion_extractor(aspect_token, parsed_sentence):
    opinions = list()
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:
                if child.pos_ == 'ADJ' and child.dep_ == 'amod':
                    if list(child.children):
                        for sub_child in child.children:
                            if sub_child.pos_ == 'ADV' and sub_child.dep_ == 'advmod':
                                opinions.append(sub_child.orth_ + "-" + child.orth_)
                    else: 
                        opinions.append(child.orth_)
            if token.head.pos_ == 'VERB' and token.dep_ == 'nsubj':
                for child in token.head.children:
                    if child.pos_ == 'ADJ' and child.dep_ == 'acomp':
                        if list(child.children):
                            for sub_child in child.children:
                                if sub_child.pos_ == 'ADV' and sub_child.dep_ == 'advmod':
                                    opinions.append(sub_child.orth_ + "-" + child.orth_)
                        else: 
                            opinions.append(child.orth_)
    return opinions

aspect_words = ["plot","characters","cinematography","dialogue"]
sentence =  "The plot was excessively dramatically dull." 
parsed_sent = nlp(sentence)

for word in aspect_words:
    if word in sentence:
        print("Results for word '{}'".format(word))
        print("\tSentence: {}".format(sentence))
        print("\t", *opinion_extractor(word, parsed_sent))

Results for word 'plot'
	Sentence: The plot was excessively dramatically dull.
	 excessively-dull dramatically-dull


## Extension D: Negation
Look at the tree below; it is an example of an adjective linked by a copula. Your existing opinion extractor would extract "dull". However, notice that the example is saying that the plot was not dull! This is an example of the use of negation.

The dependency relation we use to show this relationship is `neg`.

Your opinion extraction function when given sentences like those below containing the aspect token "*plot*", should use the `neg` relation to output features like "*not-dull*". If you have an adjective token called `"token"`, then you could create this feature like this: `"not-" + token.form`.

**Core Example D.1**: "*The plot wasn't dull.*" should produce "*not-dull*"  
**Core Example D.2**: "*It wasn't an exciting fresh plot.*" should produce "*not-exciting*", "*not-fresh*"  
**Core Example D.3**: "*The plot wasn't excessively dull.*" should produce "*not-excessively-dull*"

The dependency trees for these sentences are shown below.

![Extension 4 example 1](./img/negation_example_1.png)

![Extension 4 example 2](./img/negation_example_2.png)

![Extension 4 example 3](./img/negation_example_3.png)

### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on examples A.1, B.1, C.1, C.2, D.1, D.2, and D.3.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [47]:
def opinion_extractor(aspect_token, parsed_sentence):
    opinions = list()
    
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:
                if child.pos_ == 'ADJ' and child.dep_ == 'amod':
                    if has_advmod(child):
                        for sub_child in child.children:                        
                            if sub_child.pos_ == 'ADV' and sub_child.dep_ == 'advmod':
                                opinions.append(sub_child.orth_ + "-" + child.orth_)
                    elif has_neg(token):
                        for sub_child in token.children:                        
                            if sub_child.pos_ == 'ADV' and sub_child.dep_ == 'neg':
                                opinions.append("not-" + child.orth_)
                    else:
                        opinions.append(child.orth_)
            if token.head.pos_ == 'VERB' and token.dep_ == 'nsubj':
                for child in token.head.children:
                    if child.pos_ == 'ADJ' and child.dep_ == 'acomp':                        
                        if has_advmod(child):
                            for sub_child in child.children:                        
                                if sub_child.pos_ == 'ADV' and sub_child.dep_ == 'advmod':
                                    if has_neg(child):
                                        for sub_child_1 in child.children:
                                            if sub_child_1.pos_ == 'ADV' and sub_child_1.dep_ == 'neg':
                                                opinions.append("not-" + sub_child.orth_ + "-" + child.orth_)
                                    else:
                                        opinions.append(sub_child.orth_ + "-" + child.orth_)
                        elif has_neg(child): 
                            for sub_child in child.children:                        
                                if sub_child.pos_ == 'ADV' and sub_child.dep_ == 'neg':
                                    opinions.append("not-" + child.orth_)
                        else:
                            opinions.append(child.orth_)
                            
    return opinions

def has_advmod(token):
    for child in token.children:
        if child.pos_ == 'ADV' and child.dep_ == 'advmod':
            return True

def has_neg(token):
    for child in token.children:
        if child.pos_ == 'ADV' and child.dep_ == 'neg':
            return True

aspect_words = ["plot","characters","cinematography","dialogue"]                     
parsed_sent = nlp(sentence)
for doc in core:
    for word in aspect_words:
        if word in sentence:
            print("Results for word '{}'".format(word))
            print("\tSentence: {}".format(doc[1]))
            print("\t", opinion_extractor(word, nlp(doc[1])))
            print("--------------------------------------------------------------------------------------------")



Results for word 'plot'
	Sentence: It has an exciting fresh plot.
	 ['exciting', 'fresh']
--------------------------------------------------------------------------------------------
Results for word 'plot'
	Sentence: The plot was dull.
	 ['dull']
--------------------------------------------------------------------------------------------
Results for word 'plot'
	Sentence: It has an excessively dull plot.
	 ['excessively-dull']
--------------------------------------------------------------------------------------------
Results for word 'plot'
	Sentence: The plot was excessively dull.
	 ['excessively-dull']
--------------------------------------------------------------------------------------------
Results for word 'plot'
	Sentence: The plot wasn't dull.
	 ['not-dull']
--------------------------------------------------------------------------------------------
Results for word 'plot'
	Sentence: It wasn't an exciting fresh plot.
	 ['not-exciting', 'not-fresh']
---------------------------

## Extension E: Conjunction
If you used your existing extractor on the tree below, it would only extract "*cheesy*". However, "*fun*" and "*inspiring*" are both conjoined with "*cheesy*"; this means that they all apply to the subject ("*plot*").

This conjunction relation is shown via the `conj` dependency. Note that words other than adjectives can be the conjuncts. You could investigate whether this is a problem.

**Core Example E.1**: "*The plot was cheesy but fun and inspiring.*" should produce "*cheesy*", "*fun*", "*inspiring*"  
**Core Example E.2**: "*The plot was really cheesy and not particularly special.*" should produce "*really-cheesy*", "*not-particularly-special*"

The dependency trees for these sentences are shown below.

![Extension 5 example](./img/conj_example_1.png)

![Extension 5 example 2](./img/conj_example_2.png)

### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on all of the required examples.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [118]:
core = [("A.1","It has an exciting fresh plot.",set(["fresh", "exciting"])),
        ("B.1","The plot was dull.",set(["dull"])),
        ("C.1","It has an excessively dull plot.",set(["excessively-dull"])),
        ("C.2","The plot was excessively dull.",set(["excessively-dull"])),
        ("D.1","The plot wasn't dull.",set(["not-dull"])),
        ("D.2","It wasn't an exciting fresh plot.",set(["not-exciting", "not-fresh"])),
        ("D.3","The plot wasn't excessively dull.",set(["not-excessively-dull"])),
        ("E.1","The plot was cheesy but fun and inspiring.",set(["cheesy", "fun", "inspiring"])),
        ("E.2","The plot was really cheesy and not particularly special.",set(["really-cheesy", "not-particularly-special"]))
       ]

optional = [("A","The script and plot are utterly excellent.",set(["utterly-excellent"])),
            ("B","The script and plot were unoriginal and boring.",set(["unoriginal", "boring"])),
            ("C","The plot wasn't lacking.",set(["not-lacking"])),
            ("D","The plot is full of holes.",set(["full-of-holes"])),
            ("E","There was no logical plot to this story.",set(["no-logical"])),
            ("F","I loved the plot.",set(["loved"])),
            ("G","I didn't mind the plot.",set(["not-mind"]))
           ]

In [27]:
def print_dep_tree(node,indent):
    print(indent,node)
    indent = indent + "  "
    for child in node.children:
        print_dep_tree(child,indent)

sent = "It wasn't an exciting fresh plot."
parsed_sents = nlp(sent)
parsed_sent = next(parsed_sents.sents)
print_dep_tree(parsed_sent.root,"")

 was
   It
   plot
     n't
     an
     exciting
     fresh
   .


In [146]:
def opinion_extractor(aspect_token, parsed_sentence):
    opinions = list()
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            opinion_extractor(aspect_token, [t for t in token.subtree if not t == token])
        if noun_relation(token, aspect_token):
            if (token.dep_ == 'acomp' or token.dep_ == 'amod' or token.dep_ == 'conj'):
                if has_advmod(token):
                    opinion_extractor(aspect_token, [t for t in token.subtree if not t == token])
                else:
                    opinions.append(is_neg(token) + token.orth_) 
            elif token.dep_ == 'advmod':
                opinions.append(is_neg(token) + token.orth_ + "-" + token.head.orth_)
    return opinions

def noun_relation(token, ancestor):
    for anc in token.ancestors:
        if anc.orth_ == ancestor:
            return True
        elif anc.pos_ == 'VERB':
            for kid in anc.children:
                if kid.orth_ == ancestor:
                    return True

def is_neg(token):
    if has_neg(token) or has_neg(token.head):
        return "not-"
    return ""

def has_neg(token):
    for child in token.children:
        if child.dep_ == 'neg':
            return True
        
def has_advmod(token):
    for child in token.children:
        if child.dep_ == 'advmod':
            return True

aspect_words = ["plot","characters","cinematography","dialogue"]
extensions = [doc[0] for doc in core]
sentences = [doc[1] for doc in core]
expected = [doc[2] for doc in core]
opinions = [[opinion_extractor(word, nlp(doc[1])) for word in aspect_words if word in doc[1]] for doc in core]

#df = pd.DataFrame(list(zip(extensions, sentences, expected, opinions)), columns = ["Extension", "Sentence", "Expected", "Opinion"])
#display(df)
for doc in core:
    for word in aspect_words:
        if word in doc[1]:
            print("Results for word '{}'".format(word))
            print("\tSentence: {}".format(doc[1]))
            print("\t", opinion_extractor(word, nlp(doc[1])))
            print("--------------------------------------------------------------------------------------------")


Results for word 'plot'
	Sentence: It has an exciting fresh plot.
	 ['exciting', 'fresh']
--------------------------------------------------------------------------------------------
Results for word 'plot'
	Sentence: The plot was dull.
	 ['dull']
--------------------------------------------------------------------------------------------
Results for word 'plot'
	Sentence: It has an excessively dull plot.
	 ['excessively-dull']
--------------------------------------------------------------------------------------------
Results for word 'plot'
	Sentence: The plot was excessively dull.
	 ['excessively-dull']
--------------------------------------------------------------------------------------------
Results for word 'plot'
	Sentence: The plot wasn't dull.
	 ['not-dull']
--------------------------------------------------------------------------------------------
Results for word 'plot'
	Sentence: It wasn't an exciting fresh plot.
	 ['not-exciting', 'not-fresh']
---------------------------

## Additional extensions

This section presents some examples on which your current opinion extractor will fail. In all of the examples below, "plot" is the aspect token.

You are not required to extend your opinion extractor to handle these cases.

**Optional Example A**: "The script and plot are utterly excellent." produces "utterly-excellent"  
**Optional Example B**: "The script and plot were unoriginal and boring." produces "unoriginal", "boring"  
**Optional Example C**: "The plot wasn't lacking." produces "not-lacking"  
**Optional Example D**: "The plot is full of holes." produces "full-of-holes"  
**Optional Example E**: "There was no logical plot to this story." produces "no-logical"  
**Optional Example F**: "I loved the plot." produces "loved"  
**Optional Example G**: "I didn't mind the plot." produces "not-mind"

The dependency trees for these sentences are shown below.

![Extension example 2](./img/additional_example_1.png)

![Extension example 2](./img/additional_example_2.png)

![Extension example 2](./img/additional_example_3.png)

![Extension example 2](./img/additional_example_4.png)

![Extension example 2](./img/additional_example_5.png)

![Extension example 2](./img/additional_example_6.png)

![Extension example 2](./img/additional_example_7.png)

## Tips for de-bugging and exploration
When you will be assessing whether your opinion extractor has been effective when analysing a given sentence. Before you look at what the dependency parser says, read the sentence carefully and determine for yourself the scope of the words. Consider the following sentence.

"This film has excellent characters and an intriguing and engaging plot."

It should be obvious to you that here the plot is described as both "intriguing" and "engaging". However, "excellent" is only used to describe the cinematography.

If the parser suggests a structure which implies that plot is also described by "excellent" (for example), something has gone wrong.### Dependency tree visualisation tool
You may find it useful to use the dependency tree visualisation tool found [here](https://demos.explosion.ai/displacy).

You can copy and past example sentences from the DVD review data to get a good sense of what the dependency parse looks like.

### Outputing results only from sentences relevant to the current task
You will find that your output is dominated by examples of adjectival modification and adjectives linked via the copula. This means that when you add a new function (extensions 3-5) it will be difficult to determine the impact of that new functionality.

One way to solve this problem, is to (temporarily) output only those features produced by the new functionality.

For example, imagine you have just completed extensions 1 and 2. Next, you write code that adds the adverbial features (extension 3). When assessing how well your code is working, let your extractor only extract the "new" adverb features.

There are 2 easy ways to achieve this:

1. Comment out any extractor code that produces features that you're not currently interested in. Or
2. Introduce a boolean variable, which you only set to `True` when you have extracted the feature that you are interested in. Then always ouput an empty list if the variable is `False`, otherwise output the full opinion list.