# Topic 6: Opinion Extraction

## Preliminaries 
Run this cell.

In [57]:
import sys
# sys.path.append(r'\\ad.susx.ac.uk\ITS\TeachingResources\Departments\Informatics\LanguageEngineering\resources')
sys.path.append(r'/Users/warrenboult/Documents/MSC/nlp/resources/')
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import collections
from collections import defaultdict,Counter
from itertools import zip_longest
from IPython.display import display
from random import seed
get_ipython().magic('matplotlib inline')
import random
import math
import matplotlib.pylab as pylab
%matplotlib inline
params = {'legend.fontsize': 'large',
          'figure.figsize': (15, 5),
         'axes.labelsize': 'large',
         'axes.titlesize':'large',
         'xtick.labelsize':'large',
         'ytick.labelsize':'large'}
pylab.rcParams.update(params)
from pylab import rcParams
from operator import itemgetter, attrgetter, methodcaller
import matplotlib.pyplot as plt
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer
import seaborn as sns
import csv

## Overview
In this topic you will be looking at ways to extract opinion bearing words from DVD Amazon reviews. The goal is to find words that describe particular aspects of the film being reviewed. The specific aspects of films that we will be considering are: the **plot**, the **characters**, the **cinematography** and the **dialogue**. 

We are, in other words, interested in finding all of those words in a review that express the reviewers opinion about one of these aspects of the film. The idea is that this will provide a fine-grained characterisation of the opinion being expressed by the author of the review. We will refer to the words we are looking for as **opinion words**, and refer to the words used for particular aspects of the review as **aspect words**.

Following on from previous topic's material on dependency parsing, you will use the spaCy's output as the basis for identifying opinion words. This is based on the assumption that the opinion words we are looking for are words that occur in a sentence in the review in a particular (dependency) relationship to one of our aspect words (plot, characters, cinematography and dialogue).

For example, the opinion word "*amazing*" might be found because it is used in a sentence where it is an adjective modifying the aspect word "*plot*", as in the sentence "*I thought it had an amazing plot.*".

### Exercise
Run the cell below to set up spaCy and load in a corpus of Amazon DVD reviews.

In [58]:
import spacy
from sussex_nltk.corpus_readers import AmazonReviewCorpusReader

nlp = spacy.load('en')
dvd_reviews = [review for review in AmazonReviewCorpusReader().category("dvd").raw()]
print("The dvd review dataset contains {} reviews".format(len(dvd_reviews)))
parsed_reviews = [nlp(review) for review in dvd_reviews]
print('done')


The dvd review dataset contains 5491 reviews
done


### Exercise
We are now going to create code that finds all dependents and all heads of a list of aspect words. This code will be similar to the final exercise of the previous topic.

In the blank cell below, write code that takes a set of aspect words ("*plot*","*characters*","*cinematography*", and "*dialogue*") as an argument, and produces a dictionary that maps each aspect word to a dictionary with two keys: `deps` and `heads`, where the `deps` key maps to the list of dependents of the aspect word and `heads` maps to the list of heads of the aspect word. 

- Note that our aspect words are nouns.
- Note that our aspect words are not all lemmas.

Display your results for dependents and heads in separate tables. For example, suppose we have the following:
- `aspect_words` is a list of the aspect words we we interested in,
- `all_heads` is a list of lists of the aspect word's heads, and
- `all_dependents` is a list of lists of the aspect word's dependents


Given this, this code will display the results:

```df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Heads")
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
display(df_heads)
print("Dependents")
display(df_deps)
```

Run your code and look at the output.

In [None]:
aspect_words = ["plot","characters","cinematography","dialogue"]
word_dict = defaultdict(lambda: defaultdict(list))
for review in parsed_reviews:
    for token in review:
        if token.head.pos_ == 'NOUN' and token.text in aspect_words:
            word_dict[token.text]["heads"].append(token.head.text)
            for child in token.children:
                word_dict[token.text]["deps"].append(child.text)
all_heads = [word_dict[asp_wd]["heads"] for asp_wd in aspect_words]
all_dependents = [word_dict[asp_wd]["deps"] for asp_wd in aspect_words]
print('Done building lists')
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Heads")
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
display(df_heads)
print("Dependents")
display(df_deps)
        

In [None]:
# %load solutions/all_heads_and_deps
aspect_words = ["plot","characters","cinematography","dialogue"]
linked_words = defaultdict(lambda: defaultdict(list)) 
for parsed_review in parsed_reviews:
    for token in parsed_review:
        if token.pos_ == 'NOUN' and token.orth_ in aspect_words:
            linked_words[token.orth_]["heads"].append(token.head.orth_)
            for child in token.children:
                linked_words[token.orth_]["deps"].append(child.orth_)
all_heads = [linked_words[word]["heads"] for word in aspect_words]
all_dependents = [linked_words[word]["deps"] for word in aspect_words]
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Dependents")
display(df_deps)


Our goal here is to find out what the reviewer is saying about the different aspects we are interested in. 
As you can see, not all of the words being returned are informative. In the next two exercises we look at how to (partially) address this.

### Exercise
Make a copy of your solution to the last exercise and position it below this cell.

Then adapt the code so that instead of creating lists containing just the tokens, we record the following other features:
- For the heads:
 - the lemma of the head,
 - the part of speech of the head, 
 - the dependency between the head and the aspect word
- For the dependencies:
 - the lemma of the dependent,
 - the port of speech of the dependent,
 - the dependency between the dependent and the aspect word
 
What is the point of this?

The idea is that with this additional information, we will be in a better position to work out what makes heads and dependents uninteresting. For example, we might want to remove punctuation.

Once you have made the necessary changes to the code, run the code on the dataset and look at the outputs to see if you can figure out ways of spotting the uninteresting entries in the table.
 

In [17]:
aspect_words = ["plot","characters","cinematography","dialogue"]
word_dict = defaultdict(lambda: defaultdict(list))
for review in parsed_reviews:
    for token in review:
        if token.pos_ == 'NOUN' and token.text in aspect_words:
            word_dict[token.text]["heads"].append((token.head.text,token.head.lemma_,token.head.pos_,token.head.dep_))
            for child in token.children:
                word_dict[token.text]["deps"].append((child.text,child.lemma_,child.pos_,child.dep_))
all_heads = [word_dict[asp_wd]["heads"] for asp_wd in aspect_words]
all_dependents = [word_dict[asp_wd]["deps"] for asp_wd in aspect_words]
print('Done building lists')
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Tuple format: (word, lemma, POS, dep. relation)")
print("Heads")
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
display(df_heads)
print("Dependents")
display(df_deps)
        

Done building lists
Tuple format: (word, lemma, POS, dep. relation)
Heads


Unnamed: 0,plot,characters,cinematography,dialogue
0,"(was, be, VERB, ROOT)","(talk, talk, VERB, ROOT)","(cramped, cramp, VERB, conj)","(sounds, sound, VERB, conj)"
1,"(to, to, ADP, prep)","(of, of, ADP, prep)","(is, be, VERB, conj)","(kept, keep, VERB, ROOT)"
2,"(holes, hole, NOUN, dobj)","(know, know, VERB, ccomp)","(is, be, VERB, ROOT)","(is, be, VERB, ROOT)"
3,"(is, be, VERB, ROOT)","(are, be, VERB, ccomp)","('s, have, VERB, advcl)","(with, with, ADP, prep)"
4,"('s, have, VERB, ROOT)","(lumped, lump, VERB, ROOT)","(despite, despite, ADP, prep)","(were, be, VERB, ROOT)"
5,"(characters, character, NOUN, pobj)","(saying, say, VERB, ccomp)","(was, be, VERB, ROOT)","(developing, develope, VERB, pcomp)"
6,"(centers, center, NOUN, ROOT)","(for, for, ADP, prep)","(did, do, VERB, relcl)","(past, past, ADP, prep)"
7,"(for, for, ADP, prep)","(of, of, ADP, prep)","(costumes, costume, NOUN, conj)","(given, give, VERB, pcomp)"
8,"(irrelevant, irrelevant, ADJ, ccomp)","(between, between, ADP, prep)","(make, make, VERB, ROOT)","(of, of, ADP, prep)"
9,"(was, be, VERB, ROOT)","(sees, see, VERB, relcl)","(was, be, VERB, ROOT)","(was, be, VERB, ROOT)"


Dependents


Unnamed: 0,plot,characters,cinematography,dialogue
0,"(the, the, DET, det)","(The, the, DET, det)","(the, the, DET, det)","(the, the, DET, det)"
1,"(the, the, DET, det)","(the, the, DET, det)","(the, the, DET, det)","(Her, -PRON-, ADJ, poss)"
2,"(the, the, DET, det)","(other, other, ADJ, amod)","(the, the, DET, det)","(The, the, DET, det)"
3,"(the, the, DET, det)","(the, the, DET, det)","(the, the, DET, det)","(the, the, DET, det)"
4,"(usual, usual, ADJ, amod)","(other, other, ADJ, amod)","(worst, wrong, ADJ, amod)","(and, and, CCONJ, cc)"
5,"(predictable, predictable, ADJ, amod)","(two, two, NUM, nummod)","(ever, ever, ADV, advmod)","(some, some, DET, conj)"
6,"(romance, romance, NOUN, compound)","(Black, black, ADJ, amod)","(the, the, DET, det)","(the, the, DET, det)"
7,"(the, the, DET, det)","(,, ,, PUNCT, punct)","(beautiful, beautiful, ADJ, amod)","(bad, bad, ADJ, amod)"
8,"(the, the, DET, det)","(loyal, loyal, ADJ, amod)","(the, the, DET, det)","(and, and, CCONJ, cc)"
9,"(the, the, DET, det)","(several, several, ADJ, amod)","(the, the, DET, det)","(some, some, DET, conj)"


In [None]:
# %load solutions/all_heads_deps_features
aspect_words = ["plot","characters","cinematography","dialogue"]
linked_words = defaultdict(lambda: defaultdict(list)) 
for parsed_review in parsed_reviews:
    for token in parsed_review:
        if token.pos_ == 'NOUN' and token.orth_ in aspect_words:
            linked_words[token.orth_]["heads"].append((token.head.orth_,token.head.lemma_,token.head.pos_,token.dep_))
            for child in token.children:
                linked_words[token.orth_]["deps"].append((child.orth_,child.lemma_,child.pos_,child.dep_))
all_heads = [linked_words[word]["heads"] for word in aspect_words]
all_dependents = [linked_words[word]["deps"] for word in aspect_words]
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Tuple format: (word, lemma, POS, dep. relation)")
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Dependents")
display(df_deps)


### Exercise
We are now ready to filter out some of the entries in the table that are uninteresting.
To do this, create the following sets:

```
unwanted_head_deps = set()
unwanted_head_lemmas = set()
unwanted_head_pos = set()
unwanted_dependent_deps = set()
unwanted_dependent_pos = set()
unwanted_dependent_lemmas = set()
```

The idea is that you will manually add entries to these sets as you spot ways to eliminate uninteresting entries in the table.

Undertake the following steps:
1. Defined an initial version of these sets based on your observations of the tables produced in the previous exercise.
2. Adapt the code so that these sets are used to filter out unwanted tokens before they are added to your lists of heads and dependents. 
 - one option here is to define two functions: `uninteresting_head(token)` and `uninteresting_dependent(token)`.
3. Run the code to see how well it has improved.
4. Keep adding to the sets and re-running your code until you are satisfied that you can't improve them any more.

In [23]:
unwanted_head_deps = set(['prep','advcl','relcl','xcomp','pobj','pcomp'])
unwanted_head_lemmas = set(['be','to','have','do','for','of','a','off','with'])
unwanted_head_pos = set(['ADP','PART'])
unwanted_dependent_deps = set(['det','relcl','cc','punct','prep','poss','nummod','conj'])
unwanted_dependent_pos = set(['DET','ADP','NUM','SPACE','CCONJ','PUNCT','PROPN',])
unwanted_dependent_lemmas = set(['the','other','and','some','of','or','to','be','a','this','-PRON-'])

def uninteresting_head(token):
    return token.lemma_ in unwanted_head_lemmas or token.pos_ in unwanted_head_pos or token.dep_ in unwanted_head_deps

def uninteresting_dependent(token):
    return token.lemma_ in unwanted_dependent_lemmas or token.pos_ in unwanted_dependent_pos or token.dep_ in unwanted_head_deps

aspect_words = ["plot","characters","cinematography","dialogue"]
word_dict = defaultdict(lambda: defaultdict(list))
for review in parsed_reviews:
    for token in review:
        if token.pos_ == 'NOUN' and token.text in aspect_words:
            if not uninteresting_head(token.head):
                word_dict[token.text]["heads"].append((token.head.text,token.head.lemma_,token.head.pos_,token.head.dep_))
            for child in token.children:
                if not uninteresting_dependent(child):
                    word_dict[token.text]["deps"].append((child.text,child.lemma_,child.pos_,child.dep_))
all_heads = [word_dict[asp_wd]["heads"] for asp_wd in aspect_words]
all_dependents = [word_dict[asp_wd]["deps"] for asp_wd in aspect_words]
print('Done building lists')
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Tuple format: (word, lemma, POS, dep. relation)")
print("Heads")
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
display(df_heads)
print("Dependents")
display(df_deps)

Done building lists
Tuple format: (word, lemma, POS, dep. relation)
Heads


Unnamed: 0,plot,characters,cinematography,dialogue
0,"(holes, hole, NOUN, dobj)","(talk, talk, VERB, ROOT)","(cramped, cramp, VERB, conj)","(sounds, sound, VERB, conj)"
1,"(centers, center, NOUN, ROOT)","(know, know, VERB, ccomp)","(costumes, costume, NOUN, conj)","(kept, keep, VERB, ROOT)"
2,"(irrelevant, irrelevant, ADJ, ccomp)","(lumped, lump, VERB, ROOT)","(make, make, VERB, ROOT)","(cover, cover, VERB, ROOT)"
3,"(uninvolving, uninvolving, ADJ, ccomp)","(saying, say, VERB, ccomp)","(cinematography, cinematography, NOUN, ROOT)","(heavy, heavy, ADJ, amod)"
4,"(line, line, NOUN, nsubj)","(plot, plot, NOUN, nsubj)","(scenography, scenography, NOUN, nsubj)","(heavy, heavy, ADJ, amod)"
5,"(dialogue, dialogue, NOUN, nsubj)","(presented, present, VERB, ROOT)","(direction, direction, NOUN, conj)","(forced, force, VERB, ROOT)"
6,"(lost, lose, VERB, ROOT)","(locations, location, NOUN, conj)","(cast, cast, NOUN, intj)","(scenes, scene, NOUN, nsubj)"
7,"(summary, summary, NOUN, attr)","(getting, get, VERB, ccomp)","(undermined, undermine, VERB, ROOT)","(understand, understand, VERB, ccomp)"
8,"(got, get, VERB, ccomp)","(going, go, VERB, ROOT)","(photography, photography, NOUN, nsubj)","(wavers, waver, NOUN, conj)"
9,"(revel, revel, NOUN, ROOT)","(hit, hit, VERB, ccomp)","(acting, act, NOUN, conj)","(script, script, NOUN, nsubj)"


Dependents


Unnamed: 0,plot,characters,cinematography,dialogue
0,"(usual, usual, ADJ, amod)","(Black, black, ADJ, amod)","(worst, wrong, ADJ, amod)","(bad, bad, ADJ, amod)"
1,"(predictable, predictable, ADJ, amod)","(loyal, loyal, ADJ, amod)","(ever, ever, ADV, advmod)","(better, well, ADJ, amod)"
2,"(romance, romance, NOUN, compound)","(several, several, ADJ, amod)","(beautiful, beautiful, ADJ, amod)","(suitable, suitable, ADJ, amod)"
3,"(characters, character, NOUN, conj)","(key, key, ADJ, amod)","(awful, awful, ADJ, amod)","(all, all, ADJ, predet)"
4,"(central, central, ADJ, amod)","(plot, plot, NOUN, conj)","(editing, edit, NOUN, conj)","(terrible, terrible, ADJ, amod)"
5,"(called, call, VERB, conj)","(main, main, ADJ, amod)","(good, good, ADJ, amod)","(acting, act, NOUN, amod)"
6,"(keep, keep, VERB, acl)","(practicing, practice, VERB, acl)","(music, music, NOUN, conj)","(cliched, cliched, ADJ, amod)"
7,"(movie, movie, NOUN, poss)","(lead, lead, ADJ, amod)","(Well, well, INTJ, intj)","(plot, plot, NOUN, conj)"
8,"(message, message, NOUN, conj)","(writing, write, NOUN, amod)","(nice, nice, ADJ, amod)","(cliched, cliched, ADJ, amod)"
9,"(barfs, barfs, ADJ, conj)","(All, all, ADJ, predet)","(interiors, interior, NOUN, appos)","(tortured, torture, ADJ, amod)"


In [None]:
%load solutions/filtered_heads_deps

## Opinion extractor
Over the remaining exercises in this notebook, we are now going to create and subsequently refine a function called `opinion_extractor`.

`opinion_extractor` takes two arguments:
- `aspect_token`: an aspect token, such as "*plot*", "*characters*", "*cinematography*" and "*dialogue*".
- `parsed_sentence`: a parsed sentence
The output should be a list of extracted opinions, i.e. a list of tokens that tell us something about what the reviewer is saying (in `parsed_sentence`) about `aspect_token`. 

### Exercise
To begin, in the cell below, complete an initial verison of the function `opinion_extractor` that returns a list of all dependents of the aspect token. 
- This list will be empty when there are no opinions relating `aspect_token` to extract in `parsed_sentence`.

In [69]:
def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.text == aspect_token:
            for child in token.children:
                opinions.append(child.text)
    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
results = [] 
for parsed_review in parsed_reviews:
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))

show_results(results,"plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")       
     

Results for aspect word 'plot'

Sentence:
	However,theplotwaspredictable.
Opinion of 'plot':
	 'predictable'


Sentence:
	Furthermore,theplotissopredictablethatitmadethemoviedrag.
Opinion of 'plot':
	 'predictable'


Sentence:
	well,that'stheusualpredictableromanceplot.
Opinion of 'plot':
	 'usual', 'predictable'


Sentence:
	TheplotwasOKbutitwouldhavebeenmuchbetteriftheyhadeitherstucktoamoreauthenticrepresentationofJapanesecultureorkeptthestoryinthestates.
Opinion of 'plot':
	 'OK'


Sentence:
	Theperformancesarethinanduneven,theplotisinconsistent,andthedialoguesoundslikeitwaspulledfromfortunecookiesmadebythecastofFriends.
Opinion of 'plot':
	 'inconsistent'


Sentence:
	AlthoughthismoviewantedtobeinthesameraceandqualityofthemovieCLUE,itfailstorealizethattheplotandcharactersareincrediblyriculousandnotfunnyhahabutfunnystupid.
Opinion of 'plot':
	 'riculous'


Sentence:
	Theactingishorrible,theplotishorrible,thefilmingishorrible...
Opinion of 'plot':
	 'horrible'


Sentence:
	Its"thrill

In [None]:
%load solutions/opinion_extractor_initial

### Exercise
Look at the output that your opinion extractor is producing.

As you can see, it isn't very good. In the sections below, you will be looking at a variety of ways in which you can refine the `opinion_extractor` function to improve its performance. 

The work you will do in the remainder of this notebook forms part of the first assessed coursework for this module. This will involve developing and assessing the effectiveness of several extensions to the opinion extractor. 
- For full details of what is required for the coursework see the coursework specification document which can be downloaded from the module website.

Have a look at the section **Tips for de-bugging and exploration** (see below). 

As you are investigating how well your `opinion_extractor` is working, you will want to view your opinion extractor's output for a substantial number of sentences. You might prefer to print your output to a file. 

The code in the next cell illustrates how this below.
- Note that you may wish to replace `"savefile.txt"` with a different path.

In [None]:
save_file_path = r"savefile.txt" # Set this to the location of the file you wish to create/overwrite with the saved output.

# This is a "with statement", it invokes a context manager, which handles the opening and closing of resources (like files)
with open(save_file_path, "w") as save_file:  # The 'w' says that we want to write to the file            
    for parsed_review in parsed_reviews:
        for sentence in parsed_review.sents:
            for aspect_token in aspect_words:
                opinions = opinion_extractor(aspect_token,sentence)
                if opinions:
                    save_file.write("--- Sentence ---\n{0}\nOpinions on {1}\n{2}\n".
                                    format(sentence,aspect_token,opinions))


## Required extensions
There are 5 required extensions:
1. Adjectivel modification
2. Adjectives linked by copulae
3. Adverbial modifiers
4. Negation
5. Conjunction

In addition there are a number of optional extensions (see below).

### Example test set
In addition to testing out your opinion extractor on the DVD review dataset, we will also be looking at a very small set of examples that illustrate cases of each of the extensions that we will be making. 

Run the following cell to load up these sentences.
- Each example is a three tuple of the form  
`(<example_name>,<sentence>,<set of opinion words>)`

In [32]:
core = [("A.1","It has an exciting fresh plot.",set(["fresh", "exciting"])),
        ("B.1","The plot was dull.",set(["dull"])),
        ("C.1","It has an excessively dull plot.",set(["excessively-dull"])),
        ("C.2","The plot was excessively dull.",set(["excessively-dull"])),
        ("D.1","The plot wasn't dull.",set(["not-dull"])),
        ("D.2","It wasn't an exciting fresh plot.",set(["not-exciting", "not-fresh"])),
        ("D.3","The plot wasn't excessively dull.",set(["not-excessively-dull"])),
        ("E.1","The plot was cheesy, but fun and inspiring.",set(["cheesy", "fun", "inspiring"])),
        ("E.2","The plot was really cheesy and not particularly special.",set(["really-cheesy", "not-particularly-special"]))
       ]

optional = [("A","The script and plot are utterly excellent.",set(["utterly-excellent"])),
            ("B","The script and plot were unoriginal and boring.",set(["unoriginal", "boring"])),
            ("C","The plot wasn't lacking.",set(["not-lacking"])),
            ("D","The plot is full of holes.",set(["full-of-holes"])),
            ("E","There was no logical plot to this story.",set(["no-logical"])),
            ("F","I loved the plot.",set(["loved"])),
            ("G","I didn't mind the plot.",set(["not-mind"]))
           ]

### Problem with displaCy

You have been encouraged to use the dependency tree visualisation tool found [here](https://demos.explosion.ai/displacy).

However, it turns out that what it says on the displaCy site is incorrect, and the displaCy visualiser uses a *different* parsing model from the one that is installed by default with spaCy, and as a result, there are cases where the trees are different.

It is possible to set up spaCy to use the same model as displaCy by doing the following:

```
import spacy
nlp = spacy.load('en')
python -m spacy download en_core_web_md
nlp2 = spacy.load('en_core_web_md')
```

However, I would recommend that you stick with the spaCy default model, and check for differences. In the cell below, you will find code that can be used to show the structure of the dependency tree produced by spaCy, and it shows an example where the tree differs from that shown by the displaCy visualiser.


In [111]:
def print_dep_tree(node,indent):
    print(indent,node)
    indent = indent + "  "
    for child in node.children:
        print_dep_tree(child,indent)

sent = "The plot was cheesy, but fun and inspiring."
parsed_sents = nlp(sent)
parsed_sent = next(parsed_sents.sents)
print_dep_tree(parsed_sent.root,"")


 was
   plot
     The
   cheesy
   ,
   but
   fun
     and
     inspiring
   .


## Extension A: Adjectival modification
In this section, we are interested in adjectival modification. This is when we have a noun like "*dog*" or "*plot*", and there are one or more adjectives which are specifying the characteristics of that noun. E.g. "*big brown dog*" or "*exciting fresh plot*" ("*big*" and "*brown*" are both adjectivally modifying "*dog*").

The dependency relation we use to show this relationship is `amod`.

Write a version of the opinion extraction function which, when given sentences such as the example below containing an aspects token (e.g. "*plot*"), uses the `amod` relations to extract a list of the adjectival modifiers of the aspect token (e.g. the two words "*exciting*" and "*fresh*" in this case).

**Core Example A.1**: "*It has an exciting fresh plot.*" should produce "*fresh*", "*exciting*"

The dependency trees for this sentence is shown below.

![Extension 1 example](./img/amod_example.png)

In [None]:
def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.text == aspect_token:
            for child in token.children:
                if child.dep_ == 'amod':
                    opinions.append(child.text)
            
    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
results = [] 
for parsed_review in parsed_reviews:
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))

show_results(results,"plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")       

### Exercise
Adapt your opinion extractor so that it just finds adjectival modifiers, and apply it to examples test set in order to check that your function is working as required.
- Note, your opinion extractor should only get the first example, A.1, correct.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [70]:
sents = [nlp(entry[1]) for entry in core]
# print(sents)

for index,sen in enumerate(sents):
    opinions = opinion_extractor("plot",sen)
    print(opinions)
    if set(opinions) == core[index][2]:
        print('Test set {} passed\n'.format(index+1))

['exciting', 'fresh']
Test set 1 passed

['dull']
Test set 2 passed

['dull']
['dull']
['dull']
['exciting', 'fresh']
['dull']
['cheesy']
['cheesy']


## Extension B: Adjectives linked by copulae
In this section, we are interested in adjectives which are linked to our aspect term via the copula (conjugations of "*to be*": "*is*", "*was*", "*will be*", etc.). 

Notice that if we were only looking for `amod` relations, we'd completely miss the word "*dull*" in the dependency tree shown below.

Your opinion extraction function when given a sentences like the example below containing the aspect token "*plot*", should use appropriate dependency relations to output the term opinion word "*dull*".

**Core Example B.1**: "*The plot was dull.*" should produce "*dull*"

The dependency trees for this sentence is shown below.

![Extension 2 example](./img/copula_example.png)

In [71]:
def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.text == aspect_token:
            for child in token.children:
                if child.dep_ == 'amod':
                    opinions.append(child.text)
            if token.dep_ == 'nsubj':
                filterList = set(filter(lambda x: x.dep_ == 'acomp', token.head.children))
#                 print(list(filterList))
                for x in filterList:
                    opinions.append(x.text)
#     print(opinions)
    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
results = [] 
for parsed_review in parsed_reviews:
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))

show_results(results,"plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")       

Results for aspect word 'plot'

Sentence:
	However,theplotwaspredictable.
Opinion of 'plot':
	 'predictable'


Sentence:
	Furthermore,theplotissopredictablethatitmadethemoviedrag.
Opinion of 'plot':
	 'predictable'


Sentence:
	well,that'stheusualpredictableromanceplot.
Opinion of 'plot':
	 'usual', 'predictable'


Sentence:
	TheplotwasOKbutitwouldhavebeenmuchbetteriftheyhadeitherstucktoamoreauthenticrepresentationofJapanesecultureorkeptthestoryinthestates.
Opinion of 'plot':
	 'OK'


Sentence:
	Theperformancesarethinanduneven,theplotisinconsistent,andthedialoguesoundslikeitwaspulledfromfortunecookiesmadebythecastofFriends.
Opinion of 'plot':
	 'inconsistent'


Sentence:
	AlthoughthismoviewantedtobeinthesameraceandqualityofthemovieCLUE,itfailstorealizethattheplotandcharactersareincrediblyriculousandnotfunnyhahabutfunnystupid.
Opinion of 'plot':
	 'riculous'


Sentence:
	Theactingishorrible,theplotishorrible,thefilmingishorrible...
Opinion of 'plot':
	 'horrible'


Sentence:
	Its"thrill

### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on examples, A.1 and B.1

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

## Extension C: Adverbial modifiers
If you used the extractor you have built so far on the example sentences below, it will only find the opinion "*dull*". It would not recover an indication of the strength of the opinion. Adverbs like "*excessively*" elaborate on the adjectives that they modify in adverbial modification relations.

The relevant dependency relation we use to show this relationship is "*advmod*".

Your opinion extraction function when given a sentence like those below containing the aspect token "*plot*", should use the `advmod` relation to output features like "*excessively-dull*" 
- If you have an adjective token in a variable `adj_token`, and an adverb in a variable `adv_token` then you could create this feature like this: `adv_token.form + "-" + adj_token.form`.
- If you have a list of strings, you can use python's `join` function to concatenate them into a single string. The following would join the strings together, placing a `"-"` between each:  
`joined_string = "-".join(listofstrings)`


**Core Example C.1**: "*It has an excessively dull plot.*" should produce "*excessively-dull*"  
**Core Example C.2**: "*The plot was excessively dull.*" should produce "*excessively-dull*"

The dependency trees for these sentences are shown below.

![Extension 3 example](./img/advmod_example_1.png)

![Extension 3 example 2](./img/advmod_example_2.png)

In [75]:
def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.text == aspect_token:
            for child in token.children:
                if child.dep_ == 'amod':
                    advmod = build_advmod_string(child)
#                     for subchild in child.children:
#                         if subchild.dep_ == 'advmod':
#                             advmod += subchild.text + '-'
                    opinions.append(advmod+child.text)
            if token.dep_ == 'nsubj':
                filterList = set(filter(lambda x: x.dep_ == 'acomp', token.head.children))
#                 print(list(filterList))
                for x in filterList:
                    advmod = build_advmod_string(x)
                    opinions.append(advmod+x.text)
#     print(opinions)
    return opinions

def build_advmod_string(token):
    advmod = ''
    for child in token.children:
        if child.dep_ == 'advmod':
            advmod += child.text + '-'
    return advmod

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
results = [] 
for parsed_review in parsed_reviews:
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))

show_results(results,"plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")       

Results for aspect word 'plot'

Sentence:
	However,theplotwaspredictable.
Opinion of 'plot':
	 'predictable'


Sentence:
	Furthermore,theplotissopredictablethatitmadethemoviedrag.
Opinion of 'plot':
	 'so-predictable'


Sentence:
	well,that'stheusualpredictableromanceplot.
Opinion of 'plot':
	 'usual', 'predictable'


Sentence:
	TheplotwasOKbutitwouldhavebeenmuchbetteriftheyhadeitherstucktoamoreauthenticrepresentationofJapanesecultureorkeptthestoryinthestates.
Opinion of 'plot':
	 'OK'


Sentence:
	Theperformancesarethinanduneven,theplotisinconsistent,andthedialoguesoundslikeitwaspulledfromfortunecookiesmadebythecastofFriends.
Opinion of 'plot':
	 'inconsistent'


Sentence:
	AlthoughthismoviewantedtobeinthesameraceandqualityofthemovieCLUE,itfailstorealizethattheplotandcharactersareincrediblyriculousandnotfunnyhahabutfunnystupid.
Opinion of 'plot':
	 'incredibly-riculous'


Sentence:
	Theactingishorrible,theplotishorrible,thefilmingishorrible...
Opinion of 'plot':
	 'horrible'


Sentenc

In [76]:
sents = [nlp(entry[1]) for entry in core]
# print(sents)

for index,sen in enumerate(sents):
    opinions = opinion_extractor("plot",sen)
    print(opinions)
    if set(opinions) == core[index][2]:
        print('Test set {} passed\n'.format(index+1))

['exciting', 'fresh']
Test set 1 passed

['dull']
Test set 2 passed

['excessively-dull']
Test set 3 passed

['excessively-dull']
Test set 4 passed

['dull']
['exciting', 'fresh']
['excessively-dull']
['cheesy']
['really-cheesy']


### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on examples A.1, B.1, C.1, and C.2.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

## Extension D: Negation
Look at the tree below; it is an example of an adjective linked by a copula. Your existing opinion extractor would extract "dull". However, notice that the example is saying that the plot was not dull! This is an example of the use of negation.

The dependency relation we use to show this relationship is `neg`.

Your opinion extraction function when given sentences like those below containing the aspect token "*plot*", should use the `neg` relation to output features like "*not-dull*". If you have an adjective token called `"token"`, then you could create this feature like this: `"not-" + token.form`.

**Core Example D.1**: "*The plot wasn't dull.*" should produce "*not-dull*"  
**Core Example D.2**: "*It wasn't an exciting fresh plot.*" should produce "*not-exciting*", "*not-fresh*"  
**Core Example D.3**: "*The plot wasn't excessively dull.*" should produce "*not-excessively-dull*"

The dependency trees for these sentences are shown below.

![Extension 4 example 1](./img/negation_example_1.png)

![Extension 4 example 2](./img/negation_example_2.png)

![Extension 4 example 3](./img/negation_example_3.png)

In [84]:
def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.text == aspect_token:
            negation = build_negation_string(token)
            
            for child in token.children:
                if child.dep_ == 'amod':
                    advmod = build_advmod_string(child)
#                     for subchild in child.children:
#                         if subchild.dep_ == 'advmod':
#                             advmod += subchild.text + '-'
                    opinions.append(negation+advmod+child.text)
            if token.dep_ == 'nsubj':
                filterList = set(filter(lambda x: x.dep_ == 'acomp', token.head.children))
#                 print(list(filterList))
                for x in filterList:
                    advmod = build_advmod_string(x)
                    opinions.append(negation+advmod+x.text)
#     print(opinions)
    return opinions

def build_advmod_string(token):
    advmod = ''
    for child in token.children:
        if child.dep_ == 'advmod':
            advmod += child.text + '-'
    return advmod

def build_negation_string(token):
    negation = ''
    if token.dep_ == 'nsubj' or token.dep_ == 'attr':
        for child in token.head.children:
#             print(child.dep_ + '****\n')
            if child.dep_ == 'neg':
                negation += 'not-'
                break
    return negation

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
results = [] 
for parsed_review in parsed_reviews:
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))

show_results(results,"plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")       

Results for aspect word 'plot'

Sentence:
	However,theplotwaspredictable.
Opinion of 'plot':
	 'predictable'


Sentence:
	Furthermore,theplotissopredictablethatitmadethemoviedrag.
Opinion of 'plot':
	 'so-predictable'


Sentence:
	well,that'stheusualpredictableromanceplot.
Opinion of 'plot':
	 'usual', 'predictable'


Sentence:
	TheplotwasOKbutitwouldhavebeenmuchbetteriftheyhadeitherstucktoamoreauthenticrepresentationofJapanesecultureorkeptthestoryinthestates.
Opinion of 'plot':
	 'OK'


Sentence:
	Theperformancesarethinanduneven,theplotisinconsistent,andthedialoguesoundslikeitwaspulledfromfortunecookiesmadebythecastofFriends.
Opinion of 'plot':
	 'inconsistent'


Sentence:
	AlthoughthismoviewantedtobeinthesameraceandqualityofthemovieCLUE,itfailstorealizethattheplotandcharactersareincrediblyriculousandnotfunnyhahabutfunnystupid.
Opinion of 'plot':
	 'incredibly-riculous'


Sentence:
	Theactingishorrible,theplotishorrible,thefilmingishorrible...
Opinion of 'plot':
	 'horrible'


Sentenc

In [85]:
sents = [nlp(entry[1]) for entry in core]
# print(sents)

for index,sen in enumerate(sents):
    opinions = opinion_extractor("plot",sen)
    print(opinions)
    if set(opinions) == core[index][2]:
        print('Test set {} passed\n'.format(index+1))

['exciting', 'fresh']
Test set 1 passed

['dull']
Test set 2 passed

['excessively-dull']
Test set 3 passed

['excessively-dull']
Test set 4 passed

['not-dull']
Test set 5 passed

['not-exciting', 'not-fresh']
Test set 6 passed

['not-excessively-dull']
Test set 7 passed

['cheesy']
['really-cheesy']


### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on examples A.1, B.1, C.1, C.2, D.1, D.2, and D.3.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

## Extension E: Conjunction
If you used your existing extractor on the tree below, it would only extract "*cheesy*". However, "*fun*" and "*inspiring*" are both conjoined with "*cheesy*"; this means that they all apply to the subject ("*plot*").

This conjunction relation is shown via the `conj` dependency. Note that words other than adjectives can be the conjuncts. You could investigate whether this is a problem.

**Core Example E.1**: "*The plot was cheesy but fun and inspiring.*" should produce "*cheesy*", "*fun*", "*inspiring*"  
**Core Example E.2**: "*The plot was really cheesy and not particularly special.*" should produce "*really-cheesy*", "*not-particularly-special*"

The dependency trees for these sentences are shown below.

![Extension 5 example](./img/conj_example_1.png)

![Extension 5 example 2](./img/conj_example_2.png)

In [114]:
def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.text == aspect_token:
            negation = build_negation_string(token)
            
            for child in token.children:
                if child.dep_ == 'amod':
                    advmod = build_advmod_string(child)
                    opinions.append(negation+advmod+child.text)
            if token.dep_ == 'nsubj':
                filterList = set(filter(lambda x: x.dep_ == 'acomp' or x.dep_ == 'conj', token.head.children))
#                 print(list(filterList))
                for x in filterList: 
                    advmod = build_advmod_string(x)
                    opinions.append(negation+advmod+x.text)
                    add_conj_opinions(opinions,x,negation)
#     print(opinions)
    return opinions

def build_advmod_string(token):
    advmod = ''
    notString = ''
    for child in token.children:
        if child.dep_ == 'advmod':
            advmod += child.text + '-'
        if child.dep_ == 'neg':
            notString = 'not-'
    return notString + advmod

def build_negation_string(token):
    negation = ''
    if token.dep_ == 'nsubj' or token.dep_ == 'attr':
        for child in token.head.children:
            if child.dep_ == 'neg':
                negation += 'not-'
                break
    return negation

def add_conj_opinions(opinions, token, negation):
    for child in token.children:
        if child.dep_ == 'conj':
            advmod = build_advmod_string(child)
            opinions.append(negation+advmod+child.text)
            add_conj_opinions(opinions, child, negation)

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
results = [] 
for parsed_review in parsed_reviews:
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))

show_results(results,"plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")       

Results for aspect word 'plot'

Sentence:
	However,theplotwaspredictable.
Opinion of 'plot':
	 'predictable'


Sentence:
	Furthermore,theplotissopredictablethatitmadethemoviedrag.
Opinion of 'plot':
	 'so-predictable'


Sentence:
	well,that'stheusualpredictableromanceplot.
Opinion of 'plot':
	 'usual', 'predictable'


Sentence:
	TheplotwasOKbutitwouldhavebeenmuchbetteriftheyhadeitherstucktoamoreauthenticrepresentationofJapanesecultureorkeptthestoryinthestates.
Opinion of 'plot':
	 'been', 'OK'


Sentence:
	Theperformancesarethinanduneven,theplotisinconsistent,andthedialoguesoundslikeitwaspulledfromfortunecookiesmadebythecastofFriends.
Opinion of 'plot':
	 'sounds', 'inconsistent'


Sentence:
	Theplotisuninvolvingandincoherent,thecinematographyiscrampedandcompletelylackinginstyle,themusicisweakduringatimethatspawnedsomanymemorablesoundtracksandthedirectionlacksanyfocuswhatsoever.
Opinion of 'plot':
	 'incoherent', 'cramped', 'completely-lacking'


Sentence:
	Althoughthismoviewantedtobei

In [115]:
sents = [nlp(entry[1]) for entry in core]
# print(sents)

for index,sen in enumerate(sents):
    opinions = opinion_extractor("plot",sen)
    print(opinions)
    if set(opinions) == core[index][2]:
        print('Test set {} passed\n'.format(index+1))

['exciting', 'fresh']
Test set 1 passed

['dull']
Test set 2 passed

['excessively-dull']
Test set 3 passed

['excessively-dull']
Test set 4 passed

['not-dull']
Test set 5 passed

['not-exciting', 'not-fresh']
Test set 6 passed

['not-excessively-dull']
Test set 7 passed

['cheesy', 'fun', 'inspiring']
Test set 8 passed

['really-cheesy', 'not-particularly-special']
Test set 9 passed



### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on all of the required examples.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

## Additional extensions

This section presents some examples on which your current opinion extractor will fail. In all of the examples below, "plot" is the aspect token.

You are not required to extend your opinion extractor to handle these cases.

**Optional Example A**: "The script and plot are utterly excellent." produces "utterly-excellent"  
**Optional Example B**: "The script and plot were unoriginal and boring." produces "unoriginal", "boring"  
**Optional Example C**: "The plot wasn't lacking." produces "not-lacking"  
**Optional Example D**: "The plot is full of holes." produces "full-of-holes"  
**Optional Example E**: "There was no logical plot to this story." produces "no-logical"  
**Optional Example F**: "I loved the plot." produces "loved"  
**Optional Example G**: "I didn't mind the plot." produces "not-mind"

The dependency trees for these sentences are shown below.

![Extension example 2](./img/additional_example_1.png)

![Extension example 2](./img/additional_example_2.png)

![Extension example 2](./img/additional_example_3.png)

![Extension example 2](./img/additional_example_4.png)

![Extension example 2](./img/additional_example_5.png)

![Extension example 2](./img/additional_example_6.png)

![Extension example 2](./img/additional_example_7.png)

## Tips for de-bugging and exploration
When you will be assessing whether your opinion extractor has been effective when analysing a given sentence. Before you look at what the dependency parser says, read the sentence carefully and determine for yourself the scope of the words. Consider the following sentence.

"This film has excellent characters and an intriguing and engaging plot."

It should be obvious to you that here the plot is described as both "intriguing" and "engaging". However, "excellent" is only used to describe the cinematography.

If the parser suggests a structure which implies that plot is also described by "excellent" (for example), something has gone wrong.### Dependency tree visualisation tool
You may find it useful to use the dependency tree visualisation tool found [here](https://demos.explosion.ai/displacy).

You can copy and past example sentences from the DVD review data to get a good sense of what the dependency parse looks like.

### Outputing results only from sentences relevant to the current task
You will find that your output is dominated by examples of adjectival modification and adjectives linked via the copula. This means that when you add a new function (extensions 3-5) it will be difficult to determine the impact of that new functionality.

One way to solve this problem, is to (temporarily) output only those features produced by the new functionality.

For example, imagine you have just completed extensions 1 and 2. Next, you write code that adds the adverbial features (extension 3). When assessing how well your code is working, let your extractor only extract the "new" adverb features.

There are 2 easy ways to achieve this:

1. Comment out any extractor code that produces features that you're not currently interested in. Or
2. Introduce a boolean variable, which you only set to `True` when you have extracted the feature that you are interested in. Then always ouput an empty list if the variable is `False`, otherwise output the full opinion list.