# Topic 6: Opinion Extraction

## Preliminaries 
Run this cell.

In [1]:
import sys
sys.path.append(r'\\ad.susx.ac.uk\ITS\TeachingResources\Departments\Informatics\LanguageEngineering\resources')
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import collections
from collections import defaultdict,Counter
from itertools import zip_longest
from IPython.display import display
from random import seed
get_ipython().magic('matplotlib inline')
import random
import math
import matplotlib.pylab as pylab
%matplotlib inline
params = {'legend.fontsize': 'large',
          'figure.figsize': (15, 5),
         'axes.labelsize': 'large',
         'axes.titlesize':'large',
         'xtick.labelsize':'large',
         'ytick.labelsize':'large'}
pylab.rcParams.update(params)
from pylab import rcParams
from operator import itemgetter, attrgetter, methodcaller
import matplotlib.pyplot as plt
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer
import seaborn as sns
import csv

## Overview
In this topic you will be looking at ways to extract opinion bearing words from DVD Amazon reviews. The goal is to find words that describe particular aspects of the film being reviewed. The specific aspects of films that we will be considering are: the **plot**, the **characters**, the **cinematography** and the **dialogue**. 

We are, in other words, interested in finding all of those words in a review that express the reviewers opinion about one of these aspects of the film. The idea is that this will provide a fine-grained characterisation of the opinion being expressed by the author of the review. We will refer to the words we are looking for as **opinion words**, and refer to the words used for particular aspects of the review as **aspect words**.

Following on from previous topic's material on dependency parsing, you will use the spaCy's output as the basis for identifying opinion words. This is based on the assumption that the opinion words we are looking for are words that occur in a sentence in the review in a particular (dependency) relationship to one of our aspect words (plot, characters, cinematography and dialogue).

For example, the opinion word "*amazing*" might be found because it is used in a sentence where it is an adjective modifying the aspect word "*plot*", as in the sentence "*I thought it had an amazing plot.*".

### Exercise
Run the cell below to set up spaCy and load in a corpus of Amazon DVD reviews.

In [2]:
import spacy
from sussex_nltk.corpus_readers import AmazonReviewCorpusReader

nlp = spacy.load('en')
dvd_reviews = [review for review in AmazonReviewCorpusReader().category("dvd").raw()]
print("The dvd review dataset contains {} reviews".format(len(dvd_reviews)))

Sussex NLTK root directory is \\ad.susx.ac.uk\ITS\TeachingResources\Departments\Informatics\LanguageEngineering\resources
The dvd review dataset contains 5491 reviews


### Exercise
We are now going to create code that finds all dependents and all heads of a list of aspect words. This code will be similar to the final exercise of the previous topic.

In the blank cell below, write code that takes a set of aspect words ("*plot*","*characters*","*cinematography*", and "*dialogue*") as an argument, and produces a dictionary that maps each aspect word to a dictionary with two keys: `deps` and `heads`, where the `deps` key maps to the list of dependents of the aspect word and `heads` maps to the list of heads of the aspect word. 

- Note that our aspect words are nouns.
- Note that our aspect words are not all lemmas.

Display your results for dependents and heads in separate tables. For example, suppose we have the following:
- `aspect_words` is a list of the aspect words we we interested in,
- `all_heads` is a list of lists of the aspect word's heads, and
- `all_dependents` is a list of lists of the aspect word's dependents


Given this, this code will display the results:

```df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Heads")
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
display(df_heads)
print("Dependents")
display(df_deps)
```

Run your code and look at the output.

In [3]:
arcr = AmazonReviewCorpusReader()

aspect_words = ["plot", "character", "cinematography", "dialogue"]
#all_heads = 
print(token.orth_)

NameError: name 'token' is not defined

In [None]:
# %load solutions/all_heads_and_deps
aspect_words = ["plot","characters","cinematography","dialogue"]
linked_words = defaultdict(lambda: defaultdict(list)) 
for review in dvd_reviews:
    parsed_review = nlp(review)
    for token in parsed_review:
        if token.pos_ == 'NOUN' and token.orth_ in aspect_words:
            linked_words[token.orth_]["heads"].append(token.head.orth_)
            for child in token.children:
                linked_words[token.orth_]["deps"].append(child.orth_)
all_heads = [linked_words[word]["heads"] for word in aspect_words]
all_dependents = [linked_words[word]["deps"] for word in aspect_words]
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print()
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Dependents")
display(df_deps)


Our goal here is to find out what the reviewer is saying about the different aspects we are interested in. 
As you can see, not all of the words being returned are informative. In the next two exercises we look at how to (partially) address this.

### Exercise
Make a copy of your solution to the last exercise and position it below this cell.

Then adapt the code so that instead of creating lists containing just the tokens, we record the following other features:
- For the heads:
 - the lemma of the head,
 - the part of speech of the head, 
 - the dependency between the head and the aspect word
- For the dependencies:
 - the lemma of the dependent,
 - the port of speech of the dependent,
 - the dependency between the dependent and the aspect word
 
What is the point of this?

The idea is that with this additional information, we will be in a better position to work out what makes heads and dependents uninteresting. For example, we might want to remove punctuation.

Once you have made the necessary changes to the code, run the code on the dataset and look at the outputs to see if you can figure out ways of spotting the uninteresting entries in the table.
 

In [None]:
aspect_words = ["plot","characters","cinematography","dialogue"]
linked_words = defaultdict(lambda: defaultdict(list)) 
for review in dvd_reviews:
    parsed_review = nlp(review)
    for token in parsed_review:
        if token.pos_ == 'NOUN' and token.orth_ in aspect_words:
            linked_words[token.orth_]["heads"].append(token.head.orth_)
            for child in token.children:
                linked_words[token.orth_]["deps"].append(child.orth_)
all_heads = [linked_words[word]["heads"] for word in aspect_words]
all_dependents = [linked_words[word]["deps"] for word in aspect_words]
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print()
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Dependents")
display(df_deps)

In [None]:
# %load solutions/all_heads_deps_features
aspect_words = ["plot","characters","cinematography","dialogue"]
linked_words = defaultdict(lambda: defaultdict(list)) 
for review in dvd_reviews:
    parsed_review = nlp(review)
    for token in parsed_review:
        if token.pos_ == 'NOUN' and token.orth_ in aspect_words:
            linked_words[token.orth_]["heads"].append((token.head.orth_,token.head.lemma_,token.head.pos_,token.dep_))
            for child in token.children:
                linked_words[token.orth_]["deps"].append((child.orth_,child.lemma_,child.pos_,child.dep_))
all_heads = [linked_words[word]["heads"] for word in aspect_words]
all_dependents = [linked_words[word]["deps"] for word in aspect_words]
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Dependents")
display(df_deps)


### Exercise
We are now ready to filter out some of the entries in the table that are uninteresting.
To do this, create the following lists:

```
unwanted_head_deps = []
unwanted_head_lemmas = []
unwanted_head_pos = []
unwanted_dependent_deps =[]
unwanted_dependent_pos = []
unwanted_dependent_lemmas = []
```

The idea is that you will manually add entries to these lists as you spot ways to eliminate uninteresting entries in the table.

Undertake the following steps:
1. Defined an initial version of these lists based on your observations of the tables produced in the previous exercise.
2. Adapt the code so that these lists are used to filter out unwanted tokens before they are added to your lists of heads and dependents. 
 - one option here is to define two functions: `uninteresting_head(token)` and `uninteresting_dependent(token)`.
3. Run the code to see how well it has improved.
4. Keep adding to the lists and re-running your code until you are satisfied that you can't improve them any more.

In [None]:
unwanted_head_deps = []
unwanted_head_lemmas = []
unwanted_head_pos = []
unwanted_dependent_deps =[]
unwanted_dependent_pos = []
unwanted_dependent_lemmas = []

In [None]:
# %load solutions/filtered_heads_deps
unwanted_head_deps = ['pobj', 'attr', 'pobj']
unwanted_head_lemmas = ['be','have','do']
unwanted_head_pos = ['PROPN']
unwanted_dependent_deps = ['det', 'predet','nummod', 'cc', 'prep', 'punct', 'case']
unwanted_dependent_pos = ['SPACE','PROPN']
unwanted_dependent_lemmas = ['-PRON-','do','be']

def uninteresting_head(token):
    return token.dep_ in unwanted_head_deps or token.head.lemma_ in unwanted_head_lemmas

def uninteresting_dep(token):
    return token.dep_ in unwanted_dependent_deps or token.pos_ in unwanted_dependent_pos or token.lemma_ in unwanted_dependent_lemmas

aspect_words = ["plot","characters","cinematography","dialogue"]
linked_words = defaultdict(lambda: defaultdict(list)) 
for review in dvd_reviews:
    parsed_review = nlp(review)
    for token in parsed_review:
        if token.pos_ == 'NOUN' and token.orth_ in aspect_words:
            if not uninteresting_head(token): 
                linked_words[token.orth_]["heads"].append((token.head.orth_,token.head.pos_,token.dep_,token.head.lemma_))
            for child in token.children:
                if not uninteresting_dep(child):
                    linked_words[token.orth_]["deps"].append((child.orth_,child.pos_,child.dep_,child.lemma_))
all_heads = [linked_words[word]["heads"] for word in aspect_words]
all_dependents = [linked_words[word]["deps"] for word in aspect_words]
df_heads = pd.DataFrame(list(zip_longest(*all_heads)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Heads")
display(df_heads)
df_deps = pd.DataFrame(list(zip_longest(*all_dependents)),
                columns = aspect_words).applymap(lambda x: '' if x == None else x)
print("Dependents")
display(df_deps)


## Opinion extractor
Over the remaining exercises in this notebook, we are now going to create and subsequently refine a function called `opinion_extractor`.

`opinion_extractor` takes two arguments:
- `aspect_token`: an aspect token, such as "*plot*", "*characters*", "*cinematography*" and "*dialogue*".
- `parsed_sentence`: a parsed sentence
The output should be a list of extracted opinions, i.e. a list of tokens that tell us something about what the reviewer is saying (in `parsed_sentence`) about `aspect_token`. 

### Exercise
To begin, in the cell below, complete an initial verison of the function `opinion_extractor` that returns a list of all dependents of the aspect token. 
- This list will be empty when there are no opinions relating `aspect_token` to extract in `parsed_sentence`.

In [9]:
def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:
                if child.pos_ == 'ADJ':
                    opinions.append(child.orth_)

    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
   
results = [] 
for review in dvd_reviews[:200]:  # while we are developing our code we will just look at the first 200 reviews.
    parsed_review = nlp(review)
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))

show_results(results,"plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")       
     

Results for aspect word 'plot'

Sentence:
	& then Owen Wilson's character fall in love with Rachel's McAdams character and well, that's the usual predictable romance plot.
Opinion of 'plot':
	 'usual', 'predictable'


Sentence:
	The main plot is very thin (a CIA agent is ordered to kill an oil prince, gets caught and then warns the prince (why?)) and therefore some elements were added to make the movie more interesting.
Opinion of 'plot':
	 'main'


Sentence:
	With their childest voices, they shouldn't be used as a main plot of the story
Opinion of 'plot':
	 'main'


Results for aspect word 'characters'

Sentence:
	The characters talk about the relative intelligence of the other characters or how the other characters can't possibly know what is going on.
Opinion of 'characters':
	 'other', 'other'


Sentence:
	There are only two Black characters, both sentimentally loyal to their Southernness and their masters; otherwise, slavery is beside the point.
Opinion of 'characters':
	 'Black'


In [10]:
# %load solutions/opinion_extractor_initial
def opinion_extractor(aspect_token,parsed_sentence):
    opinions = []
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:
                opinions.append(child.orth_)
    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")
                
aspect_words = ["plot","characters","cinematography","dialogue"]
   
results = [] 
for review in dvd_reviews[:200]:
    parsed_review = nlp(review)
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))
                
show_results(results,"plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")       


Results for aspect word 'plot'

Sentence:
	However, the plot was predictable.
Opinion of 'plot':
	 'the'


Sentence:
	I'm still not sure what Jody Foster's character brings to the plot.
Opinion of 'plot':
	 'the'


Sentence:
	Furthermore, the plot is so predictable that it made the movie drag.
Opinion of 'plot':
	 'the'


Sentence:
	& then Owen Wilson's character fall in love with Rachel's McAdams character and well, that's the usual predictable romance plot.
Opinion of 'plot':
	 'the', 'usual', 'predictable', 'romance'


Sentence:
	This movie didn't do anything for me...Didn't care for the characters or the plot.
Opinion of 'plot':
	 'the'


Sentence:
	Not enough weirdness, tried too much for the plot.
Opinion of 'plot':
	 'the'


Sentence:
	Given my dislike of how the Japanese culture was handled in this film, it left me finding the plot irrelevant.
Opinion of 'plot':
	 'the'


Sentence:
	The plot was OK but it would have been much better if they had either stuck to a more authentic 

'


Sentence:
	It just added to the disbelief of the characters (stay Goldblum), the storyline, and the ultimate letdown of this much anticipated sequel.
Opinion of 'characters':
	 'the', '(', 'stay', ')', ',', 'storyline'


Sentence:
	All the characters were severely and tragically 2-dimensional with none of the depth you get from reading the graphic novel and very little attention was paid to developing better dialogue suitable for a big screen experience.
Opinion of 'characters':
	 'All', 'the'


Sentence:
	The movie gets boring rather fast as Bette and Woody are the only ones we get to see as there are very few supporting characters.
Opinion of 'characters':
	 'few', 'supporting'


Sentence:
	The characters are presented in a confusing fashion with only snippets and glimpses into each person's life.
Opinion of 'characters':
	 'The'


Sentence:
	These disjointed glimpses don't give the viewer time or depth to care what happens to any of the spoiled rotten characters.
Opinion of 'cha

In [13]:

core = ["It has an exciting fresh plot.",
        "The plot was dull.",
        "It has an excessively dull plot.",
        "The plot was excessively dull.",
        "The plot wasn't dull.",
        "It wasn't an exciting fresh plot.",
        "The plot wasn't excessively dull.",
        "The plot was cheesy, but fun and inspiring.",
        "The plot was really cheesy and not particularly special."
       ]


def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:            
#                if child.dep_== 'amod':                        #Extension A
                if child.pos_ == 'ADJ' and child.dep_ == 'amod':#Extention B
                    opinions.append(child.orth_)

    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
   


results = [] 

for review in core:
    parsed_review = nlp(review)
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))
        
show_results(results, "plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")

NameError: name 'nlp' is not defined

### Exercise
Look at the output that your opinion extractor is producing.

As you can see, it isn't very good. In the sections below, you will be looking at a variety of ways in which you can refine the `opinion_extractor` function to improve its performance. 

The work you will do in the remainder of this notebook forms part of the first assessed coursework for this module. This will involve developing and assessing the effectiveness of several extensions to the opinion extractor. 
- For full details of what is required for the coursework see the coursework specification document which can be downloaded from the module website.

Have a look at the section **Tips for de-bugging and exploration** (see below). 

As you are investigating how well your `opinion_extractor` is working, you will want to view your opinion extractor's output for a substantial number of sentences. You might prefer to print your output to a file. 

The code in the next cell illustrates how this below.
- Note that you may wish to replace `"savefile.txt"` with a different path.

In [14]:
save_file_path = r"savefile.txt" # Set this to the location of the file you wish to create/overwrite with the saved output.

# This is a "with statement", it invokes a context manager, which handles the opening and closing of resources (like files)
with open(save_file_path, "w") as save_file:  # The 'w' says that we want to write to the file            
    for review in dvd_reviews[:200]:
        parsed_review = nlp(review)
        for sentence in parsed_review.sents:
            for aspect_token in aspect_words:
                opinions = opinion_extractor(aspect_token,sentence)
                if opinions:
                    save_file.write("--- Sentence ---\n{0}\nOpinions on {1}\n{2}\n".
                                    format(sentence,aspect_token,opinions))


NameError: name 'dvd_reviews' is not defined

## Required extensions
There are 5 required extensions:
1. Adjectivel modification
2. Adjectives linked by copulae
3. Adverbial modifiers
4. Negation
5. Conjunction

In addition there are a number of optional extensions (see below).

### Example test set
In addition to testing out your opinion extractor on the DVD review dataset, we will also be looking at a very small set of examples that illustrate cases of each of the extensions that we will be making. 

Run the following cell to load up these sentences.
- Each example is a three tuple of the form  
`(<example_name>,<sentence>,<set of opinion words>)`

In [None]:
core = [("A.1","It has an exciting fresh plot.",set(["fresh", "exciting"])),
        ("B.1","The plot was dull.",set(["dull"])),
        ("C.1","It has an excessively dull plot.",set(["excessively-dull"])),
        ("C.2","The plot was excessively dull.",set(["excessively-dull"])),
        ("D.1","The plot wasn't dull.",set(["not-dull"])),
        ("D.2","It wasn't an exciting fresh plot.",set(["not-exciting", "not-fresh"])),
        ("D.3","The plot wasn't excessively dull.",set(["not-excessively-dull"])),
        ("E.1","The plot was cheesy, but fun and inspiring.",set(["cheesy", "fun", "inspiring"])),
        ("E.2","The plot was really cheesy and not particularly special.",set(["really-cheesy", "not-particularly-special"]))
       ]

optional = [("A","The script and plot are utterly excellent.",set(["utterly-excellent"])),
            ("B","The script and plot were unoriginal and boring.",set(["unoriginal", "boring"])),
            ("C","The plot wasn't lacking.",set(["not-lacking"])),
            ("D","The plot is full of holes.",set(["full-of-holes"])),
            ("E","There was no logical plot to this story.",set(["no-logical"])),
            ("F","I loved the plot.",set(["loved"])),
            ("G","I didn't mind the plot.",set(["not-mind"]))
           ]

## Extension A: Adjectival modification
In this section, we are interested in adjectival modification. This is when we have a noun like "*dog*" or "*plot*", and there are one or more adjectives which are specifying the characteristics of that noun. E.g. "*big brown dog*" or "*exciting fresh plot*" ("*big*" and "*brown*" are both adjectivally modifying "*dog*").

The dependency relation we use to show this relationship is `amod`.

Write a version of the opinion extraction function which, when given sentences such as the example below containing an aspects token (e.g. "*plot*"), uses the `amod` relations to extract a list of the adjectival modifiers of the aspect token (e.g. the two words "*exciting*" and "*fresh*" in this case).

**Core Example A.1**: "*It has an exciting fresh plot.*" should produce "*fresh*", "*exciting*"

The dependency trees for this sentence is shown below.

![Extension 1 example](./img/amod_example.png)

### Exercise
Adapt your opinion extractor so that it just finds adjectival modifiers, and apply it to examples test set in order to check that your function is working as required.
- Note, your opinion extractor should only get the first example, A.1, correct.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [4]:

core = ["It has an exciting fresh plot."]


def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:            
                if child.dep_== 'amod':             
                    opinions.append(child.orth_)

    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
   


results = [] 

for review in core:
    parsed_review = nlp(review)
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))
        
show_results(results, "plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")

Results for aspect word 'plot'

Sentence:
	It has an exciting fresh plot.
Opinion of 'plot':
	 'exciting', 'fresh'




## Extension B: Adjectives linked by copulae
In this section, we are interested in adjectives which are linked to our aspect term via the copula (conjugations of "*to be*": "*is*", "*was*", "*will be*", etc.). 

Notice that if we were only looking for `amod` relations, we'd completely miss the word "*dull*" in the dependency tree shown below.

Your opinion extraction function when given a sentences like the example below containing the aspect token "*plot*", should use appropriate dependency relations to output the term opinion word "*dull*".

**Core Example B.1**: "*The plot was dull.*" should produce "*dull*"

The dependency trees for this sentence is shown below.

![Extension 2 example](./img/copula_example.png)

### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on examples, A.1 and B.1

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [71]:

core = ["The plot was dull.",
        "It has an exciting fresh plot.",
       ]


def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        #Extension A
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:            
                if child.dep_== 'amod':             
                    opinions.append(child.orth_)
        #Extension B
        if token.pos_ == 'VERB':
            for child in token.children:
                if child.pos_ == 'ADJ':
                    opinions.append(child.orth_)

    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
   


results = [] 

for review in core:
    parsed_review = nlp(review)
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))
        
show_results(results, "plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")

Results for aspect word 'plot'

Sentence:
	The plot was dull.
Opinion of 'plot':
	 'dull'


Sentence:
	It has an exciting fresh plot.
Opinion of 'plot':
	 'exciting', 'fresh'




## Extension C: Adverbial modifiers
If you used the extractor you have built so far on the example sentences below, it will only find the opinion "*dull*". It would not recover an indication of the strength of the opinion. Adverbs like "*excessively*" elaborate on the adjectives that they modify in adverbial modification relations.

The relevant dependency relation we use to show this relationship is "*advmod*".

Your opinion extraction function when given a sentence like those below containing the aspect token "*plot*", should use the `advmod` relation to output features like "*excessively-dull*" 
- If you have an adjective token in a variable `adj_token`, and an adverb in a variable `adv_token` then you could create this feature like this: `adv_token.form + "-" + adj_token.form`.
- If you have a list of strings, you can use python's `join` function to concatenate them into a single string. The following would join the strings together, placing a `"-"` between each:  
`joined_string = "-".join(listofstrings)`


**Core Example C.1**: "*It has an excessively dull plot.*" should produce "*excessively-dull*"  
**Core Example C.2**: "*The plot was excessively dull.*" should produce "*excessively-dull*"

The dependency trees for these sentences are shown below.

![Extension 3 example](./img/advmod_example_1.png)

![Extension 3 example 2](./img/advmod_example_2.png)

### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on examples A.1, B.1, C.1, and C.2.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [5]:
#Extension C
core = ["It has an exciting fresh plot.",
        "The plot was dull.",
        "It has an excessively dull plot",
        "The plot was excessively dull."
       ]


def opinion_extractor(aspect_token,parsed_sentence):

    opinions = []
    for token in parsed_sentence:
        #Extension A
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:            
                if child.dep_== 'amod': 
                    adj_token = child.orth_
                    for adv_token in child.children:
                        if adv_token.dep_ == 'advmod':
                            adj_token = adv_token.orth_ + "-" + adj_token
                    opinions.append(adj_token)
        #Extension B
        if token.pos_ == 'VERB':
            for child in token.children:
                if child.pos_ == 'ADJ':
                    adj_token = child.orth_
                    for adv_token in child.children:
                            if adv_token.dep_ == 'advmod':
                                adj_token = adv_token.orth_ + "-" + adj_token
                    opinions.append(adj_token)
    
    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
   


results = [] 

for review in core:
    parsed_review = nlp(review)
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))
        
show_results(results, "plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")

Results for aspect word 'plot'

Sentence:
	It has an exciting fresh plot.
Opinion of 'plot':
	 'exciting', 'fresh'


Sentence:
	The plot was dull.
Opinion of 'plot':
	 'dull'


Sentence:
	It has an excessively dull plot
Opinion of 'plot':
	 'excessively-dull'


Sentence:
	The plot was excessively dull.
Opinion of 'plot':
	 'excessively-dull'




## Extension D: Negation
Look at the tree below; it is an example of an adjective linked by a copula. Your existing opinion extractor would extract "dull". However, notice that the example is saying that the plot was not dull! This is an example of the use of negation.

The dependency relation we use to show this relationship is `neg`.

Your opinion extraction function when given sentences like those below containing the aspect token "*plot*", should use the `neg` relation to output features like "*not-dull*". If you have an adjective token called `"token"`, then you could create this feature like this: `"not-" + token.form`.

**Core Example D.1**: "*The plot wasn't dull.*" should produce "*not-dull*"  
**Core Example D.2**: "*It wasn't an exciting fresh plot.*" should produce "*not-exciting*", "*not-fresh*"  
**Core Example D.3**: "*The plot wasn't excessively dull.*" should produce "*not-excessively-dull*"

The dependency trees for these sentences are shown below.

![Extension 4 example 1](./img/negation_example_1.png)

![Extension 4 example 2](./img/negation_example_2.png)

![Extension 4 example 3](./img/negation_example_3.png)

### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on examples A.1, B.1, C.1, C.2, D.1, D.2, and D.3.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [13]:
#Extension D

core = ["It has an exciting fresh plot.",
        "The plot was dull.",
        "It has an excessively dull plot",
        "The plot was excessively dull.",
        "The plot wasn't dull.",
        "It wasn't an exciting fresh plot.",
        "The plot wasn't excessively dull."
       ]

def opinion_extractor(aspect_token,parsed_sentence):
    opinions = []
    for token in parsed_sentence:
        #Extension A
        neg = ""
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:    
                if child.dep_ == 'neg':
                    neg = "not-"
                if child.dep_== 'amod': 
                    adj_token = child.orth_
                    for adv_token in child.children:
                        if adv_token.dep_ == 'advmod':
                            adj_token = adv_token.orth_ + "-" + adj_token
                    opinions.append(neg + adj_token)
        #Extension B
        if token.pos_ == 'VERB':
            for child in token.children:
                if child.pos_ == 'ADJ':
                    adj_token = child.orth_
                    for adv_token in child.children:
                        if adv_token.dep_ == 'neg':
                            neg = "not-"
                        if adv_token.dep_ == 'advmod':
                            adj_token = adv_token.orth_ + "-" + adj_token
                    opinions.append(neg + adj_token)
    
    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
   


results = [] 

for review in core:
    parsed_review = nlp(review)
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))
        
show_results(results, "plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")

Results for aspect word 'plot'

Sentence:
	It has an exciting fresh plot.
Opinion of 'plot':
	 'exciting', 'fresh'


Sentence:
	The plot was dull.
Opinion of 'plot':
	 'dull'


Sentence:
	It has an excessively dull plot
Opinion of 'plot':
	 'excessively-dull'


Sentence:
	The plot was excessively dull.
Opinion of 'plot':
	 'excessively-dull'


Sentence:
	The plot wasn't dull.
Opinion of 'plot':
	 'not-dull'


Sentence:
	It wasn't an exciting fresh plot.
Opinion of 'plot':
	 'not-exciting', 'not-fresh'


Sentence:
	The plot wasn't excessively dull.
Opinion of 'plot':
	 'not-excessively-dull'




## Extension E: Conjunction
If you used your existing extractor on the tree below, it would only extract "*cheesy*". However, "*fun*" and "*inspiring*" are both conjoined with "*cheesy*"; this means that they all apply to the subject ("*plot*").

This conjunction relation is shown via the `conj` dependency. Note that words other than adjectives can be the conjuncts. You could investigate whether this is a problem.

**Core Example E.1**: "*The plot was cheesy but fun and inspiring.*" should produce "*cheesy*", "*fun*", "*inspiring*"  
**Core Example E.2**: "*The plot was really cheesy and not particularly special.*" should produce "*really-cheesy*", "*not-particularly-special*"

The dependency trees for these sentences are shown below.

![Extension 5 example](./img/conj_example_1.png)

![Extension 5 example 2](./img/conj_example_2.png)

### Exercise
Extend the opinion extractor as described above and apply it to examples test set in order to check that your function is working as required.
- Your opinion extractor should now work on all of the required examples.

Investigate the extent to which your opinion extractor produces appropriate opinion bearing words by applying it to the full set of parsed DVD reviews. Consider all four aspects: "*plot*", "*characters*", "*cinematography*", and "*dialogue*".

In [14]:
#Extension E
core = ["It has an exciting fresh plot.",
        "The plot was dull.",
        "It has an excessively dull plot",
        "The plot was excessively dull.",
        "The plot wasn't dull.",
        "It wasn't an exciting fresh plot.",
        "The plot wasn't excessively dull.",
        "The plot was cheesy but fun and inspiring.",
        "The plot was really cheesy and not particularly special."
       ]

def opinion_extractor(aspect_token,parsed_sentence):
    opinions = []
    
    #loop to iterate through all tokens in sentence
    for token in parsed_sentence:
        neg = ""
        #Adds words relevant to aspect words, along with relevant descriptions.
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:  
                #If "n't" is present, add "not-" to beginning.
                if child.dep_ == 'neg':
                    neg = "not-"
                #If 'amod' dependency is present, add to string.
                if child.dep_== 'amod': 
                    adj_token = child.orth_
                    #Search for 'advmod' dependancies in children of 'amod' token. Add to same string
                    for adv_child in child.children:
                        if adv_child.dep_ == 'advmod':
                            adj_token = adv_child.orth_ + "-" + adj_token
                    opinions.append(neg + adj_token)
        #Adds Verbs describing aspect words.         
        if token.pos_ == 'VERB':
            for child in token.children:
                #If an adjective is present, add to string.
                if child.pos_ == 'ADJ':
                    adj_token = child.orth_
                    conj_test(child, opinions)
#                    for adv_child in child.children:
#                        adv_pass = adv_child
#                        #If "n't" is present, add "not-" to beginning.
#                        if adv_child.dep_ == 'neg':
#                            neg = "not-"
#                        #Search for 'advmod' dependancies in children of 'amod' token. Add to same string.
#                        if adv_child.dep_ == 'advmod':
#                            adj_token = adv_child.orth_ + "-" + adj_token
#                        #If conjuctive dependency is present, call a method which will repeat the above in a recursive fashion.
#                        if adv_child.dep_ == "conj":
#                            conj_test(adv_child, opinions)
#                    opinions.append(neg + adj_token)                                           
    return opinions

#function which adds conjuctives of a verb to the 'opinions' set.
def conj_test(token, opinions):
    neg = ""
    conj_token = token.orth_
    for child in token.children:
        if child.dep_ == 'neg':
            neg = "not-"
        if child.dep_ == 'advmod':
            conj_token = child.orth_ + "-" + conj_token
        if child.dep_ == 'conj':
            conj_test(child, opinions)
    opinions.append(neg + conj_token)
    return 

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
   


results = [] 

for review in core:
    parsed_review = nlp(review)
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))
        
show_results(results, "plot")
#show_results(results,"characters")
#show_results(results,"cinematography")
#show_results(results,"dialogue")

Results for aspect word 'plot'

Sentence:
	It has an exciting fresh plot.
Opinion of 'plot':
	 'exciting', 'fresh'


Sentence:
	The plot was dull.
Opinion of 'plot':
	 'dull'


Sentence:
	It has an excessively dull plot
Opinion of 'plot':
	 'excessively-dull'


Sentence:
	The plot was excessively dull.
Opinion of 'plot':
	 'excessively-dull'


Sentence:
	The plot wasn't dull.
Opinion of 'plot':
	 'not-dull'


Sentence:
	It wasn't an exciting fresh plot.
Opinion of 'plot':
	 'not-exciting', 'not-fresh'


Sentence:
	The plot wasn't excessively dull.
Opinion of 'plot':
	 'not-excessively-dull'


Sentence:
	The plot was cheesy but fun and inspiring.
Opinion of 'plot':
	 'inspiring', 'fun', 'cheesy'


Sentence:
	The plot was really cheesy and not particularly special.
Opinion of 'plot':
	 'not-particularly-special', 'really-cheesy'




## Additional extensions

This section presents some examples on which your current opinion extractor will fail. In all of the examples below, "plot" is the aspect token.

You are not required to extend your opinion extractor to handle these cases.

**Optional Example A**: "The script and plot are utterly excellent." produces "utterly-excellent"  
**Optional Example B**: "The script and plot were unoriginal and boring." produces "unoriginal", "boring"  
**Optional Example C**: "The plot wasn't lacking." produces "not-lacking"  
**Optional Example D**: "The plot is full of holes." produces "full-of-holes"  
**Optional Example E**: "There was no logical plot to this story." produces "no-logical"  
**Optional Example F**: "I loved the plot." produces "loved"  
**Optional Example G**: "I didn't mind the plot." produces "not-mind"

The dependency trees for these sentences are shown below.

In [5]:
def print_dep_tree(node,indent):
    print(indent,node)
    indent = indent + "  "
    for child in node.children:
        print_dep_tree(child,indent)

sent = "As one can imagine, thses characters are almost always pathetic"
parsed_sents = nlp(sent)
parsed_sent = next(parsed_sents.sents)
print_dep_tree(parsed_sent.root,"")


 thses
   imagine
     As
     one
     can
     ,
   are
     characters
     pathetic
       always
         almost


In [13]:
#Works for A, B and C.
#Use cell below to adapt, as rsults will start to vary from now on.

core = ["The script and plot are utterly excellent.",
        "The script and plot were unoriginal and boring.",
        "The plot wasn't lacking.",
        "The plot is full of holes.",
        "There was no logical plot to this story.",
        "I loved the plot.",
        "I didn't mind the plot."
       ]


def opinion_extractor(aspect_token,parsed_sentence):
    opinions = []
    for token in parsed_sentence:

        neg = ""
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:  
                if child.dep_ == 'neg':
                    neg = "not-"
                if child.dep_== 'amod': 
                    adj_token = child.orth_
                    for adv_child in child.children:
                        if adv_child.dep_ == 'advmod':
                            adj_token = adv_child.orth_ + "-" + adj_token
                    opinions.append(neg + adj_token)
                    
        if token.pos_ == 'VERB':
            for child in token.children:
                #If an adjective is present, add to string.
                if child.pos_ == 'ADJ':
                    adj_token = child.orth_
                    conj_test(child, opinions)
#                    for adv_child in child.children:
#                        adv_pass = adv_child
#                        #If "n't" is present, add "not-" to beginning.
#                        if adv_child.dep_ == 'neg':
#                            neg = "not-"
#                        #Search for 'advmod' dependancies in children of 'amod' token. Add to same string.
#                        if adv_child.dep_ == 'advmod':
#                            adj_token = adv_child.orth_ + "-" + adj_token
#                        #If conjuctive dependency is present, call a method which will repeat the above in a recursive fashion.
#                        if adv_child.dep_ == "conj":
#                            conj_test(adv_child, opinions)
#                    opinions.append(neg + adj_token)                                           
    return opinions
def recursive_test(token, opinions):
    word = ""
    neg = ""
    conj_token = token.orth_
    for child in token.children:
        if child.dep_ == 'neg':
            neg = "not-"
        if child.dep_ == 'advmod':
            conj_token = child.orth_ + "-" + conj_token
        if child.dep_ == 'conj':
            recursive_test(child, opinions)
    opinions.append(neg + conj_token)
    return opinions

def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

aspect_words = ["plot","characters","cinematography","dialogue"]
   


results = [] 

for review in core:
    parsed_review = nlp(review)
    for sentence in parsed_review.sents:
        for aspect_token in aspect_words:
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))
        
show_results(results, "plot")


Results for aspect word 'plot'

Sentence:
	The script and plot are utterly excellent.
Opinion of 'plot':
	 'utterly-excellent'


Sentence:
	The script and plot were unoriginal and boring.
Opinion of 'plot':
	 'boring', 'unoriginal'


Sentence:
	The plot wasn't lacking.
Opinion of 'plot':
	 'not-lacking'


Sentence:
	The plot is full of holes.
Opinion of 'plot':
	 'full'


Sentence:
	There was no logical plot to this story.
Opinion of 'plot':
	 'logical'


Sentence:
	I didn't mind the plot.
Opinion of 'plot':
	 'not-mind'




In [8]:
#TESTING

optional = [("A","The script and plot are utterly excellent.",set(["utterly-excellent"])),
            ("B","The script and plot were unoriginal and boring.",set(["unoriginal", "boring"])),
            ("C","The plot wasn't lacking.",set(["not-lacking"])),
            ("D","The plot is full of holes.",set(["full-of-holes"])),
            ("E","There was no logical plot to this story.",set(["no-logical"])),
            ("F","I loved the plot.",set(["loved"])),
            ("G","I didn't mind the plot.",set(["not-mind"]))
           ]
#Method for the opinion extractor:
#Pass an aspect word (E.g - 'plot', 'character') and a sentence. 
#Iterate through each word in sentence, analysing the dpendancies and part of speech.
#If words are found referencing the opinion of the review, add them to a set.
#Param: aspect_token    - list constaining words of interest.
#       parsed_sentence - sentence from the review which has been processed by the nlp toolkit.
#Return: Opinions       - set containing strings describing the opinion.
def opinion_extractor(aspect_token,parsed_sentence):
    #set to contain the opinions.
    opinions = []
    
    #loop to iterate through all tokens in sentence.
    for token in parsed_sentence:
        neg = ""
        #Adds words relevant to aspect words, along with relevant descriptions.
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:  
                #If "n't" is present, add "not-" to beginning.
                if child.dep_ == 'neg':
                    neg = "not-"
                #If 'amod' dependency is present, add to string.
                if child.dep_== 'amod': 
                    adj_token = child.orth_
                    #Search for 'advmod' dependancies in children of 'amod' token. Add to same string.
                    for adv_child in child.children:
                        if adv_child.dep_ == 'advmod':
                            adj_token = adv_child.orth_ + "-" + adj_token
                    opinions.append(neg + adj_token)
        #Adds Verbs describing aspect words.         
        if token.pos_ == 'VERB':
            for child in token.children:
                #If an adjective is present, add to string.
                if child.pos_ == 'ADJ':
                    adj_token = child.orth_
                    for adv_child in child.children:
                        adv_pass = adv_child
                        #If "n't" is present, add "not-" to beginning.
                        if adv_child.dep_ == 'neg':
                            neg = "not-"
                        #Search for 'advmod' dependancies in children of 'amod' token. Add to same string.
                        if adv_child.dep_ == 'advmod':
                            adj_token = adv_child.orth_ + "-" + adj_token
                        #If conjuctive dependency is present, call a method which will repeat the above in a recursive fashion.
                        if adv_child.dep_ == "conj":
                            conj_test(adv_child, opinions)
                    opinions.append(neg + adj_token)                                           
    return opinions

#Method for testing conjuctives:
#Adds string which is a conjuctive of a verb to the 'opinions' set.
#Param: token    - word with the conj dependancy.
#       opinions - set containing the opinions.
def conj_test(token, opinions):
    neg = ""
    conj_token = token.orth_
    for child in token.children:
        #If "n't" is present, add "not-" to beginning.
        if child.dep_ == 'neg':
            neg = "not-"
        #Search for 'advmod' dependancies.
        if child.dep_ == 'advmod':
            conj_token = child.orth_ + "-" + conj_token
        #If conjuctive dependency is present, call itself with current child.
        if child.dep_ == 'conj':
            conj_test(child, opinions)
    opinions.append(neg + conj_token)
    return 

#Method for output display.
#Iterate through results (which is a 3 tuple including the aspect token, the sentence, and the opinions) 
#and print them out.
def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

#Aspect words to focus opinion on.
aspect_words = ["plot","characters","cinematography","dialogue"]
   

#set to contain results.
results = [] 

#Main feature to pass sentences through nlp toolkit and the opinion extractor. 
for s in core:
    #Store the review after processing it via nlp toolkit.
    parsed_review = nlp(s)
    #Go through every sentence in the review.
    for sentence in parsed_review.sents:
        #Pass sentence through opinion extractor with aspect token.
        for aspect_token in aspect_words:
            #Add to sets. 
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))
        
#Run test for various aspects.
show_results(results, "plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")

Results for aspect word 'plot'

Sentence:
	The script and plot are utterly excellent.
Opinion of 'plot':
	 'utterly-excellent'


Sentence:
	The script and plot were unoriginal and boring.
Opinion of 'plot':
	 'boring', 'unoriginal'


Sentence:
	The plot is full of holes.
Opinion of 'plot':
	 'full'


Sentence:
	There was no logical plot to this story.
Opinion of 'plot':
	 'logical'


Results for aspect word 'characters'

Sentence:
	The script and plot are utterly excellent.
Opinion of 'characters':
	 'utterly-excellent'


Sentence:
	The script and plot were unoriginal and boring.
Opinion of 'characters':
	 'boring', 'unoriginal'


Sentence:
	The plot is full of holes.
Opinion of 'characters':
	 'full'


Results for aspect word 'cinematography'

Sentence:
	The script and plot are utterly excellent.
Opinion of 'cinematography':
	 'utterly-excellent'


Sentence:
	The script and plot were unoriginal and boring.
Opinion of 'cinematography':
	 'boring', 'unoriginal'


Sentence:
	The plot is 

In [11]:
#TESTING

optional = [("A","The script and plot are utterly excellent.",set(["utterly-excellent"])),
            ("B","The script and plot were unoriginal and boring.",set(["unoriginal", "boring"])),
            ("C","The plot wasn't lacking.",set(["not-lacking"])),
            ("D","The plot is full of holes.",set(["full-of-holes"])),
            ("E","There was no logical plot to this story.",set(["no-logical"])),
            ("F","I loved the plot.",set(["loved"])),
            ("G","I didn't mind the plot.",set(["not-mind"]))
           ]
#Method for the opinion extractor:
#Pass an aspect word (E.g - 'plot', 'character') and a sentence. 
#Iterate through each word in sentence, analysing the dpendancies and part of speech.
#If words are found referencing the opinion of the review, add them to a set.
#Param: aspect_token    - list constaining words of interest.
#       parsed_sentence - sentence from the review which has been processed by the nlp toolkit.
#Return: Opinions       - set containing strings describing the opinion.
def opinion_extractor(aspect_token,parsed_sentence):
    #set to contain the opinions.
    opinions = []
    
    #loop to iterate through all tokens in sentence.
    for token in parsed_sentence:
        neg = ""
        #Adds words relevant to aspect words, along with relevant descriptions.
        if token.pos_ == 'NOUN' and token.orth_ == aspect_token:
            for child in token.children:  
                #If "n't" is present, add "not-" to beginning.
                if child.dep_ == 'neg':
                    neg = "not-"
                #If 'amod' dependency is present, add to string.
                if child.dep_== 'amod': 
                    adj_token = child.orth_
                    #Search for 'advmod' dependancies in children of 'amod' token. Add to same string.
                    for adv_child in child.children:
                        if adv_child.dep_ == 'advmod':
                            adj_token = adv_child.orth_ + "-" + adj_token
                    opinions.append(neg + adj_token)
        #Adds Verbs describing aspect words.         
        if token.pos_ == 'VERB':
            for child in token.children:
                #If an adjective is present, add to string.
                if child.pos_ == 'ADJ':
                    adj_token = child.orth_
                    conj_test(child, opinions)
#                    for adv_child in child.children:
#                        adv_pass = adv_child
#                        #If "n't" is present, add "not-" to beginning.
#                        if adv_child.dep_ == 'neg':
#                            neg = "not-"
#                        #Search for 'advmod' dependancies in children of 'amod' token. Add to same string.
#                        if adv_child.dep_ == 'advmod':
#                            adj_token = adv_child.orth_ + "-" + adj_token
#                        #If conjuctive dependency is present, call a method which will repeat the above in a recursive fashion.
#                        if adv_child.dep_ == "conj":
#                            conj_test(adv_child, opinions)
#                    opinions.append(neg + adj_token)                                           
    return opinions

#Method for testing conjuctives:
#Adds string which is a conjuctive of a verb to the 'opinions' set.
#Param: token    - word with the conj dependancy.
#       opinions - set containing the opinions.
def conj_test(token, opinions):
    neg = ""
    conj_token = token.orth_
    for child in token.children:
        #If "n't" is present, add "not-" to beginning.
        if child.dep_ == 'neg':
            neg = "not-"
        #Search for 'advmod' dependancies.
        if child.dep_ == 'advmod':
            conj_token = child.orth_ + "-" + conj_token
        #If conjuctive dependency is present, call itself with current child.
        if child.dep_ == 'conj':
            conj_test(child, opinions)
    opinions.append(neg + conj_token)
    return 

#Method for output display.
#Iterate through results (which is a 3 tuple including the aspect token, the sentence, and the opinions) 
#and print them out.
def show_results(results,aspect_word):
    print("Results for aspect word '{}'\n".format(aspect_word))
    for word,sent,opinions in results:
        if word == aspect_word:
            print("Sentence:\n\t{}".format(sent))
            print("Opinion of '{0}':\n\t '{1}'".format(aspect_word,"', '".join(opinions)))
            print("\n")

#Aspect words to focus opinion on.
aspect_words = ["plot","characters","cinematography","dialogue"]
   

#set to contain results.
results = [] 

#Main feature to pass sentences through nlp toolkit and the opinion extractor. 
for s in core:
    #Store the review after processing it via nlp toolkit.
    parsed_review = nlp(s)
    #Go through every sentence in the review.
    for sentence in parsed_review.sents:
        #Pass sentence through opinion extractor with aspect token.
        for aspect_token in aspect_words:
            #Add to sets. 
            opinions = opinion_extractor(aspect_token,sentence)
            if opinions:
                results.append((aspect_token,sentence.orth_,opinions))
        
#Run test for various aspects.
show_results(results, "plot")
show_results(results,"characters")
show_results(results,"cinematography")
show_results(results,"dialogue")

Results for aspect word 'plot'

Sentence:
	The script and plot are utterly excellent.
Opinion of 'plot':
	 'utterly-excellent'


Sentence:
	The script and plot were unoriginal and boring.
Opinion of 'plot':
	 'boring', 'unoriginal'


Sentence:
	The plot is full of holes.
Opinion of 'plot':
	 'full'


Sentence:
	There was no logical plot to this story.
Opinion of 'plot':
	 'logical'


Results for aspect word 'characters'

Sentence:
	The script and plot are utterly excellent.
Opinion of 'characters':
	 'utterly-excellent'


Sentence:
	The script and plot were unoriginal and boring.
Opinion of 'characters':
	 'boring', 'unoriginal'


Sentence:
	The plot is full of holes.
Opinion of 'characters':
	 'full'


Results for aspect word 'cinematography'

Sentence:
	The script and plot are utterly excellent.
Opinion of 'cinematography':
	 'utterly-excellent'


Sentence:
	The script and plot were unoriginal and boring.
Opinion of 'cinematography':
	 'boring', 'unoriginal'


Sentence:
	The plot is 

![Extension example 2](./img/additional_example_1.png)

![Extension example 2](./img/additional_example_2.png)

![Extension example 2](./img/additional_example_3.png)

![Extension example 2](./img/additional_example_4.png)

![Extension example 2](./img/additional_example_5.png)

![Extension example 2](./img/additional_example_6.png)

![Extension example 2](./img/additional_example_7.png)

## Tips for de-bugging and exploration
When you will be assessing whether your opinion extractor has been effective when analysing a given sentence. Before you look at what the dependency parser says, read the sentence carefully and determine for yourself the scope of the words. Consider the following sentence.

"This film has excellent characters and an intriguing and engaging plot."

It should be obvious to you that here the plot is described as both "intriguing" and "engaging". However, "excellent" is only used to describe the cinematography.

If the parser suggests a structure which implies that plot is also described by "excellent" (for example), something has gone wrong.### Dependency tree visualisation tool
You may find it useful to use the dependency tree visualisation tool found [here](https://demos.explosion.ai/displacy).

You can copy and past example sentences from the DVD review data to get a good sense of what the dependency parse looks like.

### Outputing results only from sentences relevant to the current task
You will find that your output is dominated by examples of adjectival modification and adjectives linked via the copula. This means that when you add a new function (extensions 3-5) it will be difficult to determine the impact of that new functionality.

One way to solve this problem, is to (temporarily) output only those features produced by the new functionality.

For example, imagine you have just completed extensions 1 and 2. Next, you write code that adds the adverbial features (extension 3). When assessing how well your code is working, let your extractor only extract the "new" adverb features.

There are 2 easy ways to achieve this:

1. Comment out any extractor code that produces features that you're not currently interested in. Or
2. Introduce a boolean variable, which you only set to `True` when you have extracted the feature that you are interested in. Then always ouput an empty list if the variable is `False`, otherwise output the full opinion list.