# Sentiment Analysis Assessment - Solution

## Task #1: Perform vector arithmetic on your own words
Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!

In [1]:
# Import spaCy and load the language library. Remember to use a larger model!
import spacy

nlp = spacy.load('en_core_web_lg')


In [2]:
# Choose the words you wish to compare, and obtain their vectors
# We will compare words House Building Structure

tokens = nlp('house building structure')

for token1 in tokens:
    for token2 in tokens:
        print(token1.text, token2.text, token1.similarity(token2))


house house 1.0
house building 0.6041158
house structure 0.30534264
building house 0.6041158
building building 1.0
building structure 0.59896547
structure house 0.30534264
structure building 0.59896547
structure structure 1.0


In [3]:
# Import spatial and define a cosine_similarity function

from scipy import spatial

def cosine_similarity(x,y):
    
    s = 1 - spatial.distance.cosine(x,y)
    
    return s

In [4]:
# Write an expression for vector arithmetic
# For example: new_vector = word1 - word2 + word3

def vec_arith(w1,w2,w3):
    
    word1 = nlp.vocab[w1].vector
    word2 = nlp.vocab[w2].vector
    word3 = nlp.vocab[w3].vector
    
    new_vector = word1 + word2 + word3
    
    computed_similarities = []
    
    for word in nlp.vocab:
        
        if word.has_vector:
            
            if word.is_lower:
                
                if word.is_alpha:
                    
                    similarities = cosine_similarity(new_vector, word.vector)
                    computed_similarities.append((word, similarities))
    
    
    
    return computed_similarities
    


In [5]:
# List the top ten closest vectors in the vocabulary to the result of the expression above

computed_similarities = vec_arith('protein', 'run', 'exercise')

computed_similarities = sorted(computed_similarities, key = lambda item: -item[1])



In [6]:
print([w[0].text for w in computed_similarities][:10])

['protein', 'exercise', 'diet', 'workout', 'muscle', 'carbohydrate', 'proteins', 'metabolism', 'healthy', 'routine']


In [7]:
computed_similarities = vec_arith('king','man','woman')

computed_similarities = sorted(computed_similarities, key = lambda item: -item[1])

In [8]:
print([w[0].text for w in computed_similarities][:10])

['man', 'woman', 'lady', 'king', 'boy', 'girl', 'father', 'men', 'he', 'guy']


## Task #2: Perform VADER Sentiment Analysis on your own review
Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review.

In [9]:
# Import SentimentIntensityAnalyzer and create an sid object

from nltk.sentiment import SentimentIntensityAnalyzer
import nltk
sid = SentimentIntensityAnalyzer()



In [10]:
# Write a review as one continuous string (multiple sentences are ok)
review = 'The movie had predictable ending. The comedians tried their best to make viewers laugh, but they could only do so to some extent. The main actors were given the roles they were born to play.'

# Obtain the sid scores for your review
sid.polarity_scores(review)

{'neg': 0.0, 'neu': 0.764, 'pos': 0.236, 'compound': 0.8225}

In [11]:
# Write a review as one continuous string (multiple sentences are ok)
review = 'Must see movie of the year! Cant beat anything that is out there.The movie had predictable ending. The comedians tried their best to make viewers laugh, did it well. The main actors were given the roles they were born to play.'

# Obtain the sid scores for your review
sid.polarity_scores(review)

{'neg': 0.0, 'neu': 0.709, 'pos': 0.291, 'compound': 0.9299}

### CHALLENGE: Write a function that takes in a review and returns a score of "Positive", "Negative" or "Neutral"

In [12]:
def review_rating(string):
    
    from nltk.sentiment import SentimentIntensityAnalyzer
    
    sid = SentimentIntensityAnalyzer()
    
    rev_dict = sid.polarity_scores(review)
    
    rev = rev_dict['compound']
    
    if float(rev) > 0:
        return 'Positive Review'
    elif float(rev) < 0:
        return 'Negative Review'
    else:
        'Neutral Review'

## Applying the Sentiment Analysis to the Parasite (2019) movie reviews.

In [13]:
review = '''A family of incompetents who struggle to fold a pizza carton miraculously transform to ingenious con artists and infiltrate themselves into a rich home where they successfully pretend to deliver services they have no clue of. The inhabitants of the house, on the other hand, are so incredibly stupid that they don't notice anything is wrong except for some wierd smell. But the impostors refuse to be judged by their smell! A bloodbath follows. The end. Standing ovation at Cannes, Palme d'Or, 8,6-star rating on IMDB... It's not just a bad story, no, it's actually the perfect story based on the wrong belief that if you are poor, you cannot be expected to be good, and if you are rich, you are never good enough. Naturally, stereotypical characters follow and I couldn't make myself care about any of them. The only moments when the poor father showed some deepness were during his monologue about living with no plans and when he checked his smell after he heard the rich father complaining about it. Otherwise, all family members were interested only in filling their bellies with junk and had no reservations in causing any kind of harm to that end. Now, I know I'm supposed to dislike the rich family, but although I didn't like them either, I don't see what's so wrong about them. But the urge in society to favor the have-nots, no matter what their actual traits might be, is so strong that most of the reviewers take their side, while some even find them "lovable". And that I find even more disgusting than actually being disgusting.'''


# Test the function on your review above:
review_rating(review)

'Negative Review'

In [14]:
review = '''It makes no sense of the situations. Some scenes were good but the movie is too long for the story. Also, the plot is very predecible in a lot of things. I don't recommend this movie.'''

review_rating(review)

'Negative Review'

In [15]:
review = '''A story there was. Thrilling scenes there were. Some pretty cinematography yes. Solid character build-up, hell no. Logical plot, no. Entertainment, not much. Consistency of characters, missing. Petty resentful ignorance towards the rich, yes. Narrow and shallow depiction of the poor, very much.

So not convincing. Every stretch of the movie i thought I started understanding the characters a bit more, but no, next stretch would make me scratch that understanding and try to rebuild again.. In the end, it made me fail to attach with any of the characters.

Without enough entertainment, what were the film makers trying to tell the audience? I completely don't get it.

Really doesn't match the hype. Wasted two hours of my life.'''

review_rating(review)

'Negative Review'

In [16]:
review = '''I can't remember the last time I saw a movie that contained as many genres as 'Parasite'. The movie starts out almost like an 'Ocean's Eleven' heist film and then expands into a comedy, mystery, thriller, drama, romance, crime and even horror film. It really did have everything and it was strikingly good at all of them too.

I love a film that respects its audience. There are so many details in this movie that are crucially important and yet the film trusts its audience to notice them and acknowledge them without ramming them down our throats. There are a lot of layers to this film and I suspect for this reason its rewatch-ability factor will be very high.

The film was incredibly entertaining too. I can't think of a boring scene in this movie and yet on the surface for large parts of the film you would say not a lot is happening, at least in terms of action. Fascinating characters and brilliant dialogue are what create this. I had a great time with 'Parasite' and I think most that give it a chance will too.'''

review_rating(review)

'Positive Review'