# Sentiment Analysis Assessment - Solution

## Task #1: Perform vector arithmetic on your own words
Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!

In [1]:
# Import spaCy and load the language library. Remember to use a larger model!
import spacy
nlp = spacy.load('en_core_web_lg')

In [2]:
# Choose the words you wish to compare, and obtain their vectors
water = nlp.vocab['water'].vector
salt = nlp.vocab['salt'].vector
fish = nlp.vocab['fish'].vector

In [3]:
# Import spatial and define a cosine_similarity function
from scipy import spatial
cosine_similarity = lambda x, y: 1 - spatial.distance.cosine(x, y)

In [4]:
# Write an expression for vector arithmetic
# For example: new_vector = word1 - word2 + word3
new_vector = water + salt + fish

In [5]:
# List the top ten closest vectors in the vocabulary to the result of the expression above
computed_similarities = []
for word in nlp.vocab:
    # Ignore words without vectors and mixed-case words:
    if word.has_vector:
        if word.is_lower:
            if word.is_alpha:
                similarity = cosine_similarity(new_vector, word.vector)
                computed_similarities.append((word, similarity))

computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])

print([w[0].text for w in computed_similarities[:10]])

['water', 'fish', 'salt', 'somethin', 'there', 'where', 'it', 'cause', 'that', 'and']


#### CHALLENGE: Write a function that takes in 3 strings, performs a-b+c arithmetic, and returns a top-ten result

In [6]:
def vector_math(a,b,c):
    
    a_vector = nlp.vocab[a].vector
    b_vector = nlp.vocab[b].vector
    c_vector = nlp.vocab[c].vector
    
    new_vector = a_vector - b_vector + c_vector
    
    computed_similarities = []
    for word in nlp.vocab:
        # Ignore words without vectors and mixed-case words:
        if word.has_vector:
            if word.is_lower:
                if word.is_alpha:
                    similarity = cosine_similarity(new_vector, word.vector)
                    computed_similarities.append((word, similarity))

    computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])

    return [w[0].text for w in computed_similarities[:10]]

In [7]:
# Test the function on known words:
vector_math('king','man','woman')

['king',
 'and',
 'that',
 'where',
 'she',
 'they',
 'woman',
 'there',
 'should',
 'these']

## Task #2: Perform VADER Sentiment Analysis on your own review
Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review.

In [8]:
# Import SentimentIntensityAnalyzer and create an sid object
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

In [9]:
# Write a review as one continuous string (multiple sentences are ok)
# https://www.tripadvisor.com.ph/ShowUserReviews-g659555-d1745258-r944724419-Evolution_Diving-Malapascua_Island_Cebu_Island_Visayas.html
review_excellent = 'Always a very pleasant time at Evolution! Everything is perfect, the team is top, beach is magnificent, foos is quite good!'
# https://www.tripadvisor.com.ph/ShowUserReviews-g659555-d1745258-r746729136-Evolution_Diving-Malapascua_Island_Cebu_Island_Visayas.html
review_average = """I dove with Evolution on 3 different days during my stay. It's an efficiently run dive centre with quite a few DMs and lots of guys who help with the equipment and boats. All the equipment gets assembled and taken to the big boat, which is parked further out due to the low tide. Divers are then taken there with a smaller boat.
I had 2 different dive dive guides which were nice but were doing a job. I didn't particularly feel looked after. The dives were very good and they pointed things out. After the dives everyone went on about their own thing. No one sat together, filled out log books or talked about the dives.
They charge a 200 peso marine fee for Monad Shoal where the Thresher sharks are and also for other dive sites in that same area. On their big board it says 50 Pesos of this fee are voluntary, which isn't correct. The normal marine fee per day is 150 Pesos."""


In [10]:
# Obtain the sid scores for your review
print('Excellent Review:', sid.polarity_scores(review_excellent))
print('Average Review:', sid.polarity_scores(review_average))

Excellent Review: {'neg': 0.0, 'neu': 0.472, 'pos': 0.528, 'compound': 0.9499}
Average Review: {'neg': 0.026, 'neu': 0.914, 'pos': 0.059, 'compound': 0.6746}


### CHALLENGE: Write a function that takes in a review and returns a score of "Positive", "Negative" or "Neutral"

In [11]:
def review_rating(string):
    polarity_scores = sid.polarity_scores(string)
    del polarity_scores['compound']
    rating = max(polarity_scores, key=polarity_scores.get)
    if rating == 'neg':
        return 'Negative'
    elif rating == 'neu':
        return 'Neutral'
    elif rating == 'pos':
        return 'Positive'
    else:
        return "Unknown"

In [12]:
# Test the function on your review above:
review_rating(review_excellent), review_rating(review_average)

('Positive', 'Neutral')