# Sentiment Analysis Assessment - Solution

## Task #1: Perform vector arithmetic on your own words
Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!

In [2]:
# Import spaCy and load the language library. Remember to use a larger model!
import spacy

nlp = spacy.load('en_core_web_lg')

In [3]:
# Choose the words you wish to compare, and obtain their vectors
a = 'car'
b = 'boat'
c = 'airplane'

av = nlp.vocab[a].vector
bv = nlp.vocab[b].vector
cv = nlp.vocab[c].vector

In [4]:
# Import spatial and define a cosine_similarity function
from scipy import spatial

def cosine_similarity(v1, v2):
    return 1 - spatial.distance.cosine(v1, v2)

In [5]:
# Write an expression for vector arithmetic
# For example: new_vector = word1 - word2 + word3
water = nlp.vocab['water'].vector
air = nlp.vocab['air'].vector

nv = bv - water + air # Expect to get something closer to an airplane or something

In [6]:
# List the top ten closest vectors in the vocabulary to the result of the expression above

# Need to calculate the cosine similarity between the obtained new vector and all the vocabulary
vocabulary = {word.text:cosine_similarity(word.vector, nv) for word in nlp.vocab 
              if word.has_vector and word.is_lower and word.is_alpha}

# Sort the resulting vocabulary by the similarity in descending order
sorted_vocab = sorted(vocabulary.items(), key=lambda v: v[1], reverse=True)
sorted_vocab[:10]

[('boat', 0.6587398052215576),
 ('air', 0.6086523532867432),
 ('airplane', 0.5957858562469482),
 ('boats', 0.589165985584259),
 ('flight', 0.5855278372764587),
 ('aircraft', 0.5835665464401245),
 ('cruise', 0.5752798318862915),
 ('sail', 0.5558519959449768),
 ('plane', 0.5459532737731934),
 ('aboard', 0.5436309576034546)]

Naturally, boat is the first option, but observe the other options, we got, air, airplane, flight and aircraft which are very close to the intention when computing the new vector.

#### CHALLENGE: Write a function that takes in 3 strings, performs a-b+c arithmetic, and returns a top-ten result

In [7]:
def vector_math(a,b,c):
    av = nlp.vocab[a].vector
    bv = nlp.vocab[b].vector
    cv = nlp.vocab[c].vector
    
    nv = av - bv + cv
    vocabulary = {word.text:cosine_similarity(word.vector, nv) for word in nlp.vocab 
              if word.has_vector and word.is_lower and word.is_alpha}
    sorted_vocab = sorted(vocabulary.items(), key=lambda v: v[1], reverse=True)
    return sorted_vocab[:10]

In [8]:
# Test the function on known words:
vector_math('king','man','woman')

[('king', 0.8024259805679321),
 ('queen', 0.7880843877792358),
 ('prince', 0.6401076912879944),
 ('kings', 0.6208544373512268),
 ('princess', 0.6125636100769043),
 ('royal', 0.5800970792770386),
 ('throne', 0.5787012577056885),
 ('queens', 0.5743793845176697),
 ('monarch', 0.563362181186676),
 ('kingdom', 0.5520980954170227)]

## Task #2: Perform VADER Sentiment Analysis on your own review
Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review.

In [10]:
# Import SentimentIntensityAnalyzer and create an sid object
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

In [11]:
# Write a review as one continuous string (multiple sentences are ok)
review = 'This course is very nice, though I would expect more details in the embedding parts. Though I do not know if I will encounter that in the ANN section :)'

In [12]:
# Obtain the sid scores for your review
sid.polarity_scores(review)

{'neg': 0.0, 'neu': 0.804, 'pos': 0.196, 'compound': 0.7264}

### CHALLENGE: Write a function that takes in a review and returns a score of "Positive", "Negative" or "Neutral"

In [14]:
def review_rating(string, neutral_thresholds = (-0.2, 0.2)):
    sid = SentimentIntensityAnalyzer()
    scores = sid.polarity_scores(string)
    compound = scores['compound']
    # we rarely will encounter a zero score (neutral)
    # So we will define a threshold, e.g., between -0.2 and 0.2
    if compound < neutral_thresholds[0]:
        return 'Negative'
    elif compound > neutral_thresholds[0] and compound < neutral_thresholds[1]:
        return 'Neutral'
    else:
        return 'Positive'

In [15]:
# Test the function on your review above:
review_rating(review)

'Positive'

## Great job!