# Sentiment Analysis Assessment - Solution

## Task #1: Perform vector arithmetic on your own words
Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!

In [1]:
# Import spaCy and load the language library. Remember to use a larger model!
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_md')
from spacy import displacy

In [2]:
# Choose the words you wish to compare, and obtain their vectors
words = ['ecstatic', 'joyful', 'miserable']
word_vectors = {word: nlp.vocab[word].vector for word in words}

In [3]:
word_vectors

{'ecstatic': array([-0.67068 ,  0.27548 , -1.8172  , -0.5289  ,  2.5144  , -1.1588  ,
         2.3219  ,  0.53299 , -0.37279 , -0.81931 ,  1.4773  , -0.66501 ,
         0.23891 ,  1.772   ,  2.0204  , -0.70456 ,  1.0632  , -0.28494 ,
        -2.7905  , -1.0509  ,  1.3569  ,  0.99753 , -1.7222  , -1.4504  ,
        -1.9008  , -1.2981  , -1.4197  , -0.50681 , -0.38561 ,  2.5661  ,
         1.5151  ,  0.45034 , -0.21167 ,  1.3173  , -1.9054  , -2.9211  ,
         0.47291 ,  0.025939,  2.4379  ,  2.3975  , -1.3418  , -0.053997,
        -0.63691 ,  0.21353 , -2.0662  ,  2.7349  ,  1.7883  , -1.0111  ,
         0.10086 ,  2.331   , -0.89693 , -0.76109 , -1.2632  , -0.12785 ,
        -3.1367  ,  0.99889 ,  1.871   ,  0.57415 ,  0.57486 ,  0.054386,
         0.84079 ,  0.13604 , -1.8042  ,  0.34943 ,  1.674   ,  1.8322  ,
        -2.664   , -2.8388  ,  0.59243 ,  2.2413  , -1.0008  ,  0.12001 ,
        -1.8362  ,  0.54675 , -1.991   ,  1.5513  , -1.992   , -0.44796 ,
        -1.8171  ,  1.0354

In [4]:
from scipy import spatial

cosine_similarity = lambda x, y: 1 - spatial.distance.cosine(x, y)

In [5]:
# Write an expression for vector arithmetic
# For example: new_vector = word1 - word2 + word3
ecstatic_vector = nlp.vocab['ecstatic'].vector
joyful_vector = nlp.vocab['joyful'].vector
miserable_vector = nlp.vocab['miserable'].vector

new_vector = ecstatic_vector - joyful_vector + miserable_vector

computed_similarities = []

In [6]:
# List the top ten closest vectors in the vocabulary to the result of the expression above
for word in nlp.vocab:
    if word.has_vector:
        if word.is_lower:
            if word.is_alpha:
                similarity = cosine_similarity(new_vector, word.vector)
                computed_similarities.append((word, similarity))

computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])

print([w[0].text for w in computed_similarities[:10]])

['miserable', 'cause', 'and', 'or', 'havin', 'were', 'that', 'ecstatic', 'where', 'these']


#### CHALLENGE: Write a function that takes in 3 strings, performs a-b+c arithmetic, and returns a top-ten result

In [7]:
import spacy
from scipy.spatial.distance import cosine

# Load spaCy's medium-sized English model
nlp = spacy.load('en_core_web_md')

def vector_math(a,b,c):
    
    result_vector = nlp.vocab[a].vector - nlp.vocab[b].vector + nlp.vocab[c].vector

    all_words = [word for word in nlp.vocab.strings if nlp.vocab[word].has_vector]
    word_similarities = [(word, 1 - cosine(result_vector, nlp.vocab[word].vector)) for word in all_words]

    word_similarities = sorted(word_similarities, key=lambda item: item[1], reverse=True)

    return word_similarities[:10]

In [8]:
# Test the function on known words:
vector_math('king','man','woman')

[('Ariarathes', 0.8489541040036218),
 ('king', 0.8489541040036218),
 ('kingi', 0.8489541040036218),
 ('kingii', 0.8489541040036218),
 ('kingly', 0.8489541040036218),
 ('kingmaker', 0.8489541040036218),
 ('overlord', 0.8489541040036218),
 ('Æthelstan', 0.8489541040036218),
 ('Arsaces', 0.7189059645361134),
 ('Ennals', 0.7189059645361134)]

## Task #2: Perform VADER Sentiment Analysis on your own review
Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review.

In [10]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...


True

In [11]:
# Import SentimentIntensityAnalyzer and create an sid object
from nltk.sentiment import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

In [12]:
# Write a review as one continuous string (multiple sentences are ok)
review = 'I really love Apex Legends. The gameplay is exciting, the gunplay and differing legend compositions and abilities makes it better than most battle royales in the current market.'

In [13]:
# Obtain the sid scores for your review
sid.polarity_scores(review)

{'neg': 0.077, 'neu': 0.587, 'pos': 0.336, 'compound': 0.8658}

### CHALLENGE: Write a function that takes in a review and returns a score of "Positive", "Negative" or "Neutral"


In [14]:
def review_rating(string):
    sid = SentimentIntensityAnalyzer()
    scores = sid.polarity_scores(review)
    
    compound_score = scores['compound']
    if compound_score >= 0.05:
        return "Positive"
    elif compound_score <= -0.05:
        return "Negative"
    else:
        return "Neutral"

In [15]:
# Test the function on your review above:
review_rating(review)

'Positive'