# Sentiment Analysis

Sentiment analysis can be defined as the use of natural language processing to systematically identify, extract, quantify, and study affective states and subjective information. Generally speaking, sentiment analysis aims to determine the attitude of a speaker, writer, or other subject with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event.


To do sentiment analysis and other complex tasks, we need to use algorithms like word2vec. The idea is to create numerical arrays, or word embeddings for every word in a large corpus. Each word is assigned its own vector in such a way that words that frequently appear together in the same context are given vectors that are close together. The result is a model that may not know that a "lion" is an animal, but does know that "lion" is closer in context to "cat" than "paper"

___
## Installing Larger spaCy Models
> [**en_core_web_sm**](https://spacy.io/models/en#en_core_web_sm) (35MB) Vector: 0 keys, 0 unique vectors (0 dimensions). Provides vocabulary, sintax and entities, but not vectors
> <br>or<br>
> [**en_core_web_md**](https://spacy.io/models/en#en_core_web_md) (116MB) Vectors: 685k keys, 20k unique vectors (300 dimensions)
> <br>or<br>
> [**en_core_web_lg**](https://spacy.io/models/en#en_core_web_lg) (812MB) Vectors: 685k keys, 685k unique vectors (300 dimensions)

### From the command line (you must run this as admin or use sudo):

> `conda activate spacyenv`&emsp;*if using a conda environment*   
> 
> `python -m spacy download en_core_web_md`  
> `python -m spacy download en_core_web_lg`&emsp;&emsp;&ensp;*optional library*  
> `python -m spacy download en_vectors_web_lg`&emsp;*optional library*  

## Vector arithmetic

In [22]:
import spacy
nlp = spacy.load('en_core_web_lg')

In [23]:
print(nlp.vocab['dog'].vector)

[-4.0176e-01  3.7057e-01  2.1281e-02 -3.4125e-01  4.9538e-02  2.9440e-01
 -1.7376e-01 -2.7982e-01  6.7622e-02  2.1693e+00 -6.2691e-01  2.9106e-01
 -6.7270e-01  2.3319e-01 -3.4264e-01  1.8311e-01  5.0226e-01  1.0689e+00
  1.4698e-01 -4.5230e-01 -4.1827e-01 -1.5967e-01  2.6748e-01 -4.8867e-01
  3.6462e-01 -4.3403e-02 -2.4474e-01 -4.1752e-01  8.9088e-02 -2.5552e-01
 -5.5695e-01  1.2243e-01 -8.3526e-02  5.5095e-01  3.6410e-01  1.5361e-01
  5.5738e-01 -9.0702e-01 -4.9098e-02  3.8580e-01  3.8000e-01  1.4425e-01
 -2.7221e-01 -3.7016e-01 -1.2904e-01 -1.5085e-01 -3.8076e-01  4.9583e-02
  1.2755e-01 -8.2788e-02  1.4339e-01  3.2537e-01  2.7226e-01  4.3632e-01
 -3.1769e-01  7.9405e-01  2.6529e-01  1.0135e-01 -3.3279e-01  4.3117e-01
  1.6687e-01  1.0729e-01  8.9418e-02  2.8635e-01  4.0117e-01 -3.9222e-01
  4.5217e-01  1.3521e-01 -2.8878e-01 -2.2819e-02 -3.4975e-01 -2.2996e-01
  2.0224e-01 -2.1177e-01  2.7184e-01  9.1703e-02 -2.0610e-01 -6.5758e-01
  1.8949e-01 -2.6756e-01  9.2639e-02  4.3316e-01 -4

In [45]:
a = nlp.vocab['france'].vector
b = nlp.vocab['paris'].vector
c = nlp.vocab['japan'].vector

In [50]:
from scipy import spatial

cosine_similarity = lambda x, y: 1 - spatial.distance.cosine(x, y)
computed_similarities = []

new_vec = a - b + c
for word in nlp.vocab:
    if word.has_vector:
        if word.is_lower:
            if word.is_alpha:
                similarity = cosine_similarity(new_vec, word.vector)
                computed_similarities.append((word, similarity))
computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])    

In [51]:
print([w[0].text for w in computed_similarities[:10]])

['japan', 'france', 'tokyo', 'i', 'co', 'paris', 'coz', 'moon', 'u', 'how']


## VADER - Sentiment Analysis

VADER is an NLTK module that provides sentiment scores based on words used ("completely" boosts a score, while "slightly" reduces it), on capitalization & punctuation ("GREAT!!!" is stronger than "great."), and negations (words like "isn't" and "doesn't" affect the outcome).
<br>To view the source code visit https://www.nltk.org/_modules/nltk/sentiment/vader.html

In [52]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/arthur/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [53]:
from nltk.sentiment import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

In [60]:
review = 'The last spider man movie was amazing!! It was one of the best hero movies of all time.'

In [61]:
sid.polarity_scores(review)

{'neg': 0.0, 'neu': 0.552, 'pos': 0.448, 'compound': 0.9214}