# Sentiment Analysis
<hr style="border:2px solid black">

<img src="https://blog.breos.com/hubfs/undefined-1.jpeg" align="center" width=600/>

## 1. Warmup

***What is Sentiment Analysis?***
>- a natural language processing (NLP) technique 
>- identifies emotional tone or *valence* (positive/negative/neutral) of text data

***Why Sentiment Analysis?***
>- social media filtering: racism, sexism, hate speech, violence
>- monitoring & censorship
>- market research
>- fake news detection
>- product reviews

***What are the challenges?***
>- sarcasm, irony, paradox, humor
>- multipolarity
>- typos, spelling/grammatical mistakes
>- slang words, dialects
>- emojis

**Approaches to Sentiment Analysis**
>- Classical Supevised Learning: Naive Bayes, Bag of Words
>- Rule-Based Systems: VADER
>- Deep Learning: LSTM

<hr style="border:2px solid black">

## 2. [VADER](https://github.com/cjhutto/vaderSentiment)

**Valence Aware Dictionary and sEntiment Reasoner** 
>- lexicon- and rule-based approach to sentiment analysis of social-media texts
>- takes into account polarity (positive/negative) & intensity

In [None]:
#!conda install -c conda-forge vaderSentiment -y
# or
#!pip install vaderSentiment

In [1]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

`SentimentIntensityAnalyzer` takes in a string and returns a dictionary of scores in each of four categories:
- negative
- neutral
- positive
- compound 

In [2]:
analyzer = SentimentIntensityAnalyzer()

**Warmup Exercise**

Let's choose a positive word and analyze it

In [3]:
analyzer.polarity_scores('good')

{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4404}

>- intensity scores derived from the fraction of texts for different categories
>- The compund score calculated from the valence scores as:
$$
\text{compund score} = \frac{\text{sum of valence scores}}{\sqrt{(\text{sum of valence scores})²+15}}
$$
>- positive sentiment: compound score >= 0.05 $~$($\,$extreme positive: +1$\,$)
>- neutral sentiment: -0.05 < compound score < 0.05
>- negative sentiment: compound score <= -0.05 $~$($\,$extreme negative: -1$\,$)

In [4]:
import math

def compound_score(*valence_scores):
    "This function gives the compound score derived from component valence scores"
    total_valence = sum([score for score in valence_scores])
    cs = total_valence/math.sqrt(total_valence**2 + 15)
    return round(cs,4)

In [6]:
compound_score(1.9)

0.4404

In [7]:
analyzer.polarity_scores('shit')

{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.5574}

In [12]:
analyzer.polarity_scores('good shit')

{'neg': 0.554, 'neu': 0.0, 'pos': 0.446, 'compound': -0.1779}

In [13]:
compound_score(1.9,-2.6)

-0.1779

<br>

### 2.0 Sentence Example

In [14]:
text0 = "The party was great."
analyzer.polarity_scores(text0)

{'neg': 0.0, 'neu': 0.227, 'pos': 0.773, 'compound': 0.7783}

<br>

### 2.1 Negation

In [19]:
text1 = "The party was not so great."
analyzer.polarity_scores(text1)

{'neg': 0.382, 'neu': 0.369, 'pos': 0.249, 'compound': -0.3482}

<br>

### 2.2 Punctuation Marks

In [23]:
text2 = "The party was great!!"
analyzer.polarity_scores(text2)

{'neg': 0.0, 'neu': 0.213, 'pos': 0.787, 'compound': 0.8118}

<br>

### 2.3 Capitalization

In [22]:
text3 = "The party was GREAT!!"
analyzer.polarity_scores(text3)

{'neg': 0.0, 'neu': 0.198, 'pos': 0.802, 'compound': 0.8449}

<br>

### 2.4 Intensifiers (Boosters)

In [25]:
text4 = "The party was kinda GREAT!!"
analyzer.polarity_scores(text4)

{'neg': 0.0, 'neu': 0.277, 'pos': 0.723, 'compound': 0.8327}

<br>

### 2.5 Negative Adverbs

In [26]:
text5 = "The party was hardly great."
analyzer.polarity_scores(text5)

{'neg': 0.0, 'neu': 0.316, 'pos': 0.684, 'compound': 0.7584}

<br>

### 2.6 Conjunctions

In [27]:
text6 = "The party was great, but very small."
analyzer.polarity_scores(text6)

{'neg': 0.0, 'neu': 0.532, 'pos': 0.468, 'compound': 0.5267}

<br>

### 2.7 Emoticons

In [42]:
text7 = "The party was great."
analyzer.polarity_scores(text7)

{'neg': 0.0, 'neu': 0.227, 'pos': 0.773, 'compound': 0.7783}

<br>

### 2.8 Repetition

In [37]:
text8 = "The party was not really really GREAT!!"
analyzer.polarity_scores(text8)

{'neg': 0.386, 'neu': 0.399, 'pos': 0.215, 'compound': -0.4842}

<br>

### 2.9 Caps Lock On

In [41]:
text9 = "THE PARTY WAS GREAT!!"
analyzer.polarity_scores(text9)

{'neg': 0.0, 'neu': 0.213, 'pos': 0.787, 'compound': 0.8118}

<br>

### 2.10 Slang Words

In [43]:
text10 = "The party was fucking great."
analyzer.polarity_scores(text10)

{'neg': 0.0, 'neu': 0.297, 'pos': 0.703, 'compound': 0.796}

<br>

## 3. GitHub Exercise

**Take a look at the [Vader-github-repo](https://github.com/cjhutto/vaderSentiment) and try to answer these questions:**

1. Which file dooes containes the lexicon and what  does the values represent? (Hint: check out the README)



2. Does vader take punctuation into account? (Try to find out what exactly happens in the code)


3. Which words intensify a sentiment? (Hint: check out the vaderSentiment.py file)


4. What happens if one word is in ALL CAPS? What if the whole text is in ALL CAPS? (Hint: check out the vaderSentiment.py file)

<hr style="border:2px solid black">

## References

- [VADER Sentiment Analysis Explained](https://medium.com/@piocalderon/vader-sentiment-analysis-explained-f1c4f9101cd9)