```
#############################################
##                                         ##
##  Natural Language Processing in Python  ##
##                                         ##
#############################################

ยง2 Sentiment Analysis in Python

ยง2.1 Sentiment analysis nuts and bolts
```

# Introduction to sentiment analysis

## What is sentiment analysis?

* Sentiment analysis is the process of understanding the opinion of an author about a subject.


## What goes into a sentiment analysis system?

* First element: opinion/emotion

    * opinion (polarity): pos, neutral, neg

    * emotion
    
* Second element: subject
	
    * subject of discussion $\rightarrow$ **What is being talked about?**

* Third element: opinion holder
	
	* opinion holder (entity) $\rightarrow$ **By whom?**

![Opinion and emotion](ref1.%20Opinion%20and%20emotion.jpg)

## Why sentiment analysis?

* Social media monitoring:

    * not only what people are talking about but how they are talking about it

    * sentiment can also be found in forums, blogs, news

* Brand monitoring.

* Customer service.

* Product analytics.

* Market research and analysis.

## Code of movie reviews:

In [1]:
import pandas as pd

data = pd.read_csv('ref2. IMDB movie reviews sample.csv')
data = data[['review', 'label']]

data.head()

Unnamed: 0,review,label
0,This short spoof can be found on Elite's Mille...,0
1,A singularly unfunny musical comedy that artif...,0
2,"An excellent series, masterfully acted and dir...",1
3,The master of movie spectacle Cecil B. De Mill...,1
4,I was gifted with this movie as it had such a ...,0


In [2]:
data.label.value_counts()

0    3782
1    3719
Name: label, dtype: int64

In [3]:
data.label.value_counts() / len(data)

0    0.504199
1    0.495801
Name: label, dtype: float64

In [4]:
length_reviews = data.review.str.len()

type(length_reviews)

pandas.core.series.Series

In [5]:
# Finding the review with max length
max(length_reviews)

10321

In [6]:
min(length_reviews)

52

## Practice question for elements of a sentiment analysis problem:

* What are the three typical elements of a sentiment analysis system?

    $\Box$ Opinion, emotion, and subject.

    $\boxtimes$ Opinion, subject, and opinion holder.
    
    $\Box$ Emotion, polarity, and opinion.

    $\Box$ Opinion, subject, and polarity.

## Practice exercises for introduction to sentiment analysis:

$\blacktriangleright$ **Package pre-loading:**

In [7]:
import pandas as pd

$\blacktriangleright$ **Data pre-loading:**

In [8]:
movies = pd.read_csv('ref2. IMDB movie reviews sample.csv')
movies = movies[:1000]

$\blacktriangleright$ **Positive and negative reviews practice:**

In [9]:
# Find the number of positive and negative reviews
print('Number of positive and negative reviews: ', movies.label.value_counts())

# Find the proportion of positive and negative reviews
print('Proportion of positive and negative reviews: ',
      movies.label.value_counts() / len(movies))

Number of positive and negative reviews:  0    530
1    470
Name: label, dtype: int64
Proportion of positive and negative reviews:  0    0.53
1    0.47
Name: label, dtype: float64


$\blacktriangleright$ **Longest and shortest reviews practice:**

In [10]:
length_reviews = movies.review.str.len()

# How long is the longest review
print(max(length_reviews))

5992


In [11]:
length_reviews = movies.review.str.len()

# How long is the shortest review
print(min(length_reviews))

53


## Version checking:

In [12]:
import sys

print('The Python version is {}.'.format(sys.version.split()[0]))
print('The pandas version is {}.'.format(pd.__version__))

The Python version is 3.7.9.
The pandas version is 1.2.1.
