# NLP - Reading-Summarizing Data

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('amazon_alexa.tsv', delimiter = '\t')

data.shape

(3150, 5)

In [3]:
data.head()

Unnamed: 0,rating,date,variation,verified_reviews,feedback
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1
2,4,31-Jul-18,Walnut Finish,"Sometimes while playing a game, you can answer...",1
3,5,31-Jul-18,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1
4,5,31-Jul-18,Charcoal Fabric,Music,1


In [4]:
data.isnull().sum()

rating              0
date                0
variation           0
verified_reviews    0
feedback            0
dtype: int64

In [5]:
data.describe()

Unnamed: 0,rating,feedback
count,3150.0,3150.0
mean,4.463175,0.918413
std,1.068506,0.273778
min,1.0,0.0
25%,4.0,1.0
50%,5.0,1.0
75%,5.0,1.0
max,5.0,1.0


In [6]:
data.describe(include = 'object')

Unnamed: 0,date,variation,verified_reviews
count,3150,3150,3150.0
unique,77,16,2301.0
top,30-Jul-18,Black Dot,
freq,1603,516,79.0


In [7]:
data['variation'].value_counts()

Black  Dot                      516
Charcoal Fabric                 430
Configuration: Fire TV Stick    350
Black  Plus                     270
Black  Show                     265
Black                           261
Black  Spot                     241
White  Dot                      184
Heather Gray Fabric             157
White  Spot                     109
White                            91
Sandstone Fabric                 90
White  Show                      85
White  Plus                      78
Oak Finish                       14
Walnut Finish                     9
Name: variation, dtype: int64

# Length, Polarity and Subjectivity

In [8]:
!pip install textblob



In [9]:
#storing length for each review
data['length'] = data['verified_reviews'].apply(len)

# Polarity and Subjectivity

Text Polarity is the expression that determins the sentimental aspect of an opinion. In textual data, the result of sentiment analysis can be determined for each entity in the sentence, document or sentence. The sentiment can be determined as positive, negative or neutral.

In [10]:
from textblob import TextBlob

In [11]:
# Calculate Polarity

def get_polarity(text):
    textblob = TextBlob(str(text.encode('utf-8')))
    pol = textblob.sentiment.polarity
    return pol

# Apply function
data['polarity'] = data['verified_reviews'].apply(get_polarity)

### Text Subjectivity

In natural language, subjectivity refers to the expression of opinions, feelings, and speculations and thus incorporates sentiment. Subjective text is further classified with sentiment or polarity.

Subjective sentences refer to the personal opinion, feelings or judgement of the people.

Objective sentences refer to the factual information.

Subjectivity/Objectivity lies between [0-1]

In [12]:
# Calculate Subjectivity of the Reviews

def get_subjectivity(text):
    textblob = TextBlob(str(text.encode('utf-8')))
    subj = textblob.sentiment.subjectivity
    return subj

# Apply function
data['subjectivity'] = data['verified_reviews'].apply(get_subjectivity)

In [13]:
data[['length', 'polarity', 'subjectivity']].describe()

# length min. = 1 ;; max. = 2851.000000
# polarity min. = -1 ;; max. = 1
# subjectivity mean = 0.528922

Unnamed: 0,length,polarity,subjectivity
count,3150.0,3150.0,3150.0
mean,132.049524,0.349792,0.528922
std,182.099952,0.303362,0.256324
min,1.0,-1.0,0.0
25%,30.0,0.123852,0.419196
50%,74.0,0.35,0.585
75%,165.0,0.533333,0.695486
max,2851.0,1.0,1.0
