<a href="https://colab.research.google.com/github/christophermalone/stat360/blob/main/NLP_DataPrep.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Natural Language Processing - Data Prep

Natural language processing (NLP) is a area of data science in which one writes code to process and analyze large amounts of data centered around language.  The goal is to get code to extract information and insights contained within the written language.

In our case study here, natrual language processing will be used to evaluate language contained in reviews of products from Amazon.  Why process language?


*   Automatically identify "negative" reviews so that an employee can address the review directly
*   To better understand the nuances in language that makes a review helpful to consumers



## Getting a Set of Reviews

There will be three different Amazon products under consideration here -- a waterproof matress protector that has been identified as a "good" product, liquid grass seed that has been identifed as a "bad" product, and a book titled, "I Wish My Kid Had Cancer" where reviews are considered "emotional".

*   Amazon Product - Good:  https://www.amazon.com/SafeRest-Hypoallergenic-Waterproof-Mattress-Protector/dp/B003PWNGQU
*   Amazon Product - Bad:  https://www.amazon.com/Hydro-Mousse-Liquid-Fescue-Hydroseeding/dp/B00LMFJ8KA/?th=1
*   Amazon Product - Emotional: https://www.amazon.com/Wish-Kids-Had-Cancer-Surviving/dp/1606720708/ 





There exists a Google Chrome extention that can be used to easily extract Amazon reviews into a CSV file.  The <strong>Amazon Reviews Exporter</strong> is the name of this Chrome extension.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1qOoO-ZDNGsvcAjFYEidVTM-IVjjwhF0S" width="75%" height="75%"></img></p>



To use the Amazon Review Exporter, simply navigate to a website on Amazon and open the App.  The App has several options that can be specified, e.g. limit the number of reviews, keywords, download images, etc.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=17VxR-0KgwVaeDZtg6gcOGnxtdK0siFzW" width="25%" height="25%"></img></p>


The following snip-if is an example of the information captured by the Amazon Reviews Exporter.


<p align='center'><img src="https://drive.google.com/uc?export=view&id=1K7oDtidJNUcq6rXpOf9lfsyTOEGTYUBu" width="100%" height="100%"></img></p>


## Reading a CSV File into Python

The <strong>pandas</strong> library will be used for data prep tasks in Python.  The following code will import the pandas package and <strong>pd</strong> will be the name given to this library within our code. 

In [None]:
# install and import pandas library
import pandas as pd

The read_csv() function from the pandas library will be used to read in the CSV file that contains the reviews.  

In [None]:
# Creating a pandas dataframe from the CSV file containing the reviews to be analyzed
Reviews = pd.read_csv('/content/sample_data/AmazonReviews_Good.csv', sep=',')
#Reviews = pd.read_csv('/content/sample_data/AmazonReviews_Poor.csv', sep=',')
#Reviews = pd.read_csv('/content/sample_data/AmazonReviews_Emotional.csv', sep=',')

#Taking a look at the first five reviews
Reviews.head()

Unnamed: 0,id,profileName,text,date,title,rating,images,helpful,Size
0,R24BGSL5VYNKCP,P G,\n Got this for my 3 year old's bed. She's po...,"Reviewed in the United States on October 23, 2018",Wasn't urine-proof,1,,3,Twin
1,R2L7FX82QZOWCB,klmvi,\n I spent many hours reviewing water proof p...,"Reviewed in the United States on March 10, 2018",I read all of the 1 and 2 star reviews for a v...,5,,3,King
2,R2QUPFXE8FUZ3L,J. Clark,"\n Seemed to work great for two years, but yo...","Reviewed in the United States on October 5, 2018",DISINTEGRATED. MATTRESS RUINED.,1,https://images-na.ssl-images-amazon.com/images...,2,King
3,R1GJUJSPBI0OVM,Kevin C.,\n My wife went into labor a year and a half ...,"Reviewed in the United States on September 2, ...",Protected mattress through birth!,5,,12,Queen
4,RB220J2C8V7MU,Nikki Trost,\n Our dog has an incontinence issue and has ...,"Reviewed in the United States on October 6, 2017",Best of several we've tried by far,5,,1,King


The print() function can be used to print out an entire review.  Here the text from the $27^{th}$ review is being printed, i.e. Python starts its counting at 0.

In [None]:
print(Reviews.iloc[26]['text'])


  Nice protector! Does just what I needed. Deep pockets for the thicker mattresses, and it stays on well.I really like this product. Reasonably priced, too.



## Getting rid of all non-alphabetic characters

The first data processing step will be to get rid of all non-alphabetic characters.  The regular expression library will be use to accomplish task.



*   A custom function called clean() is created to facilitate this task
*   The function is called using the .apply() method
*   The reviews that have the non-alphabetic characters removed are put into variable called CleanedReviews



In [None]:
import re as re

# Define a function to clean the text
def clean(text):
    # Removes all special characters and numericals leaving the alphabets
    text = re.sub('[^A-Za-z]+', ' ', str(text)) 
    return text

# Removing the non-alphabetic characters
Reviews['CleanedReviews'] = Reviews['text'].apply(clean)
Reviews.head()

Unnamed: 0,id,profileName,text,date,title,rating,images,helpful,Size,CleanedReviews
0,R24BGSL5VYNKCP,P G,\n Got this for my 3 year old's bed. She's po...,"Reviewed in the United States on October 23, 2018",Wasn't urine-proof,1,,3,Twin,Got this for my year old s bed She s potty tr...
1,R2L7FX82QZOWCB,klmvi,\n I spent many hours reviewing water proof p...,"Reviewed in the United States on March 10, 2018",I read all of the 1 and 2 star reviews for a v...,5,,3,King,I spent many hours reviewing water proof pads...
2,R2QUPFXE8FUZ3L,J. Clark,"\n Seemed to work great for two years, but yo...","Reviewed in the United States on October 5, 2018",DISINTEGRATED. MATTRESS RUINED.,1,https://images-na.ssl-images-amazon.com/images...,2,King,Seemed to work great for two years but you do...
3,R1GJUJSPBI0OVM,Kevin C.,\n My wife went into labor a year and a half ...,"Reviewed in the United States on September 2, ...",Protected mattress through birth!,5,,12,Queen,My wife went into labor a year and a half ago...
4,RB220J2C8V7MU,Nikki Trost,\n Our dog has an incontinence issue and has ...,"Reviewed in the United States on October 6, 2017",Best of several we've tried by far,5,,1,King,Our dog has an incontinence issue and has lea...


Comparing a cleaned review against it's original.

In [None]:
print(Reviews.iloc[26]['text'])
print(Reviews.iloc[26]['CleanedReviews'])


  Nice protector! Does just what I needed. Deep pockets for the thicker mattresses, and it stays on well.I really like this product. Reasonably priced, too.

 Nice protector Does just what I needed Deep pockets for the thicker mattresses and it stays on well I really like this product Reasonably priced too 


## Tokenization, Parts of Speech tagging, Removal of Stopwords

<strong>Tokenization</strong> of words is the process of seperating the language into individualized words which is common in practice.  Tokenization of phrases can also be used to split languge into smaller pieces for analysis.

<strong>Stopwords</strong> are words that are commonly used and have no real meaning in language.  For example - the, and, as are considered stopwords.  Most often stopwords are removed from consideration when analyzing text.

 

The natural language toolkit is used to the the tokenization, parts of speech, and removal of stop words. Also, a custom function called token_stop_pos() is written here to assist with the necessary data processing steps.

In [None]:
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize
from nltk import pos_tag
nltk.download('stopwords')
from nltk.corpus import stopwords
nltk.download('wordnet')
from nltk.corpus import wordnet

nltk.download('averaged_perceptron_tagger')

# POS tagger dictionary
pos_dict = {'J':wordnet.ADJ, 'V':wordnet.VERB, 'N':wordnet.NOUN, 'R':wordnet.ADV}

def token_stop_pos(text):
    tags = pos_tag(word_tokenize(text))
    newlist = []
    for word, tag in tags:
        if word.lower() not in set(stopwords.words('english')):
            newlist.append(tuple([word, pos_dict.get(tag[0])]))
    return newlist


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


The following code uses the custom token_stop_pos() function defined above.  This custom function tokenizes the text, removes any stop words, and identifies the parts of speech.  The pos dictionary is specified to keep only adjectives, verbs, nouns, and adverbs.

In [None]:
Reviews['POS_Tagged'] = Reviews['CleanedReviews'].apply(token_stop_pos)
Reviews.head()

Unnamed: 0,id,profileName,text,date,title,rating,images,helpful,Size,CleanedReviews,POS_Tagged
0,R24BGSL5VYNKCP,P G,\n Got this for my 3 year old's bed. She's po...,"Reviewed in the United States on October 23, 2018",Wasn't urine-proof,1,,3,Twin,Got this for my year old s bed She s potty tr...,"[(Got, n), (year, n), (old, a), (bed, v), (pot..."
1,R2L7FX82QZOWCB,klmvi,\n I spent many hours reviewing water proof p...,"Reviewed in the United States on March 10, 2018",I read all of the 1 and 2 star reviews for a v...,5,,3,King,I spent many hours reviewing water proof pads...,"[(spent, v), (many, a), (hours, n), (reviewing..."
2,R2QUPFXE8FUZ3L,J. Clark,"\n Seemed to work great for two years, but yo...","Reviewed in the United States on October 5, 2018",DISINTEGRATED. MATTRESS RUINED.,1,https://images-na.ssl-images-amazon.com/images...,2,King,Seemed to work great for two years but you do...,"[(Seemed, v), (work, v), (great, a), (two, Non..."
3,R1GJUJSPBI0OVM,Kevin C.,\n My wife went into labor a year and a half ...,"Reviewed in the United States on September 2, ...",Protected mattress through birth!,5,,12,Queen,My wife went into labor a year and a half ago...,"[(wife, n), (went, v), (labor, n), (year, n), ..."
4,RB220J2C8V7MU,Nikki Trost,\n Our dog has an incontinence issue and has ...,"Reviewed in the United States on October 6, 2017",Best of several we've tried by far,5,,1,King,Our dog has an incontinence issue and has lea...,"[(dog, n), (incontinence, n), (issue, n), (lea..."


Once again, taking a look at an individual review and its processing.




In [None]:
print(Reviews.iloc[26]['text'])
print(Reviews.iloc[26]['CleanedReviews'])
print(Reviews.iloc[26]['POS_Tagged'])


  Nice protector! Does just what I needed. Deep pockets for the thicker mattresses, and it stays on well.I really like this product. Reasonably priced, too.

 Nice protector Does just what I needed Deep pockets for the thicker mattresses and it stays on well I really like this product Reasonably priced too 
[('Nice', 'a'), ('protector', 'n'), ('needed', 'v'), ('Deep', 'n'), ('pockets', 'n'), ('thicker', 'n'), ('mattresses', 'n'), ('stays', 'v'), ('well', 'r'), ('really', 'r'), ('like', None), ('product', 'n'), ('Reasonably', 'r'), ('priced', 'v')]


## Obtain the root of each word

<strong>Lemmatization</strong> is the process of converting a word to its base form. An alternative process is <strong>stemming</strong> a word which simply removes the last few characters, e.g. the s from studys would be removed by stemming or the ied from studied would be removed by stemming.  Lemmatization is a more sophicated and considers the context and converts the word to its meaningful base form. Lemmatization tends to lead to more meaningful conversation than stemming. 

Once again the natural language toolkit will be used for the lemmitization of our reviews.  The custom lemmatize() function takes the parts of speech list, applies the lemmatization to this list, and then creates a string that contains the base roots of each word.

In [37]:
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

def lemmatize(pos_data):
    lemma_rew = " "
    for word, pos in pos_data:
        if not pos: 
            lemma = word
            lemma_rew = lemma_rew + " " + lemma
        else:  
            lemma = wordnet_lemmatizer.lemmatize(word, pos=pos)
            lemma_rew = lemma_rew + " " + lemma
    return lemma_rew

Apply the lemmatize() function to the parts of speech list.  The outcome from this function will be put into a variable called Lemma.


In [38]:
Reviews['Lemma'] = Reviews['POS_Tagged'].apply(lemmatize)
Reviews.head()

Unnamed: 0,id,profileName,text,date,title,rating,images,helpful,Size,CleanedReviews,POS_Tagged,Lemma
0,R24BGSL5VYNKCP,P G,\n Got this for my 3 year old's bed. She's po...,"Reviewed in the United States on October 23, 2018",Wasn't urine-proof,1,,3,Twin,Got this for my year old s bed She s potty tr...,"[(Got, n), (year, n), (old, a), (bed, v), (pot...",Got year old bed potty train still occasiona...
1,R2L7FX82QZOWCB,klmvi,\n I spent many hours reviewing water proof p...,"Reviewed in the United States on March 10, 2018",I read all of the 1 and 2 star reviews for a v...,5,,3,King,I spent many hours reviewing water proof pads...,"[(spent, v), (many, a), (hours, n), (reviewing...",spend many hour review water proof pad king ...
2,R2QUPFXE8FUZ3L,J. Clark,"\n Seemed to work great for two years, but yo...","Reviewed in the United States on October 5, 2018",DISINTEGRATED. MATTRESS RUINED.,1,https://images-na.ssl-images-amazon.com/images...,2,King,Seemed to work great for two years but you do...,"[(Seemed, v), (work, v), (great, a), (two, Non...",Seemed work great two year really know test ...
3,R1GJUJSPBI0OVM,Kevin C.,\n My wife went into labor a year and a half ...,"Reviewed in the United States on September 2, ...",Protected mattress through birth!,5,,12,Queen,My wife went into labor a year and a half ago...,"[(wife, n), (went, v), (labor, n), (year, n), ...",wife go labor year half ago labor go quickly...
4,RB220J2C8V7MU,Nikki Trost,\n Our dog has an incontinence issue and has ...,"Reviewed in the United States on October 6, 2017",Best of several we've tried by far,5,,1,King,Our dog has an incontinence issue and has lea...,"[(dog, n), (incontinence, n), (issue, n), (lea...",dog incontinence issue leak bed take long ve...


Once again, taking a look at an individual review and its processing.

In [40]:
print(Reviews.iloc[26]['text'])
print(Reviews.iloc[26]['CleanedReviews'])
print(Reviews.iloc[26]['POS_Tagged'])
print(Reviews.iloc[26]['Lemma'])


  Nice protector! Does just what I needed. Deep pockets for the thicker mattresses, and it stays on well.I really like this product. Reasonably priced, too.

 Nice protector Does just what I needed Deep pockets for the thicker mattresses and it stays on well I really like this product Reasonably priced too 
[('Nice', 'a'), ('protector', 'n'), ('needed', 'v'), ('Deep', 'n'), ('pockets', 'n'), ('thicker', 'n'), ('mattresses', 'n'), ('stays', 'v'), ('well', 'r'), ('really', 'r'), ('like', None), ('product', 'n'), ('Reasonably', 'r'), ('priced', 'v')]
  Nice protector need Deep pocket thicker mattress stay well really like product Reasonably price


## Getting a Sentiment Score via TextBlob

The TextBlob library will be used to obtain a <strong>sentiment score</strong> for each review.  A sentiment score is simply a metric that is used to measure a thought, view, or attitude which is based mostly on a persons emotion. 

There are a wide variety of sentiments that can be measured.  Here, a <strong>polarity</strong> sentiment score will be obtained from the TextBlob library. The polarity score measures the positivity / negativity of a word.

An example list of positive / negative words -- this list is *not* associated with the TextBlob library and is only provided as an example.


*   Postive Words: https://ptrckprry.com/course/ssd/data/positive-words.txt 
*   Negative Words: https://ptrckprry.com/course/ssd/data/negative-words.txt



The code for three custom functions that 1) gets the subjectivty score, 2) get the polarity score, and identifies a review as positive/negative/neutral. The TextBlob library is used for this analysis.

In [41]:
from textblob import TextBlob

# function to calculate subjectivity 
def getSubjectivity(review):
    return TextBlob(review).sentiment.subjectivity

# function to calculate polarity
def getPolarity(review):
    return TextBlob(review).sentiment.polarity

# function to analyze the reviews
def analysis(score):
    if score < 0:
        return 'Negative'
    elif score == 0:
        return 'Neutral'
    else:
        return 'Positive'



Apply the getSubjectivity(), getPolarity(), and analysis() functions to Lemma variable.  

In [44]:
#Reviews['Subjectivity'] = Reviews['Lemma'].apply(getSubjectivity) 
Reviews['Polarity'] = Reviews['Lemma'].apply(getPolarity) 
Reviews['Analysis'] = Reviews['Polarity'].apply(analysis)
Reviews.head()

Unnamed: 0,id,profileName,text,date,title,rating,images,helpful,Size,CleanedReviews,POS_Tagged,Lemma,Polarity,Analysis,Subjectivity
0,R24BGSL5VYNKCP,P G,\n Got this for my 3 year old's bed. She's po...,"Reviewed in the United States on October 23, 2018",Wasn't urine-proof,1,,3,Twin,Got this for my year old s bed She s potty tr...,"[(Got, n), (year, n), (old, a), (bed, v), (pot...",Got year old bed potty train still occasiona...,-0.186234,Negative,0.513506
1,R2L7FX82QZOWCB,klmvi,\n I spent many hours reviewing water proof p...,"Reviewed in the United States on March 10, 2018",I read all of the 1 and 2 star reviews for a v...,5,,3,King,I spent many hours reviewing water proof pads...,"[(spent, v), (many, a), (hours, n), (reviewing...",spend many hour review water proof pad king ...,0.201571,Positive,0.55581
2,R2QUPFXE8FUZ3L,J. Clark,"\n Seemed to work great for two years, but yo...","Reviewed in the United States on October 5, 2018",DISINTEGRATED. MATTRESS RUINED.,1,https://images-na.ssl-images-amazon.com/images...,2,King,Seemed to work great for two years but you do...,"[(Seemed, v), (work, v), (great, a), (two, Non...",Seemed work great two year really know test ...,0.083333,Positive,0.566667
3,R1GJUJSPBI0OVM,Kevin C.,\n My wife went into labor a year and a half ...,"Reviewed in the United States on September 2, ...",Protected mattress through birth!,5,,12,Queen,My wife went into labor a year and a half ago...,"[(wife, n), (went, v), (labor, n), (year, n), ...",wife go labor year half ago labor go quickly...,0.123214,Positive,0.364015
4,RB220J2C8V7MU,Nikki Trost,\n Our dog has an incontinence issue and has ...,"Reviewed in the United States on October 6, 2017",Best of several we've tried by far,5,,1,King,Our dog has an incontinence issue and has lea...,"[(dog, n), (incontinence, n), (issue, n), (lea...",dog incontinence issue leak bed take long ve...,0.129982,Positive,0.511586


Once again, taking a look at an individual review and its processing.

In [46]:
print(Reviews.iloc[26]['text'])
print(Reviews.iloc[26]['CleanedReviews'])
print(Reviews.iloc[26]['POS_Tagged'])
print(Reviews.iloc[26]['Lemma'])
print(Reviews.iloc[26]['Polarity'])
print(Reviews.iloc[26]['Analysis'])


  Nice protector! Does just what I needed. Deep pockets for the thicker mattresses, and it stays on well.I really like this product. Reasonably priced, too.

 Nice protector Does just what I needed Deep pockets for the thicker mattresses and it stays on well I really like this product Reasonably priced too 
[('Nice', 'a'), ('protector', 'n'), ('needed', 'v'), ('Deep', 'n'), ('pockets', 'n'), ('thicker', 'n'), ('mattresses', 'n'), ('stays', 'v'), ('well', 'r'), ('really', 'r'), ('like', None), ('product', 'n'), ('Reasonably', 'r'), ('priced', 'v')]
  Nice protector need Deep pocket thicker mattress stay well really like product Reasonably price
0.25
Positive


## Simple Summaries of the Sentiment Scores

In [49]:
#Getting a subset of the columns 
Reviews_Subset = pd.DataFrame(Reviews[['id', 'date', 'text', 'helpful', 'Lemma', 'Analysis', 'Polarity']])
Reviews_Subset.head()

Unnamed: 0,id,date,text,helpful,Lemma,Analysis,Polarity
0,R24BGSL5VYNKCP,"Reviewed in the United States on October 23, 2018",\n Got this for my 3 year old's bed. She's po...,3,Got year old bed potty train still occasiona...,Negative,-0.186234
1,R2L7FX82QZOWCB,"Reviewed in the United States on March 10, 2018",\n I spent many hours reviewing water proof p...,3,spend many hour review water proof pad king ...,Positive,0.201571
2,R2QUPFXE8FUZ3L,"Reviewed in the United States on October 5, 2018","\n Seemed to work great for two years, but yo...",2,Seemed work great two year really know test ...,Positive,0.083333
3,R1GJUJSPBI0OVM,"Reviewed in the United States on September 2, ...",\n My wife went into labor a year and a half ...,12,wife go labor year half ago labor go quickly...,Positive,0.123214
4,RB220J2C8V7MU,"Reviewed in the United States on October 6, 2017",\n Our dog has an incontinence issue and has ...,1,dog incontinence issue leak bed take long ve...,Positive,0.129982


Identify the number of positive vs negative reviews...

In [47]:
tb_counts = Reviews.Analysis.value_counts()
tb_counts

Positive    4234
Negative     641
Neutral      125
Name: Analysis, dtype: int64

In [53]:
#Install dfply for dplyr like functionality in Python
!pip install dfply
from dfply import *



In [55]:
#Piping in dfply and using filter_by() to grab requested rows.
Reviews_Negative = (
                      Reviews
                      >> arrange(X.Polarity)
                      >> filter_by(X.Polarity < -0.50)
                      >> filter_by(X.helpful > 25)
                      >> mutate(Year = X.date.str.split(',').str[-1].astype(np.int64))
                      >> filter_by(X.Year > 2014)
                      
                      >> mutate(WebLink = '<ul><li>Please address the following review: https://www.amazon.com/gp/customer-reviews/' + X.id + '</li></ul>')
                    )
Reviews_Negative.shape
Reviews_Negative.head()



Unnamed: 0,id,profileName,text,date,title,rating,images,helpful,Size,CleanedReviews,POS_Tagged,Lemma,Polarity,Analysis,Subjectivity,Year,WebLink
21,RTRWS0P8LMHGF,Busymom,\n Donâ€™t understand what the fuss is about ...,"Reviewed in the United States on August 14, 2018",Donâ€™t waste your money!,1,,112,King,Don t understand what the fuss is about these...,"[(understand, v), (fuss, n), (things, n), (wat...",understand fuss thing waterproof kid pee bed...,-0.7,Negative,0.666667,2018,<ul><li>Please address the following review: h...


## Sending an email via Python

In [57]:
#Source:  https://towardsdatascience.com/automate-sending-emails-with-gmail-in-python-449cc0c3c317
!pip install yagmail
import yagmail

user = '<yourgmail>@gmail.com'
app_password = '<token for external email sending via gmail>' # a token for gmail
to = '<sent info to email>'

subject = 'Test Email - Python'
content = 'Here are the your Amazon reviews that require your attention:<br><br>' + ''.join(map(str, Reviews_N.WebLink.tolist()))

with yagmail.SMTP(user, app_password) as yag:
    yag.send(to, subject, content)
    print('Sent email successfully')

Sent email successfully




---



---



---



Not complete...

## Emotion Analysis

Source:  https://towardsdatascience.com/text2emotion-python-package-to-detect-emotions-from-textual-data-b2e7b7ce1153



In [None]:
#Install package using pip
!pip install text2emotion

#Import the modules
import text2emotion as te

# function to calculate polarity
def getEmotion(review):
    return te.get_emotion(review)

 



In [None]:
#Get the emotion scores
fin_data['Emotion'] = fin_data['Lemma'].apply(getEmotion)


In [None]:
#Split emotion list up into seperate columns
fin_data[['Happy','Angry', 'Surprise', 'Sad', 'Fear']] = pd.DataFrame(fin_data.Emotion.values.tolist(), index= fin_data.index)

fin_data.head()

Unnamed: 0,id,date,review,helpful,Lemma,Polarity,Analysis,Emotion,Happy,Angry,Surprise,Sad,Fear
0,R3MZRW67QAA2ZG,"Reviewed in the United States on May 9, 2018",it’s sprays green however it’s green liquid no...,609,spray green however green liquid foam Mousse...,-0.1625,Negative,"{'Happy': 0.0, 'Angry': 0.0, 'Surprise': 1.0, ...",0.0,0.0,1.0,0.0,0.0
1,R21KMZ4ZOZHSSA,"Reviewed in the United States on April 30, 2016",Complete waste of time and money,1,Complete waste time money,-0.05,Negative,"{'Happy': 0, 'Angry': 0, 'Surprise': 0, 'Sad':...",0.0,0.0,0.0,0.0,0.0
2,R1GYHC9PRPS419,"Reviewed in the United States on July 24, 2016",Garbage. Do not buy! Doesn't work and complete...,816,Garbage buy work completely break apart st use,0.1,Positive,"{'Happy': 0.0, 'Angry': 0.0, 'Surprise': 0.5, ...",0.0,0.0,0.5,0.0,0.5
3,RM6O1US7MGXUC,"Reviewed in the United States on May 30, 2017",One Star,469,One Star,0.0,Neutral,"{'Happy': 0, 'Angry': 0, 'Surprise': 0, 'Sad':...",0.0,0.0,0.0,0.0,0.0
4,R2WYXNIEZHZN9,"Reviewed in the United States on May 28, 2018","Waste of money, The commercial is false advert...",323,Waste money commercial false advertising opi...,-0.2,Negative,"{'Happy': 0.0, 'Angry': 0.0, 'Surprise': 0.0, ...",0.0,0.0,0.0,1.0,0.0


In [None]:
#Piping in dfply and using filter_by() to grab requested rows.
(fin_data
    >>summarize(
                  Avg_Happy=X.Happy.mean(),
                  Avg_Angry=X.Angry.mean(),
                  Avg_Surprise=X.Surprise.mean(),
                  Avg_Sad=X.Sad.mean(),
                  Avg_Fear=X.Fear.mean()
                )
)

Unnamed: 0,Avg_Happy,Avg_Angry,Avg_Surprise,Avg_Sad,Avg_Fear
0,0.06774,0.016091,0.051438,0.07931,0.241178
