## Sentiment Analysis for Whispr

1. import data from google sheets
2. clean dataset and create synthetic variables
3. summarize dataset: how many records per category, reviews over time
4. evaluate sentiment of review, give confidence interval
5. calculate summary insights: average sentiment / subjectivity per item, reviews per item
6. compare against manual evaluation
7. export data to google sheets

In [1]:
import pandas as pd
import numpy as np
import os
from textblob import TextBlob
import gspread
from datetime import datetime
from oauth2client.service_account import ServiceAccountCredentials
from matplotlib import pyplot as plt
import seaborn as sns

%matplotlib inline
sns.set_style('darkgrid')
pd.options.display.max_rows = 100


In [2]:
pwd

'/Users/christinejiang/Documents/Python/Sentiment_Analysis'

### 1. Import data from GS
- connect to google sheets API
- create spreadsheet and worksheet objects, explore GSpread library
- create dataframe of reviews

In [3]:
#1 define the scope of your access tokens
scope = ['https://www.googleapis.com/auth/drive','https://spreadsheets.google.com/feeds']

#2 after getting oauth2 credentials in a json, obtain an access token from google authorization server
#by creating serviceaccountcredentials and indicating scope, which controls resources / operations that an
#access token permits
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)

#3 log into the google API using oauth2 credentials
#returns gspread.Client instance
c = gspread.authorize(creds)

In [12]:
spreadsheet = c.open('UK Sentiment')

worksheet = spreadsheet.worksheet('WHotel_Sentiment')

records = worksheet.get_all_records()
df = pd.DataFrame(records)
df = df[['Contents','Sentiment','Topic','Location','Comment']]

In [16]:
df.head()

Unnamed: 0,Contents,Sentiment,Topic,Location,Comment
0,What I thought was the weirdest design choice ...,1,Design,Washington DC,
1,"New day, new sunset 🌅 #wkohsamui #beachlife #h...",1,Location&View,Koh Samui,
2,#amsterdam #wamsterdam #finertravel #travelpho...,1,Location&View,Amsterdam,
3,Best breakfast ever whotels at #whoteldubai 🤩 ...,1,Restaurant,Dubai,Breakfast
4,#그립다😢 #bali #wbali #seminyak,1,Guest Experience,Bali,


### 2. Data preprocessing

In [17]:
df['Sentiment_Category'] = df['Sentiment'].map({1: 'Positive',2:'Neutral',3:'Negative'})

In [18]:
df.head()

Unnamed: 0,Contents,Sentiment,Topic,Location,Comment,Sentiment_Category
0,What I thought was the weirdest design choice ...,1,Design,Washington DC,,Positive
1,"New day, new sunset 🌅 #wkohsamui #beachlife #h...",1,Location&View,Koh Samui,,Positive
2,#amsterdam #wamsterdam #finertravel #travelpho...,1,Location&View,Amsterdam,,Positive
3,Best breakfast ever whotels at #whoteldubai 🤩 ...,1,Restaurant,Dubai,Breakfast,Positive
4,#그립다😢 #bali #wbali #seminyak,1,Guest Experience,Bali,,Positive


In [24]:
for i, x in df.iterrows():
    print(x)

Contents              What I thought was the weirdest design choice ...
Sentiment                                                             1
Topic                                                            Design
Location                                                  Washington DC
Comment                                                                
Sentiment_Category                                             Positive
Name: 0, dtype: object
Contents              New day, new sunset 🌅 #wkohsamui #beachlife #h...
Sentiment                                                             1
Topic                                                     Location&View
Location                                                      Koh Samui
Comment                                                                
Sentiment_Category                                             Positive
Name: 1, dtype: object
Contents              #amsterdam #wamsterdam #finertravel #travelpho...
Sentiment         

In [36]:
def pos_neg(polarity):
    if polarity >= 0.1:
        return 'Positive'
    if polarity >= 0 and polarity < 0.1:
        return 'Neutral'
    else:
        return 'Negative'

df['Polarity'] = [TextBlob(x['Contents']).polarity for i, x in df.iterrows()]
df['Subjectivity'] = [TextBlob(x['Contents']).subjectivity for i, x in df.iterrows()]
df['Textblob_Score'] = df['Polarity'].apply(pos_neg)

In [37]:
df.head()

Unnamed: 0,Contents,Sentiment,Topic,Location,Comment,Sentiment_Category,Polarity,Subjectivity,Textblob_Score
0,What I thought was the weirdest design choice ...,1,Design,Washington DC,,Positive,0.229401,0.546681,Positive
1,"New day, new sunset 🌅 #wkohsamui #beachlife #h...",1,Location&View,Koh Samui,,Positive,0.344156,0.515584,Positive
2,#amsterdam #wamsterdam #finertravel #travelpho...,1,Location&View,Amsterdam,,Positive,0.0,0.0,Neutral
3,Best breakfast ever whotels at #whoteldubai 🤩 ...,1,Restaurant,Dubai,Breakfast,Positive,0.766667,0.416667,Positive
4,#그립다😢 #bali #wbali #seminyak,1,Guest Experience,Bali,,Positive,0.0,0.0,Neutral


In [41]:
df.groupby(['Sentiment_Category','Textblob_Score'])['Polarity'].agg({'mean':np.mean, 'count':len})

is deprecated and will be removed in a future version. Use                 named aggregation instead.

    >>> grouper.agg(name_1=func_1, name_2=func_2)

  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,Unnamed: 1_level_0,mean,count
Sentiment_Category,Textblob_Score,Unnamed: 2_level_1,Unnamed: 3_level_1
Negative,Negative,-0.229419,11.0
Negative,Neutral,0.003046,72.0
Negative,Positive,0.379133,55.0
Neutral,Neutral,0.028125,1.0
Positive,Negative,-0.4,1.0
Positive,Neutral,0.001145,14.0
Positive,Positive,0.425419,20.0


In [42]:
df.shape

(174, 9)

In [46]:
df['Contents']

0      What I thought was the weirdest design choice ...
1      New day, new sunset 🌅 #wkohsamui #beachlife #h...
2      #amsterdam #wamsterdam #finertravel #travelpho...
3      Best breakfast ever whotels at #whoteldubai 🤩 ...
4                           #그립다😢 #bali #wbali #seminyak
                             ...                        
169    ...but really you can! 👙 Thank you to dukespir...
170    #Goa #Wgoa #VagatorBeach #nature #photography ...
171    Rule #01- Be healthy . . . #whotel #singapore ...
172    Movida night #eventdinner #wbarcelonahotel #ba...
173        Bliss Spa, W Hotel DC https://t.co/9jr01gmK1X
Name: Contents, Length: 174, dtype: object

In [68]:
contents = str(df['Contents'].values)
contents_blob = TextBlob(contents)

In [76]:
contents_blob.np_counts
contents_blob.np_counts.get_keys

AttributeError: 'collections.defaultdict' object has no attribute 'get_keys'

In [78]:
sorted(contents_blob.np_counts, key = contents_blob.np_counts.get)

["[ 'what",
 'weirdest design choice',
 'favorite detail',
 'who',
 'scroll',
 'major lobbyists',
 'hotel lobby',
 'vices ...',
 'coincidence',
 'either',
 'great design detail',
 "'new day",
 'new sunset 🌅 # wkohsamui # beachlife #',
 'happy # thailand #',
 '# laugh #',
 "# behappy # relaxation # meme'",
 "# amsterdam # wamsterdam # finertravel # travelphotography # rooftops # travelstylemag # rooftoppool # travelshots # mytravelgram # whotels' 'best breakfast",
 '# whoteldubai 🤩',
 'summer fun',
 '# dogsindubai ❤️',
 'great time',
 "# doggram # puppies # rescuedog # puppylove # dog # dogsofinstagram'",
 "# 그립다😢 # bali # wbali # seminyak' 'thanks room",
 'awesome 🙂 bye',
 "😀 # whotel # shanghai # worktrip # sgfoodie # tgif' '이제 좀 휴가 같아요 # 휴가 # w # 스미냑 # 발리 # woobar # 신나 # 좋아요 # 술 # 야경'",
 'highly',
 'whilst',
 'short review',
 'fyi',
 'elites',
 'wine',
 'somalia',
 'very',
 'arrival dock',
 'general manager',
 'fantastic guy',
 'backroom staff',
 'multiple gifts',
 'brownies',
 'swee