Resources:
- https://docs.python.org/3/tutorial/datastructures.html#dictionaries
- https://apple.github.io/turicreate/docs/api/generated/turicreate.SArray.apply.html?highlight=apply#turicreate.SArray.apply
- https://scikit-learn.org/stable/

# Analyze Product Sentiment

In [1]:
import turicreate
import pandas as pd

# Read product review data

In [2]:
# products = turicreate.SFrame('../input/amazon_baby.sframe')

# Explore data

In [3]:
# products.head(3)

# Task1
Use .apply() to build a new feature with the counts for each of the selected_words: In the notebook above, we created a column ‘word_count’ with the word counts for each review. Our first task is to create a new column in the products SFrame with the counts for each selected_word above, and, in the process, we will see how the method .apply() can be used to create new columns in our data (our features) and how to use a Python function, which is an extremely useful concept to grasp!

In [4]:
# with turicreate is way easier
# products.groupby('name',operations={'count':turicreate.aggregate.COUNT()}).sort('count',ascending=False)

In [5]:
# build the column word_count
# products['word_count'] = graphlab.text_analytics.count_words(products['review'])

In [6]:
products = pd.read_csv('../input/amazon_baby.csv')
products.head()

Unnamed: 0,name,review,rating
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5


In [7]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

In [8]:
 # fill in N/A's in the review column. Important step
products = products.fillna({'review':''})

In [9]:
for word in selected_words:
    products[word] = products['review'].apply(lambda text: text.split().count(word))

In [10]:
products.head()

Unnamed: 0,name,review,rating,awesome,great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3,0,0,0,0,0,0,0,0,0,0,0
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,0,0,0,0,1,0,0,0,0,0,0
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,0,0,0,0,0,0,0,0,0,0,0
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,0,0,0,0,2,0,0,0,0,0,0
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,0,1,0,0,0,0,0,0,0,0,0


- question 1 and 2

In [11]:
sorted_words=[]
for w in selected_words:
    print(w,products[w].sum())

('awesome', 1683)
('great', 37056)
('fantastic', 807)
('amazing', 1164)
('love', 33667)
('horrible', 637)
('bad', 3599)
('terrible', 659)
('awful', 337)
('wow', 54)
('hate', 1089)


# Examining reviews of the jiraffe toy

In [12]:
giraffe_reviews = products[products['name'] == 'Vulli Sophie the Giraffe Teether']
giraffe_reviews.head()

Unnamed: 0,name,review,rating,awesome,great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate
34313,Vulli Sophie the Giraffe Teether,He likes chewing on all the parts especially t...,5,0,0,0,0,0,0,0,0,0,0,0
34314,Vulli Sophie the Giraffe Teether,My son loves this toy and fits great in the di...,5,0,1,0,0,0,0,0,0,0,0,0
34315,Vulli Sophie the Giraffe Teether,There really should be a large warning on the ...,1,0,0,0,0,0,0,0,0,0,0,0
34316,Vulli Sophie the Giraffe Teether,All the moms in my moms\' group got Sophie for...,5,0,0,0,0,1,0,0,0,0,0,0
34317,Vulli Sophie the Giraffe Teether,I was a little skeptical on whether Sophie was...,5,0,0,0,0,0,0,0,0,0,0,0


# Define what is positive and negative sentiment

In [13]:
#ignore all 3*  reviews. We will ignore all reviews with rating = 3, since they tend to have a neutral sentiment. 
# Reviews with a rating of 4 or higher will be considered positive, while the ones with rating of 2 or lower will 
# have a negative sentiment.

products = products[products['rating']!= 3]

In [14]:
products.rating.describe()

count    166752.000000
mean          4.233191
std           1.295527
min           1.000000
25%           4.000000
50%           5.000000
75%           5.000000
max           5.000000
Name: rating, dtype: float64

In [17]:
#positive sentiment = 4* or 5* reviews. with this we'll make a binary classification
products['sentiment'] = products['rating'].apply(lambda rating : +1 if rating > 3 else -1)

In [20]:
# !! 
products.sentiment.value_counts()

 1    140259
-1     26493
Name: sentiment, dtype: int64

In [None]:
# highly skeweed

In [None]:
products['sentiment'] = products['rating'].apply(augmentation) # i like this emphasizing

In [None]:
products

In [None]:
asdfasdfasdf

In [None]:
for e in products.rating:
    if e<3:
        products.sentiment=(products.rating)-1
    else:
        products.sentiment=(products.rating)+1

In [None]:
products.head()

# Train our sentiment classifier

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
train_data , test_data = train_test_split(products,test_size=0.2,random_state=42) #this function already  merges the dataset before spitting

In [None]:
train_data.shape

In [None]:
test_data.shape

In [None]:
#Step 1. Import the model I want to use
from sklearn.linear_model import LogisticRegression

#Step 2. Make an instance of the Model
logisticRegr = LogisticRegression()
#logisticRegr2 = linear_model.LogisticRegression()

#Step 3. Training the model on the data, storing the information learned from the data
selected_words_model = logisticRegr.fit(train_data[selected_words], train_data['sentiment'])

In [None]:
# TURICREATE

In [None]:
#train_data,test_data = products.random_split(.8,seed=0)

In [None]:
#sentiment_model = turicreate.logistic_classifier.create(train_data,target='sentiment', features=['word_count'], validation_set=test_data)

# Apply the sentiment classifier to better understand the Giraffe reviews

In [None]:
products['predicted_sentiment'] = sentiment_model.predict(products, output_type = 'probability')

In [None]:
products

In [None]:
giraffe_reviews = products[products['name']== 'Vulli Sophie the Giraffe Teether']

In [None]:
giraffe_reviews

# Sort the Giraffe reviews according to predicted sentiment

In [None]:
giraffe_reviews = giraffe_reviews.sort('predicted_sentiment', ascending=False)

In [None]:
giraffe_reviews

In [None]:
giraffe_reviews.tail()

## Show the most positive reviews

In [None]:
giraffe_reviews[0]['review']

In [None]:
giraffe_reviews[1]['review']

# Most negative reivews

In [None]:
giraffe_reviews[-1]['review']

In [None]:
giraffe_reviews[-2]['review']