# Predicting sentiment from product reviews

# Fire up GraphLab Create

In [1]:
import graphlab

# Read some product review data

Loading reviews for a set of baby products. 

In [2]:
products = graphlab.SFrame('amazon_baby.gl/')

[INFO] This trial license of GraphLab Create is assigned to yangjy0113@gmail.com and will expire on January 29, 2016. Please contact trial@dato.com for licensing options or to request a free non-commercial license for personal or academic use.

[INFO] Start server at: ipc:///tmp/graphlab_server-563 - Server binary: /root/lecture/dato-env/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1453547460.log
[INFO] GraphLab Server Version: 1.8


# Let's explore this data together

Data includes the product name, the review text and the rating of the review. 

In [3]:
products.head()

name,review,rating
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0


# Build the word count vector for each review

In [4]:
products['word_count'] = graphlab.text_analytics.count_words(products['review'])

In [5]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

In [6]:
def awesome_count(D, W):
    if W in D:
        return D[W]
    else:
        return 0

for word in selected_words:
    products[word] = products['word_count'].apply(lambda x : awesome_count(x, word))

In [7]:
products.head()

name,review,rating,word_count,awesome,great,fantastic
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,"{'and': 5, 'stink': 1, 'because': 1, 'ordered': ...",0,0,0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ...",0,0,0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...",0,0,0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'ingenious': 1, 'and': 3, 'love': 2, ...",0,0,0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'and': 2, 'parents!!': 1, 'all': 2, 'puppet.': ...",0,1,0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'and': 2, 'cute': 1, 'help': 2, 'doll': 1, ...",0,1,0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'shop': 1, 'be': 1, 'is': 1, 'it': 1, 'as': ...",0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'feeding,': 1, 'and': 2, 'all': 1, 'right': 1, ...",0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'and': 1, 'help': 1, 'give': 1, 'is': 1, ...",0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'journal.': 1, 'all': 1, 'standarad': 1, ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate
0,0,0,0,0,0,0,0
0,1,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,2,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0


In [9]:
sum_word = {}
for word in selected_words:
    sum_word[word] = products[word].sum()
    

In [10]:
print(sum_word)

{'fantastic': 932, 'love': 42065, 'bad': 3724, 'awesome': 2090, 'great': 45206, 'terrible': 748, 'amazing': 1363, 'horrible': 734, 'awful': 383, 'hate': 1220, 'wow': 144}


In [11]:
graphlab.canvas.set_target('ipynb')

# Build a sentiment classifier

In [11]:
products['rating'].show(view='Categorical')

## Define what's a positive and a negative sentiment

We will ignore all reviews with rating = 3, since they tend to have a neutral sentiment.  Reviews with a rating of 4 or higher will be considered positive, while the ones with rating of 2 or lower will have a negative sentiment.   

In [13]:
#ignore all 3* reviews
products = products[products['rating'] != 3]

In [14]:
#positive sentiment = 4* or 5* reviews
products['sentiment'] = products['rating'] >=4

In [15]:
products.head()

name,review,rating,word_count,awesome,great,fantastic
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ...",0,0,0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...",0,0,0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'ingenious': 1, 'and': 3, 'love': 2, ...",0,0,0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'and': 2, 'parents!!': 1, 'all': 2, 'puppet.': ...",0,1,0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'and': 2, 'cute': 1, 'help': 2, 'doll': 1, ...",0,1,0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'shop': 1, 'be': 1, 'is': 1, 'it': 1, 'as': ...",0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'feeding,': 1, 'and': 2, 'all': 1, 'right': 1, ...",0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'and': 1, 'help': 1, 'give': 1, 'is': 1, ...",0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'journal.': 1, 'all': 1, 'standarad': 1, ...",0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'all': 1, 'forget': 1, 'just': 1, ""daughter's"": ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment
0,1,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,2,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,2,0,0,0,0,0,0,1


## Let's train the sentiment classifier

In [17]:
train_data,test_data = products.random_split(.8, seed=0)

In [18]:
selected_words_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=selected_words,
                                                     validation_set=test_data)

PROGRESS: Logistic regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 133448
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 11
PROGRESS: Number of unpacked features : 11
PROGRESS: Number of coefficients    : 12
PROGRESS: Starting Newton Method
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: | Iteration | Passes   | Elapsed Time | Training-accuracy | Validation-accuracy |
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: | 1         | 2        | 1.141166     | 0.844299          | 0.842842            |
PROGRESS: | 2         | 3        | 1.242803     | 0.844186          | 0.842842            |
PROGRESS: | 3         | 4        | 1.341236     | 0.844276          | 0.843142            |
PROGRESS: | 4         | 5        |

In [39]:
selected_words_model

Class                         : LogisticClassifier

Schema
------
Number of coefficients        : 12
Number of examples            : 133448
Number of classes             : 2
Number of feature columns     : 11
Number of unpacked features   : 11

Hyperparameters
---------------
L1 penalty                    : 0.0
L2 penalty                    : 0.01

Training Summary
----------------
Solver                        : auto
Solver iterations             : 6
Solver status                 : SUCCESS: Optimal solution found.
Training time (sec)           : 1.687

Settings
--------
Log-likelihood                : 54057.6401

Highest Positive Coefficients
-----------------------------
love                          : 1.3999
(intercept)                   : 1.3673
awesome                       : 1.058
amazing                       : 0.8928
fantastic                     : 0.8913

Lowest Negative Coefficients
----------------------------
terrible                      : -2.0905
horrible                 

# Evaluate the sentiment model

In [19]:
selected_words_model.evaluate(test_data, metric='roc_curve')

{'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+-----+-----+-------+------+
 | threshold | fpr | tpr |   p   |  n   |
 +-----------+-----+-----+-------+------+
 |    0.0    | 1.0 | 1.0 | 27976 | 5328 |
 |   1e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   2e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   3e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   4e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   5e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   6e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   7e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   8e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   9e-05   | 1.0 | 1.0 | 27976 | 5328 |
 +-----------+-----+-----+-------+------+
 [100001 rows x 5 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

In [40]:
selected_words_model.show(view='Evaluation')

# Examining the reviews for most-sold product:  'Vulli Sophie the Giraffe Teether'

In [26]:
diaper_view = products[products['name'] == 'Baby Trend Diaper Champ']

In [27]:
len(diaper_view)

298

# Applying the learned model to understand sentiment for Giraffe

In [30]:
diaper_view['predicted_sentiment'] = selected_words_model.predict(diaper_view, output_type='probability')

In [31]:
diaper_view.head()

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,Ok - newsflash. Diapers are just smelly. We've ...,4.0,"{'just': 2, 'less': 1, '-': 3, 'smell- ...",0,0,0
Baby Trend Diaper Champ,"My husband and I selected the Diaper ""Champ"" ma ...",1.0,"{'just': 1, 'less': 1, 'when': 3, 'over': 1, ...",0,0,0
Baby Trend Diaper Champ,Excellent diaper disposal unit. I used it in ...,5.0,"{'control': 1, 'am': 1, 'it': 1, 'used': 1, ' ...",0,0,0
Baby Trend Diaper Champ,We love our diaper champ. It is very easy to use ...,5.0,"{'and': 3, 'over.': 1, 'all': 1, 'love': 1, ...",0,0,0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,"{'just': 1, 'when': 1, 'both': 1, 'results': 1, ...",0,0,0
Baby Trend Diaper Champ,I waited to review this until I saw how it ...,4.0,"{'lysol': 1, 'all': 1, 'mom.': 1, 'busy': 1, ...",0,0,0
Baby Trend Diaper Champ,I have had a diaper genie for almost 4 years since ...,1.0,"{'all': 1, 'bags.': 1, 'just': 1, ""don't"": 2, ...",0,0,0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,"{'lysol': 1, 'all': 2, 'bags.': 1, 'feedback': ...",0,0,0
Baby Trend Diaper Champ,I am so glad I got the Diaper Champ instead of ...,5.0,"{'and': 2, 'all': 1, 'just': 1, 'is': 2, ' ...",0,0,0
Baby Trend Diaper Champ,We had 2 diaper Genie's both given to us as a ...,4.0,"{'hand.': 1, '(required': 1, 'before': 1, ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment,predicted_sentiment
0,0,0,0,0,0,0,0,1,0.796940851291
0,0,0,0,0,0,0,0,0,0.796940851291
0,0,0,0,0,0,0,0,1,0.796940851291
0,1,0,0,0,0,0,0,1,0.940876393428
0,0,1,0,0,0,0,0,1,0.347684052736
0,0,0,1,0,0,0,0,1,0.5942241719
0,0,0,0,0,0,0,0,0,0.796940851291
0,0,0,0,0,0,0,0,1,0.796940851291
0,0,0,0,0,0,0,0,1,0.796940851291
0,2,0,0,0,0,0,0,1,0.984739056527


## Sort the reviews based on the predicted sentiment and explore

In [33]:
diaper_view = diaper_view.sort('predicted_sentiment', ascending=False)

In [34]:
diaper_view.head()

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,I LOVE LOVE LOVE this product! It is SO much ...,4.0,"{'rating': 1, 'contacted': 1, 'over': ...",0,1,0
Baby Trend Diaper Champ,I received my Diaper Champ at my baby shower ...,5.0,"{'bags.': 1, ""don't"": 1, 'son.': 1, 'of,': 1, ...",0,0,0
Baby Trend Diaper Champ,"Love it, love it, love it! This lives up to ...",5.0,"{'all': 1, 'already': 1, 'love': 3, 'have': 4, ...",0,0,0
Baby Trend Diaper Champ,Works great - no smells. LOVE that it uses reg ...,5.0,"{'and': 2, 'love': 1, 'garbage': 1, 'wastef ...",0,2,0
Baby Trend Diaper Champ,I love this diaper pale and wouldn't dream of ...,5.0,"{'and': 3, 'love': 1, 'use.': 1, 'is': 2, ' ...",0,2,0
Baby Trend Diaper Champ,We had 2 diaper Genie's both given to us as a ...,4.0,"{'hand.': 1, '(required': 1, 'before': 1, ...",0,0,0
Baby Trend Diaper Champ,I've worked with kids more than half my life. ...,5.0,"{'and': 4, 'genies': 1, 'now': 1, 'because': 1, ...",0,0,0
Baby Trend Diaper Champ,"This is absolutely, by far, the best diaper ...",5.0,"{'just': 3, 'money': 1, 'not': 2, 'mechanism' ...",0,0,0
Baby Trend Diaper Champ,I have a two-year-old son and I love the Diaper ...,5.0,"{'and': 6, 'two-year- old': 1, ""toddler's"": 1, ...",0,0,0
Baby Trend Diaper Champ,I love this diaper pail! It's so easy to use a ...,5.0,"{'and': 3, 'this': 1, 'love': 2, 'being': 1, ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment,predicted_sentiment
0,3,0,0,0,0,0,0,1,0.998423414594
0,3,0,0,0,0,0,0,1,0.996192539732
0,3,0,0,0,0,0,0,1,0.996192539732
0,1,0,0,0,0,0,0,1,0.989387539605
0,1,0,0,0,0,0,0,1,0.989387539605
0,2,0,0,0,0,0,0,1,0.984739056527
0,2,0,0,0,0,0,0,1,0.984739056527
0,2,0,0,0,0,0,0,1,0.984739056527
0,2,0,0,0,0,0,0,1,0.984739056527
0,2,0,0,0,0,0,0,1,0.984739056527


## Most positive reviews for the giraffe

In [35]:
diaper_view[0]['review']

'I LOVE LOVE LOVE this product! It is SO much easier to use than the Diaper Genie, (you need a PHD in poopy to figure out how to use the darn thing!) and it even takes the same bags as my kitchen trash can, shich is super convenient, and cost efficient as I can buy them in bulk.The only reason for not rating it a 5 star was that I did have one small problem with it. The foam gasket in the barrell which keeps the poopy smell inside the unit ripped somehow, and it got VERY stinky. HOWEVER, I contacted the manufacturer though their website, and received an email back the same day stating that this was unusual, and that replacement gaskets were on their way to me. They arrived inside of a week and after replacing, it works great again! (They even sent me extras should it happen again)I HIGHLY reccomend this diaper pail over ANY competitors, you will not be sorry!'

In [36]:
diaper_view[1]['review']

"I received my Diaper Champ at my baby shower for the birth of my first son 11 months ago. I use it faithfully every day and love the ease and convenience of only having to change the bag once a week! I love that you can use regular kitchen-size trash bags and don't need to purchase any special expensive bags. One thing you might want to be careful of, however...make sure you do not throw loose baby wipes into the Diaper Champ or else the flip mechanism can become jammed and after time will not seal properly due to having to pull out wipes that are stuck. I love my diaper champ so much, I have asked for a second one for my upcoming baby shower for my second son."

## Show most negative reviews for giraffe

In [37]:
diaper_view[-1]['review']

"The Diaper Champ is TERRIBLE at keeping the smelly diapers from only smelling in the container.  Our baby's room was constantly stinky (due to the Diaper Champ, not the baby!), and we were having to empty the container almost daily.  What's the point of having a diaper disposal system if you can't dispose of diapers efficiently?  Please don't buy this product unless you enjoy smelling those dirty diapers.  The Diaper Champ just doesn't work."

In [38]:
diaper_view[-2]['review']

"Two girlfriends and two family members put me onto this diaper pail.  They each had tried the Diaper Genie and had horrible results with eventual smells, and costliness of buying proprietary DG cartridges.  My family members eventually started bringing every dirty diaper out to the trash and leaving just wet diapers in the DG, that is until they found out about the Diaper Champ!Wow, what a difference, it seals in orders very well and using normal 8 - 13 gallon trash bags makes it economical.  The ease of use factor is amazing, drop a dirty diaper in the hole, grab the handle, give it a flip, the plunger pushes the diaper down and it drops into the can with a little gravitic help.No wrenching, turning, fighting a cartridge bag system.Opening it for the first time was a little hard, but look at that from a child's point of view, a toddler is not going to get into it, and neither is a dog.  Also, it needs to be away from the wall a little bit, so that it's flip top access lid locks into 