# Analyzing product sentiment
In this module, we focused on classifiers, applying them to analyzing product sentiment, and understanding the types of errors a classifier makes. We also built an exciting Jupyter notebook for analyzing the sentiment of real product reviews.

In this assignment, we are going to explore this application further, training a sentiment analysis model using a set of key polarizing words, verify the weights learned to each of these words, and compare the results of this simpler classifier with those of the one using all of the words. These techniques will be a core component in your capstone project.

Follow the rest of the instructions on this page to complete your program. When you are done, instead of uploading your code, you will answer a series of quiz questions (see the quiz after this reading) to document your completion of this assignment. The instructions will indicate what data to collect for answering the quiz.

Learning outcomes: 
- Execute sentiment analysis code with the Jupyter notebook
- Load and transform real, text data
- Using the .apply() function to create new columns (features) for our model
- Compare results of two models, one using all words and the other using a subset of the words
- Compare learned models with majority class prediction
- Examine the predictions of a sentiment model
- Build a sentiment analysis model using a classifier

In [3]:
import turicreate

In [4]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

Often, ML practitioners will throw out words they consider “unimportant” before training their model. This procedure can often be helpful in terms of accuracy. Here, we are going to throw out all words except for the very few above. Using so few words in our model will hurt our accuracy, but help us interpret what our classifier is doing.

1. Use .apply() to build a new feature with the counts for each of the selected_words: In the notebook above, we created a column ‘word_count’ with the word counts for each review. Our first task is to create a new column in the products SFrame with the counts for each selected_word above, and, in the process, we will see how the method .apply() can be used to create new columns in our data (our features) and how to use a Python function, which is an extremely useful concept to grasp!

In [5]:
product = turicreate.SFrame('amazon_baby.sframe')

In [9]:
selected_words = [str(s) for s in selected_words]

import string
def remove_punctuation(text):
    try: #python 2.x
        text = text.translate(None, string.punctuation)
    except: # python 3.x
        translator = text.maketrans('', '', string.punctuation)
        text = text.translate(translator)
    return text

# Remove punctuation
product['review_clean'] = product['review'].apply(remove_punctuation)

# Split out the words into individual columns
for word in selected_words:
    product[word] = product['review_clean'].apply(lambda x: x.split().count(word))

In [10]:
product

name,review,rating,review_clean,awesome,great,fantastic
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,These flannel wipes are OK but in my opinion not ...,0,0,0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,it came early and was not disappointed i love ...,0,0,0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,Very soft and comfortable and warmer than it ...,0,0,0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,This is a product well worth the purchase I ...,0,0,0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,All of my kids have cried nonstop when I tried to ...,0,1,0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,When the Binky Fairy came to our house we didnt ...,0,1,0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,Lovely book its bound tightly so you may no ...,0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,Perfect for new parents We were able to keep ...,0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,A friend of mine pinned this product on Pinte ...,0,0,1
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,This has been an easy way for my nanny to record ...,0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate
0,0,0,0,0,0,0,0
0,1,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,2,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0


In [18]:
for word in selected_words:
    print('Sum of %s' % word, ': %s' % sum(product[word]))

Sum of awesome : 3234
Sum of great : 49419
Sum of fantastic : 1506
Sum of amazing : 2233
Sum of love : 34584
Sum of horrible : 1057
Sum of bad : 4602
Sum of terrible : 1092
Sum of awful : 626
Sum of wow : 111
Sum of hate : 1118


# Create a new sentiment analysis model using only the selected_words as features 

In [22]:
product[product['rating'] == 3]

name,review,rating,review_clean,awesome,great,fantastic
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,These flannel wipes are OK but in my opinion not ...,0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",I thought keeping a simple handwritten ...,3.0,I thought keeping a simple handwritten ...,0,0,0
Nature's Lullabies Second Year Sticker Calendar ...,"Calendar is exactly as described, but I find ...",3.0,Calendar is exactly as described but I find the ...,0,0,0
"Lamaze Peekaboo, I Love You ...",My son loves peek a boo at this age of 9 months ...,3.0,My son loves peek a boo at this age of 9 months ...,0,0,0
"Lamaze Peekaboo, I Love You ...","The book is cute, and we are huge fans of Lamaze ...",3.0,The book is cute and we are huge fans of Lamaze ...,0,0,0
SoftPlay Baby Animals of the World Soft Cloth ...,not bad but not as interesting to my 1-year ...,3.0,not bad but not as interesting to my 1year ...,0,0,0
Our Baby Girl Memory Book,"I didn't realize this was a religious product, ...",3.0,I didnt realize this was a religious product so I ...,0,0,0
Cloth Diaper Pins Stainless Steel ...,"These are the right color, so I am happy, ...",3.0,These are the right color so I am happy but the ...,0,0,0
Wall-stickers Wall Decor Removable Decal Stick ...,I am pleased with the design I just wish the ...,3.0,I am pleased with the design I just wish the ...,0,0,0
Musical Christmas Nativity Scene Angel ...,I was so excited to get this nativity. I have ...,3.0,I was so excited to get this nativity I have ...,0,1,0

amazing,love,horrible,bad,terrible,awful,wow,hate
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,1,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,1,0,0,0,0,0,0


In [23]:
product = product[product['rating'] != 3]

In [24]:
product[product['rating'] == 3]

name,review,rating,review_clean,awesome,great,fantastic,amazing,love,horrible

bad,terrible,awful,wow,hate


In [26]:
product['sentiment'] = product['rating'] >= 4

In [27]:
product

name,review,rating,review_clean,awesome,great,fantastic
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,it came early and was not disappointed i love ...,0,0,0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,Very soft and comfortable and warmer than it ...,0,0,0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,This is a product well worth the purchase I ...,0,0,0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,All of my kids have cried nonstop when I tried to ...,0,1,0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,When the Binky Fairy came to our house we didnt ...,0,1,0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,Lovely book its bound tightly so you may no ...,0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,Perfect for new parents We were able to keep ...,0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,A friend of mine pinned this product on Pinte ...,0,0,1
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,This has been an easy way for my nanny to record ...,0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,I love this journal and our nanny uses it ...,0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment
0,1,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,2,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,2,0,0,0,0,0,0,1


In [28]:
product[product['rating'] <= 2]

name,review,rating,review_clean,awesome,great
Nature's Lullabies Second Year Sticker Calendar ...,I only purchased a second-year calendar for ...,2.0,I only purchased a secondyear calendar for ...,0,0
"SoftPlay Giggle Jiggle Funbook, Happy Bear ...",This bear is absolutely adorable and I would ...,2.0,This bear is absolutely adorable and I would ...,0,0
"SoftPlay Cloth Book, Love",This book is boring. Nothing to stimulate my ...,1.0,This book is boring Nothing to stimulate my ...,0,0
Hunnt&reg; Falling Flowers and Birds Kids ...,The reason:Small sizeHard to apply on the wall ...,1.0,The reasonSmall sizeHard to apply on the wall ...,0,0
Wall Decor Removable Decal Sticker - Colorful ...,Would not purchase again or recommend. The decals ...,2.0,Would not purchase again or recommend The decals ...,0,0
Cloth Diaper Pins Stainless Steel ...,These were good quality --worked fine--heavy ...,2.0,These were good qualityworked fineheavy ...,0,0
Cloth Diaper Pins Stainless Steel ...,"While the diaper pins are attractive, the metal in ...",2.0,While the diaper pins are attractive the metal in ...,0,0
Cloth Diaper Pins Stainless Steel ...,"The steel part is not strong at all, unlike ...",1.0,The steel part is not strong at all unlike the ...,0,0
Cloth Diaper Pins Stainless Steel ...,I really thought I was getting a dozen ...,2.0,I really thought I was getting a dozen pinst ...,0,0
Super Mario Game Nintendo Wall Sticker and Decal ...,These do not stick to the wall. They start to peel ...,1.0,These do not stick to the wall They start to peel ...,0,0

fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0


In [29]:
train_data, split_data = product.random_split(.8, seed = 0)

In [30]:
features

['awesome',
 'great',
 'fantastic',
 'amazing',
 'love',
 'horrible',
 'bad',
 'terrible',
 'awful',
 'wow',
 'hate']

In [33]:
selected_words_model = turicreate.logistic_classifier.create(train_data, target='sentiment', features=features)

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



In [68]:
coef_table = selected_words_model.coefficients

In [69]:
coef_table_sorted = coef_table.sort('value', ascending = False)

In [72]:
coef_table_sorted[1]

{'name': 'love',
 'index': None,
 'class': 1,
 'value': 1.297443787062286,
 'stderr': 0.03120449986536209}

# Comparing the accuracy of different sentiment analysis model

In [48]:
selected_words_model.evaluate(split_data)

{'accuracy': 0.8441628633197213,
 'auc': 0.6598929532102161,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        0        |  238  |
 |      0       |        1        |  5090 |
 |      1       |        1        | 27876 |
 |      1       |        0        |  100  |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.9148370581864723,
 'log_loss': 0.40696202706070683,
 'precision': 0.8455984954195231,
 'recall': 0.9964255075779239,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+-----+-----+-------+------+
 | threshold | fpr | tpr |   p   |  n   |
 +-----------+-----+-----+-------+------+
 |    0.0    | 1.0 | 1.0 | 27976 | 5328 |
 |   1e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   2e-05   

In [57]:
# Baseline:
num_positive  = (split_data['sentiment'] == +1).sum()
num_negative = (split_data['sentiment'] == 0).sum()

In [58]:
print(num_positive)
print(num_negative)

27976
5328


In [59]:
print(num_positive/len(split_data))

0.8400192169108815


In [60]:
champ_reviews = product[product['name'] == 'Baby Trend Diaper Champ']

In [61]:
champ_reviews

name,review,rating,review_clean,awesome,great,fantastic
Baby Trend Diaper Champ,Ok - newsflash. Diapers are just smelly. We've ...,4.0,Ok newsflash Diapers are just smelly Weve ...,0,0,0
Baby Trend Diaper Champ,"My husband and I selected the Diaper ""Champ"" ma ...",1.0,My husband and I selected the Diaper Champ mainly ...,0,0,0
Baby Trend Diaper Champ,Excellent diaper disposal unit. I used it in ...,5.0,Excellent diaper disposal unit I used it in ...,0,0,0
Baby Trend Diaper Champ,We love our diaper champ. It is very easy to use ...,5.0,We love our diaper champ It is very easy to use ...,0,0,0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,Two girlfriends and two family members put me ...,0,0,0
Baby Trend Diaper Champ,I waited to review this until I saw how it ...,4.0,I waited to review this until I saw how it ...,0,0,0
Baby Trend Diaper Champ,I have had a diaper genie for almost 4 years since ...,1.0,I have had a diaper genie for almost 4 years since ...,0,0,0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,I originally put this item on my baby registry ...,0,0,0
Baby Trend Diaper Champ,I am so glad I got the Diaper Champ instead of ...,5.0,I am so glad I got the Diaper Champ instead of ...,0,0,0
Baby Trend Diaper Champ,We had 2 diaper Genie's both given to us as a ...,4.0,We had 2 diaper Genies both given to us as a ...,0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,1
0,1,0,0,0,0,0,0,1
1,0,1,0,0,0,0,0,1
0,0,0,1,0,0,0,0,1
0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,1
0,2,0,0,0,0,0,0,1


In [62]:
champ_reviews['predicted_sentiment'] = selected_words_model.predict(champ_reviews, output_type = 'probability')

In [63]:
champ_reviews

name,review,rating,review_clean,awesome,great,fantastic
Baby Trend Diaper Champ,Ok - newsflash. Diapers are just smelly. We've ...,4.0,Ok newsflash Diapers are just smelly Weve ...,0,0,0
Baby Trend Diaper Champ,"My husband and I selected the Diaper ""Champ"" ma ...",1.0,My husband and I selected the Diaper Champ mainly ...,0,0,0
Baby Trend Diaper Champ,Excellent diaper disposal unit. I used it in ...,5.0,Excellent diaper disposal unit I used it in ...,0,0,0
Baby Trend Diaper Champ,We love our diaper champ. It is very easy to use ...,5.0,We love our diaper champ It is very easy to use ...,0,0,0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,Two girlfriends and two family members put me ...,0,0,0
Baby Trend Diaper Champ,I waited to review this until I saw how it ...,4.0,I waited to review this until I saw how it ...,0,0,0
Baby Trend Diaper Champ,I have had a diaper genie for almost 4 years since ...,1.0,I have had a diaper genie for almost 4 years since ...,0,0,0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,I originally put this item on my baby registry ...,0,0,0
Baby Trend Diaper Champ,I am so glad I got the Diaper Champ instead of ...,5.0,I am so glad I got the Diaper Champ instead of ...,0,0,0
Baby Trend Diaper Champ,We had 2 diaper Genie's both given to us as a ...,4.0,We had 2 diaper Genies both given to us as a ...,0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment,predicted_sentiment
0,0,0,0,0,0,0,0,1,0.8064762214284156
0,0,0,0,0,0,0,0,0,0.8064762214284156
0,0,0,0,0,0,0,0,1,0.8064762214284156
0,1,0,0,0,0,0,0,1,0.9384695253466392
1,0,1,0,0,0,0,0,1,0.5820546459368815
0,0,0,1,0,0,0,0,1,0.6087179156049227
0,0,0,0,0,0,0,0,0,0.8064762214284156
0,0,0,0,0,0,0,0,1,0.8064762214284156
0,0,0,0,0,0,0,0,1,0.8064762214284156
0,2,0,0,0,0,0,0,1,0.9824010709367376


In [64]:
champ_reviews = champ_reviews.sort('predicted_sentiment', ascending = False)

In [67]:
champ_reviews[champ_reviews['review'] == "I read a review below that can explain exactly what we experienced. We've had it for 16 months and it has worked wonderful for us. No smells, change it out once a week, easy to clean. Then a diaper snagged this foam material in the head part, so I pulled the rest of the foam out. Big mistake!!! Now it can no loner retain the stinkiness and we're looking for a replacement. Be careful of overloading and never take out that foam piece that is cushioned between pieces. I have figured out that it is key to keeping the stink out."]

name,review,rating,review_clean,awesome,great,fantastic
Baby Trend Diaper Champ,I read a review below that can explain exactly ...,4.0,I read a review below that can explain exactly ...,0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment,predicted_sentiment
0,0,0,0,0,0,0,0,1,0.8064762214284156
