## Predicting sentiment from product reviews

### Fire up Turicreate

In [1]:
import turicreate

## Read some product review data

Loading reviews for a set of baby products. 

In [2]:
products = turicreate.SFrame('./amazon_baby.gl/')

## Let's explore this data together

Data includes the product name, the review text and the rating of the review. 

In [3]:
products.head()

name,review,rating
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0


## Build the word count vector for each review

In [4]:
products['word_count'] = turicreate.text_analytics.count_words(products['review'])

In [5]:
products.head()

name,review,rating,word_count
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,"{'handles': 1.0, 'stripping': 1.0, ..."
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'recommend': 1.0, 'disappointed': 1.0, ..."
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'quilt': 1.0, 'the': 1.0, 'than': 1.0, 'fu ..."
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'tool': 1.0, 'clever': 1.0, 'binky': 2.0, ..."
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'rock': 1.0, 'many': 1.0, 'headaches': 1.0, ..."
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'thumb': 1.0, 'or': 1.0, 'break': 1.0, 'trying': ..."
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'for': 1.0, 'barnes': 1.0, 'at': 1.0, 'is': ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'right': 1.0, 'because': 1.0, 'questions': 1.0, ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'like': 1.0, 'and': 1.0, 'changes': 1.0, 'the': ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'in': 1.0, 'pages': 1.0, 'out': 1.0, 'run': 1.0, ..."


In [6]:
products['name'].show()

## Examining the reviews for most-sold product:  'Vulli Sophie the Giraffe Teether'

In [11]:
giraffe_reviews = products[products['name'] == 'Vulli Sophie the Giraffe Teether']

In [12]:
len(giraffe_reviews)

785

In [13]:
giraffe_reviews['rating'].show()

## Build a sentiment classifier

In [14]:
products['rating'].show()

## Define what's a positive and a negative sentiment

We will ignore all reviews with rating = 3, since they tend to have a neutral sentiment.  Reviews with a rating of 4 or higher will be considered positive, while the ones with rating of 2 or lower will have a negative sentiment.   

In [15]:
#ignore all 3* reviews
products = products[products['rating'] != 3]

In [18]:
#positive sentiment = 4* or 5* reviews
products['sentiment'] = products['rating'] >= 4

In [19]:
products.head()

name,review,rating,word_count,sentiment
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'recommend': 1.0, 'disappointed': 1.0, ...",1
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'quilt': 1.0, 'the': 1.0, 'than': 1.0, 'fu ...",1
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'tool': 1.0, 'clever': 1.0, 'binky': 2.0, ...",1
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'rock': 1.0, 'many': 1.0, 'headaches': 1.0, ...",1
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'thumb': 1.0, 'or': 1.0, 'break': 1.0, 'trying': ...",1
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'for': 1.0, 'barnes': 1.0, 'at': 1.0, 'is': ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'right': 1.0, 'because': 1.0, 'questions': 1.0, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'like': 1.0, 'and': 1.0, 'changes': 1.0, 'the': ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'in': 1.0, 'pages': 1.0, 'out': 1.0, 'run': 1.0, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'tracker': 1.0, 'now': 1.0, 'its': 1.0, 'sti ...",1


## Let's train the sentiment classifier

In [20]:
train_data, test_data = products.random_split(0.8, seed = 0)

In [21]:
sentiment_model = turicreate.logistic_classifier.create(train_data,
                                                     target = 'sentiment',
                                                     features = ['word_count'],
                                                     validation_set = test_data)

# Evaluate the sentiment model

In [22]:
sentiment_model.evaluate(test_data)

{'accuracy': 0.9176975738650012,
 'auc': 0.9342357833151299,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        1        |  1397 |
 |      1       |        0        |  1344 |
 |      0       |        0        |  3931 |
 |      1       |        1        | 26632 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.951057941255245,
 'log_loss': 0.33047871872321327,
 'precision': 0.9501587641371436,
 'recall': 0.9519588218472976,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+--------------------+--------------------+-------+------+
 | threshold |        fpr         |        tpr         |   p   |  n   |
 +-----------+--------------------+--------------------+-------+------+
 |    0.

### ROC curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.

In [23]:
sentiment_model.evaluate(test_data, metric = 'roc_curve')

{'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+--------------------+--------------------+-------+------+
 | threshold |        fpr         |        tpr         |   p   |  n   |
 +-----------+--------------------+--------------------+-------+------+
 |    0.0    |        1.0         |        1.0         | 27976 | 5328 |
 |   1e-05   | 0.847972972972973  | 0.9975693451529882 | 27976 | 5328 |
 |   2e-05   | 0.829954954954955  | 0.9971761509865599 | 27976 | 5328 |
 |   3e-05   | 0.818506006006006  | 0.9969616814412353 | 27976 | 5328 |
 |   4e-05   | 0.8109984984984985 | 0.9967472118959108 | 27976 | 5328 |
 |   5e-05   | 0.8057432432432432 | 0.9966042321990277 | 27976 | 5328 |
 |   6e-05   | 0.7991741741741741 | 0.9962825278810409 | 27976 | 5328 |
 |   7e-05   | 0.7952327327327328 | 0.9961752931083786 | 27976 | 5328 |
 |   8e-05   | 0.7920420420420421 | 0.9961038032599371 | 27976 | 5328 |
 |   9e-05   | 0.7882882882882

## Applying the learned model to understand sentiment for Giraffe

In [25]:
giraffe_reviews.head(4)

name,review,rating,word_count
Vulli Sophie the Giraffe Teether ...,He likes chewing on all the parts especially the ...,5.0,"{'purchase': 1.0, 'teething': 1.0, ..."
Vulli Sophie the Giraffe Teether ...,My son loves this toy and fits great in the diaper ...,5.0,"{'a': 1.0, 'is': 1.0, 'when': 1.0, 'him': 1.0, ..."
Vulli Sophie the Giraffe Teether ...,There really should be a large warning on the ...,1.0,"{'made': 1.0, 'of': 1.0, 'packaging': 1.0, 'no': ..."
Vulli Sophie the Giraffe Teether ...,All the moms in my moms' group got Sophie for ...,5.0,"{'another': 1.0, 'out': 1.0, 'run': 1.0, 'lost': ..."


In [34]:
giraffe_reviews['predicted_sentiment'] = sentiment_model.predict(giraffe_reviews, output_type = 'probability')

In [35]:
giraffe_reviews

name,review,rating,word_count,predicted_sentiment
Vulli Sophie the Giraffe Teether ...,He likes chewing on all the parts especially the ...,5.0,"{'purchase': 1.0, 'teething': 1.0, ...",0.9993655365682312
Vulli Sophie the Giraffe Teether ...,My son loves this toy and fits great in the diaper ...,5.0,"{'a': 1.0, 'is': 1.0, 'when': 1.0, 'him': 1.0, ...",0.9998633791689632
Vulli Sophie the Giraffe Teether ...,There really should be a large warning on the ...,1.0,"{'made': 1.0, 'of': 1.0, 'packaging': 1.0, 'no': ...",0.2545268197807981
Vulli Sophie the Giraffe Teether ...,All the moms in my moms' group got Sophie for ...,5.0,"{'another': 1.0, 'out': 1.0, 'run': 1.0, 'lost': ...",0.9165688083914976
Vulli Sophie the Giraffe Teether ...,I was a little skeptical on whether Sophie was ...,5.0,"{'disappointed': 1.0, 'will': 1.0, 'take': ...",0.6855768205884997
Vulli Sophie the Giraffe Teether ...,I have been reading about Sophie and was going ...,5.0,"{'late': 1.0, 'perfect': 1.0, 'pack': 1.0, 'on ...",0.99999994452112
Vulli Sophie the Giraffe Teether ...,My neice loves her sophie and has spent hours ...,5.0,"{'delight': 1.0, 'in': 1.0, 'other': 1.0, ...",0.9979351181093516
Vulli Sophie the Giraffe Teether ...,What a friendly face! And those mesmerizing ...,5.0,"{'inside': 1.0, 'water': 1.0, 'don': 1.0, 'up': ...",0.9999745004834384
Vulli Sophie the Giraffe Teether ...,We got this just for my son to chew on instea ...,5.0,"{'its': 1.0, 'fine': 1.0, 'is': 1.0, 'which': 1.0, ...",0.9460144428356756
Vulli Sophie the Giraffe Teether ...,"My baby seems to like this toy, but I could ...",3.0,"{'off': 1.0, 'have': 2.0, 'of': 1.0, 'some': 1.0, ...",0.3830113614211259


## Sort the reviews based on the predicted sentiment and explore

In [36]:
giraffe_reviews[2]

{'name': 'Vulli Sophie the Giraffe Teether',
 'rating': 1.0,
 'word_count': {'made': 1.0,
  'of': 1.0,
  'packaging': 1.0,
  'no': 1.0,
  'mommy': 1.0,
  'and': 2.0,
  'world': 1.0,
  'anaphylactic': 1.0,
  'so': 1.0,
  'really': 1.0,
  'many': 1.0,
  'is': 2.0,
  'should': 1.0,
  'repeated': 1.0,
  'allergy': 1.0,
  'the': 3.0,
  '2011': 1.0,
  'an': 2.0,
  'being': 1.0,
  'be': 1.0,
  'latex': 4.0,
  'in': 2.0,
  'with': 1.0,
  'sheesh': 1.0,
  'allergies': 1.0,
  'this': 1.0,
  'box': 1.0,
  'that': 2.0,
  'caused': 1.0,
  'large': 1.0,
  'by': 1.0,
  'a': 3.0,
  'could': 1.0,
  'exposure': 1.0,
  'why': 1.0,
  'would': 1.0,
  'on': 2.0,
  'think': 1.0,
  'good': 1.0,
  'quite': 1.0,
  'all': 1.0,
  'teether': 2.0,
  'have': 1.0,
  'was': 2.0,
  'easily': 1.0,
  'someone': 1.0,
  'idea': 1.0,
  'for': 1.0,
  'baby': 2.0,
  'there': 2.0,
  'killed': 1.0},
 'predicted_sentiment': 0.2545268197807981}

In [37]:
giraffe_reviews = giraffe_reviews.sort('predicted_sentiment', ascending = False)

In [38]:
giraffe_reviews.head()

name,review,rating,word_count,predicted_sentiment
Vulli Sophie the Giraffe Teether ...,I'll be honest...I bought this toy because all the ...,4.0,"{'around': 1.0, 'explore': 1.0, 'they': ...",1.0
Vulli Sophie the Giraffe Teether ...,As a mother of 16month old twins; I bought ...,5.0,"{'will': 1.0, '15months': 1.0, 'would': 2.0, ...",1.0
Vulli Sophie the Giraffe Teether ...,"Sophie, oh Sophie, your time has come. My ...",5.0,"{'11': 1.0, 'prisrob': 1.0, '12': 1.0, 'who': ...",1.0
Vulli Sophie the Giraffe Teether ...,We got this little giraffe as a gift from a ...,5.0,"{'out': 1.0, 've': 1.0, 'would': 1.0, 'enough': ...",0.9999999999998376
Vulli Sophie the Giraffe Teether ...,"As every mom knows, you always want to give your ...",5.0,"{'whether': 1.0, 'neutral': 1.0, 'gend ...",0.9999999999998284
Vulli Sophie the Giraffe Teether ...,My Mom-in-Law bought Sophie for my son whe ...,5.0,"{'penny': 1.0, 'little': 1.0, 'perfect': 1.0, ...",0.9999999999997958
Vulli Sophie the Giraffe Teether ...,"My 4 month old son is teething, and I've tried ...",4.0,"{'worth': 1.0, 'works': 1.0, 'teether': 1.0, ...",0.9999999999994914
Vulli Sophie the Giraffe Teether ...,Let me just start off by addressing the choking ...,5.0,"{'question': 1.0, 'must': 1.0, 'overall': 1.0, ...",0.9999999999941254
Vulli Sophie the Giraffe Teether ...,I'm not sure why Sophie is such a hit with the ...,4.0,"{'makers': 1.0, 'or': 1.0, 'take': 1.0, 'can': ...",0.999999999987423
Vulli Sophie the Giraffe Teether ...,"I admit, I didn't get Sophie the Giraffe at ...",4.0,"{'dye': 1.0, 'of': 1.0, 'cause': 1.0, 'fade': ...",0.9999999999829476


## Most positive reviews for the giraffe

In [39]:
giraffe_reviews[0]['review']

'I\'ll be honest...I bought this toy because all the hip parents seem to have one too and I wanted to be a part of the "hip parent" crowd. The price-tag was somewhat of a deterent but I prevailed and purchased this teether for my daughter.At first, Lily didn\'t know what to make of of Sophie and showed little interest in the polka-dotted creature. I continued to introduce Lily to Sophie and kept the toy in the carrier so that it was on-hand during transitions. Eventually, Lily discovered what a wonderful experience it was to gnaw on the hooves and ears and these two have never been far apart since.Lily really enjoys gumming all the different parts of Sophie like no other teether we have. The size of the toy is great as it is somewhat substantial and so easy for a little one to grasp and hold onto. Lily really enjoys hearing Sophie squeak and will smile whenever Sophie makes a noise or pops her head up from Mommy\'s lap to say hello.People have stopped and commented on Sophie and to the

In [40]:
giraffe_reviews[1]['review']

"As a mother of 16month old twins; I bought Sophie [1 for each, of course] when they were 4 months old after careful reading of all reviews. I heard great things about Sophie and wanted to give her a try. At 4 months babies can't do much more than grasp and semi gnaw on Sophie. For many months I had to squeeze Sophie myself [which I personally enjoyed] and set it on their laps. They LOVED Sophie. The squeak is LOUD and sounds exactly like a dog's squeaky chew toy, just for those who are wondering.As they grew and their motor skills developed to each milestone, Sophie gained more and more individual babytime. The twins were able to squeeze her themselves and chew on her around the clock. They love to throw her, stretch her, squeeze her, chew on her, drool on her... you name it, they have done it. One of the two Sophie's took an extended vacation out in the back yard [unbeknownst to me] and once found, a little water had her looking like a champ again... ready to face another day of play

## Show most negative reviews for giraffe

In [41]:
giraffe_reviews[-1]['review']

"This children's toy is nostalgic and very cute. However, there is a distinct rubber smell and a very odd taste, yes I tried it, that my baby did not enjoy. Also, if it is soiled it is extremely difficult to clean as the rubber is a kind of porus material and does not clean well. The final thing is the squeaking device inside which stopped working after the first couple of days. I returned this item feeling I had overpaid for a toy that was defective and did not meet my expectations. Please do not be swayed by the cute packaging and hype surounding it as I was. One more thing, I was given a full refund from Amazon without any problem."

In [44]:
giraffe_reviews['review'][-2]

'I wanted to love this product and was excited to buy it when I became pregnant but am now hesitant to let my baby use it after reading about the recall in Europe. Apparently, as I understand it, their toxin standards of measurement are lower than ours so they have not been recalled here (apparently we are OK with low levels of nitrates in the toys our children put in their mouths, but Europeans are not...hmmm)...Be that as it may, toxins registering even CLOSE to a dangerous level made me nervous about using. After digging around online I did discover that the company claims to have changed the product after a certain date and lists manufacturing codes so you can check yours (those listed were made after a certain date and are said to be safer). Sadly mine was not made after the &#34;improved&#34; date but I could not return it because there was no formal recall in our country. I considered returning it and hunting for one with an approved manufacturing date but man that was just too 

# Exercise

## 1. Use .apply() to build a new feature with the counts for each of the selected_words:

In the notebook above, we created a column ‘word_count’ with the word counts for each review. Our first task is to create a new column in the products SFrame with the counts for each selected_word above, and, in the process, we will see how the method .apply() can be used to create new columns in our data (our features) and how to use a Python function, which is an extremely useful concept to grasp!

In [45]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate', 'crazy', 'weird', 'poor']

In [50]:
def awesome_count(word_count):
    if 'awesome' in word_count:
        return word_count['awesome']
    return 0
products['awesome'] = products['word_count'].apply(awesome_count)


def great_count(word_count):
    if 'great' in word_count:
        return word_count['great']
    return 0
products['great'] = products['word_count'].apply(great_count)


def fantastic_count(word_count):
    if 'fantastic' in word_count:
        return word_count['fantastic']
    return 0
products['fantastic'] = products['word_count'].apply(fantastic_count)


def amazing_count(word_count):
    if 'amazing' in word_count:
        return word_count['amazing']
    return 0
products['amazing'] = products['word_count'].apply(amazing_count)


def love_count(word_count):
    if 'love' in word_count:
        return word_count['love']
    return 0
products['love'] = products['word_count'].apply(love_count)


def horrible_count(word_count):
    if 'horrible' in word_count:
        return word_count['horrible']
    return 0
products['horrible'] = products['word_count'].apply(horrible_count)


def bad_count(word_count):
    if 'bad' in word_count:
        return word_count['bad']
    return 0
products['bad'] = products['word_count'].apply(bad_count)


def terrible_count(word_count):
    if 'terrible' in word_count:
        return word_count['terrible']
    return 0
products['terrible'] = products['word_count'].apply(terrible_count)


def awful_count(word_count):
    if 'awful' in word_count:
        return word_count['awful']
    return 0
products['awful'] = products['word_count'].apply(awful_count)


def wow_count(word_count):
    if 'wow' in word_count:
        return word_count['wow']
    return 0
products['wow'] = products['word_count'].apply(wow_count)


def hate_count(word_count):
    if 'hate' in word_count:
        return word_count['hate']
    return 0
products['hate'] = products['word_count'].apply(hate_count)


def crazy_count(word_count):
    if 'crazy' in word_count:
        return word_count['crazy']
    return 0
products['crazy'] = products['word_count'].apply(crazy_count)


def weird_count(word_count):
    if 'weird' in word_count:
        return word_count['weird']
    return 0
products['weird'] = products['word_count'].apply(weird_count)


def poor_count(word_count):
    if 'poor' in word_count:
        return word_count['poor']
    return 0
products['poor'] = products['word_count'].apply(poor_count)

In [51]:
products.head()

name,review,rating,word_count,sentiment,awesome
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'recommend': 1.0, 'disappointed': 1.0, ...",1,0.0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'quilt': 1.0, 'the': 1.0, 'than': 1.0, 'fu ...",1,0.0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'tool': 1.0, 'clever': 1.0, 'binky': 2.0, ...",1,0.0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'rock': 1.0, 'many': 1.0, 'headaches': 1.0, ...",1,0.0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'thumb': 1.0, 'or': 1.0, 'break': 1.0, 'trying': ...",1,0.0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'for': 1.0, 'barnes': 1.0, 'at': 1.0, 'is': ...",1,0.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'right': 1.0, 'because': 1.0, 'questions': 1.0, ...",1,0.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'like': 1.0, 'and': 1.0, 'changes': 1.0, 'the': ...",1,0.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'in': 1.0, 'pages': 1.0, 'out': 1.0, 'run': 1.0, ...",1,0.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'tracker': 1.0, 'now': 1.0, 'its': 1.0, 'sti ...",1,0.0

great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,crazy,weird,poor
0.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,2.0,0,0,0.0,0,0,0,0,0,0
1.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,0
1.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,1.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,2.0,0,0,0.0,0,0,0,0,0,0


- Using the .sum() method on each of the new columns you created, answer the following questions: Out of the selected_words, which one is most used in the dataset? Which one is least used? Save these results to answer the quiz at the end.

In [52]:
print('Word count value:')
print('\n')
for word in selected_words:
    print('{0} ---> {1}'.format(word, products[word].sum()))

Word count value:


awesome ---> 3892.0
great ---> 55791.0
fantastic ---> 1664.0
amazing ---> 2628.0
love ---> 41994.0
horrible ---> 1110
bad ---> 4183
terrible ---> 1146.0
awful ---> 687
wow ---> 425
hate ---> 1107
crazy ---> 1122
weird ---> 490
poor ---> 1455


## 2. Create a new sentiment analysis model using only the selected_words as features: 
In the IPython Notebook above, we used word counts for all words as features for our sentiment classifier. Now, you are just going to use the selected_words:

In [53]:
train_data, test_data = products.random_split(0.8, seed = 0)

In [54]:
selected_words_model = turicreate.logistic_classifier.create(train_data,
                                                     target = 'sentiment',
                                                     features = selected_words,
                                                     validation_set = test_data)

- You will now examine the weights the learned classifier assigned to each of the 11 words in selected_words and gain intuition as to what the ML algorithm did for your data using these features. In turireate, a learned model, such as the selected_words_model, has a field 'coefficients', which lets you look at the learned coefficients. You can access it by using:

In [55]:
coeff = selected_words_model.coefficients

In [56]:
coeff

name,index,class,value,stderr
(intercept),,1,1.3690433239069115,0.0090451299821596
awesome,,1,1.1658183662530524,0.0848777695556236
great,,1,0.8758010279663601,0.0191283829206492
fantastic,,1,0.9115419286539496,0.111947115279167
amazing,,1,1.1660107751951434,0.101918543706549
love,,1,1.360311550657392,0.0282619257823735
horrible,,1,-2.2309398991439084,0.0807874499016518
bad,,1,-0.95332281087476,0.0389759579882749
terrible,,1,-2.2067095196063136,0.0777068952398687
awful,,1,-2.010244686277285,0.101690583266167


- Using this approach, sort the learned coefficients according to the ‘value’ column using .sort(). Out of the 11 words in selected_words, which one got the most positive weight? Which one got the most negative weight? Do these values make sense for you? Save these results to answer the quiz at the end.

In [57]:
coeff.sort('value', ascending = False)   # most positive coeff
coeff

name,index,class,value,stderr
(intercept),,1,1.3690433239069115,0.0090451299821596
awesome,,1,1.1658183662530524,0.0848777695556236
great,,1,0.8758010279663601,0.0191283829206492
fantastic,,1,0.9115419286539496,0.111947115279167
amazing,,1,1.1660107751951434,0.101918543706549
love,,1,1.360311550657392,0.0282619257823735
horrible,,1,-2.2309398991439084,0.0807874499016518
bad,,1,-0.95332281087476,0.0389759579882749
terrible,,1,-2.2067095196063136,0.0777068952398687
awful,,1,-2.010244686277285,0.101690583266167


In [58]:
coeff.sort('value', ascending = True)   # most negative coeff

name,index,class,value,stderr
poor,,1,-2.4168515893597635,0.0726003893744658
horrible,,1,-2.2309398991439084,0.0807874499016518
terrible,,1,-2.2067095196063136,0.0777068952398687
awful,,1,-2.010244686277285,0.101690583266167
hate,,1,-1.334571592762697,0.0775151904400961
bad,,1,-0.95332281087476,0.0389759579882749
weird,,1,-0.7740245415549182,0.1195344677350031
crazy,,1,-0.5276388735155451,0.084531852141003
wow,,1,-0.0059807260743115,0.1613633980596307
great,,1,0.8758010279663601,0.0191283829206492


## 3. Comparing the accuracy of different sentiment analysis model:

- What is the accuracy of the selected_words_model on the test_data? What was the accuracy of the sentiment_model that we learned using all the word counts in the IPython Notebook above from the lectures? What is the accuracy majority class classifier on this task? How do you compare the different learned models with the baseline approach where we are just predicting the majority class? Save these results to answer the quiz at the end.

Hint: we discussed the majority class classifier in lecture, which simply predicts that every data point is from the most common class. This is baseline is something we definitely want to beat with models we learn from data.

In [59]:
sentiment_model.evaluate(test_data)

{'accuracy': 0.9176975738650012,
 'auc': 0.9342357833151299,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        1        |  1397 |
 |      1       |        0        |  1344 |
 |      0       |        0        |  3931 |
 |      1       |        1        | 26632 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.951057941255245,
 'log_loss': 0.33047871872321327,
 'precision': 0.9501587641371436,
 'recall': 0.9519588218472976,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+--------------------+--------------------+-------+------+
 | threshold |        fpr         |        tpr         |   p   |  n   |
 +-----------+--------------------+--------------------+-------+------+
 |    0.

In [60]:
selected_words_model.evaluate(test_data)

{'accuracy': 0.8487869325006006,
 'auc': 0.7027940206524079,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      1       |        0        |  167  |
 |      0       |        0        |  459  |
 |      0       |        1        |  4869 |
 |      1       |        1        | 27809 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.9169716754047549,
 'log_loss': 0.39132671854266027,
 'precision': 0.8510006732358162,
 'recall': 0.994030597655133,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+--------------------+-----+-------+------+
 | threshold |        fpr         | tpr |   p   |  n   |
 +-----------+--------------------+-----+-------+------+
 |    0.0    |        1.0         | 1.0 | 27976 | 532

In [61]:
selected_words_model.evaluate(test_data, metric = 'roc_curve')

{'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+--------------------+-----+-------+------+
 | threshold |        fpr         | tpr |   p   |  n   |
 +-----------+--------------------+-----+-------+------+
 |    0.0    |        1.0         | 1.0 | 27976 | 5328 |
 |   1e-05   |        1.0         | 1.0 | 27976 | 5328 |
 |   2e-05   |        1.0         | 1.0 | 27976 | 5328 |
 |   3e-05   | 0.9998123123123123 | 1.0 | 27976 | 5328 |
 |   4e-05   | 0.9998123123123123 | 1.0 | 27976 | 5328 |
 |   5e-05   | 0.9998123123123123 | 1.0 | 27976 | 5328 |
 |   6e-05   | 0.9998123123123123 | 1.0 | 27976 | 5328 |
 |   7e-05   | 0.9998123123123123 | 1.0 | 27976 | 5328 |
 |   8e-05   | 0.9998123123123123 | 1.0 | 27976 | 5328 |
 |   9e-05   | 0.9998123123123123 | 1.0 | 27976 | 5328 |
 +-----------+--------------------+-----+-------+------+
 [100001 rows x 5 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_row

## 4. Interpreting the difference in performance between the models:

To understand why the model with all word counts performs better than the one with only the selected_words, we will now examine the reviews for a particular product.

- We will investigate a product named ‘Baby Trend Diaper Champ’. (This is a trash can for soiled baby diapers, which keeps the smell contained.)

- Just like we did for the reviews for the giraffe toy in the IPython Notebook in the lecture video, before we start our analysis you should select all reviews where the product name is ‘Baby Trend Diaper Champ’. Let’s call this table diaper_champ_reviews.

- Again, just as in the video, use the sentiment_model to predict the sentiment of each review in diaper_champ_reviews and sort the results according to their ‘predicted_sentiment’.

- What is the ‘predicted_sentiment’ for the most positive review for ‘Baby Trend Diaper Champ’ according to the sentiment_model from the IPython Notebook from lecture? Save this result to answer the quiz at the end.

- Now use the selected_words_model you learned using just the selected_words to predict the sentiment most positive review you found above. Hint: if you sorted the diaper_champ_reviews in descending order (from most positive to most negative), this command will be helpful to make the prediction you need:

In [62]:
diaper_champ_reviews = products[products['name'] == 'Baby Trend Diaper Champ']

In [63]:
diaper_champ_reviews.head()

name,review,rating,word_count,sentiment,awesome
Baby Trend Diaper Champ,Ok - newsflash. Diapers are just smelly. We've ...,4.0,"{'convenient': 1.0, 'more': 1.0, 'trash': ...",1,0.0
Baby Trend Diaper Champ,"My husband and I selected the Diaper ""Champ"" ma ...",1.0,"{'system': 1.0, 'try': 1.0, 're': 1.0, 'still': ...",0,0.0
Baby Trend Diaper Champ,Excellent diaper disposal unit. I used it in ...,5.0,"{'nose': 1.0, 'for': 2.0, 'investment': 1.0, ...",1,0.0
Baby Trend Diaper Champ,We love our diaper champ. It is very easy to use ...,5.0,"{'out': 1.0, 'pull': 1.0, 'open': 1.0, 'pail': ...",1,0.0
Baby Trend Diaper Champ,Two girlfriends and two family members put me ...,5.0,"{'winter': 1.0, 'outside': 1.0, 'day': ...",1,0.0
Baby Trend Diaper Champ,I waited to review this until I saw how it ...,4.0,"{'mom': 1.0, 'huge': 1.0, 'special': 1.0, 'good': ...",1,0.0
Baby Trend Diaper Champ,I have had a diaper genie for almost 4 years since ...,1.0,"{'yuck': 1.0, 'clean': 1.0, 'trash': 3.0, 'is': ...",0,0.0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,"{'price': 1.0, 'suggestions': 1.0, ...",1,0.0
Baby Trend Diaper Champ,I am so glad I got the Diaper Champ instead of ...,5.0,"{'best': 1.0, 'that': 1.0, 'will': 1.0, ...",1,0.0
Baby Trend Diaper Champ,We had 2 diaper Genie's both given to us as a ...,4.0,"{'no': 1.0, 'regular': 1.0, 'part': 1.0, ...",1,0.0

great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,crazy,weird,poor
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,1.0,0.0,1,0,0.0,0,1,0,0,0,0
0.0,0.0,0.0,0.0,0,1,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,2.0,0,0,0.0,0,0,0,0,0,0


In [64]:
diaper_champ_reviews['predicted_sentiment'] = sentiment_model.predict(diaper_champ_reviews, output_type = 'probability')

In [65]:
diaper_champ_reviews = diaper_champ_reviews.sort('predicted_sentiment', ascending = False)

In [66]:
diaper_champ_reviews.head()

name,review,rating,word_count,sentiment,awesome
Baby Trend Diaper Champ,I read a review below that can explain exactly ...,4.0,"{'key': 1.0, 'have': 1.0, 'pieces': 1.0, 'betwe ...",1,0.0
Baby Trend Diaper Champ,I have never written a review for Amazon but I ...,5.0,"{'priceless': 1.0, 'knows': 1.0, 'parent': ...",1,0.0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,"{'price': 1.0, 'suggestions': 1.0, ...",1,0.0
Baby Trend Diaper Champ,Baby Luke can turn a clean diaper to a dirty ...,5.0,"{'around': 1.0, 'any': 1.0, 't': 1.0, 'isn': ...",1,0.0
Baby Trend Diaper Champ,Diaper Champ or Diaper Genie? That was my ...,5.0,"{'either': 1.0, 'be': 1.0, 't': 1.0, 'not': ...",1,0.0
Baby Trend Diaper Champ,I am one of those super- critical shoppers who ...,5.0,"{'hope': 1.0, 'make': 1.0, 'slower': 1.0, ...",1,0.0
Baby Trend Diaper Champ,I LOOOVE this diaper pail! Its the easies ...,5.0,"{'buy': 1.0, 'product': 1.0, 'recommend': 1.0, ...",1,0.0
Baby Trend Diaper Champ,"As a first time mother, I wanted to get the best ...",5.0,"{'ll': 1.0, 'baby': 1.0, 'recommended': 1.0, ' ...",1,0.0
Baby Trend Diaper Champ,I see that there are complaints of stinkiness ...,5.0,"{'very': 1.0, 'told': 1.0, 'all': 1.0, ...",1,0.0
Baby Trend Diaper Champ,I have a 10 year old daughter and an 8 month ...,5.0,"{'sorry': 1.0, 'be': 1.0, 'you': 2.0, 'sell': 1.0, ...",1,0.0

great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,crazy,weird,poor
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,1
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
1.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
1.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,1
0.0,0.0,0.0,2.0,0,0,0.0,0,0,0,0,0,0

predicted_sentiment
0.999999999989594
0.9999999999868132
0.9999999999465672
0.9999999999302822
0.9999999999174132
0.9999999998430964
0.9999999997360196
0.9999999995664316
0.9999999985015902
0.999999998056851


In [67]:
diaper_champ_reviews['predicted_sentiment'].max()

0.9999999999895941

In [69]:
selected_words_model.predict(diaper_champ_reviews[0], output_type = 'probability')

dtype: float
Rows: 1
[0.7972255441795084]

In [70]:
# diaper_champ_reviews['predicted_sentiment_2']  = selected_words_model.predict(diaper_champ_reviews, output_type='probability')
diaper_champ_reviews.head()

name,review,rating,word_count,sentiment,awesome
Baby Trend Diaper Champ,I read a review below that can explain exactly ...,4.0,"{'key': 1.0, 'have': 1.0, 'pieces': 1.0, 'betwe ...",1,0.0
Baby Trend Diaper Champ,I have never written a review for Amazon but I ...,5.0,"{'priceless': 1.0, 'knows': 1.0, 'parent': ...",1,0.0
Baby Trend Diaper Champ,I originally put this item on my baby registry ...,5.0,"{'price': 1.0, 'suggestions': 1.0, ...",1,0.0
Baby Trend Diaper Champ,Baby Luke can turn a clean diaper to a dirty ...,5.0,"{'around': 1.0, 'any': 1.0, 't': 1.0, 'isn': ...",1,0.0
Baby Trend Diaper Champ,Diaper Champ or Diaper Genie? That was my ...,5.0,"{'either': 1.0, 'be': 1.0, 't': 1.0, 'not': ...",1,0.0
Baby Trend Diaper Champ,I am one of those super- critical shoppers who ...,5.0,"{'hope': 1.0, 'make': 1.0, 'slower': 1.0, ...",1,0.0
Baby Trend Diaper Champ,I LOOOVE this diaper pail! Its the easies ...,5.0,"{'buy': 1.0, 'product': 1.0, 'recommend': 1.0, ...",1,0.0
Baby Trend Diaper Champ,"As a first time mother, I wanted to get the best ...",5.0,"{'ll': 1.0, 'baby': 1.0, 'recommended': 1.0, ' ...",1,0.0
Baby Trend Diaper Champ,I see that there are complaints of stinkiness ...,5.0,"{'very': 1.0, 'told': 1.0, 'all': 1.0, ...",1,0.0
Baby Trend Diaper Champ,I have a 10 year old daughter and an 8 month ...,5.0,"{'sorry': 1.0, 'be': 1.0, 'you': 2.0, 'sell': 1.0, ...",1,0.0

great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,crazy,weird,poor
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,1
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
1.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
1.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,1.0,0,0,0.0,0,0,0,0,0,0
0.0,0.0,0.0,0.0,0,0,0.0,0,0,0,0,0,1
0.0,0.0,0.0,2.0,0,0,0.0,0,0,0,0,0,0

predicted_sentiment
0.999999999989594
0.9999999999868132
0.9999999999465672
0.9999999999302822
0.9999999999174132
0.9999999998430964
0.9999999997360196
0.9999999995664316
0.9999999985015902
0.999999998056851


In [71]:
diaper_champ_reviews[0]['review']

"I read a review below that can explain exactly what we experienced. We've had it for 16 months and it has worked wonderful for us. No smells, change it out once a week, easy to clean. Then a diaper snagged this foam material in the head part, so I pulled the rest of the foam out. Big mistake!!! Now it can no loner retain the stinkiness and we're looking for a replacement. Be careful of overloading and never take out that foam piece that is cushioned between pieces. I have figured out that it is key to keeping the stink out."

In [72]:
diaper_champ_reviews[0]['word_count']

{'key': 1.0,
 'have': 1.0,
 'pieces': 1.0,
 'between': 1.0,
 'cushioned': 1.0,
 'piece': 1.0,
 'take': 1.0,
 'overloading': 1.0,
 'be': 1.0,
 'looking': 1.0,
 're': 1.0,
 'stinkiness': 1.0,
 'retain': 1.0,
 'now': 1.0,
 'wonderful': 1.0,
 'worked': 1.0,
 '16': 1.0,
 'and': 3.0,
 'months': 1.0,
 've': 1.0,
 'in': 1.0,
 'us': 1.0,
 'i': 3.0,
 'experienced': 1.0,
 'read': 1.0,
 'easy': 1.0,
 'for': 3.0,
 'to': 2.0,
 'has': 1.0,
 'review': 1.0,
 'keeping': 1.0,
 'replacement': 1.0,
 'out': 5.0,
 'loner': 1.0,
 'clean': 1.0,
 'mistake': 1.0,
 'big': 1.0,
 'pulled': 1.0,
 'it': 5.0,
 'this': 1.0,
 'is': 2.0,
 'explain': 1.0,
 'material': 1.0,
 'exactly': 1.0,
 'a': 4.0,
 'we': 3.0,
 'that': 4.0,
 'had': 1.0,
 'what': 1.0,
 'part': 1.0,
 'no': 2.0,
 'smells': 1.0,
 'can': 2.0,
 'change': 1.0,
 'figured': 1.0,
 'week': 1.0,
 'then': 1.0,
 'snagged': 1.0,
 'diaper': 1.0,
 'careful': 1.0,
 'the': 5.0,
 'never': 1.0,
 'foam': 3.0,
 'head': 1.0,
 'so': 1.0,
 'below': 1.0,
 'rest': 1.0,
 'stink': 1

In [73]:
diaper_champ_reviews[1]['review']

'I have never written a review for Amazon but I saw some of the poor reviews on the Diaper Champ and had to add my two cents . . .When my sister got pregnant she could not decide if she should register for the Diaper Champ or the Diaper Genie.  She left it up to me and after reviewing both items I went with the Diaper Champ . . .  The idea of having to continually purchase bags for the Diaper Genie seemed like a waste when the Diaper Champ lets you use any type of bag . . .  My nephew is now over two years old and while he has begun potty training, my sister still uses the Diaper Champ I gave her . . .When it came time for me to choose a diaper disposal system, I put the Diaper Champ on my registry.  Yes, it can be a bit tricky to open (I found using two hands works best) but then again do I want something that is holding wet and otherwise dirty diapers to open easily?  True, the mechanism sometimes won\'t "swing" the dirty diaper into the pail but HELLO that is a sign that it is proba

In [74]:
diaper_champ_reviews[-1]['review']

'My husband and I selected the Diaper "Champ" mainly because you can use ordinary trash bags and not be roped into buying the specialty refill bags, and it was moderately priced (a little less than the Diaper Dekor). It also seemed that the reviews of this product were generally more positive...The positives are:1. You can use any trash bag2. Easy to use and refillThe negatives are:1. The bag doesn\'t seal around the dirty diapers, so when it comes time to refill the bag, it\'s just like opening a regular trash can. Smells like the Champ is trying to knock YOU out with odor!2. The plastic seems to smell, ie. You put a dirty diaper in the hole, and flip the handle to dump the diaper into the champ. That "side" of the plastic dumper-thingie is in contact with the air inside the dirty diaper changer, so when you flip it over the next time to dispose of another diaper, you smell the last 8 diapers you put in there...pretty gross.3. The "odor seal" (some soft material) really seems to retai