#Predicting sentiment from product reviews

#Fire up GraphLab Create

In [150]:
import pandas as pd

#Read some product review data

Loading reviews for a set of baby products. 

In [151]:
products = pd.read_csv('amazon_baby.csv')

#Let's explore this data together

Data includes the product name, the review text and the rating of the review. 

In [152]:
products.head()

Unnamed: 0,name,review,rating
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5


In [153]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

In [154]:
products = products.fillna({'review':''})  # fill in N/A's in the review column

In [155]:
def remove_punctuation(text):
    import string
    translator = str.maketrans('', '', string.punctuation)
    return text.translate(translator) 

In [156]:
products['review_clean'] = products['review'].astype(str).apply(remove_punctuation) #astype(str) makes sure all reviews are strings

In [157]:
for word in selected_words:
    products[word] = products['review_clean'].apply(lambda s : s.split().count(word))

In [158]:
products.head()

Unnamed: 0,name,review,rating,review_clean,awesome,great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3,These flannel wipes are OK but in my opinion n...,0,0,0,0,0,0,0,0,0,0,0
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,it came early and was not disappointed i love ...,0,0,0,0,1,0,0,0,0,0,0
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,Very soft and comfortable and warmer than it l...,0,0,0,0,0,0,0,0,0,0,0
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,This is a product well worth the purchase I h...,0,0,0,0,2,0,0,0,0,0,0
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,All of my kids have cried nonstop when I tried...,0,1,0,0,0,0,0,0,0,0,0


Using the .sum() method on each of the new columns you created, answer the following questions: Out of the selected_words, which one is most used in the dataset? Which one is least used? Save these results to answer the quiz at the end.

In [159]:
for w in selected_words:
    print(w,(products[w].sum()))

awesome 3234
great 49419
fantastic 1506
amazing 2233
love 34584
horrible 1057
bad 4602
terrible 1092
awful 626
wow 111
hate 1118


In [160]:
print('answer: great is used the most and wow is used the least')

answer: great is used the most and wow is used the least


#Examining the reviews for most-sold product:  'Vulli Sophie the Giraffe Teether'

In [161]:
giraffe_reviews = products[products['name'] == 'Vulli Sophie the Giraffe Teether']
giraffe_reviews

Unnamed: 0,name,review,rating,review_clean,awesome,great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate
34313,Vulli Sophie the Giraffe Teether,He likes chewing on all the parts especially t...,5,He likes chewing on all the parts especially t...,0,0,0,0,0,0,0,0,0,0,0
34314,Vulli Sophie the Giraffe Teether,My son loves this toy and fits great in the di...,5,My son loves this toy and fits great in the di...,0,1,0,0,0,0,0,0,0,0,0
34315,Vulli Sophie the Giraffe Teether,There really should be a large warning on the ...,1,There really should be a large warning on the ...,0,0,0,0,0,0,0,0,0,0,0
34316,Vulli Sophie the Giraffe Teether,All the moms in my moms\' group got Sophie for...,5,All the moms in my moms group got Sophie for t...,0,0,0,0,1,0,0,0,0,0,0
34317,Vulli Sophie the Giraffe Teether,I was a little skeptical on whether Sophie was...,5,I was a little skeptical on whether Sophie was...,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
159649,Vulli Sophie the Giraffe Teether,My baby loves her Sophie Chew Toy. She can che...,5,My baby loves her Sophie Chew Toy She can chew...,0,0,0,0,0,0,0,0,0,0,0
159650,Vulli Sophie the Giraffe Teether,Sophie the Giraffe was a big hit at the baby s...,5,Sophie the Giraffe was a big hit at the baby s...,0,0,0,0,0,0,0,0,0,0,0
159651,Vulli Sophie the Giraffe Teether,quick shipping and perfect product. I would pu...,5,quick shipping and perfect product I would pur...,0,0,0,0,0,0,0,0,0,0,0
159652,Vulli Sophie the Giraffe Teether,My baby who is currently teething love his Sop...,5,My baby who is currently teething love his Sop...,0,0,0,0,1,0,0,0,0,0,0


##Define what's a positive and a negative sentiment

We will ignore all reviews with rating = 3, since they tend to have a neutral sentiment.  Reviews with a rating of 4 or higher will be considered positive, while the ones with rating of 2 or lower will have a negative sentiment.   

In [162]:
#ignore all 3* reviews
products = products[products['rating'] != 3]

In [163]:
#positive sentiment = 4* or 5* reviews
products['sentiment'] = products['rating'].apply(lambda rating : +1 if rating > 3 else -1)

In [164]:
products.head()

Unnamed: 0,name,review,rating,review_clean,awesome,great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,it came early and was not disappointed i love ...,0,0,0,0,1,0,0,0,0,0,0,1
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,Very soft and comfortable and warmer than it l...,0,0,0,0,0,0,0,0,0,0,0,1
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,This is a product well worth the purchase I h...,0,0,0,0,2,0,0,0,0,0,0,1
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,All of my kids have cried nonstop when I tried...,0,1,0,0,0,0,0,0,0,0,0,1
5,Stop Pacifier Sucking without tears with Thumb...,"When the Binky Fairy came to our house, we did...",5,When the Binky Fairy came to our house we didn...,0,1,0,0,0,0,0,0,0,0,0,1


In [165]:
products[selected_words]

Unnamed: 0,awesome,great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate
1,0,0,0,0,1,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,2,0,0,0,0,0,0
4,0,1,0,0,0,0,0,0,0,0,0
5,0,1,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
183526,0,2,0,0,0,0,0,0,0,0,0
183527,0,1,0,0,0,0,0,0,0,0,0
183528,0,2,0,0,0,0,0,0,0,0,0
183529,0,0,0,0,0,0,0,0,0,0,0


##Let's train the sentiment classifier

In [166]:
import numpy as np
from sklearn.model_selection import train_test_split
train_data , test_data = train_test_split(products,test_size=0.2)

In [167]:
#Step 1. Import the model I want to use
from sklearn.linear_model import LogisticRegression

#Step 2. Make an instance of the Model
# all parameters not specified are set to their defaults
logisticRegr = LogisticRegression()
#logisticRegr2 = linear_model.LogisticRegression()

#Step 3. Training the model on the data, storing the information learned from the data
selected_words_model = logisticRegr.fit(train_data[selected_words], train_data['sentiment'])

In [168]:
selected_words_model.coef_

array([[ 1.05701403,  0.77854065,  0.98276089,  1.04077432,  1.30107877,
        -2.09126824, -0.96040222, -2.12706595, -1.86106079, -0.68326587,
        -1.32283892]])

In [169]:
print(selected_words)
weight=selected_words_model.coef_
weight.reshape(-1,1).tolist()

['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']


[[1.0570140315780299],
 [0.7785406485551589],
 [0.9827608881400005],
 [1.0407743177067377],
 [1.3010787672591055],
 [-2.0912682443887527],
 [-0.9604022190885717],
 [-2.127065953537484],
 [-1.8610607949866766],
 [-0.6832658702770313],
 [-1.322838918361984]]

In [170]:
flat_list = [item for sublist in weight.reshape(-1,1).tolist() for item in sublist]
flat_list

[1.0570140315780299,
 0.7785406485551589,
 0.9827608881400005,
 1.0407743177067377,
 1.3010787672591055,
 -2.0912682443887527,
 -0.9604022190885717,
 -2.127065953537484,
 -1.8610607949866766,
 -0.6832658702770313,
 -1.322838918361984]

In [171]:
feature_weight = pd.DataFrame(
    {'feature': selected_words,
     'weight': flat_list
    })
feature_weight

Unnamed: 0,feature,weight
0,awesome,1.057014
1,great,0.778541
2,fantastic,0.982761
3,amazing,1.040774
4,love,1.301079
5,horrible,-2.091268
6,bad,-0.960402
7,terrible,-2.127066
8,awful,-1.861061
9,wow,-0.683266


Using this approach, sort the learned coefficients according to the ‘value’ column using .sort(). Out of the 11 words in selected_words, which one got the most positive weight? Which one got the most negative weight? Do these values make sense for you? Save these results to answer the quiz at the end.

In [270]:
print('answer: love has the most positive weight; terrible has the most negative weight')

answer: love has the most positive weight; terrible has the most negative weight


Comparing the accuracy of different sentiment analysis model: Using the method


In [173]:
test_data.head()

Unnamed: 0,name,review,rating,review_clean,awesome,great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment
98165,Replacement Tubing (Retail Pack of 2) for Mede...,Tubing works just like it should in my Medela ...,4,Tubing works just like it should in my Medela ...,0,0,0,0,1,0,0,0,0,0,0,1
147643,"Evenflo Compact Fold High Chair, Marianna","Shipment was delayed thanks to UPS, other than...",5,Shipment was delayed thanks to UPS other than ...,0,1,0,0,0,0,0,0,0,0,0,1
28869,"Fantasy Furniture Homey VIP Chair, Blue",This chair is very nice looking and isn\'t too...,5,This chair is very nice looking and isnt too k...,0,0,0,0,0,0,0,0,0,0,0,1
26164,Sunshine Kids Stow \'N Go Car Seatback Organiz...,"Great back seat organizer,several handy pocket...",4,Great back seat organizerseveral handy pockets...,0,0,0,0,0,0,0,0,0,0,0,1
31429,American Baby Company Percale 3 Piece Toddler ...,"Do not like it, same objections-bad quality of...",1,Do not like it same objectionsbad quality of f...,0,0,0,0,0,0,0,0,0,0,0,-1


In [175]:
from sklearn.metrics import accuracy_score
y_pred_selected_words = selected_words_model.predict(test_data[selected_words])
y_true_selected_words = test_data['sentiment']
accuracy_score(y_true_selected_words, y_pred_selected_words)
print('answer: the accuracy of selected words model is',accuracy_score(y_true_selected_words, y_pred_selected_words))

answer: the accuracy of selected words model is 0.847890617972


We will now compute the word count for each word that appears in the reviews. A vector consisting of word counts is often referred to as bag-of-word features. Since most words occur in only a few reviews, word count vectors are sparse. For this reason, scikit-learn and many other tools use sparse matrices to store a collection of word count vectors. Refer to appropriate manuals to produce sparse word count vectors. General steps for extracting word count vectors are as follows:

In [176]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
     # Use this token pattern to keep single-letter words
# First, learn vocabulary from the training data and assign columns to words
# Then convert the training data into a sparse matrix
train_matrix = vectorizer.fit_transform(train_data['review_clean'])

# Second, convert the test data into a sparse matrix, using the same word-column mapping
test_matrix = vectorizer.transform(test_data['review_clean'])

In [177]:
print(train_matrix.shape)

(133401, 121546)


In [178]:
# Training the model on the data, storing the information learned from the data
sentiment_model = LogisticRegression().fit(train_matrix, train_data['sentiment'])

In [180]:
y_pred_sentiment_model = sentiment_model.predict(test_matrix)
y_true_sentiment_model = test_data['sentiment']
accuracy_score(y_pred_sentiment_model, y_true_sentiment_model)
print('answer: the accuracy of sentiment model is',accuracy_score(y_pred_sentiment_model, y_true_sentiment_model))

answer: the accuracy of sentiment model is 0.928427933195


How do you compare the different learned models with the baseline approach where we are just predicting the majority class? 

In [182]:
# the majority of train data
num_positive_train  = (train_data['sentiment'] == +1).sum()
num_negative_train = (train_data['sentiment'] == -1).sum()
print (num_positive_train)
print (num_negative_train)

112117
21284


In [184]:
num_positive_test  = (test_data['sentiment'] == +1).sum()
num_negative_test = (test_data['sentiment'] == -1).sum()

In [185]:
accuracy_majority_class_classifier = num_positive_test/(num_negative_test+num_positive_test)
accuracy_majority_class_classifier

0.8438127792270097

In [187]:
print('answer: both models are better than the baseline approach')

answer: both models are better than the baseline approach


In [241]:
diaper_champ_reviews = test_data[test_data['name'] == 'Baby Trend Diaper Champ']
diaper_champ_reviews['review_clean']

640    Its a good product because its easy to use and...
527    I bought this trying to save a few dollars and...
411    We are first time parents but have friends who...
602    If you wrap up the diaper with the tape it kee...
384    you can use any ol bag with this go to the dol...
347    Ive been using this diaper pail for 2 and 12 y...
493    My husband and I registered for the Diaper Cha...
349    Diaper Champ is great  But just to let you kno...
331    Granted our 3month old isnt producing really s...
336    My husband  I had received a tip from another ...
642    This truly is the champ of diapers This produc...
369    We wanted to have a convenient locking recepta...
421    I find this product easy to operate  With any ...
501    Its so much better then the diaper genie No sp...
603    I was a diaper genie user until my son was abo...
507    If you really want to save money but still get...
586    I love this product It is so easy to use and t...
435    It is so easy to use You

Again, just as in the video, use the sentiment_model to predict the sentiment of each review in diaper_champ_reviews and sort the results according to their ‘predicted_sentiment’.

In [242]:
test_matrix_diaper_champ = vectorizer.transform(diaper_champ_reviews['review_clean'])
test_matrix_diaper_champ

<57x121546 sparse matrix of type '<class 'numpy.int64'>'
	with 4041 stored elements in Compressed Sparse Row format>

In [243]:
diaper_sentiment_model_pred_proba = sentiment_model.predict_proba(test_matrix_diaper_champ)
diaper_sentiment_model_pred = sentiment_model.predict(test_matrix_diaper_champ)
len(diaper_sentiment_model_pred_proba[:,1])

57

In [244]:
diaper_sentiment_model_pred_proba[:,1].tolist()

[0.9996912699271776,
 0.9904745148636306,
 0.992983598443178,
 0.9867987480331346,
 0.9888706474915948,
 0.9990269161452062,
 0.9590746915014893,
 0.9985295385049727,
 0.9878729838555992,
 0.9476120327855299,
 0.9926501485796079,
 0.9919940571348792,
 0.9877176298073338,
 0.9496841461384783,
 0.996348283734769,
 0.9060805881802583,
 0.9993660767486636,
 0.9931922632899021,
 0.9999191793666939,
 0.9997175088603375,
 0.04303525917102579,
 0.9954442487792847,
 0.9919999904404853,
 0.9999943340363152,
 0.999964674875064,
 0.14682161458152407,
 0.9999993557820135,
 0.9997594922087505,
 0.9995877791823967,
 0.9973824336182506,
 0.9338013128218321,
 0.9880654419506433,
 0.9999991797579868,
 0.6408462791764689,
 0.9999999999612095,
 0.050717778166811646,
 0.7369597217350143,
 0.9503867893605001,
 0.9975269996358064,
 0.9952821257351729,
 0.9999717188101253,
 0.9909014722385561,
 0.9998655974246947,
 0.96879628435153,
 0.9403444187216434,
 0.008923506007457182,
 0.9998740742405557,
 0.771264093

In [245]:
len(diaper_sentiment_model_pred.tolist())

57

In [246]:
diaper_review_sentiment_pred_proba = pd.DataFrame(
    {'review_clean': diaper_champ_reviews['review_clean'],
     'predicted_sentiment_prob': diaper_sentiment_model_pred_proba[:,1].tolist(),
     'predicted_sentiment': diaper_sentiment_model_pred.tolist()
    })

In [247]:
diaper_review_sentiment_pred_proba.sort_values(by='predicted_sentiment_prob', ascending=False)

Unnamed: 0,review_clean,predicted_sentiment_prob,predicted_sentiment
376,This is absolutely by far the best diaper pail...,1.0,1
414,We have been using our Diaper Champ for almost...,1.0,1
571,We did alot of research on diaper pails before...,0.999999,1
420,Baby Luke can turn a clean diaper to a dirty d...,0.999999,1
604,I have been using this diaper pail for 412 mon...,0.999998,1
458,Im SO glad that we asked the sales associate a...,0.999994,1
486,This is my second child With my first I went t...,0.999992,1
610,This is the best diaper pail I was a little s...,0.999972,1
549,Ive tried the two other most popular diaper pa...,0.999965,1
499,This is a great product and a good value for t...,0.999919,1


What is the ‘predicted_sentiment’ for the most positive review for ‘Baby Trend Diaper Champ’ according to the sentiment_model from the IPython Notebook from lecture? Save this result to answer the quiz at the end.

In [229]:
print('answer: the predicted_sentiment_prob for the most positive review is 1')

answer: the predicted_sentiment_prob for the most positive review is 1


In [264]:
diaper_review_sentiment_pred_proba.sort_values(by='predicted_sentiment_prob', ascending=False).iloc[0]['review_clean']

'This is absolutely by far the best diaper pail money can buy  Never do we detect a diaper odor and my husband has a very sensitive sense of smell and is usually very quick to complain about such things  For those who say they have a problem with the Diaper Champ getting stuckthe ONLY time this ever happens to us is when the bag is full and needs to be changed  We love that it uses regular kitchen trash bags makes it much more economical  We have not found that we need to worry about frequent emptying or cleaning  We just leave the Champ to do its job until the mechanism begins to feel like its getting stuckthen we change the bag  For us this means about once a week  Not only is the Champ EASY to use its kind of fun  Before our daughter was born we really worried about whether the diaper pail we chose would be effective enough for us because my husband is so sensitive to smells  But shes two months old now and we still just cant say enough good things about itUPDATE  My daughter is now

Now use the selected_words_model you learned using just the selected_words to predict the sentiment most positive review you found above. Hint: if you sorted the diaper_champ_reviews in descending order (from most positive to most negative), this command will be helpful to make the prediction you need:


In [248]:
diaper_champ_reviews_test = test_data[test_data['name'] == 'Baby Trend Diaper Champ']
diaper_champ_reviews_test.shape

(57, 16)

In [249]:
diaper_champ_reviews_test

Unnamed: 0,name,review,rating,review_clean,awesome,great,fantastic,amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment
640,Baby Trend Diaper Champ,"Its a good product because its easy to use, an...",4,Its a good product because its easy to use and...,0,0,0,0,0,0,0,0,0,0,0,1
527,Baby Trend Diaper Champ,I bought this trying to save a few dollars and...,5,I bought this trying to save a few dollars and...,0,1,0,0,0,0,0,0,0,0,0,1
411,Baby Trend Diaper Champ,We are first time parents but have friends who...,5,We are first time parents but have friends who...,0,0,0,0,0,0,1,0,0,0,0,1
602,Baby Trend Diaper Champ,"If you wrap up the diaper with the tape, it ke...",5,If you wrap up the diaper with the tape it kee...,0,0,0,0,0,0,0,0,0,0,0,1
384,Baby Trend Diaper Champ,you can use any ol\' bag with this. go to the ...,5,you can use any ol bag with this go to the dol...,0,0,0,0,0,0,0,0,0,0,0,1
347,Baby Trend Diaper Champ,I\'ve been using this diaper pail for 2 and 1/...,5,Ive been using this diaper pail for 2 and 12 y...,0,0,0,0,0,0,0,0,0,0,0,1
493,Baby Trend Diaper Champ,My husband and I registered for the Diaper Cha...,5,My husband and I registered for the Diaper Cha...,0,0,0,0,0,0,0,0,0,0,0,1
349,Baby Trend Diaper Champ,Diaper Champ is great. But just to let you kn...,5,Diaper Champ is great But just to let you kno...,0,1,0,0,0,0,0,0,0,0,0,1
331,Baby Trend Diaper Champ,Granted our 3-month old isn\'t producing reall...,4,Granted our 3month old isnt producing really s...,0,0,0,0,0,0,0,0,0,0,0,1
336,Baby Trend Diaper Champ,My husband & I had received a tip from another...,5,My husband I had received a tip from another ...,0,0,0,0,1,0,0,0,0,0,0,1


In [250]:
diaper_selected_words_model_pred_proba = selected_words_model.predict_proba(diaper_champ_reviews_test[selected_words])
diaper_selected_words_model_pred = selected_words_model.predict(diaper_champ_reviews_test[selected_words])
len(diaper_selected_words_model_pred_proba[:,1])

57

In [251]:
diaper_selected_words_model_pred_proba[:,1].tolist()

[0.8041966277270779,
 0.8994631263346075,
 0.6111932187856536,
 0.8041966277270779,
 0.8041966277270779,
 0.8041966277270779,
 0.8041966277270779,
 0.8994631263346075,
 0.8041966277270779,
 0.9378367331610821,
 0.8994631263346075,
 0.8041966277270779,
 0.8041966277270779,
 0.6111932187856536,
 0.9511916492139302,
 0.8994631263346075,
 0.9378367331610821,
 0.9378367331610821,
 0.8994631263346075,
 0.8041966277270779,
 0.8041966277270779,
 0.6111932187856536,
 0.9378367331610821,
 0.8041966277270779,
 0.8041966277270779,
 0.6111932187856536,
 0.8041966277270779,
 0.8994631263346075,
 0.9378367331610821,
 0.9862231488089293,
 0.8041966277270779,
 0.8994631263346075,
 0.8041966277270779,
 0.8041966277270779,
 0.9822749154734584,
 0.8041966277270779,
 0.6111932187856536,
 0.8041966277270779,
 0.8041966277270779,
 0.8994631263346075,
 0.8041966277270779,
 0.8994631263346075,
 0.9378367331610821,
 0.8041966277270779,
 0.8041966277270779,
 0.8041966277270779,
 0.8041966277270779,
 0.8041966277

In [252]:
diaper_selected_words_model_pred_proba = pd.DataFrame(
    {'review_clean': diaper_champ_reviews['review_clean'],
     'predicted_sentiment_prob': diaper_selected_words_model_pred_proba[:,1].tolist(),
     'predicted_sentiment': diaper_selected_words_model_pred.tolist()
    })

In [254]:
diaper_selected_words_model_pred_proba.shape

(57, 3)

In [268]:
diaper_selected_words_model_pred_proba.loc[376]['review_clean']

'This is absolutely by far the best diaper pail money can buy  Never do we detect a diaper odor and my husband has a very sensitive sense of smell and is usually very quick to complain about such things  For those who say they have a problem with the Diaper Champ getting stuckthe ONLY time this ever happens to us is when the bag is full and needs to be changed  We love that it uses regular kitchen trash bags makes it much more economical  We have not found that we need to worry about frequent emptying or cleaning  We just leave the Champ to do its job until the mechanism begins to feel like its getting stuckthen we change the bag  For us this means about once a week  Not only is the Champ EASY to use its kind of fun  Before our daughter was born we really worried about whether the diaper pail we chose would be effective enough for us because my husband is so sensitive to smells  But shes two months old now and we still just cant say enough good things about itUPDATE  My daughter is now

In [269]:
diaper_selected_words_model_pred_proba.loc[414]['review_clean']

'We have been using our Diaper Champ for almost 14 months now and we are very happy with it  It sits in the corner of our bathroom and we have never had any problems with odors at all and we live in the damp and humid South where odors of all kinds are generally rife  It is easy to install and change the ordinary trash bags easy to clean and most important easy to use one handed  just insert folded dirty diaper and flip the handle  As the bag fills you may need to flip twice to ensure that the weighted cylinder has completely pushed the diaper into the bag  For the first 9 months or so we only needed to empty this once a week  Now that our sons diapers are a little larger they take up more space and we empty twice a week  We have never had any problems with anything like loose wipes getting stuck we always wrap them in the dirty diaper  It is a bit difficult to get the unit open the advice to take it slowly is sound but now that our son is walking and exploring I consider this a positi

In [259]:
diaper_selected_words_sorted=diaper_selected_words_model_pred_proba.sort_values(by='predicted_sentiment_prob', ascending=False)

Why is the predicted_sentiment for the most positive review found using the model with all word counts (sentiment_model) much more positive than the one using only the selected_words (selected_words_model)? Hint: examine the text of this review, the extracted word counts for all words, and the word counts for each of the selected_words, and you will see what each model used to make its prediction. Save this result to answer the quiz at the end.

In [262]:
diaper_selected_words_sorted

Unnamed: 0,review_clean,predicted_sentiment_prob,predicted_sentiment
604,I have been using this diaper pail for 412 mon...,0.986223,1
426,I love this diaper pale and wouldnt dream of t...,0.986223,1
376,This is absolutely by far the best diaper pail...,0.982275,1
603,I was a diaper genie user until my son was abo...,0.951192,1
336,My husband I had received a tip from another ...,0.937837,1
344,This is on my list of must haves The thing ha...,0.937837,1
518,I LOOOVE this diaper pail Its the easiest to ...,0.937837,1
435,It is so easy to use You never have to change ...,0.937837,1
628,We love our diaper champ and are so glad that ...,0.937837,1
586,I love this product It is so easy to use and t...,0.937837,1


In [263]:
diaper_selected_words_sorted.iloc[0]['review_clean']

'I have been using this diaper pail for 412 months now and just love it It is taller than other diaper pails so you dont have to bend so far down to dispose of the diaper AND you can use regular kitchen garbage bags  I didnt want to have to buy special bags which are more expensive and just one more thing you can run out of  My son is still exclusively breastfed so I dont know if the Diaper Champ will continue doing such a great job once he is on solids but so far it has been great even in the VERY hot weather it has contained all diaper odors  I am very happy with it and would gladly recommend it especially if you are the least bit tall'