## Analyzing product sentiment

### What you will do

Now you are ready! We are going do four tasks in this assignment. There are several results you need to gather along the way to enter into the quiz after this reading.

In the IPython notebook above, we used the word counts for all words in the reviews to train the sentiment classifier model. Now, we are going to follow a similar path, but only use this subset of the words:

Often, ML practitioners will throw out words they consider “unimportant” before training their model. This procedure can often be helpful in terms of accuracy. Here, we are going to throw out all words except for the very few above. Using so few words in our model will hurt our accuracy, but help us interpret what our classifier is doing.

In [None]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

**Use .apply() to build a new feature with the counts for each of the selected_words:** In the notebook above, we created a column ‘word_count’ with the word counts for each review. Our first task is to create a new column in the products SFrame with the counts for each selected_word above, and, in the process, we will see how the method .apply() can be used to create new columns in our data (our features) and how to use a Python function, which is an extremely useful concept to grasp!

Our first goal is to create a column products[‘awesome’] where each row contains the number of times the word ‘awesome’ showed up in the review for the corresponding product, and 0 if the review didn’t show up. One way to do this is to look at the each row ‘word_count’ column and follow this logic:

If ‘awesome’ shows up in the word counts for a particular product (row of the products SFrame), then we know how often ‘awesome’ appeared in the review, if ‘awesome’ doesn’t appear in the word counts, then it didn’t appear in the review, and we should set the count for ‘awesome’ to 0 in this review.

We could use a for loop to iterate this logic for each row of the products SFrame, but this approach would be really slow, because the SFrame is not optimized for this being accessed with a for loop. Instead, we will use the .apply() method to iterate the the logic above for each row of the products[‘word_count’] column (which, since it’s a single column, has type SArray). [Read about using the .apply() method on an SArray here.](https://dato.com/products/create/docs/generated/graphlab.SArray.apply.html)

We are now ready to create our new columns:

First, you will use a Python function to define the logic above. You will write a function called awesome_count which takes in the word counts and returns the number of times ‘awesome’ appears in the reviews.

A few tips:

i. Each entry of the ‘word_count’ column is of Python type dictionary.

ii. If you have a dictionary called dict, you can access a field in the dictionary using:
`dict['awesome']`


but only if ‘awesome’ is one of the fields in the dictionary, otherwise you will get a nasty error.

iii. In Python, to test if a dictionary has a particular field, you can simply write:
`if 'awesome' in dict`


In our case, if this condition doesn’t hold, the count of ‘awesome’ should be 0.

Using these tips, you can now write the awesome_count function.

Next, you will use .apply() to iterate awesome_count for each row of products[‘word_count’] and create a new column called ‘awesome’ with the resulting counts. Here is what that looks like:
`products['awesome'] = products['word_count'].apply(awesome_count)`


And you are done! Check the products SFrame and you should see the new column you just create.

### 1. Repeat this process for the other 11 words in selected_words.

(Here, we described a simple procedure to obtain the counts for each selected_word. There are other more efficient ways of doing this, and we encourage you to explore this further.)Using the .sum() method on each of the new columns you created, answer the following questions: Out of the selected_words, which one is most used in the dataset? Which one is least used? Save these results to answer the quiz at the end.

In [1]:
import graphlab

products = graphlab.SFrame('../week3/amazon_baby.gl/')

[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1523378560.log


This non-commercial license of GraphLab Create for academic use is assigned to jaekeunprk@gmail.com and will expire on March 15, 2019.


In [2]:
products.head(3)

name,review,rating
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0


In [3]:
# create 'word_count' column
products['word_count'] = graphlab.text_analytics.count_words(products['review'])

In [4]:
products.head(3)

name,review,rating,word_count
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,"{'and': 5, '6': 1, 'stink': 1, 'because' ..."
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ..."
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ..."


In [5]:
# awesome_count function

def awesome_count(x):
    if 'awesome' in x:
        return x['awesome']
    else:
        return 0

In [6]:
# create 'awesome' column

products['awesome'] = products['word_count'].apply(awesome_count)

In [7]:
products.head(3)

name,review,rating,word_count,awesome
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,"{'and': 5, '6': 1, 'stink': 1, 'because' ...",0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ...",0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...",0


In [8]:
# sum of awesome
products['awesome'].sum()

2090

In [9]:
# selected_words_count function

def selected_words_count(x):
    if word in x:
        return x[word]
    else:
        return 0

In [10]:
# create selected_words columns
selected_words = ['great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

for word in selected_words:
    products[word] = products['word_count'].apply(selected_words_count)

In [11]:
products.head(3)

name,review,rating,word_count,awesome,great,fantastic
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,"{'and': 5, '6': 1, 'stink': 1, 'because' ...",0,0,0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ...",0,0,0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate
0,0,0,0,0,0,0,0
0,1,0,0,0,0,0,0
0,0,0,0,0,0,0,0


In [12]:
# sum of each column
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

for word in selected_words:
    print 'sum of {}: {}'.format(word, products[word].sum())

sum of awesome: 2090
sum of great: 45206
sum of fantastic: 932
sum of amazing: 1363
sum of love: 42065
sum of horrible: 734
sum of bad: 3724
sum of terrible: 748
sum of awful: 383
sum of wow: 144
sum of hate: 1220


### 2. Create a new sentiment analysis model using only the selected_words as features: 

In the IPython Notebook above, we used word counts for all words as features for our sentiment classifier. Now, you are just going to use the selected_words:

Use the same train/test split as in the IPython Notebook from lecture:

In [13]:
# orginal data
len(products)

183531

In [14]:
# create senetiment column

products = products[products['rating'] != 3]
products['sentiment'] = products['rating'] >= 4

In [15]:
# reduced data
len(products)

166752

In [16]:
# train, test split
train_data, test_data = products.random_split(.8, seed=0)

Train a logistic regression classifier (use graphlab.logistic_classifier.create) using just the selected_words. Hint: you can use this parameter in the .create() call to specify the features used to be exactly the new columns you just created:

In [17]:
# features=selected_words
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

selected_words_model = graphlab.logistic_classifier.create(train_data,
                                                           target='sentiment',
                                                           features=selected_words,
                                                           validation_set=test_data)

Call your new model: selected_words_model.

You will now examine the weights the learned classifier assigned to each of the 11 words in selected_words and gain intuition as to what the ML algorithm did for your data using these features. In GraphLab Create, a learned model, such as the selected_words_model, has a field 'coefficients', which lets you look at the learned coefficients. You can access it by using:

In [19]:
selected_words_model['coefficients']

name,index,class,value,stderr
(intercept),,1,1.36728315229,0.00861805467824
awesome,,1,1.05800888878,0.110865296265
great,,1,0.883937894898,0.0217379527921
fantastic,,1,0.891303090304,0.154532343591
amazing,,1,0.892802422508,0.127989503231
love,,1,1.39989834302,0.0287147460124
horrible,,1,-1.99651800559,0.0973584169028
bad,,1,-0.985827369929,0.0433603009142
terrible,,1,-2.09049998487,0.0967241912229
awful,,1,-1.76469955631,0.134679803365


The result has a column called ‘value’, which contains the weight learned for each feature.

Using this approach, sort the learned coefficients according to the ‘value’ column using .sort(). Out of the 11 words in selected_words, which one got the most positive weight? Which one got the most negative weight? Do these values make sense for you? **Save these results to answer the quiz at the end.**

In [45]:
selected_words_model['coefficients'].sort('value', ascending=False)

name,index,class,value,stderr
love,,1,1.39989834302,0.0287147460124
(intercept),,1,1.36728315229,0.00861805467824
awesome,,1,1.05800888878,0.110865296265
amazing,,1,0.892802422508,0.127989503231
fantastic,,1,0.891303090304,0.154532343591
great,,1,0.883937894898,0.0217379527921
wow,,1,-0.0541450123333,0.275616449416
bad,,1,-0.985827369929,0.0433603009142
hate,,1,-1.40916406276,0.0771983993506
awful,,1,-1.76469955631,0.134679803365


In [51]:
selected_words_model['coefficients'].sort('value', ascending=False).print_rows(12)

+-------------+-------+-------+------------------+------------------+
|     name    | index | class |      value       |      stderr      |
+-------------+-------+-------+------------------+------------------+
|     love    |  None |   1   |  1.39989834302   | 0.0287147460124  |
| (intercept) |  None |   1   |  1.36728315229   | 0.00861805467824 |
|   awesome   |  None |   1   |  1.05800888878   |  0.110865296265  |
|   amazing   |  None |   1   |  0.892802422508  |  0.127989503231  |
|  fantastic  |  None |   1   |  0.891303090304  |  0.154532343591  |
|    great    |  None |   1   |  0.883937894898  | 0.0217379527921  |
|     wow     |  None |   1   | -0.0541450123333 |  0.275616449416  |
|     bad     |  None |   1   | -0.985827369929  | 0.0433603009142  |
|     hate    |  None |   1   |  -1.40916406276  | 0.0771983993506  |
|    awful    |  None |   1   |  -1.76469955631  |  0.134679803365  |
|   horrible  |  None |   1   |  -1.99651800559  | 0.0973584169028  |
|   terrible  |  Non

### 3. Comparing the accuracy of different sentiment analysis model: 
Using the method

What is the accuracy of the selected_words_model on the test_data? What was the accuracy of the sentiment_model that we learned using all the word counts in the IPython Notebook above from the lectures? What is the accuracy majority class classifier on this task? How do you compare the different learned models with the baseline approach where we are just predicting the majority class? **Save these results to answer the quiz at the end.**

Hint: we discussed the majority class classifier in lecture, which simply predicts that every data point is from the most common class. This is baseline is something we definitely want to beat with models we learn from data.

In [23]:
# evaluation of selected_words_model

selected_words_model.evaluate(test_data)

{'accuracy': 0.8431419649291376,
 'auc': 0.6648096413721418,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        0        |  234  |
 |      0       |        1        |  5094 |
 |      1       |        1        | 27846 |
 |      1       |        0        |  130  |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.914242563530107,
 'log_loss': 0.4054747110365649,
 'precision': 0.8453551912568306,
 'recall': 0.9953531598513011,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+-----+-----+-------+------+
 | threshold | fpr | tpr |   p   |  n   |
 +-----------+-----+-----+-------+------+
 |    0.0    | 1.0 | 1.0 | 27976 | 5328 |
 |   1e-05   | 1.0 | 1.0 | 27976 | 5328 |
 |   2e-05   | 

In [27]:
# evaluation of sentiment_model

sentiment_model = graphlab.logistic_classifier.create(train_data,
                                                      target='sentiment',
                                                      features=['word_count'],
                                                      validation_set=test_data)

sentiment_model.evaluate(test_data, metric='accuracy')

{'accuracy': 0.916256305548883}

In [28]:
# majority class ( 1 ) & ratio in full data

products['sentiment'].sum() / len(products)

0.8411233448474381

In [43]:
# majority class ( 1 ) & ratio in test data

test_data['sentiment'].sum() / len(test_data)

0.8400192169108815

### 4. Interpreting the difference in performance between the models:
To understand why the model with all word counts performs better than the one with only the selected_words, we will now examine the reviews for a particular product.

We will investigate a product named ‘Baby Trend Diaper Champ’. (This is a trash can for soiled baby diapers, which keeps the smell contained.)Just like we did for the reviews for the giraffe toy in the IPython Notebook in the lecture video, before we start our analysis you should select all reviews where the product name is ‘Baby Trend Diaper Champ’. Let’s call this table diaper_champ_reviews.Again, just as in the video, use the sentiment_model to predict the sentiment of each review in diaper_champ_reviews and sort the results according to their ‘predicted_sentiment’.What is the ‘predicted_sentiment’ for the most positive review for ‘Baby Trend Diaper Champ’ according to the sentiment_model from the IPython Notebook from lecture? **Save this result to answer the quiz at the end.** Now use the selected_words_model you learned using just the selected_words to predict the sentiment most positive review you found above. Hint: if you sorted the diaper_champ_reviews in descending order (from most positive to most negative), this command will be helpful to make the prediction you need:

In [29]:
# create diaper_champ_reviews table with product named 'Baby Trend Diaper Champ’
diaper_champ_reviews = products[products['name'] == 'Baby Trend Diaper Champ']

In [30]:
diaper_champ_reviews.head(3)

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,Ok - newsflash. Diapers are just smelly. We've ...,4.0,"{'son': 1, 'just': 2, 'less': 1, '-': 3, ...",0,0,0
Baby Trend Diaper Champ,"My husband and I selected the Diaper ""Champ"" ma ...",1.0,"{'material)': 1, 'bags,': 1, 'less': 1, 'when': 3, ...",0,0,0
Baby Trend Diaper Champ,Excellent diaper disposal unit. I used it in ...,5.0,"{'control': 1, 'am': 1, 'it': 1, 'used': 1, ' ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment
0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,1


In [31]:
diaper_champ_reviews['predicted_sentiment'] = sentiment_model.predict(diaper_champ_reviews,
                                                                     output_type='probability')

In [32]:
diaper_champ_reviews.head(3)

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,Ok - newsflash. Diapers are just smelly. We've ...,4.0,"{'son': 1, 'just': 2, 'less': 1, '-': 3, ...",0,0,0
Baby Trend Diaper Champ,"My husband and I selected the Diaper ""Champ"" ma ...",1.0,"{'material)': 1, 'bags,': 1, 'less': 1, 'when': 3, ...",0,0,0
Baby Trend Diaper Champ,Excellent diaper disposal unit. I used it in ...,5.0,"{'control': 1, 'am': 1, 'it': 1, 'used': 1, ' ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment,predicted_sentiment
0,0,0,0,0,0,0,0,1,0.958443580893
0,0,0,0,0,0,0,0,0,2.47155884995e-12
0,0,0,0,0,0,0,0,1,0.999994864775


In [34]:
diaper_champ_reviews = diaper_champ_reviews.sort('predicted_sentiment', ascending=False)

In [35]:
diaper_champ_reviews.head(3)

name,review,rating,word_count,awesome,great,fantastic
Baby Trend Diaper Champ,Baby Luke can turn a clean diaper to a dirty ...,5.0,"{'all': 1, 'less': 1, ""friend's"": 1, '(which': ...",0,0,0
Baby Trend Diaper Champ,I LOOOVE this diaper pail! Its the easies ...,5.0,"{'just': 1, 'over': 1, 'rweek': 1, 'sooo': 1, ...",0,0,0
Baby Trend Diaper Champ,We researched all of the different types of di ...,4.0,"{'all': 2, 'just': 4, ""don't"": 2, 'one,': 1, ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate,sentiment,predicted_sentiment
0,0,0,0,0,0,0,0,1,0.999999937267
0,1,0,0,0,0,0,0,1,0.999999917406
0,0,0,1,0,0,0,0,1,0.999999899509


In [36]:
# the most positive review for sentiment_model
diaper_champ_reviews['review'][0]

'Baby Luke can turn a clean diaper to a dirty diaper in 3 seconds flat. The diaper champ turns the smelly diaper into "what diaper smell" in less time than that. I hesitated and wondered what I REALLY needed for the nursery. This is one of the best purchases we made. The champ, the baby bjorn, fluerville diaper bag, and graco pack and play bassinet all vie for the best baby purchase.Great product, easy to use, economical, effective, absolutly fabulous.UpdateI knew that I loved the champ, and useing the diaper genie at a friend\'s house REALLY reinforced that!! There is no comparison, the chanp is easy and smell free, the genie was difficult to use one handed (which is absolutly vital if you have a little one on a changing pad) and there was a deffinite odor eminating from the genieplus we found that the quick tie garbage bags where the ties are integrated into the bag work really well because there isn\'t any added bulk around the sealing edge of the champ.'

In [42]:
# predict probability of above review using selected_words_model
selected_words_model.predict(diaper_champ_reviews[0:1], output_type='probability')

dtype: float
Rows: 1
[0.7969408512906712]

Why is the predicted_sentiment for the most positive review found using the model with all word counts (sentiment_model) much more positive than the one using only the selected_words (selected_words_model)? Hint: examine the text of this review, the extracted word counts for all words, and the word counts for each of the selected_words, and you will see what each model used to make its prediction. **Save this result to answer the quiz at the end.**