In [53]:
import graphlab as gl

# 1. Task: Predicting sentiment from product reviews


The **goal** of this task is to know if a particular review has a positive, or negative review associated with  it.

** Input **: Raw text blob of review data
```
My wife took me here on my birthday for breakfast and it was excellent.  The weather was perfect which made sitting outside overlooking their grounds an absolute pleasure. 
```

** Output **: Positive!

# 2. Getting access to data

In [None]:
!head -n 2 ../data/yelp/yelp_training_set_review.json

In [33]:
reviews = gl.SFrame.read_csv('../data/yelp/yelp_training_set_review.json', header = False)
reviews

------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[dict]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


X1
"{'votes': {'funny': 0, 'useful': 5, 'cool': 2}, ..."
"{'votes': {'funny': 0, 'useful': 0, 'cool': 0}, ..."
"{'votes': {'funny': 0, 'useful': 1, 'cool': 0}, ..."
"{'votes': {'funny': 0, 'useful': 2, 'cool': 1}, ..."
"{'votes': {'funny': 0, 'useful': 0, 'cool': 0}, ..."
"{'votes': {'funny': 1, 'useful': 3, 'cool': 4}, ..."
"{'votes': {'funny': 4, 'useful': 7, 'cool': 7}, ..."
"{'votes': {'funny': 0, 'useful': 1, 'cool': 0}, ..."
"{'votes': {'funny': 0, 'useful': 0, 'cool': 0}, ..."
"{'votes': {'funny': 0, 'useful': 1, 'cool': 0}, ..."


In [34]:
reviews[0]

{'X1': {'business_id': '9yKzy9PApeiPPOUJEtnvkg',
  'date': '2011-01-26',
  'review_id': 'fWKvX83p0-ka4JS3dc6E5A',
  'stars': 5,
  'text': 'My wife took me here on my birthday for breakfast and it was excellent.  The weather was perfect which made sitting outside overlooking their grounds an absolute pleasure.  Our waitress was excellent and our food arrived quickly on the semi-busy Saturday morning.  It looked like the place fills up pretty quickly so the earlier you get here the better.\n\nDo yourself a favor and get their Bloody Mary.  It was phenomenal and simply the best I\'ve ever had.  I\'m pretty sure they only use ingredients from their garden and blend them fresh when you order it.  It was amazing.\n\nWhile EVERYTHING on the menu looks excellent, I had the white truffle scrambled eggs vegetable skillet and it was tasty and delicious.  It came with 2 pieces of their griddled bread with was amazing and it absolutely made the meal complete.  It was the best "toast" I\'ve ever had

### Unpack to extract structure

In [35]:
reviews=reviews.unpack('X1','')
reviews

business_id,date,review_id,stars,text,type
9yKzy9PApeiPPOUJEtnvkg,2011-01-26,fWKvX83p0-ka4JS3dc6E5A,5,My wife took me here on my birthday for break ...,review
ZRJwVLyzEJq1VAihDhYiow,2011-07-27,IjZ33sJrzXqU-0X6U8NwyA,5,I have no idea why some people give bad reviews ...,review
6oRAC4uyJCsJl1X0WZpVSA,2012-06-14,IESLBzqUCLdSzSqm0eCSxQ,4,love the gyro plate. Rice is so good and I also ...,review
_1QQZuf4zZOyFCvXc0o6Vg,2010-05-27,G-WvGaISbqqaMHlNnByodA,5,"Rosie, Dakota, and I LOVE Chaparral Dog Park!!! ...",review
6ozycU1RpktNG2-1BroVtw,2012-01-05,1uJFq2r5QfJG_6ExMRCaGw,5,General Manager Scott Petello is a good egg!!! ...,review
-yxfBYGB6SEqszmxJxd97A,2007-12-13,m2CKSsepBCoRYWxiRUsxAg,4,"Quiessence is, simply put, beautiful. Full ...",review
zp713qNhx8d9KCJJnrw1xA,2010-02-12,riFQ3vxNpP4rWLk_CSri2A,5,Drop what you're doing and drive here. After I ...,review
hW0Ne_HTHEAgGF1rAdmR-g,2012-07-12,JL7GXJ9u4YMx7Rzs05NfiQ,4,"Luckily, I didn't have to travel far to make my ...",review
wNUea3IXZWD63bbOQaOH-g,2012-08-17,XtnfnYmnJYi71yIuGsXIUA,4,Definitely come for Happy hour! Prices are amaz ...,review
nMHhuYan8e3cONo3PornJA,2010-08-11,jJAIXA46pU1swYyRCdfXtQ,5,Nobuo shows his unique talents with everything ...,review

user_id,votes
rLtl8ZkDX5vH5nAx9C3q5Q,"{'funny': 0, 'useful': 5, 'cool': 2} ..."
0a2KyEL0d3Yb1V6aivbIuQ,"{'funny': 0, 'useful': 0, 'cool': 0} ..."
0hT2KtfLiobPvh6cDC8JQg,"{'funny': 0, 'useful': 1, 'cool': 0} ..."
uZetl9T0NcROGOyFfughhg,"{'funny': 0, 'useful': 2, 'cool': 1} ..."
vYmM4KTsC8ZfQBg-j5MWkw,"{'funny': 0, 'useful': 0, 'cool': 0} ..."
sqYN3lNgvPbPCTRsMFu27g,"{'funny': 1, 'useful': 3, 'cool': 4} ..."
wFweIWhv2fREZV_dYkz_1g,"{'funny': 4, 'useful': 7, 'cool': 7} ..."
1ieuYcKS7zeAv_U15AB13A,"{'funny': 0, 'useful': 1, 'cool': 0} ..."
Vh_DlizgGhSqQh4qfZ2h6A,"{'funny': 0, 'useful': 0, 'cool': 0} ..."
sUNkXg8-KFtCMQDV6zRzQg,"{'funny': 0, 'useful': 1, 'cool': 0} ..."


### Votes are still crammed in a dictionary. Let's unpack it.

In [36]:
reviews = reviews.unpack('votes', '')
reviews

business_id,date,review_id,stars,text,type
9yKzy9PApeiPPOUJEtnvkg,2011-01-26,fWKvX83p0-ka4JS3dc6E5A,5,My wife took me here on my birthday for break ...,review
ZRJwVLyzEJq1VAihDhYiow,2011-07-27,IjZ33sJrzXqU-0X6U8NwyA,5,I have no idea why some people give bad reviews ...,review
6oRAC4uyJCsJl1X0WZpVSA,2012-06-14,IESLBzqUCLdSzSqm0eCSxQ,4,love the gyro plate. Rice is so good and I also ...,review
_1QQZuf4zZOyFCvXc0o6Vg,2010-05-27,G-WvGaISbqqaMHlNnByodA,5,"Rosie, Dakota, and I LOVE Chaparral Dog Park!!! ...",review
6ozycU1RpktNG2-1BroVtw,2012-01-05,1uJFq2r5QfJG_6ExMRCaGw,5,General Manager Scott Petello is a good egg!!! ...,review
-yxfBYGB6SEqszmxJxd97A,2007-12-13,m2CKSsepBCoRYWxiRUsxAg,4,"Quiessence is, simply put, beautiful. Full ...",review
zp713qNhx8d9KCJJnrw1xA,2010-02-12,riFQ3vxNpP4rWLk_CSri2A,5,Drop what you're doing and drive here. After I ...,review
hW0Ne_HTHEAgGF1rAdmR-g,2012-07-12,JL7GXJ9u4YMx7Rzs05NfiQ,4,"Luckily, I didn't have to travel far to make my ...",review
wNUea3IXZWD63bbOQaOH-g,2012-08-17,XtnfnYmnJYi71yIuGsXIUA,4,Definitely come for Happy hour! Prices are amaz ...,review
nMHhuYan8e3cONo3PornJA,2010-08-11,jJAIXA46pU1swYyRCdfXtQ,5,Nobuo shows his unique talents with everything ...,review

user_id,cool,funny,useful
rLtl8ZkDX5vH5nAx9C3q5Q,2,0,5
0a2KyEL0d3Yb1V6aivbIuQ,0,0,0
0hT2KtfLiobPvh6cDC8JQg,0,0,1
uZetl9T0NcROGOyFfughhg,1,0,2
vYmM4KTsC8ZfQBg-j5MWkw,0,0,0
sqYN3lNgvPbPCTRsMFu27g,4,1,3
wFweIWhv2fREZV_dYkz_1g,7,4,7
1ieuYcKS7zeAv_U15AB13A,0,0,1
Vh_DlizgGhSqQh4qfZ2h6A,0,0,0
sUNkXg8-KFtCMQDV6zRzQg,0,0,1


### Quick data visualization

In [37]:
reviews.show()

Canvas is accessible via web browser at the URL: http://localhost:52845/index.html
Opening Canvas in default web browser.


In [None]:
gl.canvas.set_target('ipynb')

# 3. Problem formulation

### Define what's a positive and a negative sentiment

We will ignore all reviews with rating = 3, since they tend to have a neutral sentiment.  Reviews with a rating of 4 or higher will be considered positive, while the ones with rating of 2 or lower will have a negative sentiment.   

In [57]:
reviews['stars'].show(view = 'Categorical')

In [42]:
#ignore all 3* reviews
reviews = reviews[reviews['stars'] != 3]

In [44]:
#positive sentiment = 4* or 5* reviews
reviews['sentiment'] = reviews['stars'] >=4

In [56]:
reviews['sentiment'].show(view = 'Categorical')

# 4. Feature engineering

The goal is to convert data of the following form into something that is useful for machine learning.

```
'My wife took me here on my birthday for breakfast and it was excellent.  The weather was perfect which made sitting outside overlooking their grounds an absolute pleasure.  Our waitress was excellent and our food arrived quickly on the semi-busy Saturday morning.  It looked like the place fills up pretty quickly so the earlier you get here the better.\n\nDo yourself a favor and get their Bloody Mary.  It was phenomenal and simply the best I\'ve ever had.  I\'m pretty sure they only use ingredients from their garden and blend them fresh when you order it.  It was amazing.\n\nWhile EVERYTHING on the menu looks excellent, I had the white truffle scrambled eggs vegetable skillet and it was tasty and delicious.  It came with 2 pieces of their griddled bread with was amazing and it absolutely made the meal complete.  It was the best "toast" I\'ve ever had.\n\nAnyway, I can\'t wait to go back!',
```
  

In [40]:
reviews['word_count'] = gl.text_analytics.count_words(reviews['text'])

In [46]:
reviews['word_count']

dtype: dict
Rows: 194544

# 5. Model/Algorithm selection & training

Finally, we are ready to train a model.

In [47]:
train_data, test_data = reviews.random_split(.8, seed=0)

In [49]:
sentiment_model = gl.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=['word_count'],
                                                     validation_set=test_data)

# 6a. Evaluate the model (Quantitatively)

In [50]:
sentiment_model.evaluate(test_data, metric='roc_curve')

{'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+----------------+----------------+-------+------+
 | threshold |      fpr       |      tpr       |   p   |  n   |
 +-----------+----------------+----------------+-------+------+
 |    0.0    |      1.0       |      1.0       | 31225 | 7605 |
 |   1e-05   | 0.804996712689 | 0.999007205765 | 31225 | 7605 |
 |   2e-05   | 0.786982248521 | 0.99871897518  | 31225 | 7605 |
 |   3e-05   | 0.776199868508 | 0.998622898319 | 31225 | 7605 |
 |   4e-05   | 0.768047337278 | 0.998558847078 | 31225 | 7605 |
 |   5e-05   | 0.758579881657 | 0.998462770216 | 31225 | 7605 |
 |   6e-05   | 0.753320184089 | 0.998366693355 | 31225 | 7605 |
 |   7e-05   | 0.747797501644 | 0.998302642114 | 31225 | 7605 |
 |   8e-05   | 0.742932281394 | 0.998238590873 | 31225 | 7605 |
 |   9e-05   | 0.738067061144 | 0.998174539632 | 31225 | 7605 |
 +-----------+----------------+----------------+-------+------

In [54]:
sentiment_model.show(view='Evaluation')

# 6b. Evaluate the model (Qualitatively)

Let us start by picking the most popular restraunt

In [63]:
most_popular_business = 'VVeogjZya58oiTxK7qUjAQ'

In [65]:
most_popular_business_data = test_data[test_data['business_id'] == most_popular_business]
most_popular_business_data

business_id,date,review_id,stars,text,type
VVeogjZya58oiTxK7qUjAQ,2009-12-06,DtgJ5aFeUSk7bNGY408WPA,4,"Great Pizza! Great food! Great service! Still, ...",review
VVeogjZya58oiTxK7qUjAQ,2011-09-06,YvQe-wF-QA7Db9jyOrHuKg,4,Over Hyped ??... Yeah a little but its defini ...,review
VVeogjZya58oiTxK7qUjAQ,2011-10-09,M02b5MTdhiOneijWtSOFdA,5,Taste buds went CRRRAAZZYYY!!\n\nFinally ...,review
VVeogjZya58oiTxK7qUjAQ,2011-10-03,m3gwH2G0YpYUQG25D5yM7g,5,This place was great. When one of my coworkers ...,review
VVeogjZya58oiTxK7qUjAQ,2012-05-10,83WQ3yROY6olQen0wP718Q,2,Abridged: \nI'm gonna be real with you. I am not ...,review
VVeogjZya58oiTxK7qUjAQ,2012-09-17,J2AWK69TkltyPCZPbVyhfQ,4,I had no idea they were open for lunch! Perfect ...,review
VVeogjZya58oiTxK7qUjAQ,2012-11-18,7bo4oUOM6JjTabn1ly_5wA,5,We went to Pizzeria Bianco expecting a mega ...,review
VVeogjZya58oiTxK7qUjAQ,2008-07-26,RD80sDmOHHKXasKbhZlqpg,4,An excellent pizza and well worth its place on ...,review
VVeogjZya58oiTxK7qUjAQ,2009-02-23,ynv0auoM3nhbWhGqUrQ9Gg,4,I had a trip to Phoenix a few month ago. I did ...,review
VVeogjZya58oiTxK7qUjAQ,2010-04-05,mw2JUSGuEgDinvk6OsZXJg,2,I was incredibly excited about eating here. I ...,review

user_id,cool,funny,useful,word_count,sentiment
UuwjD6MZf6Z6QlNphiXRjA,3,1,6,"{'earlier),': 1, 'nope.': 1, 'just': 2, ""don't"" ...",1
M3R4oIrJaHDDHbhbzwCLrA,0,0,1,"{'and': 3, 'right': 1, 'particularly': 1, ...",1
Xz1w0h7wDI22IZKi-CnrHA,0,0,0,"{'and': 4, 'the': 3, 'all': 1, 'ordered': 1, ...",1
_RN13rQ1c77hAi9LU7nwnA,2,0,0,"{'saying': 1, 'all': 1, ""don't"": 2, '-': 9, ...",1
7zDqr2I0-xpw9HF5Ha54cA,1,1,5,"{'opinions': 1, 'least': 1, 'cried': 1, 'rating': ...",0
_uL7OiQSfNsCd60DrAf7qQ,1,0,1,"{'just': 1, 'less': 1, 'sooo': 1, 'still': 1, ...",1
eyAXtSO0ECNBC5VGuoigaA,0,0,0,"{'all': 1, 'combination.': 1, ...",1
jJOeu8snPhHtz8jdFZ62VQ,1,0,1,"{'and': 2, 'irish': 1, 'do': 2, 'lists.': 1, ...",1
SIJ237q2EZE-1Z8zFQYJgQ,0,0,0,"{'and': 1, 'all': 1, 'is': 1, 'some': 1, ' ...",1
GQmw4SHwhKm8-F-zx804NA,0,0,2,"{'atmosphere': 1, 'just': 1, 'being': 1, 'all': 2, ...",0


## Sort the reviews based on the predicted sentiment and explore

In [71]:
most_popular_business_data['predictions'] = sentiment_model.predict(most_popular_business_data,
                                               output_type = 'probability')

In [72]:
most_popular_business_data = most_popular_business_data.sort('predictions')

## Explore some very bad sentiment reviews

In [77]:
print most_popular_business_data['text'][1]

We finally bit the bullet and decided to wait for dinner at Pizzeria Bianco. I was disappointed. No, I was angry at how bad an experience it was.

I won't be so naive as to complain about the wait, we went knowing it would be a long wait, and arriving at 4:45, we were told it would be 3.5 to 4 hours. No big surprise there. Luckily we were actually sat at 7:45, so that was nice. 

The market salad was great, a super light vinaigrette, super fresh escarole greens and nicely shaved parmigiano reggiano. The Antipasto was amazing as well, with roasted vegetables and olives and sopressata. So far I was very pleased with my meal.

Then came the pizza. We ordered the Wiseguy, and added mushrooms and garlic.

Now I've had thin crust pizza cooked in a wood fired oven before, but I have NEVER had it be not only undercooked in the center but completely sopping wet. We're talking grab the crust and pull a slice to your place and only get 2 inches of actual pizza, with the rest of the mush still in 

## Explore some very good sentiment reviews

In [81]:
print most_popular_business_data['text'][-2]

I can now say I have reached the pinnacle of pizza perfection.

Pizza has always been one of my favorite foods and has recently become one of my obsessions after visiting Naples this summer. I have found some amazing people within the states most notably in the Bay Area, Southern California and Chicago, but was simply blown away by the perfection that is Pizzeria Bianco.

I came here on a Friday afternoon and there was no wait for a table of five. We sat down and were super excited to sample some of the best pizza in America.

I started off with a salad consisting of fresh mozzarella, basil, olive oil and tomatoes. It is probably impossible to make that list of ingredients taste bad and this salad did not disappoint. Everything worked perfectly together. The salad was light, so fresh and so damn good.

I had to order the margherita pizza because that's the best way to judge a place I feel. The pizza came out piping hot and it was absolutely beautiful. The crust was for sure a clear hig

In [None]:
# 7. Deployment

