# 4. Clasificación: Prediciendo sentimiento de reviews de productos

## Dataset
'amazon_baby.csv' contiene información de reviews de productos de bebe en Amazon. El propósito de este caso de estudio es construir un clasificador de sentimiento que pueda predecir si el review de un producto es positivo o negativo. 

Dentro del dataset se tiene los siguientes datos:
1. `review`: Texto del review escrito por un usuario
2. `name`: Nombre del producto
3. `rating`: Rating del 1 al 5
Existen muchas técnicas para el análisis de sentimiento en texto hoy en día, pero para efectos de este caso, el análisis de sentimiento lo realizaremos usando un conteo de palabras. 

Por ejemplo: Para el review "the sushi was good and the service was excellent" se generaría el conteo de palabras:
"the": 2
"sushi": 1
"was": 2
"good": 1
"and" 1
"service": 1
"excelente": 1

1. Usa `CountVectorize`que se encuentra en klearn.feature_extraction.text para obtener lo feature para tu modelo
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html

2. Ahora usa `TfidfVectorizer` para obtgener los features para tu modelo y compáralo contra el anterior
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

In [61]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import string
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.feature_extraction.text import TfidfTransformer

In [43]:
data = pd.read_csv('amazon_baby.csv')
data.head()

Unnamed: 0,name,review,rating
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5


In [44]:
data.describe()

Unnamed: 0,rating
count,183531.0
mean,4.120448
std,1.285017
min,1.0
25%,4.0
50%,5.0
75%,5.0
max,5.0


In [45]:
data['review'][3] 

'This is a product well worth the purchase.  I have not found anything else like this, and it is a positive, ingenious approach to losing the binky.  What I love most about this product is how much ownership my daughter has in getting rid of the binky.  She is so proud of herself, and loves her little fairy.  I love the artwork, the chart in the back, and the clever approach of this tool.'

In [46]:
data['review1']=data['review'].str.replace('[{}]'.format(string.punctuation), '')
data.head()

Unnamed: 0,name,review,rating,review1
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3,These flannel wipes are OK but in my opinion n...
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,it came early and was not disappointed i love ...
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,Very soft and comfortable and warmer than it l...
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,This is a product well worth the purchase I h...
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,All of my kids have cried nonstop when I tried...


In [47]:
data[['review', 'review1']].head()

Unnamed: 0,review,review1
0,"These flannel wipes are OK, but in my opinion ...",These flannel wipes are OK but in my opinion n...
1,it came early and was not disappointed. i love...,it came early and was not disappointed i love ...
2,Very soft and comfortable and warmer than it l...,Very soft and comfortable and warmer than it l...
3,This is a product well worth the purchase. I ...,This is a product well worth the purchase I h...
4,All of my kids have cried non-stop when I trie...,All of my kids have cried nonstop when I tried...


In [48]:
data['review'][3]

'This is a product well worth the purchase.  I have not found anything else like this, and it is a positive, ingenious approach to losing the binky.  What I love most about this product is how much ownership my daughter has in getting rid of the binky.  She is so proud of herself, and loves her little fairy.  I love the artwork, the chart in the back, and the clever approach of this tool.'

In [49]:
data['review1'][3]

'This is a product well worth the purchase  I have not found anything else like this and it is a positive ingenious approach to losing the binky  What I love most about this product is how much ownership my daughter has in getting rid of the binky  She is so proud of herself and loves her little fairy  I love the artwork the chart in the back and the clever approach of this tool'

In [50]:
data.isnull().sum()

name       318
review     829
rating       0
review1    829
dtype: int64

In [51]:
data = data[data['rating'] != 3]
data.head()

Unnamed: 0,name,review,rating,review1
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,it came early and was not disappointed i love ...
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,Very soft and comfortable and warmer than it l...
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,This is a product well worth the purchase I h...
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,All of my kids have cried nonstop when I tried...
5,Stop Pacifier Sucking without tears with Thumb...,"When the Binky Fairy came to our house, we did...",5,When the Binky Fairy came to our house we didn...


In [52]:
data.isnull().sum()

name       296
review     777
rating       0
review1    777
dtype: int64

Rellenar con NA

In [53]:
data = data.fillna({'review':''})  
data = data.fillna({'review1':''})

In [54]:
data.isnull().sum()

name       296
review       0
rating       0
review1      0
dtype: int64

In [55]:
#we apply 1 if the sentiments is bigger than 1 or else -1
data['a_sent'] = data['rating'].apply(lambda rating : +1 if rating > 3 else 0)

In [56]:
data.sample(5)

Unnamed: 0,name,review,rating,review1,a_sent
21654,Nautica Kids William 4 Piece Crib Set,When I saw this online I loved it. The colors ...,2,When I saw this online I loved it The colors a...,0
37260,"The First Years Spinning Drying Rack, White",Love the new bigger tabs on the ends of the bo...,5,Love the new bigger tabs on the ends of the bo...,1
94204,"Boppy Newborn Lounger, Geo","The boppy lounger has helped with twins, expec...",5,The boppy lounger has helped with twins expeci...,1
133441,"Boon Catch Bowl with Spill Catcher,Blue/Orange",Bought for my grand daughter and it does what ...,4,Bought for my grand daughter and it does what ...,1
156003,"Similac SimplySmart Bottle, 4 Ounce",After sterilizing for a few times in an electr...,1,After sterilizing for a few times in an electr...,0


In [57]:
data.sample(n=10, replace = False, random_state=5)

Unnamed: 0,name,review,rating,review1,a_sent
80203,"Simple Wishes Hands-Free Breastpump Bra, Pink,...","When I first saw this, I thought it was a gimm...",5,When I first saw this I thought it was a gimmi...,1
55086,"Safety 1st Grip N\' Twist Door Knob Cover, 4-C...",Child will figure this out sooner than those c...,1,Child will figure this out sooner than those c...,0
65317,Peekaru Original Fleece Baby Carrier Cover Med...,I hoped to use this from birth to take my wint...,2,I hoped to use this from birth to take my wint...,0
93245,Evenflo Splash Mega Exersaucer,,5,,1
32059,"Sassy Baby 3 Piece Electronic Toy Set, Multipl...",,4,,1
5475,Dr. Brown\'s Natural Flow Level 2 Wide Neck Ni...,Both of my girls have used Dr Browns bottles a...,4,Both of my girls have used Dr Browns bottles a...,1
121418,Ameda Purely Yours Breast Pump,I received this pump through my insurance. It ...,1,I received this pump through my insurance It i...,0
105613,Fisher-Price Luv U Zoo Crib \'N Go Projector S...,This item is awesome. It definitely gets the ...,5,This item is awesome It definitely gets the a...,1
152568,Munchkin XTRAGUARD 2 Count Dual Action Multi U...,After a long long time.. Kiddo is now &#34;fru...,5,After a long long time Kiddo is now 34frustrat...,1
78352,Recaro Signo Convertible Car Seat Midnight Desert,I have used this with both my kids. It is the ...,5,I have used this with both my kids It is the b...,1


In [58]:
from sklearn.model_selection import train_test_split

In [60]:
train,test = train_test_split(data,test_size = 0.20)

In [62]:
x=data['review']
y=data['a_sent']

In [63]:
X_train ,X_test ,y_train ,y_test = train_test_split(x,y, random_state = 0)

In [64]:
Tfid_vectorizer = TfidfVectorizer(token_pattern=r'\b\w+\b', stop_words = 'english' )

X_train  = Tfid_vectorizer.fit_transform(X_train)
X_test  = Tfid_vectorizer.transform(X_test)


clf = LogisticRegression(max_iter = 10000000)
sentiment_model = clf.fit(X_train , y_train)

In [69]:
acc = round(sentiment_model.score(X_train , y_train) * 100, 2)
acc

93.75

In [70]:
acc1 = round(sentiment_model.score(X_test , y_test) * 100, 2)
acc1

92.68

In [71]:
count_vectorizer = CountVectorizer(token_pattern=r'\b\w+\b', stop_words = 'english')

X_train ,X_test ,y_train ,y_test = train_test_split(x,y, random_state = 0)

X_train  = count_vectorizer.fit_transform(X_train)
X_test   = count_vectorizer.transform(X_test)

clf = LogisticRegression(max_iter = 10000000)
sentiment_model2 = clf.fit(X_train , y_train) 

In [72]:
acc2 = round(sentiment_model2.score(X_train, y_train) * 100, 2)
acc2

95.96

In [73]:
acc3 = round(sentiment_model2.score(X_test , y_test) * 100, 2)
acc3

92.42

In [74]:
hashing_vectorizer = HashingVectorizer(token_pattern=r'\b\w+\b')

X_train ,X_test ,y_train ,y_test = train_test_split(x,y, random_state = 0)

X_train  = hashing_vectorizer.fit_transform(X_train)
X_test  = hashing_vectorizer.transform(X_test)

clf = LogisticRegression(max_iter = 10000000)
sentiment_model = clf.fit(X_train , y_train)


In [75]:
acc4 = round(sentiment_model.score(X_train , y_train) * 100, 2)
acc4

92.98

In [76]:
acc5 = round(sentiment_model.score(X_test , y_test) * 100, 2)
acc5

92.5