# Analyzing product sentiment using Pandas

### What is Sentiment Analysis?

Sentiment Analysis is a field within Natural Language Processing (NLP) concerned with identifying and classifying subjective opinions from text. Sentiment analysis ranges from detecting emotions (e.g., anger, happiness, fear), to sarcasm and intent (e.g., complaints, feedback, opinions). In its simplest form, sentiment analysis assigns a polarity (e.g., positive, negative, neutral) to a piece of text.


### Project Description:

How do you guess whether a person felt positively or negatively about an experience, just from a short review they wrote?

Here, analyzed sentiments, created models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information).

This project is an example of classification.

Implemented logistic regression classifier, analyzed the accuracy of classifier and found out the most positive and negative reviews.  

## Importing Libraries 


In [43]:
import pandas as pd
import numpy as np 

## Exploring Data

In [44]:
df = pd.read_csv('amazon_baby.csv')

In [45]:
df.shape

(183531, 3)

In [46]:
df.head(20)

Unnamed: 0,name,review,rating
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5
5,Stop Pacifier Sucking without tears with Thumb...,"When the Binky Fairy came to our house, we did...",5
6,A Tale of Baby\'s Days with Peter Rabbit,"Lovely book, it\'s bound tightly so you may no...",4
7,"Baby Tracker&reg; - Daily Childcare Journal, S...",Perfect for new parents. We were able to keep ...,5
8,"Baby Tracker&reg; - Daily Childcare Journal, S...",A friend of mine pinned this product on Pinter...,5
9,"Baby Tracker&reg; - Daily Childcare Journal, S...",This has been an easy way for my nanny to reco...,4


In [47]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 183531 entries, 0 to 183530
Data columns (total 3 columns):
 #   Column  Non-Null Count   Dtype 
---  ------  --------------   ----- 
 0   name    183213 non-null  object
 1   review  182702 non-null  object
 2   rating  183531 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 4.2+ MB


In [48]:
df.isnull().sum()

name      318
review    829
rating      0
dtype: int64

## Text Cleaning

In [50]:
df = df.fillna({'review':''})

In [51]:
def remove_punctuation(text):
    import string
    return text.translate(str.maketrans('','',string.punctuation))

df['review_clean'] = df['review'].apply(remove_punctuation)

In [52]:
df.head()


Unnamed: 0,name,review,rating,review_clean
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3,These flannel wipes are OK but in my opinion n...
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,it came early and was not disappointed i love ...
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,Very soft and comfortable and warmer than it l...
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,This is a product well worth the purchase I h...
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,All of my kids have cried nonstop when I tried...


## Extract Sentiments

In [54]:
df = df[df['rating'] != 3]

In [55]:
df['sentiment'] = df['rating'].apply(lambda rating : +1 if rating > 3 else -1)

In [56]:
df.head()

Unnamed: 0,name,review,rating,review_clean,sentiment
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5,it came early and was not disappointed i love ...,1
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5,Very soft and comfortable and warmer than it l...,1
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5,This is a product well worth the purchase I h...,1
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5,All of my kids have cried nonstop when I tried...,1
5,Stop Pacifier Sucking without tears with Thumb...,"When the Binky Fairy came to our house, we did...",5,When the Binky Fairy came to our house we didn...,1


## Split into training and test sets

In [69]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.drop(columns=['sentiment']), df['sentiment'], test_size=0.20, random_state=42)

## Build the word count vector for each review


In [70]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
# Use this token pattern to keep single-letter words
# First, learn vocabulary from the training data and assign columns to words
# Then convert the training data into a sparse matrix
train_matrix = vectorizer.fit_transform(X_train['review_clean'])
# Second, convert the test data into a sparse matrix, using the same word-column mapping
test_matrix = vectorizer.transform(X_test['review_clean'])
#print vectorizer.vocabulary_

## Train a sentiment classifier with logistic regression

In [73]:
from sklearn.linear_model import LogisticRegression
sentiment_model = LogisticRegression()
sentiment_model.fit(train_matrix, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


LogisticRegression()

In [75]:
predicted_y = sentiment_model.predict(test_matrix)
correct_num = np.sum(predicted_y == y_test)
total_num = len(y_test)
print("correct_num: {}, total_num: {}".format(correct_num, total_num))
accuracy = correct_num * 1./ total_num
print(accuracy)

correct_num: 31169, total_num: 33351
0.9345746754220263


## Probability Predictions

In [76]:
predicted_probability = sentiment_model.predict_proba(test_matrix)

In [81]:
result = X_test.copy()
result['actual_sentiment'] = y_test

In [84]:
result['predicted_sentiment']= predicted_y

In [98]:
result['negative_class_probability']= predicted_probability[:,0]
result['positive_class_probability']= predicted_probability[:,1]

In [99]:
result.head()

Unnamed: 0,name,review,rating,review_clean,actual_sentiment,predicted_sentiment,positive_class_probability,negative_class_probability
146345,"Inglesina 2013 Fast Table Chair, Liquirizia",Not easy to use and didn\'t fit any of the tab...,1,Not easy to use and didnt fit any of the table...,-1,1,0.654219,0.345781
80867,Tiny Love Island Stroller Set,NOT WHAT I EXPECTED. THEY SHOWING IT LIKE IT\'...,1,NOT WHAT I EXPECTED THEY SHOWING IT LIKE ITS A...,-1,-1,0.268006,0.731994
148242,"Munchkin Bath Fun Bubble Blower, Blue",The tip about taping half the air output port ...,2,The tip about taping half the air output port ...,-1,-1,0.303242,0.696758
162192,Kidsline Disney Pooh and Friends Collection Se...,A bit smaller than I had anticipated but my in...,4,A bit smaller than I had anticipated but my in...,1,1,0.999063,0.000937
141236,PRK Products Inc Universal Baby Bottle and Sip...,I just bought this for my grandson. His mom l...,5,I just bought this for my grandson His mom lo...,1,1,0.994243,0.005757


## Finding the most positive and negative review


In [100]:
# most positive review 
result[result['positive_class_probability'] == result['positive_class_probability'].max()]

Unnamed: 0,name,review,rating,review_clean,actual_sentiment,predicted_sentiment,positive_class_probability,negative_class_probability
123632,"Zooper 2011 Waltz Standard Stroller, Flax Brown",I did a TON of research before I purchased thi...,5,I did a TON of research before I purchased thi...,1,1,1.0,0.0
135152,Maxi-Cosi Pria 70 with Tiny Fit Convertible Ca...,We\'ve been using Britax for our boy (now 14 m...,5,Weve been using Britax for our boy now 14 mont...,1,1,1.0,0.0
50735,"Joovy Zoom 360 Swivel Wheel Jogging Stroller, ...",The joovy zoom 360 was the perfect solution fo...,5,The joovy zoom 360 was the perfect solution fo...,1,1,1.0,0.0
100166,"Infantino Wrap and Tie Baby Carrier, Black Blu...",I bought this carrier when my daughter was abo...,5,I bought this carrier when my daughter was abo...,1,1,1.0,0.0
166409,"Kiddy City N Move Stroller, Walnut","For starters, it\'s the only stroller my littl...",4,For starters its the only stroller my little g...,1,1,1.0,0.0
109574,phil&amp;teds Smart Buggy Bassinet and Strolle...,"My wife\'s 5\', and I\'m about 5\'6"", our baby...",5,My wifes 5 and Im about 56 our baby is within ...,1,1,1.0,0.0
111155,bumGenius One-Size Hook &amp; Loop Closure Clo...,We did my son in cloth diapers from birth thro...,5,We did my son in cloth diapers from birth thro...,1,1,1.0,0.0
57108,BabyPlus Prenatal Education System,I started wearing the Babyplus when I was 18 w...,5,I started wearing the Babyplus when I was 18 w...,1,1,1.0,0.0
129722,Bumbleride 2011 Flite Lightweight Compact Trav...,This is a review of the 2012 Bumbleride Flite ...,5,This is a review of the 2012 Bumbleride Flite ...,1,1,1.0,0.0
168086,Buttons Cloth Diaper Cover - One Size - 8 Colo...,Buttons vs. Best Bottoms reviewFirst thing I w...,5,Buttons vs Best Bottoms reviewFirst thing I wa...,1,1,1.0,0.0


In [101]:
# most negative reviews
result[result['negative_class_probability'] == result['negative_class_probability'].max()]

Unnamed: 0,name,review,rating,review_clean,actual_sentiment,predicted_sentiment,positive_class_probability,negative_class_probability
175191,"Zooper Twist Escape Stroller, Summer Day",I had to return this stroller for three reason...,1,I had to return this stroller for three reason...,-1,-1,7.871137e-18,1.0
147902,Graco Pack \'n Play Playard - Dempsey,My disappointment with this product prompted m...,1,My disappointment with this product prompted m...,-1,-1,2.189015e-21,1.0
