# Project 3 - Sentiment Analysis for E-Commerce Store
### by Azubuogu Peace Udoka

### Table of Content
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#data">Understanding the Dataset</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#ques">Calculating Metrics</a></li>
<li><a href="#conc">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

### Background
As a data analyst at an e-commerce store, I have access to a dataset containing product reviews and associated ratings. The task is to build a model that can classify each review as positive, negative, or neutral based on the text content.

In [1]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
#set general style of plots
sns.set(rc = {'figure.figsize':(20,8)}, style="white", font_scale=1.5)


import warnings
warnings.simplefilter("ignore")

### Exploring the Dataset

In [2]:
data = pd.read_csv('Amazon Product Review.txt')
# view 5 random rows of data
data.sample(5)

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date,sentiment
2827,US,31181150,RKJJ3GVGTEPFL,B00IKPYKWG,2693241,"Fire HD 7, 7"" HD Display, Wi-Fi, 8 GB",PC,5,0,0,N,Y,This is my second kindle and it is much better...,This is my second kindle and it is much better...,2015-07-17,1
30007,US,22406499,R12QQUWHKYVIXX,B00LCHRQL6,2693241,"Fire HD 7, 7"" HD Display, Wi-Fi, 8 GB",PC,5,0,0,N,Y,Excellent Kindle,Very pleased with my Fire HD 7. Fantastic col...,2014-11-11,1
1994,US,2748574,R9UE750J6WKQD,B00IKPX4GY,2693241,"Fire HD 7, 7"" HD Display, Wi-Fi, 8 GB",PC,3,0,1,N,Y,"The functionality is great, but the speakers a...","The functionality is great, but the speakers a...",2015-07-29,0
3273,US,21706272,R1TS14QX6OY5P9,B00IKPYKWG,2693241,"Fire HD 7, 7"" HD Display, Wi-Fi, 8 GB",PC,5,0,0,N,Y,Five Stars,Easy to get started and navigate.,2015-07-08,1
11441,US,830922,R1NCRKSJ5PASW4,B00IKPW0UA,2693241,"Fire HD 7, 7"" HD Display, Wi-Fi, 8 GB",PC,5,0,0,N,Y,Play or not great tablet.,"Great tablet, just what one needs to play or g...",2015-04-17,1


In [3]:
# size of dataset
data.shape

(30846, 16)

There are 30846 rows of data and 16 columns

In [4]:
# checking for missing values
data.isnull().sum()

marketplace          0
customer_id          0
review_id            0
product_id           0
product_parent       0
product_title        0
product_category     0
star_rating          0
helpful_votes        0
total_votes          0
vine                 0
verified_purchase    0
review_headline      0
review_body          0
review_date          0
sentiment            0
dtype: int64

There are no missing values.

In [5]:
# checking for duplicates
data.duplicated().sum()

0

There are no dupicates.

In [7]:
data.dtypes

marketplace          object
customer_id           int64
review_id            object
product_id           object
product_parent        int64
product_title        object
product_category     object
star_rating           int64
helpful_votes         int64
total_votes           int64
vine                 object
verified_purchase    object
review_headline      object
review_body          object
review_date          object
sentiment             int64
dtype: object

In [9]:
# converting customer_id and product_parent column to string and review_date column to date
data['customer_id'] = data['customer_id'].astype('str')
data['product_parent'] = data['product_parent'].astype('str')
data['review_date'] = pd.to_datetime(data['review_date'])

In [11]:
data.sentiment.value_counts()/len(data)

1    0.835343
0    0.164657
Name: sentiment, dtype: float64

The sentiment column contains two values.

0 - negative sentiment

1 - positive sentiment

This means, majority of the reviews are positive (83.5343%) and 16.4657% of the reviews are negative.