## Sentiment Analysis of Amazon-Reviews
The reviews have been collected using a tool called "ParseHub". The tool can be downloaded from its official website https://www.parsehub.com/. The tool simulates human web browsing behavior like opening a web page, logging into an account, entering text, pointing-and-clicking the web element, etc. Just click the information on the website in the built-in browser and start the extraction, and you will get the structured data you need.
In our case, we clicked on the required elements like the reviews, ratings etc. 

Importing important libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
!pip install textblob
import textblob

Read the csv file using read_csv (i.e. bulid in function of pandas)

In [2]:
df = pd.read_csv('Reviews_amazon.csv')

In [3]:
df.head()

Unnamed: 0,Reviwes_name,Reviwes_Ratings,Reviwes_Ratings_url
0,Dont buy this it is not worth for money. It is...,1.0 out of 5 stars,https://www.amazon.in/gp/customer-reviews/RDIM...
1,Pros:-\n1. Display (Best display you can get a...,5.0 out of 5 stars,https://www.amazon.in/gp/customer-reviews/R120...
2,Cons:\n1.Good phone at this price.\n2.Display ...,4.0 out of 5 stars,https://www.amazon.in/gp/customer-reviews/R2WR...
3,So greatful to Amazon for sending me such a ni...,1.0 out of 5 stars,https://www.amazon.in/gp/customer-reviews/RKJS...
4,I am sure there will be many reviewers who wil...,4.0 out of 5 stars,https://www.amazon.in/gp/customer-reviews/RI96...


In [4]:
X = df['Reviwes_Ratings']

In [5]:
df.drop(['Reviwes_Ratings_url', 'Reviwes_Ratings'], axis = 1, inplace = True)

In [6]:
df.head(3)

Unnamed: 0,Reviwes_name
0,Dont buy this it is not worth for money. It is...
1,Pros:-\n1. Display (Best display you can get a...
2,Cons:\n1.Good phone at this price.\n2.Display ...


Before the sentiment analysis, I must have to perform some data - cleaning process for which i have to import some important libraries.

In [None]:
import nltk
import re
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

In data cleaning i perform following things-
1. All the stopwords from the data is removed
2. Convert all the character into lower case
3. Perfrom stemming on the data-set
4. Store all the preprocessed data into the same row.

In [7]:
corpus = []
for i in range(len(df)):
  review = re.sub('[^a-zA-Z]', ' ', str(df['Reviwes_name'][i]))
  review = review.lower()
  review = review.split()
  ps = PorterStemmer()
  all_stopwords = stopwords.words('english')
  all_stopwords.remove('not')
  review = [ps.stem(word) for word in review if not word in set(all_stopwords)]
  review = ' '.join(review)
  df['Reviwes_name'][i] = review

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\THINKPAD\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [None]:
df.head()

It performs different operations on textual data such as noun phrase extraction, sentiment analysis, classification, translation, etc.

In [8]:
from textblob import TextBlob

In [9]:
for i in range (len(df)) :
   df['sentiment_scores_tb'] = [round(TextBlob(article).sentiment.polarity, 3) for article in df['Reviwes_name']]
   df['sentiment_category_tb'] = ['positive' if score > 0 
                             else 'negative' if score < 0 
                                 else 'neutral' 
                                     for score in df['sentiment_scores_tb']]

In [10]:
print(df['sentiment_category_tb'])

0      negative
1      positive
2      positive
3      positive
4      positive
         ...   
445     neutral
446    positive
447    positive
448    positive
449    positive
Name: sentiment_category_tb, Length: 450, dtype: object


In [11]:
df.head()

Unnamed: 0,Reviwes_name,sentiment_scores_tb,sentiment_category_tb
0,dont buy not worth money work slow,-0.225,negative
1,pro display best display get price e super amo...,0.203,positive
2,con good phone price display qualiti realli gr...,0.523,positive
3,great amazon send nice product alreadi land sa...,0.232,positive
4,sure mani review depth analysi list pro con de...,0.168,positive


In [13]:
final_df = pd.concat([df,X], axis=1)

In [15]:
final_df

Unnamed: 0,Reviwes_name,sentiment_scores_tb,sentiment_category_tb,Reviwes_Ratings
0,dont buy not worth money work slow,-0.225,negative,1.0 out of 5 stars
1,pro display best display get price e super amo...,0.203,positive,5.0 out of 5 stars
2,con good phone price display qualiti realli gr...,0.523,positive,4.0 out of 5 stars
3,great amazon send nice product alreadi land sa...,0.232,positive,1.0 out of 5 stars
4,sure mani review depth analysi list pro con de...,0.168,positive,4.0 out of 5 stars
...,...,...,...,...
445,phone excel item dearer k inr last sale make d...,0.000,neutral,5.0 out of 5 stars
446,phone good given star one thing want say much ...,0.625,positive,5.0 out of 5 stars
447,best batteri back rang decent camera qualiti l...,0.342,positive,5.0 out of 5 stars
448,overal best smartphon daili use social media b...,0.046,positive,5.0 out of 5 stars


It is clear that reviews have negative sentiment have low rating

In [21]:
pip install openpyxl

Collecting openpyxl
  Downloading openpyxl-3.0.9-py2.py3-none-any.whl (242 kB)
     ------------------------------------ 242.2/242.2 KB 644.2 kB/s eta 0:00:00
Collecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9
Note: you may need to restart the kernel to use updated packages.


In [22]:
final_df.to_excel('final review sentiment.xlsx', index = False)

In [23]:
from IPython.display import FileLink
FileLink('final review sentiment.xlsx')