# Amazon Rating Stars Predictions

[link](https://www.youtube.com/watch?v=AekvfY5Rnlc)

[Github Link](https://github.com/laxmimerit/Amazon-Musical-Reviews-Rating-Dataset)

[Kaggle Link](https://www.kaggle.com/eswarchandt/amazon-music-reviews?select=Musical_instruments_reviews.csv)

In [1]:
import pandas as pd 
import numpy as np

In [2]:
url = "https://raw.githubusercontent.com/laxmimerit/Amazon-Musical-Reviews-Rating-Dataset/master/Musical_instruments_reviews.csv"

data = pd.read_csv(url)

In [3]:
data.head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime
0,A2IBPI20UZIR0U,1384719342,"cassandra tu ""Yeah, well, that's just like, u...","[0, 0]","Not much to write about here, but it does exac...",5.0,good,1393545600,"02 28, 2014"
1,A14VAT5EAX3D9S,1384719342,Jake,"[13, 14]",The product does exactly as it should and is q...,5.0,Jake,1363392000,"03 16, 2013"
2,A195EZSQDW3E21,1384719342,"Rick Bennette ""Rick Bennette""","[1, 1]",The primary job of this device is to block the...,5.0,It Does The Job Well,1377648000,"08 28, 2013"
3,A2C00NNG1ZQQG2,1384719342,"RustyBill ""Sunday Rocker""","[0, 0]",Nice windscreen protects my MXL mic and preven...,5.0,GOOD WINDSCREEN FOR THE MONEY,1392336000,"02 14, 2014"
4,A94QU4C90B1AX,1384719342,SEAN MASLANKA,"[0, 0]",This pop filter is great. It looks and perform...,5.0,No more pops when I record my vocals.,1392940800,"02 21, 2014"


In [4]:
data.columns

Index(['reviewerID', 'asin', 'reviewerName', 'helpful', 'reviewText',
       'overall', 'summary', 'unixReviewTime', 'reviewTime'],
      dtype='object')

In [5]:
df_review = data.loc[:,['reviewText', 'overall']]

In [6]:
df_review

Unnamed: 0,reviewText,overall
0,"Not much to write about here, but it does exac...",5.0
1,The product does exactly as it should and is q...,5.0
2,The primary job of this device is to block the...,5.0
3,Nice windscreen protects my MXL mic and preven...,5.0
4,This pop filter is great. It looks and perform...,5.0
...,...,...
10256,"Great, just as expected. Thank to all.",5.0
10257,I've been thinking about trying the Nanoweb st...,5.0
10258,I have tried coated strings in the past ( incl...,4.0
10259,"Well, MADE by Elixir and DEVELOPED with Taylor...",4.0


In [7]:
df_review.sample(3)

Unnamed: 0,reviewText,overall
8742,"I got this to compliment my filming of events,...",5.0
1720,Perfect thickness and action.What else can I s...,5.0
3920,This stand is probably one of the most popular...,4.0


In [8]:
df_review['overall'].value_counts()

5.0    6938
4.0    2084
3.0     772
2.0     250
1.0     217
Name: overall, dtype: int64

## This is note

You should install This

> pip install git+https://github.com/laxmimerit/preprocess_kgptalkie.git

But to make it true, you need to install this dependencies first

> pip install spacy==2.2.3

> python -m spacy download en_core_web_sm

> pip install beautifulsoup4==4.9.1

> pip install textblob==0.15.3

In [9]:
import preprocess_kgptalkie as ps 
import re

INFO:tensorflow:Enabling eager execution
INFO:tensorflow:Enabling v2 tensorshape
INFO:tensorflow:Enabling resource variables
INFO:tensorflow:Enabling tensor equality
INFO:tensorflow:Enabling control flow v2


In [10]:
def get_clean(x):
    x = str(x).lower().replace('\\', '').replace('_', '')
    x = ps.cont_exp(x)
    x = ps.remove_emails(x)
    x = ps.remove_urls(x)
    x = ps.remove_html_tags(x)
    x = ps.remove_rt(x)
    x = ps.remove_accented_chars(x)
    x = ps.remove_special_chars(x)
    x = re.sub("(.)\\1{2,}", "\\1", x)
    return x

In [11]:
df_review['reviewText'] = df_review['reviewText'].apply(lambda x: get_clean(x))

In [12]:
df_review.head()

Unnamed: 0,reviewText,overall
0,not much to write about here but it does exact...,5.0
1,the product does exactly as it should and is q...,5.0
2,the primary job of this device is to block the...,5.0
3,nice windscreen protects my mxl mic and preven...,5.0
4,this pop filter is great it looks and performs...,5.0


# TFIDF and Linear SVM

In [13]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report

In [38]:
tfidf = TfidfVectorizer(max_features=20000, ngram_range=(1,5), analyzer='char')

In [39]:
X = tfidf.fit_transform(df_review['reviewText'])
y = df_review['overall']

In [40]:
X.shape, y.shape

((10261, 20000), (10261,))

In [79]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [80]:
X_train.shape

(7182, 20000)

In [81]:
svm_model = LinearSVC(C=5, class_weight='balanced')
svm_model.fit(X_train, y_train)

LinearSVC(C=5, class_weight='balanced')

In [82]:
y_pred= svm_model.predict(X_test)

In [83]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

         1.0       0.30      0.22      0.25        55
         2.0       0.20      0.10      0.13        80
         3.0       0.22      0.23      0.22       213
         4.0       0.34      0.32      0.33       648
         5.0       0.78      0.81      0.80      2083

    accuracy                           0.64      3079
   macro avg       0.37      0.33      0.35      3079
weighted avg       0.62      0.64      0.63      3079



In [84]:
comment1 = "'this product is really bad. don't buy it"
x = get_clean(comment1)
vec = tfidf.transform([x])
svm_model.predict(vec)


array([1.])

In [85]:
comment2 = "This is great product, I love it"
x = get_clean(comment2)
vec = tfidf.transform([x])
svm_model.predict(vec)

array([5.])

In [86]:
comment3 = "Well, this product is not really good but not too bad"
x = get_clean(comment3)
vec = tfidf.transform([x])
svm_model.predict(vec)

array([3.])

In [87]:
comment4 = "Not the best one, but you should buy"
x = get_clean(comment4)
vec = tfidf.transform([x])
svm_model.predict(vec)

array([4.])

In [88]:
comment5 = "Bad product, not recommended"
x = get_clean(comment5)
vec = tfidf.transform([x])
svm_model.predict(vec)

array([2.])