#**RECOMMENDING QUALITY PRODUCTS BY EXPLOITING USER WRITTEN REVIEWS**








**PROBLEM STATEMENT**

Until recent years, there are no recommendation systems that are available to recommend the best products that are able to scrape data and filter out useful information that aids the recommendation process. 

But now, recommendation systems have been upgraded and are able to filter out best products from heavy data. The following imported dataset contains review written by each person for a specific product. Review or rating by itself cannot determine if a product is the best. There needs to be a heavy amount of filtering that could bring out the best product out of the best products.

Binning of Rating data and application of Natural Language Processing to Reviews which includes tokenization, lemmatization and sentiment analysis recommends the best products from reviews through hefty filtering of data.

These methods give the best products but it is not yet enough because every user has unique preferences and their preferences may not allow for the recommendation of the so called objectively best products. For that, we use collaborative filtering, which includes collectively gathering preferences of multiple user and also knowing the preference of individual user and then filters the data taking these factors into perspective. 

#**Importing Packages**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.metrics import accuracy_score 
from sklearn.metrics import classification_report
from sklearn import metrics 
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_confusion_matrix

In [None]:
from google.colab import drive
drive.mount('/content/drive')

#**Converting JSON file to CSV file**

In [None]:
def parse(path):
  g = gzip.open(path, 'r')
  for l in g:
    yield eval(l)

In [None]:
import json
import gzip

def parse(path):
  g = gzip.open(path,'r')
  for l in g:
    yield json.dumps(eval(l))

f = open("output.strict", 'w')
for l in parse("/content/drive/MyDrive/ML/reviews_Musical_Instruments_5 (1).json.gz"):
  f.write(l + '\n')

In [None]:
import pandas as pd
import gzip

def parse(path):
  g = gzip.open(path, 'rb')
  for l in g:
    yield eval(l)

def getDF(path):
  i = 0
  df = {}
  for d in parse(path):
    df[i] = d
    i += 1
  return pd.DataFrame.from_dict(df, orient='index')

data= getDF('/content/drive/MyDrive/ML/reviews_Musical_Instruments_5 (1).json.gz')
metadata = getDF('/content/drive/MyDrive/ML/meta_Musical_Instruments  (1).json.gz')


#**Understanding data**

In [None]:
data.head()

In [None]:
metadata.head()

#**Dropping column**

In [None]:
pr = data.copy()

In [None]:
pr.head()

In [None]:
pr.shape

In [None]:
pr.dtypes

In [None]:
pr.describe()

In [None]:
pr.info()

In [None]:
#Deleting 'reviewTime' column

pr.drop(["reviewTime","reviewerName"], axis = 1, inplace = True)

#**Checking NULL Values**

In [None]:
#Finding Number of NULL Values in Columns
pr.isnull().sum()

#**Data Wrangling**



**CHANGING DATE FORMAT**

In [None]:
pr['unixReviewTime'] = pd.to_datetime(pr['unixReviewTime'], unit='s')

In [None]:
pr['unixReviewTime'].head().to_frame()

**CONCATING REVIEW TEXT AND SUMMARY**

In [None]:
pr['review_text'] = pr[['summary', 'reviewText']].apply(lambda x: " ".join(str(y) for y in x if str(y) != 'nan'), axis = 1)

In [None]:
#Dropping 'reviewText' and 'summary' as they are added to 'review_text'

pr = pr.drop(['reviewText', 'summary'], axis = 1)

In [None]:
pr.head(3)

#**NATURAL LANGUAGE PROCESSING**

In [None]:
import nltk
import nltk.corpus
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
product_reviews_nlp = pr.copy()

In [None]:
product_reviews_nlp.head()

Unnamed: 0,reviewerID,asin,helpful,overall,unixReviewTime,review_text
0,A2IBPI20UZIR0U,1384719342,"[0, 0]",5.0,2014-02-28,"good Not much to write about here, but it does..."
1,A14VAT5EAX3D9S,1384719342,"[13, 14]",5.0,2013-03-16,Jake The product does exactly as it should and...
2,A195EZSQDW3E21,1384719342,"[1, 1]",5.0,2013-08-28,It Does The Job Well The primary job of this d...
3,A2C00NNG1ZQQG2,1384719342,"[0, 0]",5.0,2014-02-14,GOOD WINDSCREEN FOR THE MONEY Nice windscreen ...
4,A94QU4C90B1AX,1384719342,"[0, 0]",5.0,2014-02-21,No more pops when I record my vocals. This pop...


In [None]:
#Converting String in Review Text to Lower Case

product_reviews_nlp["review_text"] = product_reviews_nlp["review_text"].str.lower()

In [None]:
#Importing Libraries for Tokenization and Removing Punctuation

from nltk.tokenize import word_tokenize
import string 
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [None]:
#Removing Punctuation

def remove_punctuation(text):
    no_punct=[words for words in text if words not in string.punctuation]
    words_wo_punct=''.join(no_punct)
    return words_wo_punct

product_reviews_nlp['review_text']=product_reviews_nlp['review_text'].apply(lambda x: remove_punctuation(x))
product_reviews_nlp.head()

Unnamed: 0,reviewerID,asin,helpful,overall,unixReviewTime,review_text
0,A2IBPI20UZIR0U,1384719342,"[0, 0]",5.0,2014-02-28,good not much to write about here but it does ...
1,A14VAT5EAX3D9S,1384719342,"[13, 14]",5.0,2013-03-16,jake the product does exactly as it should and...
2,A195EZSQDW3E21,1384719342,"[1, 1]",5.0,2013-08-28,it does the job well the primary job of this d...
3,A2C00NNG1ZQQG2,1384719342,"[0, 0]",5.0,2014-02-14,good windscreen for the money nice windscreen ...
4,A94QU4C90B1AX,1384719342,"[0, 0]",5.0,2014-02-21,no more pops when i record my vocals this pop ...


In [None]:
#Creating a New Column with Tokenized Words

product_reviews_nlp['tokenized_review_text'] = product_reviews_nlp['review_text'].apply(word_tokenize)

In [None]:
product_reviews_nlp['tokenized_review_text'].head(3).to_frame()

Unnamed: 0,tokenized_review_text
0,"[good, not, much, to, write, about, here, but,..."
1,"[jake, the, product, does, exactly, as, it, sh..."
2,"[it, does, the, job, well, the, primary, job, ..."


In [None]:
product_reviews_nlp['tokenized_review_text'].head(3).to_frame()

Unnamed: 0,tokenized_review_text
0,"[good, not, much, to, write, about, here, but,..."
1,"[jake, the, product, does, exactly, as, it, sh..."
2,"[it, does, the, job, well, the, primary, job, ..."


In [None]:
#CONVERTED TO LOWER CASE, SEPERATED REVIEW TEXT INTO WORDS(Tokenization), AND REMOVED PUNCTUATIONS

product_reviews_nlp[['tokenized_review_text']].head()

Unnamed: 0,tokenized_review_text
0,"[good, not, much, to, write, about, here, but,..."
1,"[jake, the, product, does, exactly, as, it, sh..."
2,"[it, does, the, job, well, the, primary, job, ..."
3,"[good, windscreen, for, the, money, nice, wind..."
4,"[no, more, pops, when, i, record, my, vocals, ..."


In [None]:
#Importing Libraries for Lemmatization

from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')
lemmatizer = WordNetLemmatizer()

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


In [None]:
#Lemmatization Converts words into their base form. Unlike 'stemming' it is less aggressive and would be careful not to lose context
#Converted Words into their Base form in 'tokenized_review_text'

def Lemmatization(text):
    lemm_words=[lemmatizer.lemmatize(words) for words in text]
    return lemm_words

product_reviews_nlp['tokenized_review_text']=product_reviews_nlp['tokenized_review_text'].apply(lambda x: Lemmatization(x))

In [None]:
product_reviews_nlp['tokenized_review_text'].head(5).to_frame()

Unnamed: 0,tokenized_review_text
0,"[good, not, much, to, write, about, here, but,..."
1,"[jake, the, product, doe, exactly, a, it, shou..."
2,"[it, doe, the, job, well, the, primary, job, o..."
3,"[good, windscreen, for, the, money, nice, wind..."
4,"[no, more, pop, when, i, record, my, vocal, th..."


In [None]:
from nltk.probability import FreqDist

In [None]:
#Finding Frequency of each word in tokenized review text

def freq_count(text):
    
    fdist=FreqDist(text)
    return fdist

product_reviews_nlp['freq_count']=product_reviews_nlp['tokenized_review_text'].apply(lambda x: freq_count(x))
product_reviews_nlp['freq_count'].head().to_frame()

Unnamed: 0,freq_count
0,"{'good': 1, 'not': 1, 'much': 2, 'to': 2, 'wri..."
1,"{'jake': 1, 'the': 5, 'product': 2, 'doe': 1, ..."
2,"{'it': 3, 'doe': 1, 'the': 9, 'job': 2, 'well'..."
3,"{'good': 1, 'windscreen': 2, 'for': 1, 'the': ..."
4,"{'no': 1, 'more': 1, 'pop': 3, 'when': 2, 'i':..."


In [None]:
#Importing Preprogrammed SentimentAnalyzer

nltk.download('vader_lexicon')
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...




In [None]:
#Finding Total Polarity Score

def get_key(val):
    for value, key in sentimentwords.items():
         if val == value:
             return key

def sent_count(text):
  senti = sia.polarity_scores(text)
  return senti['compound']

product_reviews_nlp['sentiment_value']=product_reviews_nlp['review_text'].apply(lambda x: sent_count(x))


In [None]:
product_reviews_nlp[['review_text','tokenized_review_text','freq_count','sentiment_value']].head()

Unnamed: 0,review_text,tokenized_review_text,freq_count,sentiment_value
0,good not much to write about here but it does ...,"[good, not, much, to, write, about, here, but,...","{'good': 1, 'not': 1, 'much': 2, 'to': 2, 'wri...",0.7681
1,jake the product does exactly as it should and...,"[jake, the, product, doe, exactly, a, it, shou...","{'jake': 1, 'the': 5, 'product': 2, 'doe': 1, ...",0.9359
2,it does the job well the primary job of this d...,"[it, doe, the, job, well, the, primary, job, o...","{'it': 3, 'doe': 1, 'the': 9, 'job': 2, 'well'...",-0.5719
3,good windscreen for the money nice windscreen ...,"[good, windscreen, for, the, money, nice, wind...","{'good': 1, 'windscreen': 2, 'for': 1, 'the': ...",0.7717
4,no more pops when i record my vocals this pop ...,"[no, more, pop, when, i, record, my, vocal, th...","{'no': 1, 'more': 1, 'pop': 3, 'when': 2, 'i':...",0.6597


In [None]:
pr.head()

Unnamed: 0,reviewerID,asin,helpful,overall,unixReviewTime,review_text
0,A2IBPI20UZIR0U,1384719342,"[0, 0]",5.0,2014-02-28,"good Not much to write about here, but it does..."
1,A14VAT5EAX3D9S,1384719342,"[13, 14]",5.0,2013-03-16,Jake The product does exactly as it should and...
2,A195EZSQDW3E21,1384719342,"[1, 1]",5.0,2013-08-28,It Does The Job Well The primary job of this d...
3,A2C00NNG1ZQQG2,1384719342,"[0, 0]",5.0,2014-02-14,GOOD WINDSCREEN FOR THE MONEY Nice windscreen ...
4,A94QU4C90B1AX,1384719342,"[0, 0]",5.0,2014-02-21,No more pops when I record my vocals. This pop...


In [None]:
pr.drop(['review_text'], axis = 1, inplace=True)

In [None]:
pr.drop(['helpful'], axis = 1, inplace=True)

In [None]:
pr.drop(['unixReviewTime'], axis = 1, inplace=True)

In [None]:
pr.head()

Unnamed: 0,reviewerID,asin,overall
0,A2IBPI20UZIR0U,1384719342,5.0
1,A14VAT5EAX3D9S,1384719342,5.0
2,A195EZSQDW3E21,1384719342,5.0
3,A2C00NNG1ZQQG2,1384719342,5.0
4,A94QU4C90B1AX,1384719342,5.0


In [None]:
pr = pr.join(product_reviews_nlp["review_text"])

In [None]:
pr = pr.join(product_reviews_nlp["sentiment_value"])

In [None]:
pr['sentiment_value']=pr['sentiment_value']+1
pr['sentiment_value']=(pr['sentiment_value']/2)*5

In [None]:
pr['total_score']=(pr['overall']+pr['sentiment_value'])/2

In [None]:
pr.head(10)

Unnamed: 0,reviewerID,asin,overall,review_text,sentiment_value,total_score
0,A2IBPI20UZIR0U,1384719342,5.0,good not much to write about here but it does ...,4.42025,4.710125
1,A14VAT5EAX3D9S,1384719342,5.0,jake the product does exactly as it should and...,4.83975,4.919875
2,A195EZSQDW3E21,1384719342,5.0,it does the job well the primary job of this d...,1.07025,3.035125
3,A2C00NNG1ZQQG2,1384719342,5.0,good windscreen for the money nice windscreen ...,4.42925,4.714625
4,A94QU4C90B1AX,1384719342,5.0,no more pops when i record my vocals this pop ...,4.14925,4.574625
5,A2A039TZMZHH9Y,B00004Y2UT,5.0,the best cable so good that i bought another o...,4.4685,4.73425
6,A1UPZM995ZAH90,B00004Y2UT,5.0,monster standard 100 21 instrument cable i ha...,1.93425,3.467125
7,AJNFQI3YR6XJ5,B00004Y2UT,3.0,didnt fit my 1996 fender strat i now use this ...,2.803,2.9015
8,A3M1PLEYNDEYO8,B00004Y2UT,5.0,great cable perfect for my epiphone sheraton i...,4.7585,4.87925
9,AMNTZU1YQN1TH,B00004Y2UT,5.0,best instrument cables on the market monster m...,4.832,4.916


In [None]:
len(pr.reviewerID.unique())

1429

In [None]:
len(pr.asin.unique())

900

#**KNN**

In [None]:
user_item_table = pr.pivot_table(index = ["asin"],columns = ["reviewerID"],values = "total_score").fillna(0)
user_item_table

reviewerID,A00625243BI8W1SSZNLMD,A10044ECXDUVKS,A102MU6ZC9H1N6,A109JTUZXO61UY,A109ME7C09HM2M,A10APIDAZISWQF,A10B2J2IRQXBWA,A10E3QH2FQUBLF,A10FM4ILBIMJJ7,A10H2F00ZOT8S2,A10HYGDU2NITYQ,A10KH8EN77ZKWH,A10N243R7A5ZW3,A10NJEIG56RHN5,A10VG94SAKVSC0,A10ZSXTQA264C7,A110ZEDSNASVCO,A118PM0B1PGWDA,A11E4FWMN9BXJD,A11INIL2YFJ137,A120FZ2ESIMA63,A121QRWXZIO6UP,A126XEMCLHPBNZ,A127K5WGHNUUH3,A12ABV9NU02O29,A12DQZKRKTNF5E,A12N7TJQR2RB9W,A12O5B8XNKNBOL,A12P4A1OC41KUO,A12SSZIN555FTL,A12YXGXV4MATDS,A1365RYO0BLEMI,A136IQFGB01KQB,A136M3QYHUVN9A,A13798OPDBLDCO,A13A81NN0NRD1S,A13GMS7FWV3TQ0,A13IKQCJKFAP5S,A13KBLFF4IZF7H,A13NWJUMVNS6YZ,...,AWBZIK5JYWB5J,AWCJ12KBO5VII,AWIG50VOI5VUV,AWKVQQZKTRFAL,AWQQ1QHCECDJ3,AWV58YYFEAUL0,AWYE428W5MRQN,AWYXB9L41T82S,AX11NOUMV8G95,AX69H7INJKE76,AXABTEYS7A4A8,AXG9N4QFS4QYP,AXJ19189TLBLJ,AXMXE3RT660HQ,AXMYGK3WC8BPP,AXP7888CP2222,AXP9CF1UTFRSU,AXTOICJWZBAJ0,AXU9VX024GPSS,AXWB93VKVML6K,AXWEQHTXQWR7Q,AXWI0P2EGDEQT,AXX6BZDD8K4JL,AXXGP6UT41KAS,AXXYMIJBD0J9G,AYJJDQQ4EZ5V3,AYQ46BHSK99YV,AYQCAPXJ81XTN,AYTKUTAP0VA53,AZ0LJNEP2VRD1,AZ9KESC05F6RI,AZAYBFPLEDFL7,AZBUUKQLYKUCL,AZCP5P3BARLS5,AZE83O4F1IJPR,AZJPNK73JF3XP,AZMHABTPXVLG3,AZMIKIG4BB6BZ,AZPDO6FLSMLFP,AZVME8JMPD3F4
asin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1384719342,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B00004Y2UT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B00005ML71,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B000068NSX,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B000068NTU,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.648,0.0,0.0,3.015375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
B00H02C9TG,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B00HFRXACG,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,4.887875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B00IZCSW3M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,4.624500,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B00J4TBMVO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
query_index = np.random.choice(user_item_table.shape[0])
print("Choosen Item is: ",user_item_table.index[query_index])

Choosen Item is:  B005J9FS0Y


In [None]:
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors

user_item_table_matrix = csr_matrix(user_item_table.values)
model_knn = NearestNeighbors()
model_knn.fit(user_item_table_matrix)
distances, indices = model_knn.kneighbors(user_item_table.iloc[query_index,:].values.reshape(1,-1), n_neighbors = 6)

In [None]:
item = []
distance = []

for i in range(0, len(distances.flatten())):
    if i != 0:
        item.append(user_item_table.index[indices.flatten()[i]])
        distance.append(distances.flatten()[i])    

i=pd.Series(item,name='item')
d=pd.Series(distance,name='distance')
recommend = pd.concat([i,d], axis=1)
recommend = recommend.sort_values('distance',ascending=False)

print('Recommendations for {0}:\n'.format(user_item_table.index[query_index]))
for i in range(0,recommend.shape[0]):
    print('{0}: {1}, with distance of {2}'.format(i, recommend["item"].iloc[i], recommend["distance"].iloc[i]))

Recommendations for B005J9FS0Y:

0: B0002E37MM, with distance of 12.499849808472701
1: B001E43SK0, with distance of 12.42790294496119
2: B000RN53LQ, with distance of 12.415913746327735
3: B0002E3MRW, with distance of 12.175012034773312
4: B000SL0NCQ, with distance of 11.90500538704729


In [None]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
pr['asin_new']=le.fit_transform(pr['asin'])
pr['reviewerID_new']=le.fit_transform(pr['reviewerID'])

In [None]:
from sklearn.model_selection import train_test_split
features=pr[['asin_new','reviewerID_new','sentiment_value']].values
labels=pr['total_score'].values
train, test, train_labels, test_labels = train_test_split(features,labels,test_size=0.33)


In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix,classification_report


In [None]:
from sklearn.metrics import accuracy_score
knn = KNeighborsClassifier(n_neighbors = 3,metric='euclidean')
knn.fit(train, train_labels)
Y_pred = knn.predict(test)
accuracy_score(test_labels, Y_pred)*100

ValueError: ignored

#**TF-IDF Vectorizer**

In [None]:
def final(x):
    
    if x>=3:
      return "Recommended"
    else:
      return "Not Recommended"

pr['total_score']=pr['total_score'].apply(lambda x: final(x))

In [None]:
from sklearn.preprocessing import  LabelEncoder
# label_encoder object knows how to understand word labels.
label_encoder =LabelEncoder()
# Encode labels in column 'species'.
pr['total_score']= label_encoder.fit_transform(pr['total_score'])
pr['total_score'].head(20)

In [None]:
pr.head()

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.dummy import DummyClassifier
from sklearn import svm
import re

In [None]:
vectorizer = TfidfVectorizer(max_features=700)
vectorizer.fit(pr['review_text'])
features = vectorizer.transform(pr['review_text']) 

features.toarray()

In [None]:
tf_idf = pd.DataFrame(features.toarray(), columns=vectorizer.get_feature_names())
tf_idf.head(10)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(tf_idf, pr['total_score'], test_size=0.2, random_state=42)

print (f'Train set shape\t:{X_train.shape}\nTest set shape\t:{X_test.shape}')

In [None]:
yy = pd.DataFrame(y_train)
train_data = pd.concat([X_train, yy],axis=1)
train_data.head()

In [None]:
pr.shape

#**K-Means Clustering**

In [None]:
from sklearn.cluster import KMeans
from sklearn import metrics

In [None]:
sse = []
list_k = list(range(1, 10))

for k in list_k:
    km = KMeans(n_clusters=k)
    km.fit(X_train)
    sse.append(km.inertia_)

plt.plot(list_k, sse, '-o')
plt.xlabel(r'Number of clusters(k)')
plt.ylabel('Sum of squared distance');

In [None]:
kmeanModel = KMeans(n_clusters=8)
kmeanModel.fit(X_train)

In [None]:
X_train['Cluster'] = kmeanModel.labels_
X_train['Cluster'].sample(n=10)

In [None]:
X_train['Cluster'].value_counts()

In [None]:
sns.countplot(X_train['Cluster'])
plt.xlabel('Clusters')
plt.ylabel('Count')
plt.show()

In [None]:
kmeanModel.cluster_centers_

#**Recommending through Collaborative Filtering System**

Here, we use the user-item relation to train a model to predict top items that a user may like. This model allows us to recommend a large number of products unlike other models that gives us lower coverage.

**User-Item Matrix**

In [None]:
new_pr=pr
sentiment_matrix = new_pr.pivot_table(values='total_score', index='asin', columns='reviewerID', fill_value=0)
sentiment_matrix

In [None]:
sentiment_matrix.shape

In [None]:
X = sentiment_matrix
X.head(20)

In [None]:
X.shape

**Correlation Matrix**

In [None]:
#Decomposing the Matrix
from sklearn.decomposition import TruncatedSVD
SVD = TruncatedSVD(n_components=10)
decomposed_matrix = SVD.fit_transform(X)
decomposed_matrix.shape
'''
The singular value decomposition(SVD) provides another way to factorize a matrix, into singular vectors and singular values. ... The SVD is used widely both in the calculation of other matrix operations, such as matrix inverse, but also as a data reduction method in machine learning
'''
decomposed_matrix.shape

In [None]:
correlation_matrix = np.corrcoef(decomposed_matrix) 
correlation_matrix

In [None]:
i = "B00004Y2UT"

product_names = list(X.index)
product_ID = product_names.index(i)
product_ID

In [None]:
correlation_product_ID = correlation_matrix[product_ID]
correlation_product_ID.shape

In [None]:
Recommend = list(X.index[correlation_product_ID > 0.65])
# Removes the item already bought by the customer
Recommend.remove(i)
Recommend[0:24]

In [None]:
def final(x):
    
    if x>=3:
      return "Recommended"
    else:
      return "Not Recommended"

pr['total_score']=pr['total_score'].apply(lambda x: final(x))

In [None]:
from sklearn.preprocessing import  LabelEncoder
# label_encoder object knows how to understand word labels.
label_encoder =LabelEncoder()
# Encode labels in column 'species'.
pr['total_score']= label_encoder.fit_transform(pr['total_score'])
pr['total_score'].head(20)