DOMAIN: Smartphone, Electronics


CONTEXT: India is the second largest market globally for smartphones after China. About 134 million smartphones were sold across India
in the year 2017 and is estimated to increase to about 442 million in 2022. India ranked second in the average time spent on mobile web by
smartphone users across Asia Pacific. The combination of very high sales volumes and the average smartphone consumer behaviour has
made India a very attractive market for foreign vendors. As per Consumer behaviour, 97% of consumers turn to a search engine when they
are buying a product vs. 15% who turn to social media. If a seller succeeds to publish smartphones based on user’s behaviour/choice at the
right place, there are 90% chances that user will enquire for the same. This Case Study is targeted to build a recommendation system
based on individual consumer’s behaviour or choice.



DATA DESCRIPTION:
    • author : name of the person who gave the rating
    • country : country the person who gave the rating belongs to
    • data : date of the rating
    • domain: website from which the rating was taken from
    • extract: rating content
    • language: language in which the rating was given
    • product: name of the product/mobile phone for which the rating was given
    • score: average rating for the phone
    • score_max: highest rating given for the phone
    • source: source from where the rating was taken
    
    
PROJECT OBJECTIVE: We will build a recommendation system using popularity based and collaborative filtering methods to recommend
mobile phones to a user which are most popular and personalised respectively.

In [1]:
import pandas as pd
import numpy as np
import matplotlib as plt
# Libraries for recommendation systems
from surprise import SVD
from surprise import KNNWithMeans
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split

### 1. Import the necessary libraries and read the provided CSVs as a data frame and perform the below steps

In [2]:
#A. Merge all the provided CSVs into one dataFrame
data_1 = pd.read_csv('C:\\Users\\AN953317\\Recommendationsystem\\phone_user_review_file_1.csv', encoding='iso-8859-1')
data_2 = pd.read_csv('C:\\Users\\AN953317\\Recommendationsystem\\phone_user_review_file_2.csv', encoding='iso-8859-1')
data_3 = pd.read_csv('C:\\Users\\AN953317\\Recommendationsystem\\phone_user_review_file_3.csv', encoding='iso-8859-1')
data_4 = pd.read_csv('C:\\Users\\AN953317\\Recommendationsystem\\phone_user_review_file_4.csv', encoding='iso-8859-1')
data_5 = pd.read_csv('C:\\Users\\AN953317\\Recommendationsystem\\phone_user_review_file_5.csv', encoding='iso-8859-1')
data_6 = pd.read_csv('C:\\Users\\AN953317\\Recommendationsystem\\phone_user_review_file_6.csv', encoding='iso-8859-1')



In [3]:
print("Shape of data 1",data_1.shape)
print("Shape of data 2",data_2.shape)
print("Shape of data 3",data_3.shape)
print("Shape of data 4",data_4.shape)
print("Shape of data 5",data_5.shape)
print("Shape of data 6",data_6.shape)

Shape of data 1 (374910, 11)
Shape of data 2 (114925, 11)
Shape of data 3 (312961, 11)
Shape of data 4 (98284, 11)
Shape of data 5 (350216, 11)
Shape of data 6 (163837, 11)


In [4]:
print("Review Data 1")
data_1.head()


Review Data 1


Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8
2,/cellphones/samsung-galaxy-s8/,5/4/2017,en,us,Amazon,amazon.com,6.0,10.0,Adequate feel. Nice heft. Processor's still sl...,R. Craig,"Samsung Galaxy S8 (64GB) G950U 5.8"" 4G LTE Unl..."
3,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Samsung,samsung.com,9.2,10.0,Never disappointed. One of the reasons I've be...,Buster2020,Samsung Galaxy S8 64GB (AT&T)
4,/cellphones/samsung-galaxy-s8/,5/11/2017,en,us,Verizon Wireless,verizonwireless.com,4.0,10.0,I've now found that i'm in a group of people t...,S Ate Mine,Samsung Galaxy S8


In [5]:
print("Review Data 2")
data_2.head()


Review Data 2


Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/leagoo-lead-7/,4/15/2015,en,us,Amazon,amazon.com,2.0,10.0,"The telephone headset is of poor quality , not...",luis,Leagoo Lead7 5.0 Inch HD JDI LTPS Screen 3G Sm...
1,/cellphones/leagoo-lead-7/,5/23/2015,en,gb,Amazon,amazon.co.uk,10.0,10.0,This is my first smartphone so I have nothing ...,Mark Lavin,Leagoo Lead 7 Lead7 MTK6582 Quad core 1GB RAM ...
2,/cellphones/leagoo-lead-7/,4/27/2015,en,gb,Amazon,amazon.co.uk,8.0,10.0,Great phone. Battery life not great but seems ...,tracey,Leagoo Lead 7 Lead7 MTK6582 Quad core 1GB RAM ...
3,/cellphones/leagoo-lead-7/,4/22/2015,en,gb,Amazon,amazon.co.uk,10.0,10.0,Best 90 quid I've ever spent on a smart phone,Reuben Ingram,Leagoo Lead 7 Lead7 MTK6582 Quad core 1GB RAM ...
4,/cellphones/leagoo-lead-7/,4/18/2015,en,gb,Amazon,amazon.co.uk,10.0,10.0,I m happy with this phone.it s very good.thx team,viorel,Leagoo Lead 7 Lead7 MTK6582 Quad core 1GB RAM ...


In [6]:
print("Review Data 3")
data_3.head()


Review Data 3


Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,11/7/2015,pt,br,Submarino,submarino.com.br,6.0,10.0,"recomendo, eu comprei um, a um ano, e agora co...",herlington tesch,Samsung Smartphone Samsung Galaxy S3 Slim G381...
1,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,10/2/2015,pt,br,Submarino,submarino.com.br,10.0,10.0,Comprei um pouco desconfiada do site e do celu...,Luisa Silva Marieta,Samsung Smartphone Samsung Galaxy S3 Slim G381...
2,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,9/2/2015,pt,br,Submarino,submarino.com.br,10.0,10.0,"Muito bom o produto, obvio que tem versÃµes me...",Cyrus,Samsung Smartphone Samsung Galaxy S3 Slim G381...
3,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,9/2/2015,pt,br,Submarino,submarino.com.br,8.0,10.0,Unica ressalva fica para a camera que poderia ...,Marcela Santa Clara Brito,Samsung Smartphone Samsung Galaxy S3 Slim G381...
4,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,9/1/2015,pt,br,Colombo,colombo.com.br,8.0,10.0,Rapidez e atenÃ§Ã£o na entrega. O aparelho Ã© ...,Claudine Maria Kuhn Walendorff,"Smartphone Samsung Galaxy S3 Slim, Dual Chip, ..."


In [7]:
print("Review Data 4")
data_4.head()


Review Data 4


Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-s7262-duos-galaxy-ace/,3/11/2015,en,us,Amazon,amazon.com,2.0,10.0,was not conpatable with my phone as stated. I ...,Frances DeSimone,Samsung Galaxy Star Pro DUOS S7262 Unlocked Ce...
1,/cellphones/samsung-s7262-duos-galaxy-ace/,17/11/2015,en,in,Zopper,zopper.com,10.0,10.0,Decent Functions and Easy to Operate Pros:- Th...,Expert Review,Samsung Galaxy Star Pro S7262 Black
2,/cellphones/samsung-s7262-duos-galaxy-ace/,29/10/2015,en,in,Amazon,amazon.in,4.0,10.0,Not Good Phone such price. Hang too much and v...,Amazon Customer,Samsung Galaxy Star Pro GT-S7262 (Midnight Black)
3,/cellphones/samsung-s7262-duos-galaxy-ace/,29/10/2015,en,in,Amazon,amazon.in,6.0,10.0,not bad for features,Amazon Customer,Samsung Galaxy Star Pro GT-S7262 (Midnight Black)
4,/cellphones/samsung-s7262-duos-galaxy-ace/,29/10/2015,en,in,Amazon,amazon.in,10.0,10.0,Excellent product,NHK,Samsung Galaxy Star Pro GT-S7262 (Midnight Black)


In [8]:
print("Review Data 5")
data_5.head()


Review Data 5


Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/karbonn-k1616/,7/13/2016,en,in,91 Mobiles,91mobiles.com,2.0,10.0,I bought 1 month before. currently speaker is ...,venkatesh,Karbonn K1616
1,/cellphones/karbonn-k1616/,7/13/2016,en,in,91 Mobiles,91mobiles.com,6.0,10.0,"I just bought one week back, I have Airtel con...",Venkat,Karbonn K1616
2,/cellphones/karbonn-k1616/,7/13/2016,en,in,91 Mobiles,91mobiles.com,4.0,10.0,one problem in this handset opera is not worki...,krrish,Karbonn K1616
3,/cellphones/karbonn-k1616/,4/25/2014,en,in,Naaptol,naaptol.com,10.0,10.0,here Karbonn comes up with an another excellen...,BRIJESH CHAUHAN,Karbonn K1616 - Black
4,/cellphones/karbonn-k1616/,4/23/2013,en,in,Naaptol,naaptol.com,10.0,10.0,"What a phone, all so on Naaptol my god 23% off...",Suraj CHAUHAN,Karbonn K1616 - Black


In [9]:
print("Review Data 6")
data_6.head()


Review Data 6


Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-instinct-sph-m800/,9/16/2011,en,us,Phone Arena,phonearena.com,8.0,10.0,I've had the phone for awhile and it's a prett...,ajabrams95,Samsung Instinct HD
1,/cellphones/samsung-instinct-sph-m800/,2/13/2014,en,us,Amazon,amazon.com,6.0,10.0,to be clear it is not the sellers fault that t...,Stephanie,Samsung SPH M800 Instinct
2,/cellphones/samsung-instinct-sph-m800/,12/30/2011,en,us,Phone Scoop,phonescoop.com,9.0,10.0,Well i love this phone. i have had ton of phon...,snickers,Instinct M800
3,/cellphones/samsung-instinct-sph-m800/,10/18/2008,en,us,HandCellPhone,handcellphone.com,4.0,10.0,I have had my Instinct for several months now ...,A4C,Samsung Instinct
4,/cellphones/samsung-instinct-sph-m800/,9/6/2008,en,us,Reviewed.com,reviewed.com,6.0,10.0,i have had this instinct phone for about two m...,betaBgood,Samsung Instinct


In [10]:
print("All 6 files have same columns [True/False]==",set(list(data_1.columns))==set(list(data_1.columns)+list(data_2.columns)+list(data_3.columns)+list(data_4.columns)+list(data_5.columns)+list(data_6.columns)))

All 6 files have same columns [True/False]== True


In [11]:
reviewdf = pd.concat([data_1,data_2,data_3,data_4,data_5,data_6], ignore_index=True) # Merge al files


In [12]:
#B. Explore, understand the Data and share at least 2 observations. 
reviewdf.head()

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8
2,/cellphones/samsung-galaxy-s8/,5/4/2017,en,us,Amazon,amazon.com,6.0,10.0,Adequate feel. Nice heft. Processor's still sl...,R. Craig,"Samsung Galaxy S8 (64GB) G950U 5.8"" 4G LTE Unl..."
3,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Samsung,samsung.com,9.2,10.0,Never disappointed. One of the reasons I've be...,Buster2020,Samsung Galaxy S8 64GB (AT&T)
4,/cellphones/samsung-galaxy-s8/,5/11/2017,en,us,Verizon Wireless,verizonwireless.com,4.0,10.0,I've now found that i'm in a group of people t...,S Ate Mine,Samsung Galaxy S8


In [13]:
reviewdf.tail()

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
1415128,/cellphones/alcatel-ot-club_1187/,5/12/2000,de,de,Ciao,ciao.de,2.0,10.0,Weil mein Onkel bei ALcatel arbeitet habe ich ...,david.paul,Alcatel Club Plus Handy
1415129,/cellphones/alcatel-ot-club_1187/,5/11/2000,de,de,Ciao,ciao.de,10.0,10.0,Hy Liebe Leserinnen und Leser!! Ich habe seit ...,Christiane14,Alcatel Club Plus Handy
1415130,/cellphones/alcatel-ot-club_1187/,5/4/2000,de,de,Ciao,ciao.de,2.0,10.0,"Jetzt hat wohl Alcatell gedacht ,sie machen wa...",michaelawr,Alcatel Club Plus Handy
1415131,/cellphones/alcatel-ot-club_1187/,5/1/2000,de,de,Ciao,ciao.de,8.0,10.0,Ich bin seit 2 Jahren (stolzer) Besitzer eines...,claudia0815,Alcatel Club Plus Handy
1415132,/cellphones/alcatel-ot-club_1187/,4/25/2000,de,de,Ciao,ciao.de,2.0,10.0,"Was sich Alkatel hier wieder ausgedacht hat,sc...",michaelawr,Alcatel Club Plus Handy


In [14]:
reviewdf.shape

(1415133, 11)

In [15]:
reviewdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1415133 entries, 0 to 1415132
Data columns (total 11 columns):
phone_url    1415133 non-null object
date         1415133 non-null object
lang         1415133 non-null object
country      1415133 non-null object
source       1415133 non-null object
domain       1415133 non-null object
score        1351644 non-null float64
score_max    1351644 non-null float64
extract      1395772 non-null object
author       1351931 non-null object
product      1415132 non-null object
dtypes: float64(2), object(9)
memory usage: 118.8+ MB


In [16]:
reviewdf.describe()

Unnamed: 0,score,score_max
count,1351644.0,1351644.0
mean,8.00706,10.0
std,2.616121,0.0
min,0.2,10.0
25%,7.2,10.0
50%,9.2,10.0
75%,10.0,10.0
max,10.0,10.0


In [17]:
reviewdf['lang'].value_counts()

en    554746
ru    207443
de    176600
it    116120
es     99739
fr     95080
pt     67155
nl     38375
tr     28359
sv     17149
fi      6953
cs      2533
no      1918
he      1370
pl       493
da       418
hu       346
id       271
ja        33
zh        19
ar        12
ko         1
Name: lang, dtype: int64

In [18]:
reviewdf['lang'].value_counts().plot(kind="bar")

<matplotlib.axes._subplots.AxesSubplot at 0x1941e39a8d0>

1. There are 11 features in dataset
2. Score and max_score are the two numerical features. 
3. score, score_max, extract and author have null values
4. review are given in different languages, most of the review ae in english

In [19]:
#C. Round off scores to the nearest integers
reviewdf['score']=reviewdf['score'].round(0).astype('Int64')
reviewdf['score_max']=reviewdf['score_max'].round(0).astype('Int64')

In [20]:
reviewdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1415133 entries, 0 to 1415132
Data columns (total 11 columns):
phone_url    1415133 non-null object
date         1415133 non-null object
lang         1415133 non-null object
country      1415133 non-null object
source       1415133 non-null object
domain       1415133 non-null object
score        1351644 non-null Int64
score_max    1351644 non-null Int64
extract      1395772 non-null object
author       1351931 non-null object
product      1415132 non-null object
dtypes: Int64(2), object(9)
memory usage: 121.5+ MB


In [21]:
#D. Check for missing values. Impute the missing values, if any
reviewdf.isna().sum()

phone_url        0
date             0
lang             0
country          0
source           0
domain           0
score        63489
score_max    63489
extract      19361
author       63202
product          1
dtype: int64

In [22]:
reviewdf['score'] = reviewdf['score'].fillna(reviewdf['score'].median())
reviewdf['score_max'] = reviewdf['score_max'].fillna(reviewdf['score_max'].median())
reviewdf = reviewdf[reviewdf["author"] != 'Anonymous']
reviewdf = reviewdf.dropna()

In [23]:
#E. Check for duplicate values and remove them, if any
reviewdf = reviewdf.drop_duplicates()

In [24]:
#F. Keep only 1 Million data samples. Use random state=612.
newdf = reviewdf.sample(n=1000000, random_state=612)

In [25]:
#G. Drop irrelevant features. Keep features like Author, Product, and Score. 
newdf.drop(['phone_url','date','lang','country','source','domain','extract'], axis = 1, inplace = True)


In [26]:
newdf.drop(['score_max'],axis=1,inplace=True)

In [27]:
newdf.shape

(1000000, 3)

### 2. Answer the following questions.

In [28]:
#A. Identify the most rated products.
(newdf['product'].value_counts()).head() # products rating count

Lenovo Vibe K4 Note (White,16GB)     3878
Lenovo Vibe K4 Note (Black, 16GB)    3286
OnePlus 3 (Graphite, 64 GB)          3133
OnePlus 3 (Soft Gold, 64 GB)         2643
Samsung Galaxy Express I8730         2019
Name: product, dtype: int64

lenova vibe K4 Note and oneplus 3 are most rated products.

In [29]:
newdf.groupby('product')['score'].mean().sort_values(ascending=False).head()   #highly rated products

product
LG Straight Talk LG 420G Prepaid Cell Phone                                                             10.0
Sony Smartphone Sony Xperia M2 Aqua 4G Android 4.4 - CÃ¢m. 8MP Tela 4.8" Proc. Quad Core Wi-Fi A-GPS    10.0
LG D821 Nexus 5                                                                                         10.0
LG D821 Nexus 5 16GB White Fraktfritt                                                                   10.0
Sony Smartphone Sony Xperia T2 Ultra Dual Chip 3G - Android 4.3 CÃ¢m. 13MP Tela 6" Proc. Quad Core      10.0
Name: score, dtype: float64

The above are some of the products whose avg score is 10

In [30]:
#B. Identify the users with most number of reviews.
(newdf['author'].value_counts()).head()

Amazon Customer    57877
Cliente Amazon     14529
e-bit               6309
Client d'Amazon     5819
Amazon Kunde        3564
Name: author, dtype: int64

Amazon customer seems to have more reviews [57877] compared to others

In [31]:
#C. Select the data with products having more than 50 ratings and users who have given more than 50 ratings. Report the shape of the final dataset
prodvaluecount=newdf['product'].value_counts()
prodmorethan50rating=prodvaluecount[prodvaluecount>50].index.tolist()
print("No. of products greater than 50 ratings",len(prodmorethan50rating))

uservaluecount=newdf['author'].value_counts()
usermorethan50rating=uservaluecount[uservaluecount>50].index.tolist()
print("No. of user who have greater than 50 ratings",len(usermorethan50rating))


finaldf=newdf[(newdf['product'].isin(prodmorethan50rating))&(newdf['author'].isin(usermorethan50rating))]


finaldf.shape

No. of products greater than 50 ratings 4397
No. of user who have greater than 50 ratings 671


(108302, 3)

### 3. Build a popularity based model and recommend top 5 mobile phones.

In [32]:
meanrating= pd.DataFrame(newdf.groupby('product')['score'].mean())
meanrating['Numberofratings'] = pd.DataFrame(newdf.groupby('product')['score'].count()) 
meanrating = meanrating.sort_values(by=['score','Numberofratings'], ascending=[False,False])
print('Top 5 recommendations for the products are: \n')
display(meanrating.head())

Top 5 recommendations for the products are: 



Unnamed: 0_level_0,score,Numberofratings
product,Unnamed: 1_level_1,Unnamed: 2_level_1
Samsung Galaxy Note5,10.0,153
Nokia Smartphone Nokia Lumia 520 Desbloqueado Oi Preto Windows Phone 8 CÃ¢mera 5MP 3G Wi-Fi MemÃ³ria Interna 8G GPS,10.0,138
Motorola Smartphone Motorola Moto X Desbloqueado Preto Android 4.2.2 CÃ¢mera 10MP e Frontal 2MP MemÃ³ria Interna de 16GB GSM,10.0,137
Samsung Smartphone Dual Chip Samsung Galaxy SIII Duos Desbloqueado Claro Azul Android 4.1 3G/Wi-Fi CÃ¢mera 5MP,10.0,132
Motorola Smartphone Motorola Moto G Dual Chip Desbloqueado TIM Android 4.3 Tela 4.5 8GB 3G Wi-Fi CÃ¢mera 5MP - Preto,10.0,130


### 4. Build a collaborative filtering model using SVD. You can use SVD from surprise or build it from scratch

(Note: Incase you’re building it from scratch you can limit your data points to 5000 samples if you face memory issues). 

### Build a collaborative filtering model using kNNWithMeans from surprise. You can try both user-based and item-based model.

In [33]:
col= ['author','product','score']
finaldf=finaldf.reindex(columns=col)
#finaldfresample = finaldf.sample(n=5000, random_state=612)

In [34]:
reader = Reader(rating_scale=(1, 10))
data = Dataset.load_from_df(finaldf,reader = reader)
#trainset = data.build_full_trainset()
#testset = trainset.build_anti_testset()
trainset, testset = train_test_split(data, test_size=.15)

In [35]:
def svd_func(train, test):
    svd = SVD(random_state=612)
    svd.fit(train)
    svd_pred = svd.test(test)
    return svd_pred, svd

svd_pred, svd = svd_func(trainset,testset)
print('First few prediction values: \n',svd_pred[0:2])


First few prediction values: 
 [Prediction(uid='Amazon Customer', iid='HTC Desire 620G (Santroni White)', r_ui=10.0, est=5.871633591534618, details={'was_impossible': False}), Prediction(uid='Samuel', iid='LG OPTIMUS L7 II DUAL P715 Factory Unlocked International Version - BLACK (No-Warranty)', r_ui=2.0, est=8.05978024959822, details={'was_impossible': False})]


In [36]:
def knn_item(train, test):
    knn_i = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': False})
    knn_i.fit(train)
    knn_i_pred = knn_i.test(test)
    return knn_i_pred, knn_i

knn_i_pred, knn_i = knn_item(trainset, testset)
print('First few prediction values: \n',knn_i_pred[0:2])


Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
First few prediction values: 
 [Prediction(uid='Amazon Customer', iid='HTC Desire 620G (Santroni White)', r_ui=10.0, est=6.96, details={'actual_k': 50, 'was_impossible': False}), Prediction(uid='Samuel', iid='LG OPTIMUS L7 II DUAL P715 Factory Unlocked International Version - BLACK (No-Warranty)', r_ui=2.0, est=10, details={'actual_k': 3, 'was_impossible': False})]


In [37]:
def knn_user(train, test):
    knn_u = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': True})
    knn_u.fit(train)
    knn_u_pred = knn_u.test(test)
    return knn_u_pred, knn_u

knn_u_pred, knn_u = knn_user(trainset, testset)
print('First few prediction values: \n',knn_u_pred[0:2])


Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
First few prediction values: 
 [Prediction(uid='Amazon Customer', iid='HTC Desire 620G (Santroni White)', r_ui=10.0, est=6.96, details={'actual_k': 50, 'was_impossible': False}), Prediction(uid='Samuel', iid='LG OPTIMUS L7 II DUAL P715 Factory Unlocked International Version - BLACK (No-Warranty)', r_ui=2.0, est=6.416256157635468, details={'actual_k': 1, 'was_impossible': False})]


### 5. Evaluate the collaborative model. Print RMSE value.

In [38]:
svd_rmse = round(accuracy.rmse(svd_pred),2) # compute RMSE
print('\nRMSE value SVD collaborative model(test-set): ',svd_rmse)


RMSE: 2.6454

RMSE value SVD collaborative model(test-set):  2.65


In [39]:
knn_i_rmse = round(accuracy.rmse(knn_i_pred),2)
print('\nRMSE value Item-based Model: ',knn_i_rmse) # compute RMSE


RMSE: 2.6627

RMSE value Item-based Model:  2.66


In [40]:
knn_u_rmse = round(accuracy.rmse(knn_u_pred),2)
print('\nRMSE value(User-based Model, test-set): ',knn_u_rmse) # compute RMSE


RMSE: 2.7191

RMSE value(User-based Model, test-set):  2.72


### 6. Predict score (average rating) for test users.

In [41]:
svd_pred_df=pd.DataFrame(svd_pred, columns=['uid', 'iid', 'rui', 'est', 'details'])
print('average prediction for test users: ',svd_pred_df['est'].mean())
print('average rating  by test users: ',svd_pred_df['rui'].mean())
print('average prediction error for test users: ',(svd_pred_df['rui']-svd_pred_df['est']).abs().mean())

average prediction for test users:  7.938742081187367
average rating  by test users:  7.877262095284993
average prediction error for test users:  2.0304896429137274


In [42]:
knn_i_pred_df=pd.DataFrame(knn_i_pred, columns=['uid', 'iid', 'rui', 'est', 'details'])
print('average prediction for test users: ',knn_i_pred_df['est'].mean())
print('average rating  by test users: ',knn_i_pred_df['rui'].mean())
print('average prediction error for test users: ',(knn_i_pred_df['rui']-knn_i_pred_df['est']).abs().mean())

average prediction for test users:  7.855641120039388
average rating  by test users:  7.877262095284993
average prediction error for test users:  2.0385427880791274


In [43]:
knn_u_pred_df=pd.DataFrame(knn_u_pred, columns=['uid', 'iid', 'rui', 'est', 'details'])
print('average prediction for test users: ',knn_u_pred_df['est'].mean())
print('average rating  by test users: ',knn_u_pred_df['rui'].mean())
print('average prediction error for test users: ',(knn_u_pred_df['rui']-knn_u_pred_df['est']).abs().mean())

average prediction for test users:  7.8774919676925705
average rating  by test users:  7.877262095284993
average prediction error for test users:  2.053533850488572


### 7. Report your findings and inferences.

| Receommendatin system| RMSE | Avg Prediction error for test user | 
| --- | --- | --- |
|SVD|2.65|2.03|
|Item based collaborative filtering|2.66|2.03|
|User Based Collaborative filtering|2.72|2.05|


Both SVD and item based collaborative recommendation yields same results

SVD is performing better than other two systems. need to  do cross validation.

deviation of predicted rating is 2.03-2.05 in all the models .

Samsung Galaxy Note5 is the most popular phone .

Most of the author are amazon customer.

### 8. Try and recommend top 5 products for test users.

In [44]:


from collections import defaultdict
def get_top_n(predictions,n):
    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:5]
    return top_n
top_5 = get_top_n(knn_i_pred,5)

print('Top 5 recommendations for all test users are: \n')
for key,value in top_5.items(): print(key,'-> ',value,'\n') # to print all the recommendations for all the users

Top 5 recommendations for all test users are: 

Amazon Customer ->  [('Samsung E2550 Handy (Social Networking Dienste, Kamera, MP3-Player) strong-black', 10), ('Motorola Moto Z Play - Black - 32GB (U.S. Warranty)', 10), ('Samsung T699 "Galaxy S Relay 4G" T-Mobile Android Phone - Black', 10), ('Lenovo A526 (Blue)', 10), ('Nokia 7705 Twist Phone, Black (Verizon Wireless) Works With Page Plus And Red Pocket cdma Verizon', 10)] 

Samuel ->  [('LG OPTIMUS L7 II DUAL P715 Factory Unlocked International Version - BLACK (No-Warranty)', 10), ('Samsung Galaxy S7 edge SM-G935F 32GB 4G Pink gold - Smartphone (SIM Ãºnica, Android, NanoSIM, GSM, UMTS, WCDMA, LTE)', 8.057885837323365), ('Sony Ericsson W300', 8.0), ('Huawei P8 Lite Smartphone, Display 5" IPS, Processore Octa-Core 1.5 GHz, Memoria Interna da 16 GB, 2 GB RAM, Fotocamera 13 MP, monoSIM, Android 5.0, Bianco [Italia]', 7.453850028152508), ('Samsung Galaxy S3 White Sprint Android Smart Phone', 4.745775470109253)] 

Diego ->  [('Smartphone S

ÐÐ¾ÑÑÑ ->  [('Nokia 5250', 10), ('Nokia 6303i Classic', 9.487621170758969), ('Apple iPhone SE 64GB (Ñ\x81ÐµÑ\x80Ñ\x8bÐ¹ ÐºÐ¾Ñ\x81Ð¼Ð¾Ñ\x81)', 9.432416530329645), ('Nokia 5228', 9.327722102679912), ('Apple iPhone 3GS 16Gb', 9.246064887704314)] 

George ->  [('Smartphone Moto E DTV Colors Preto com TV Digital, Dual Chip, Tela de 4.3â\x80\x9d, Android 4.4, 3G, Wi-Fi, CÃ¢mera 5MP e Duas Capas Coloridas', 10), ('LG G2 Mini 3G DUAL D618 8GB Unlocked GSM Dual-SIM Android 4.4 (KitKat) Quad-Core Smartphone - Titan Black - International Version No Warranty', 9.808545914614033), ('LG Chocolate', 9.787120720596521), ('LG P920 Optimus 3D', 8.851063829787234), ('Samsung Galaxy S II White', 7.9850295891449425)] 

ozlemce108 ->  [('Nokia 5610 Cep Telefonu', 9.0), ('Nokia 6300 Cep Telefonu', 9.0), ('Nokia E51 Cep Telefonu', 9.0), ('Nokia N81 Cep Telefonu', 9.0), ('Samsung I560 Cep Telefonu', 9.0)] 

ÐÐ»ÐµÐºÑÐµÐ¹ ->  [('Samsung Galaxy S7', 10), ('Samsung Galaxy Note 3', 10), ('Apple iPhone 5s 16GB

Joseph ->  [('HTC One X 32GB', 10), ('OnePlus One (16GB, Silk White)', 8.073084863270818), ('Huawei Mate 2 - Factory Unlocked (White)', 7.459885812693887), ('LG Nexus 5X Unlocked Smartphone with 5.2-Inch 32GB H790 4G LTE (Carbon Black)', 6.567765576570648)] 

Tony ->  [('BlackBerry Z30 Unlocked Cellphone, 16GB, Black', 10), ('Apple iPhone 3G', 10), ('Samsung T959 Galaxy S Vibrant 4G GSM Unlocked Android Smartphone', 10), ('Honor 7 Smartphone dÃ©bloquÃ© 4G (Ecran: 5,2 pouces - 16 Go - Double Nano SIM - Android 5.0 Lollipop) Argent/Blanc', 10), ('LG Nexus 4 Smartphone, Nero [Italia]', 9.90661284920727)] 

okuyan ->  [('Apple iPhone 3GS 16GB', 9.016392950625693), ('Nokia N93 Cep Telefonu', 9.015580585841727), ('Nokia 6500 Slide Cep Telefonu', 9.01259018931843), ('Sony Ericsson K850i Cep Telefonu', 9.0), ('Nokia 5610 Cep Telefonu', 9.0)] 

Sam ->  [('Samsung Galaxy J3 (8GB)', 9.972749195055254), ('Motorola RAZRi UK Sim Free Smartphone', 9.871352381372883), ('Huawei P8 Lite Smartphone, Disp

dunya ->  [('Nokia 6600i Slide Cep Telefonu', 9.0), ('Nokia 5800 XpressMusic Cep Telefonu', 9.0), ('Nokia 6600i Slide Cep Telefonu', 9.0), ('Nokia 6300 Cep Telefonu', 9.0), ('LG KP500 COOKIE Cep Telefonu', 9.0)] 

Benjamin ->  [('Sony Xperia E Smartphone (8,9 cm (3,5 Zoll) Touchscreen, Qualcomm, 1GHz, 512MB RAM, 4GB HDD, 3,2 Megapixel Kamera, Android 4.1) wei??', 4.0102276140717175)] 

ozer1299 ->  [('Nokia N73 Cep Telefonu', 9.011672367054427)] 

Jean ->  [('Samsung Galaxy Grand Prime Dual Sim Factory Unlocked Phone - Retail Packaging - Gold(International Version)', 9.025632770590212), ('Samsung Galaxy S7 Edge zwart / 32 GB', 8.9213984217413)] 

Sophie ->  [('Honor 7 Smartphone dÃ©bloquÃ© 4G (Ecran: 5,2 pouces - 16 Go - Double Nano SIM - Android 5.0 Lollipop) Gris/Noir', 7.444943820224719)] 

osmntyr ->  [('Samsung M150 Cep Telefonu', 9.0)] 

sumeyyehavvaay ->  [('Nokia 6500c Cep Telefonu', 9.0)] 



### 9. Try other techniques (Example: cross validation) to get better results.

In [45]:
svd_cv = cross_validate(svd,data, measures=['RMSE'], cv=5, verbose=False)
print('\n Mean svd cv score:', round(svd_cv['test_rmse'].mean(),2),'\n')
svd_cv


 Mean svd cv score: 2.67 



{'test_rmse': array([2.67453342, 2.68164476, 2.65692785, 2.66123273, 2.65132542]),
 'fit_time': (11.297166347503662,
  11.020620346069336,
  10.84618091583252,
  9.752644538879395,
  8.378888130187988),
 'test_time': (0.27921104431152344,
  0.3684988021850586,
  0.28350043296813965,
  0.3727595806121826,
  0.25853443145751953)}

In [46]:
knn_i_cv = cross_validate(knn_i,data, measures=['RMSE'], cv=5, verbose=False)
print('\n Mean knn_i_cv score:', round(knn_i_cv['test_rmse'].mean(),2),'\n')
knn_i_cv

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.

 Mean knn_i_cv score: 2.69 



{'test_rmse': array([2.70685576, 2.68514252, 2.68588342, 2.70740822, 2.678186  ]),
 'fit_time': (410.4503929615021,
  377.36462020874023,
  452.88364839553833,
  401.20813369750977,
  402.32679176330566),
 'test_time': (453.17704725265503,
  468.7737078666687,
  427.5407781600952,
  462.86636900901794,
  452.5380048751831)}

In [47]:
knn_u_cv = cross_validate(knn_u,data, measures=['RMSE'], cv=5, verbose=False)
print('\n Mean knn_u_cv score:', round(knn_u_cv['test_rmse'].mean(),2),'\n')
knn_u_cv

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.

 Mean knn_u_cv score: 2.75 



{'test_rmse': array([2.75551822, 2.74472613, 2.72573154, 2.74658002, 2.76177857]),
 'fit_time': (4.839931488037109,
  4.843189477920532,
  5.0993921756744385,
  4.6870927810668945,
  5.020069360733032),
 'test_time': (9.259884357452393,
  9.039757013320923,
  8.988811731338501,
  8.954061269760132,
  9.702831268310547)}

| Receommendatin system| cross validation |
| --- | --- |
|SVD|2.67|
|Item based collaborative filtering|2.69|
|User Based Collaborative filtering|2.75|

using cross validation also , SVD is performing better compared to KNN-item based nad KNN-userbased

### 10. In what business scenario you should use popularity based Recommendation Systems ?

1. Popularity based recommendation can be used when we want to recommend most watched videos or most selled products.
2. we can recommend what fellow people like to the users
3. when we dont have user information, we will not suffer cold-start problem 
4. Also used in determning trending / popular items in different categories
5. Most watched Tamil movies, Trending in India, 
6. Best selling item in western dresses.
7. Most selected trip package

### 11. In what business scenario you should use CF based Recommendation Systems ?

1. Collaboative filtering can be used when we have past data about the user and the item to give personalized recommendation
2. It can be used to recommend movies like since you watched this video/song, you may also like this video/song ; 
3. People bought this product also bought this additional product. [ mobile+mobile cover]
4. Recommendation based on likes and interaction from people who have similar taste.

### 12. What other possible methods can you think of which can further improve the recommendation for different users ?

1. session based recommendation - based on user's interaction with the session
2. Hybrid recomendation of content and collaborative
3. knowledge based receommendation ,
4. Locality/language based Recommendation can be used
