### Q1 A - Merge all the provided CSVs into one data-frame.

In [1]:
# import the relevant libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

In [2]:
# read & merge the dataset

data1 = pd.read_csv('phone_user_review_file_1.csv', encoding = "ISO-8859-1")
data2 = pd.read_csv('phone_user_review_file_2.csv', encoding = "ISO-8859-1")
data3 = pd.read_csv('phone_user_review_file_3.csv', encoding = "ISO-8859-1")
data4 = pd.read_csv('phone_user_review_file_4.csv', encoding = "ISO-8859-1")
data5 = pd.read_csv('phone_user_review_file_5.csv', encoding = "ISO-8859-1")
data6 = pd.read_csv('phone_user_review_file_6.csv', encoding = "ISO-8859-1")

merged_data = pd.concat([data1, data2, data3, data4, data5, data6], axis=0).reset_index()
merged_data

Unnamed: 0,index,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8
2,2,/cellphones/samsung-galaxy-s8/,5/4/2017,en,us,Amazon,amazon.com,6.0,10.0,Adequate feel. Nice heft. Processor's still sl...,R. Craig,"Samsung Galaxy S8 (64GB) G950U 5.8"" 4G LTE Unl..."
3,3,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Samsung,samsung.com,9.2,10.0,Never disappointed. One of the reasons I've be...,Buster2020,Samsung Galaxy S8 64GB (AT&T)
4,4,/cellphones/samsung-galaxy-s8/,5/11/2017,en,us,Verizon Wireless,verizonwireless.com,4.0,10.0,I've now found that i'm in a group of people t...,S Ate Mine,Samsung Galaxy S8
...,...,...,...,...,...,...,...,...,...,...,...,...
1415128,163832,/cellphones/alcatel-ot-club_1187/,5/12/2000,de,de,Ciao,ciao.de,2.0,10.0,Weil mein Onkel bei ALcatel arbeitet habe ich ...,david.paul,Alcatel Club Plus Handy
1415129,163833,/cellphones/alcatel-ot-club_1187/,5/11/2000,de,de,Ciao,ciao.de,10.0,10.0,Hy Liebe Leserinnen und Leser!! Ich habe seit ...,Christiane14,Alcatel Club Plus Handy
1415130,163834,/cellphones/alcatel-ot-club_1187/,5/4/2000,de,de,Ciao,ciao.de,2.0,10.0,"Jetzt hat wohl Alcatell gedacht ,sie machen wa...",michaelawr,Alcatel Club Plus Handy
1415131,163835,/cellphones/alcatel-ot-club_1187/,5/1/2000,de,de,Ciao,ciao.de,8.0,10.0,Ich bin seit 2 Jahren (stolzer) Besitzer eines...,claudia0815,Alcatel Club Plus Handy


### Q1 B - Explore, understand the Data and share at least 2 observations.

In [3]:
# basic statistic

merged_data.describe()

Unnamed: 0,index,score,score_max
count,1415133.0,1351644.0,1351644.0
mean,145167.3,8.00706,10.0
std,101459.9,2.616121,0.0
min,0.0,0.2,10.0
25%,58963.0,7.2,10.0
50%,123589.0,9.2,10.0
75%,228101.0,10.0,10.0
max,374909.0,10.0,10.0


- Count for 'score' and 'score_max' is same.
- Column 'score_max' has same value throughout all the rows.

In [4]:
# most rating w.r.t. country

merged_data['country'].value_counts().head(1)

us    318435
Name: country, dtype: int64

- Most rating is from the country 'us'.

In [5]:
# most rating w.r.t. language

merged_data['lang'].value_counts().head(1)

en    554746
Name: lang, dtype: int64

- Most rating is in 'en' language.

### Q1 C - Round off scores to the nearest integers.

In [6]:
merged_data['score'] = merged_data['score'].round(0)
merged_data['score_max'] = merged_data['score_max'].round(0)

### Q1 D - Check for missing values. Impute the missing values, if any.

In [7]:
# percentage missing value

merged_data.isna().sum()/len(merged_data) * 100

index        0.000000
phone_url    0.000000
date         0.000000
lang         0.000000
country      0.000000
source       0.000000
domain       0.000000
score        4.486433
score_max    4.486433
extract      1.368140
author       4.466153
product      0.000071
dtype: float64

- The percentage missing value in column 'score' is less than 5%. Hence dropping such rows would be inconsequential.

In [8]:
# impute the missing value in column 'score' with median

merged_data['score'] = merged_data['score'].fillna(merged_data['score'].median())

In [9]:
# impute the missing value in column 'score_max' with '10'

merged_data['score_max'] = merged_data['score_max'].replace(np.nan, 10)

In [10]:
# drop the rows with null values

merged_data_v1 = merged_data.dropna()

In [11]:
# again cross verify the null values

merged_data_v1.isna().sum()

index        0
phone_url    0
date         0
lang         0
country      0
source       0
domain       0
score        0
score_max    0
extract      0
author       0
product      0
dtype: int64

- Hence, we have successfully dealt with the missing values. Now there are no missing values.

In [12]:
# shape of the dataset after removing the missing values

merged_data_v1.shape

(1336416, 12)

### Q1 E - Check for duplicate values and remove them, if any.

In [13]:
len(merged_data_v1[merged_data_v1.duplicated()])

0

- There are no duplicate values.

### Q1 F - Keep only 1 Million data samples. Use random state=612.

In [14]:
merged_data_v2 = merged_data_v1.sample(1000000, random_state=612)
merged_data_v2.shape

(1000000, 12)

### Q1 G - Drop irrelevant features. Keep features like Author, Product, and Score.

In [15]:
col = ['index', 'phone_url', 'date', 'lang', 'country', 'source', 'domain', 'score_max', 'extract']
merged_data_v3 = merged_data_v2.drop(col, axis=1).reset_index(drop=True)
merged_data_v3

Unnamed: 0,score,author,product
0,8.0,ÐÑÐ½Ð´Ð°ÑÐµÐ² Ð¡ÐµÑÐ³ÐµÐ¹,Apple iPhone 5S 32Gb
1,10.0,Rosemarie Boeshans,"Samsung Galaxy S3 mini I8190 Smartphone (10,2 ..."
2,10.0,Federico Minetti,"Honor 4X Smartphone 4G, Display 5.5 Pollici, P..."
3,10.0,EugÃªnio,Smartphone Samsung Galaxy J1 Mini Duos SM-J105...
4,10.0,AH Kamalati,Samsung Wave 2 S8530 Sim Free Mobile Smart Pho...
...,...,...,...
999995,8.0,Suelen Araripe,Samsung Smartphone Samsung Galaxy Gran Duos GT...
999996,6.0,isidre viÃ±as,Samsung S4 Mini - Smartphone libre (pantalla 4...
999997,10.0,skinnyguy,Samsung Galaxy S III 16GB (Sprint)
999998,10.0,just me,Microsoft Nokia Lumia 630 UK SIM-Free Smartpho...


### Q2 A - Identify the most rated features.

In [16]:
merged_data_v3['product'].value_counts().head()

Lenovo Vibe K4 Note (White,16GB)     3835
Lenovo Vibe K4 Note (Black, 16GB)    3275
OnePlus 3 (Graphite, 64 GB)          3097
OnePlus 3 (Soft Gold, 64 GB)         2673
Samsung Galaxy Express I8730         2009
Name: product, dtype: int64

### Q2 B - Identify the users with most number of reviews.

In [17]:
merged_data_v3['author'].value_counts().head()

Amazon Customer    57643
Cliente Amazon     14458
e-bit               6492
Client d'Amazon     5803
Amazon Kunde        3545
Name: author, dtype: int64

### Q2 C - Select the data with products having more than 50 ratings and users who have given more than 50 ratings. Report the shape of the final dataset.

In [18]:
# extracting authors who gave greater than 50 ratings

df1 = pd.DataFrame(columns=['author', 'a_count'])
df1['author']=merged_data_v3['author'].value_counts().index.tolist() 
df1['a_count'] = list(merged_data_v3['author'].value_counts() > 50)

In [19]:
# get names of indexes for which count column value is False

index_names = df1[ df1['a_count'] == False ].index

# drop these row indexes from dataFrame

df1.drop(index_names, inplace = True) 
df1

Unnamed: 0,author,a_count
0,Amazon Customer,True
1,Cliente Amazon,True
2,e-bit,True
3,Client d'Amazon,True
4,Amazon Kunde,True
...,...,...
669,Gonzalo,True
670,Jeremy,True
671,MarÃ­a,True
672,Monique,True


In [20]:
# extracting product that got more than 50 ratings

df2 = pd.DataFrame(columns=['product', 'p_count'])
df2['product'] = merged_data_v3['product'].value_counts().index.tolist() 
df2['p_count'] = list(merged_data_v3['product'].value_counts() > 50)

In [21]:
# get names of indexes for which count column value is False

index_names = df2[ df2['p_count'] == False ].index

# drop these row indexes from dataFrame

df2.drop(index_names, inplace = True)
df2

Unnamed: 0,product,p_count
0,"Lenovo Vibe K4 Note (White,16GB)",True
1,"Lenovo Vibe K4 Note (Black, 16GB)",True
2,"OnePlus 3 (Graphite, 64 GB)",True
3,"OnePlus 3 (Soft Gold, 64 GB)",True
4,Samsung Galaxy Express I8730,True
...,...,...
4369,Samsung Evergreen A667 Unlocked GSM 3G Phone w...,True
4370,Cubot X9 Unlocked Cell Phone 5.0 inch Octa Cor...,True
4371,Samsung Star II DUOS,True
4372,"Samsung A187 At&t Phone with QWERTY Keyboard, ...",True


In [22]:
# selecting data rows where product is having more than 50 ratings.

df3 = merged_data_v3[merged_data_v3['product'].isin(df2['product'])] 
df3

Unnamed: 0,score,author,product
0,8.0,ÐÑÐ½Ð´Ð°ÑÐµÐ² Ð¡ÐµÑÐ³ÐµÐ¹,Apple iPhone 5S 32Gb
1,10.0,Rosemarie Boeshans,"Samsung Galaxy S3 mini I8190 Smartphone (10,2 ..."
2,10.0,Federico Minetti,"Honor 4X Smartphone 4G, Display 5.5 Pollici, P..."
8,2.0,Tom,"Apple iPhone 5s GSM Unlocked Cellphone, 16 GB,..."
9,10.0,BESOURO FELIX,LG GX200
...,...,...,...
999994,8.0,VINICIUS_RO0,BlackBerry Curve 8320
999995,8.0,Suelen Araripe,Samsung Smartphone Samsung Galaxy Gran Duos GT...
999996,6.0,isidre viÃ±as,Samsung S4 Mini - Smartphone libre (pantalla 4...
999997,10.0,skinnyguy,Samsung Galaxy S III 16GB (Sprint)


In [23]:
# selecting data rows from df3 where author has given more than 50 ratings.
# so that we get the data with products having more than 50 ratings and users who have given more than 50 ratings

df4 = df3[df3['author'].isin(df1['author'])]
df4

Unnamed: 0,score,author,product
8,2.0,Tom,"Apple iPhone 5s GSM Unlocked Cellphone, 16 GB,..."
10,6.0,Amazon Customer,Samsung Galaxy SIII UK SIM-Free Smartphone - P...
21,8.0,Cliente Amazon,Samsung Galaxy A3 (2016) 16GB White
45,6.0,JR,"Huawei P8 Lite - Smartphone de 5"" (cÃ¡mara 13 ..."
56,10.0,Ali,Sim Free Samsung Galaxy S7 Edge Mobile Phone -...
...,...,...,...
999975,10.0,e-bit,Smartphone Samsung Galaxy Gran Prime SM-G531
999978,10.0,lorenzo,"Huawei P8 lite Smartphone, Display 5.0"" IPS, D..."
999979,6.0,Amazon Customer,"Lenovo Vibe K4 Note (White,16GB)"
999985,2.0,Amazon Customer,"Motorola Moto G 3rd Generation (Black, 8GB)"


In [24]:
# data shape of the df4

df4.shape

(108657, 3)

### Q3 - Build a popularity based model and recommend top 5 mobile phones.

In [25]:
#calculating the mean score for a product by grouping it.

ratings_mean_count = pd.DataFrame(merged_data_v3.groupby('product')['score'].mean())
ratings_mean_count

Unnamed: 0_level_0,score
product,Unnamed: 1_level_1
"'Smartphone Meizu Pro 5, 5,7 pouces avec Exynos 7420 Octa 8 Core Processeur. mÃ©moire RAM 4 Go et 64 Go mÃ©moire...",8.000000
"'Sony Xperia X (F5122) â rosa â Dual Sim (Google Android 6.0.1, 5 Display, 2 x CORTEX A72 1.8 GHz + 4 x cortex-a53...",10.000000
"(CUBOT) GT88 5.5"" qHD 1.3GHz MTK6572 2-Core Android 4.2.2 3G Phone 8MP CAM 512MB RAM 4GB ROM",8.000000
"(DG300 Versione Aggiornata)5'' DOOGEE VOYAGER2 DG310 Dual Flashlights IPS Screen 3G Smartphone Android 4.4 MTK6582 1.3GHz Quad Core Telefono Cellulare Dual SIM 8G ROM OTG OTA GPS WIFI, BIANCO",7.529412
(Part 2) Lenovo VIBE X2,9.000000
...,...
ä¸­å¹é«éæ©ãDell Venue æºæ§äºé¨æ² å¯¦æ©åé®®,9.000000
å¹¾å¯äºçãéç¶²éå¡ç½è² iPhone 4ï¼,9.000000
æ©æç½æ C168i,10.000000
è·¨å¹´ååä¸èµ·ä¾ç Nokia C5-03 ç«éè©¦ç¨å ±å,9.000000


In [26]:
# calculating the number of ratings a product got

ratings_mean_count['rating_counts'] = pd.DataFrame(merged_data_v3.groupby('product')['score'].count())
ratings_mean_count

Unnamed: 0_level_0,score,rating_counts
product,Unnamed: 1_level_1,Unnamed: 2_level_1
"'Smartphone Meizu Pro 5, 5,7 pouces avec Exynos 7420 Octa 8 Core Processeur. mÃ©moire RAM 4 Go et 64 Go mÃ©moire...",8.000000,1
"'Sony Xperia X (F5122) â rosa â Dual Sim (Google Android 6.0.1, 5 Display, 2 x CORTEX A72 1.8 GHz + 4 x cortex-a53...",10.000000,1
"(CUBOT) GT88 5.5"" qHD 1.3GHz MTK6572 2-Core Android 4.2.2 3G Phone 8MP CAM 512MB RAM 4GB ROM",8.000000,1
"(DG300 Versione Aggiornata)5'' DOOGEE VOYAGER2 DG310 Dual Flashlights IPS Screen 3G Smartphone Android 4.4 MTK6582 1.3GHz Quad Core Telefono Cellulare Dual SIM 8G ROM OTG OTA GPS WIFI, BIANCO",7.529412,34
(Part 2) Lenovo VIBE X2,9.000000,1
...,...,...
ä¸­å¹é«éæ©ãDell Venue æºæ§äºé¨æ² å¯¦æ©åé®®,9.000000,1
å¹¾å¯äºçãéç¶²éå¡ç½è² iPhone 4ï¼,9.000000,1
æ©æç½æ C168i,10.000000,1
è·¨å¹´ååä¸èµ·ä¾ç Nokia C5-03 ç«éè©¦ç¨å ±å,9.000000,1


In [27]:
# Recommending the 5 mobile phones based on highest mean score and highest number of ratings the product got.

ratings_mean_count.sort_values(by=['score','rating_counts'], ascending=[False,False]).head()

Unnamed: 0_level_0,score,rating_counts
product,Unnamed: 1_level_1,Unnamed: 2_level_1
Samsung Galaxy Note5,10.0,152
Motorola Smartphone Motorola Moto X Desbloqueado Preto Android 4.2.2 CÃ¢mera 10MP e Frontal 2MP MemÃ³ria Interna de 16GB GSM,10.0,138
Samsung Smartphone Dual Chip Samsung Galaxy SIII Duos Desbloqueado Claro Azul Android 4.1 3G/Wi-Fi C??mera 5MP,10.0,135
Samsung Smartphone Dual Chip Samsung Galaxy SIII Duos Desbloqueado Claro Azul Android 4.1 3G/Wi-Fi CÃ¢mera 5MP,10.0,133
Motorola Smartphone Motorola Moto G Dual Chip Desbloqueado TIM Android 4.3 Tela 4.5 8GB 3G Wi-Fi CÃ¢mera 5MP - Preto,10.0,127


### Q4 - Build a collaborative filtering model using SVD. You can use SVD from surprise or build it from scratch.

In [28]:
# import relevant libraries
from sklearn import preprocessing
from collections import defaultdict
from surprise import SVD
from surprise import KNNWithMeans
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split

# Suppressing Warnings
import warnings
warnings.filterwarnings('ignore')

In [29]:
# arranging columns in the order of user id,item id and rating

columns_titles = ['author','product','score']
merged_data_v4 = merged_data_v3.reindex(columns=columns_titles)

In [30]:
# Keep only 5000 data samples. Use random state=612

data_svd = merged_data_v4.sample(n=5000, random_state=612)

In [31]:
# Build a collaborative filtering model using KNNWithMeans

reader = Reader(rating_scale=(1, 10))
data = Dataset.load_from_df(data_svd,reader = reader)
trainset, testset = train_test_split(data, test_size=.15)

In [32]:
# item-based collaborative filtering

algo1 = KNNWithMeans(k=50, sim_options={'name': 'cosine', 'user_based': False})
algo1.fit(trainset)

# run the trained model against the testset

test_pred1 = algo1.test(testset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


In [33]:
# user-based collaborative filtering

algo2 = KNNWithMeans(k=50, sim_options={'name': 'cosine'})
algo2.fit(trainset)

# run the trained model against the testset

test_pred2 = algo2.test(testset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


### Q5 - Evaluate the collaborative model. Print RMSE value.

In [34]:
# get RMSE for item-based model

print("Item-based Model : Test Set")
accuracy.rmse(test_pred1, verbose=True)

Item-based Model : Test Set
RMSE: 2.5822


2.582246948076654

In [35]:
# get RMSE for user-based model

print("User-based Model : Test Set")
accuracy.rmse(test_pred2, verbose=True)

User-based Model : Test Set
RMSE: 2.5383


2.538318576907625

### Q6 - Predict score (average rating) for test users.

In [36]:
# item-based

item_pred_df=pd.DataFrame(test_pred1, columns=['uid', 'iid', 'rui', 'est', 'details'])
print('average prediction for test users: ',item_pred_df['est'].mean())
print('actual average rating by test users: ',item_pred_df['rui'].mean())
print('average prediction error for test users: ',(item_pred_df['rui']-item_pred_df['est']).abs().mean())

average prediction for test users:  8.042592356316991
actual average rating by test users:  8.090666666666667
average prediction error for test users:  1.9935751370820851


In [37]:
# user-based

user_pred_df=pd.DataFrame(test_pred2, columns=['uid', 'iid', 'rui', 'est', 'details'])
print('average prediction for test users: ',user_pred_df['est'].mean())
print('actual average rating by test users: ',user_pred_df['rui'].mean())
print('average prediction error for test users: ',(user_pred_df['rui']-user_pred_df['est']).abs().mean())

average prediction for test users:  8.044622606572466
actual average rating by test users:  8.090666666666667
average prediction error for test users:  1.9782184754184433


### Q7 - Report your findings and inferences.

- Samsung Galaxy Note5 is the mobile with highest mean score and highest number of reviews.
- Highest numbers of reviews given by the author 'Amazon Customer' i.e 57643.
- knn_item(item-based) and knn_user(user-based) have roughly similar RMSE.

### Q8 - Try and recommend top 5 products for test users.

In [38]:
def get_top_rec(predictions, n=5):
    # First map the predictions to each user.
    top_rec = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_rec[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the n highest ones.
    for uid, user_ratings in top_rec.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_rec[uid] = user_ratings[:n]

    return top_rec

In [39]:
# item-based

top_rec = get_top_rec(test_pred1,5)
print('Top 5 recommendations for all test users are: \n')
for key,value in top_rec.items(): print(key,'-> ',value,'\n')    

Top 5 recommendations for all test users are: 

Stefan R. ->  [('Huawei Ascend P6 Smartphone (11,9 cm (4,7 Zoll) Touchscreen, 8 Megapixel, 8GB Speicher, Android 4.2) weiÃ\x9f', 8.069176470588236)] 

densil ->  [('Cubot X17S Sim Free Smartphone 3 GB Ram Unlocked Mobile Phone Dual Sim 5 Inch Quad Core Android 5.1 LTE/FDD/4G Dual Camera 16 MP Pixels 16GB Rom Unlock Cell Phone (White)', 8.069176470588236)] 

J.Janssen ->  [('Samsung Galaxy S3 mini', 8.069176470588236)] 

Cliente Amazon ->  [('Samsung Galaxy J5 - Smartphone libre Android (pantalla 5", cÃ¡mara 13 Mp, 8 GB, Quad-Core 1.2 GHz, 1.5 GB RAM), blanco', 10), ('Samsung Galaxy A3 (2017) Smartphone (pantalla tÃ¡ctil de 4,7 pulgadas (12,04 cm), 16 GB de memoria, Android 6.0)', 8.069176470588236), ('Sony Sony, Xperia E5, Smartphone', 8.069176470588236), ('Oukitel 5.5 "OUKITEL U7 Pro HD Schermo IPS 3G Android 5.1 MT6580 Quad Core Smartphone Dual SIM Dual Standby 1G/8G Smart Gesture Wake Movimento HotKnot Telefono Mobile del Cellulare (Or

SPLITRECIFE2008 ->  [('Samsung B2100', 8.069176470588236)] 

Shah ->  [('LG GD570 dLite Lollipop Unlocked GSM QuadBand Cell Phone with 2 MP Camera, Bluetooth, and MP3 Player--No Warranty (Sky Blue)', 8.069176470588236)] 

Scooby ->  [('OnePlus 3T A3000 64GB Gunmetal 4G/LTE North American Version GSM Factory Unlocked US Warranty', 8.069176470588236)] 

Asmidada ->  [('Cubot P11 Android 5,1 3G Smartphone Handy Ohne Vertrag 5 Zoll IPS HD Quad Core Dual SIM 8MP Kamera Air Control 1GB 8GB - Gold', 8.069176470588236)] 

Peter T. ->  [('Huawei P9 Lite Dual-SIM Smartphone, 13,2 cm (5,2 Zoll) Display, LTE (4G), Android 6.0 (Marshmallow)', 8.069176470588236)] 

Arturo ->  [('Samsung S5830 Galaxy Ace - Unlocked Phone - Black', 8.069176470588236)] 

Anzaar ->  [('HTC Desire 526G Plus (Fervor Red, 16GB)', 8.069176470588236)] 

Hopalaibins ->  [('Microsoft Nokia Asha 308 Dual-SIM Smartphone (7,6 cm (3 Zoll) Touchscreen, 2 Megapixel Kamera) schwarz', 8.069176470588236)] 

Levindo ->  [('Samsung Galax

PICHU55 ->  [('LG KP110', 8.069176470588236)] 

Anupam Choubey ->  [('Asus Zenfone Max ZC550KL-6A068IN (Black, 2GB, 16GB)', 8.069176470588236)] 

BakBak ->  [('Apple iPhone 4 16Go Noir', 8.069176470588236)] 

H.Mohamed Salahudeen ->  [('Mi Xiaomi Mi Max Prime (Gold, 128GB)', 8.069176470588236)] 

leiver ->  [('Samsung Galaxy S4 mini GT-i9190 - 8GB - Blue Artic (Unlocked) Intenational Model', 8.069176470588236)] 

Valeria Notari ->  [('Huawei Ascend Y300 Smartphone, 4 GB, Nero', 8.069176470588236)] 

supero23 ->  [('Sony Ericsson Xperia Arc', 8.069176470588236)] 

terry121 ->  [('Sony Ericsson Z310i', 8.069176470588236)] 

iolo003 ->  [('Nokia N82', 8.069176470588236)] 

Jennifer k popp ->  [('Samsung Galaxy S6 Edge G925a 64GB Unlocked GSM 4G LTE Octa-Core Smartphone w/ 16MP Camera - Gold Platinum', 8.069176470588236)] 

bgpl ->  [('Asus New Asus Zenfone 6 16GB Dual SIM (Unlocked) A601CG 3G 6" Intel Z2580 2G RAM (White) - International Version No Warranty', 8.069176470588236)] 

Vivek D

TB RC ->  [('Sony Xperia U Smartphone (8,9 cm (3,5 Zoll) Touchscreen, 5 Megapixel Kamera, Android 2.3, UMTS) schwarz/pink/gelb', 8.069176470588236)] 

Jack8716 ->  [('Siemens SX1', 8.069176470588236)] 

Vo Blinn ->  [('Samsung Galaxy Note 5 N920C 32GB Factory Unlocked GSM - International Version (Silver)', 8.069176470588236)] 

investigatore ->  [('LG Optimus One P500', 8.069176470588236)] 

Samuele sapio ->  [('Meizu M681H/16GB/SW M3 Note Smartphone da Memoria 16GB, Bianco/Argento [Italia]', 8.069176470588236)] 

valencia.arte ->  [('Apple iPhone 4S 32GB', 8.069176470588236), ('Sony Mobile Xperia Ion 16GB', 8.069176470588236)] 

V ->  [('Apple iPhone 5s (Silver, 16GB)', 8.069176470588236)] 

klausboysen ->  [('Microsoft Nokia 5230 Navi Smartphone (8,1 cm (3,2 Zoll) Display, Touchscreen, 2 Megapixel Kamera) black-chrome', 8.069176470588236)] 

J. Venzke ->  [('Blackberry Bold 9700 Smartphone (QWERTZ-Tastatur, 3 Megapixel-Digitalkamera, GPS-EmpfÃ¤nger, UMTS, WLAN, HSDPA) schwarz', 8.069

PatrusPetronius ->  [('Palm Treo Pro', 8.069176470588236)] 

Tristan ->  [('HTC UNLOCKED BLUE HTC 8X 16GB, Windows Phone 8, C625a, Front and Rear Camera, 8MP, 1080p Video, 4.3" LCD, Beats Audio...', 8.069176470588236)] 

Matteo ->  [('Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 8.069176470588236)] 

Tekilos ->  [('Samsung E1080', 8.069176470588236)] 

coolrasan ->  [('Nokia 5190', 8.069176470588236)] 

k.b. ->  [('LG Electronics G2 mini Smartphone (4,7 Zoll (11,9 cm) Touch-Display, 8 GB Speicher, Android 4.4) schwarz', 8.069176470588236)] 

Natayla ->  [("Casio G'zOne Ravine 2 C781 Verizon Black", 8.069176470588236)] 

Cristina Dominguez ->  [('Apple iPhone 5S 16GB', 8.069176470588236)] 

Jared ->  [('Apple iPhone 6 128GB (AT&T) - Grey', 8.069176470588236)] 

ÐÑÑÐµÐ¼ ÐÐ°Ð»ÐµÑÑÐµÐ²Ð¸Ñ ->  [('Nokia 6600 fold', 8.069176470588236)] 

Sanjay Kumar Singh ->  [('Lenovo Vibe K5 (Gold, VoLTE update)',

enurwo ->  [('Motorola Moto E (2nd Gen.) - Smartphone libre de 4.5" (Quad Core 1.2 GHz, 1 GB de RAM, 8 GB, cÃ¡mara 5 MP, Android) color negro', 8.069176470588236)] 

Ð»ÑÐ½Ð½ÑÐ¹ ÑÐ²ÐµÑ ->  [('Siemens C60', 8.069176470588236)] 

rilu96 ->  [('Samsung Galaxy S II GT-I9100', 8.069176470588236)] 



In [40]:
# user-based

top_rec = get_top_rec(test_pred2,5)
print('Top 5 recommendations for all test users are: \n')
for key,value in top_rec.items(): print(key,'-> ',value,'\n')

Top 5 recommendations for all test users are: 

Stefan R. ->  [('Huawei Ascend P6 Smartphone (11,9 cm (4,7 Zoll) Touchscreen, 8 Megapixel, 8GB Speicher, Android 4.2) weiÃ\x9f', 8.069176470588236)] 

densil ->  [('Cubot X17S Sim Free Smartphone 3 GB Ram Unlocked Mobile Phone Dual Sim 5 Inch Quad Core Android 5.1 LTE/FDD/4G Dual Camera 16 MP Pixels 16GB Rom Unlock Cell Phone (White)', 8.069176470588236)] 

J.Janssen ->  [('Samsung Galaxy S3 mini', 8.069176470588236)] 

Cliente Amazon ->  [('Samsung Galaxy A3 (2017) Smartphone (pantalla tÃ¡ctil de 4,7 pulgadas (12,04 cm), 16 GB de memoria, Android 6.0)', 8.069176470588236), ('Sony Sony, Xperia E5, Smartphone', 8.069176470588236), ('Oukitel 5.5 "OUKITEL U7 Pro HD Schermo IPS 3G Android 5.1 MT6580 Quad Core Smartphone Dual SIM Dual Standby 1G/8G Smart Gesture Wake Movimento HotKnot Telefono Mobile del Cellulare (Oro)', 8.069176470588236), ('Lenovo K6 Smartphone Dual SIM, Display 5 Pollici, LTE, Fotocamera 13 MP, Memoria 16 GB, 2 GB RAM, Gri

NATHAN1914 ->  [('Nokia X2', 8.069176470588236)] 

Robby ->  [('LG 530G Prepaid Phone With Triple Minutes (Tracfone)', 8.069176470588236)] 

francesco ->  [('Samsung Galaxy Mini 2 Smartphone, Display 3.27 Pollici, Bianco [Italia]', 10)] 

huliye ->  [('Sony Ericsson J230Ä° Cep Telefonu', 8.069176470588236)] 

bob ->  [('Huawei Ascend P8 Lite Smartphone, 16 GB, Marchio TIM, Nero', 8.069176470588236)] 

IanMoone ->  [('Samsung Galaxy S6 edge zwart / 32 GB', 8.069176470588236)] 

R rana ->  [('OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 8.069176470588236)] 

Emerson Reed ->  [('Motorola Smartphone Motorola Moto X Desbloqueado Preto Android 4.2.2 CÃ¢mera 10MP e Frontal 2MP MemÃ³ria Interna de 16GB GSM', 8.069176470588236)] 

Martin ->  [('LG Electronics L7 II Sim Free Smartphone', 6.0)] 

ÐÐ»Ð¸Ð½ÐºÐ ->  [('Samsung GT-S5230 Star', 8.069176470588236)] 

Anny Ruch ->  [('Sony Ericsson Xperia ray Smartphone ??cran tactile 8,4 cm (3,3") Appareil photo 8 Mpx Android 2.3 Noir (Import...', 8.


saints fan ->  [('Samsung Eternity A867 Unlocked Phone with Touchscreen, 3MP Camera, GPS and Music Player', 8.069176470588236)] 

MAGOOGF ->  [('Motorola A853', 8.069176470588236)] 

CRATER_300 ->  [('Nokia E50', 8.069176470588236)] 

A R Hartley ->  [('HTC One S Z560E Sim Free Smartphone - Black', 8.069176470588236)] 

NiMi ->  [('Samsung Galaxy J3 (2016) DUOS Smartphone (5,0 Zoll (12,63 cm Touch-Display, 8 GB Speicher, Android 5.1) gold', 8.069176470588236)] 

kismas ->  [('Ð\x9cÐ¾Ð±Ð¸Ð»Ñ\x8cÐ½Ñ\x8bÐ¹ Ñ\x82ÐµÐ»ÐµÑ\x84Ð¾Ð½ Apple iPhone 6s 16Gb A1688 Silver', 8.069176470588236)] 

niall garrity ->  [('Unlocked Cubot BOBBY 5.0 Inch 3G SmartPhone Android 4.2.2 QHD Screen MTK6572W 1.3GHz Dual Core Dual SIM Cell Phone...', 8.069176470588236)] 

birba ->  [('Thuraya SG 2520', 8.069176470588236)] 

thedro1231 ->  [('Dare VX-9700', 8.069176470588236)] 

C. J. Tindall ->  [('Nokia 105 SIM-Free Mobile Phone, Black', 8.069176470588236)] 

shel ->  [('Verizon LG VX7100 Glance Thin Bluetooth Came

Scottyb123  ->  [('Samsung Galaxy S III 16 or 32GB (T-Mobile 4G LTE)', 8.069176470588236)] 

SamsungEdge ->  [('Samsung Galaxy S6 edge zwart / 32 GB', 8.069176470588236)] 

menkmax ->  [('Samsung GT I8000 Omnia II', 8.069176470588236)] 

LolaVologda ->  [('Acer neoTouch', 8.069176470588236)] 

elizabeth hughes ->  [('Microsoft Lumia 650 UK SIM-Free Smartphone - White', 8.069176470588236)] 

Omid ->  [('Samsung A177 Unlocked QuadBand Phone with QWERTY Keyboard and Camera - US Warranty - Black', 8.069176470588236)] 

LawnmowerMan ->  [('Philips Savvy Dual Band', 8.069176470588236)] 

Peter Limbach ->  [('Microsoft Nokia A00021557 130 DS RM-1035 NV DE Handy (4,5 cm (1,8 Zoll) Display, Dual-Sim, portabler Video-/Musikplayer, Radio, Taschenlampe) weiÃ\x9f', 8.069176470588236)] 

Surgeon ->  [('Huawei Ascend D1', 8.069176470588236)] 

deepak soni ->  [('Microsoft Lumia 640 XL (Dual SIM, White)', 8.069176470588236)] 

Vinod S. ->  [('VIVO V3 Max (Gold)', 8.069176470588236)] 

Ð Ð·Ð°ÐµÐ² ÐÐ»Ð

### Q9 - Try other techniques (Example: cross validation) to get better results.

In [41]:
# item based

knn_i_cv = cross_validate(algo1,data, measures=['RMSE'], cv=5, verbose=False)
print('\n Mean knn_i_cv score:', round(knn_i_cv['test_rmse'].mean(),2),'\n')
knn_i_cv

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.

 Mean knn_i_cv score: 2.56 



{'test_rmse': array([2.54115308, 2.62397289, 2.57894421, 2.54775612, 2.51042126]),
 'fit_time': (1.3201382160186768,
  1.514343500137329,
  1.4390156269073486,
  1.0184502601623535,
  1.5131676197052002),
 'test_time': (0.09112119674682617,
  0.07814598083496094,
  0.07189154624938965,
  0.05875563621520996,
  0.05971384048461914)}

In [42]:
# user based

knn_u_cv = cross_validate(algo2,data, measures=['RMSE'], cv=5, verbose=False)
print('\n Mean knn_i_cv score:', round(knn_u_cv['test_rmse'].mean(),2),'\n')
knn_u_cv

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.

 Mean knn_i_cv score: 2.54 



{'test_rmse': array([2.56818934, 2.48105576, 2.50251024, 2.55400235, 2.58655857]),
 'fit_time': (0.9049086570739746,
  1.404658317565918,
  1.22743558883667,
  2.447636365890503,
  1.0814216136932373),
 'test_time': (0.06663966178894043,
  0.042606353759765625,
  0.03126263618469238,
  0.06251835823059082,
  0.046888113021850586)}

- User based model is giving better cv score of 2.53.

### Q10 - In what business scenario you should use popularity based Recommendation Systems ?

#### Popularity based recommendation systems is used when the scenario is to:
- Recommend items viewed/purchased by most people. 
- Recommend based on ranked list of items by their purchase count / viewed count “Popular news”.

#### It uses:
- Text only
- Purchase history
- User and item features
- Scale 

#### The problem with popularity based recommendation system is that the personalization is not available with this method . For eg: for a song recommendation if the model is popularity based recommender, it is not personalised towards any user and will output the same list of recommended songs.

### Q11 - In what business scenario you should use CF based Recommendation Systems ?

- Collaborative filtering, also referred to as social filtering, filters information by using the recommendations of other people. 

- For each user, recommender systems recommend items based on how similar users liked the item.


- For eg:, A person who wants to see a movie for example, might ask for recommendations from friends. The recommendations of some friends who have similar interests are trusted more than recommendations from others. This information is used in the decision on which movie to see.


### Q12 - What other possible methods can you think of which can further improve the recommendation for different users ?

Hybrid filtering approach can be used to further improve the recommendation.

Methods of Hybridization

- Weighted - Recommendations from each system is weighted to calculate final recommendation.

- Switching- System switches between different recommendation model.

- Mixed - Recommendations from different recommenders are presented together.

Multiple recommender systems are combined to improve recommendations.


- Although any type of recommender systems can be combined a common approach in industry is to combine content based approaches and collaborative filtering approaches.

- Content based models can be used to solve the Cold start and Grey sheep problems in Collaborative filtering.