• DOMAIN: Smartphone, Electronics

• CONTEXT: India is the second largest market globally for smartphones after China. About 134 million smartphones were sold across India in the year 2017 and is estimated to increase to about 442 million in 2022. India ranked second in the average time spent on mobile web by smartphone users across Asia Pacific. The combination of very high sales volumes and the average smartphone consumer behaviour has made India a very attractive market for foreign vendors. As per Consumer behaviour, 97% of consumers turn to a search engine when they are buying a product vs. 15% who turn to social media. If a seller succeeds to publish smartphones based on user’s behaviour/choice at the right place, there are 90% chances that user will enquire for the same. This Case Study is targeted to build a recommendation system based on individual consumer’s behaviour or choice.

• DATA DESCRIPTION:

• author : name of the person who gave the rating

• country : country the person who gave the rating belongs to

• data : date of the rating

• domain: website from which the rating was taken from

• extract: rating content

• language: language in which the rating was given

• product: name of the product/mobile phone for which the rating was given

• score: average rating for the phone

• score_max: highest rating given for the phone

• source: source from where the rating was taken

• PROJECT OBJECTIVE: We will build a recommendation system using popularity based and collaborative filtering methods to recommend mobile phones to a user which are most popular and personalised respectively.

## Steps and tasks:
### **1. Import the necessary libraries and read the provided CSVs as a data frame and perform the below steps.**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 
import os
import chardet

file = "/content/drive/MyDrive/Data Set/Data Set/phone_user_review_file_1.csv"

with open(file, 'rb') as rawdata:
    result = chardet.detect(rawdata.read(100000))
result

{'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}

In [None]:
os.chdir("/content/drive/MyDrive/Data Set")

In [None]:
df1 = pd.read_csv("/content/drive/MyDrive/Data Set/Data Set/phone_user_review_file_1.csv", encoding= 'ISO-8859-1')
df2 = pd.read_csv("/content/drive/MyDrive/Data Set/Data Set/phone_user_review_file_2.csv", encoding= 'ISO-8859-1')
df3 = pd.read_csv("/content/drive/MyDrive/Data Set/Data Set/phone_user_review_file_3.csv", encoding= 'ISO-8859-1')
df4 = pd.read_csv("/content/drive/MyDrive/Data Set/Data Set/phone_user_review_file_4.csv", encoding= 'ISO-8859-1')
df5 = pd.read_csv("/content/drive/MyDrive/Data Set/Data Set/phone_user_review_file_5.csv", encoding= 'ISO-8859-1')
df6 = pd.read_csv("/content/drive/MyDrive/Data Set/Data Set/phone_user_review_file_6.csv", encoding= 'ISO-8859-1')

In [None]:
df2.head(5)

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/leagoo-lead-7/,4/15/2015,en,us,Amazon,amazon.com,2.0,10.0,"The telephone headset is of poor quality , not...",luis,Leagoo Lead7 5.0 Inch HD JDI LTPS Screen 3G Sm...
1,/cellphones/leagoo-lead-7/,5/23/2015,en,gb,Amazon,amazon.co.uk,10.0,10.0,This is my first smartphone so I have nothing ...,Mark Lavin,Leagoo Lead 7 Lead7 MTK6582 Quad core 1GB RAM ...
2,/cellphones/leagoo-lead-7/,4/27/2015,en,gb,Amazon,amazon.co.uk,8.0,10.0,Great phone. Battery life not great but seems ...,tracey,Leagoo Lead 7 Lead7 MTK6582 Quad core 1GB RAM ...
3,/cellphones/leagoo-lead-7/,4/22/2015,en,gb,Amazon,amazon.co.uk,10.0,10.0,Best 90 quid I've ever spent on a smart phone,Reuben Ingram,Leagoo Lead 7 Lead7 MTK6582 Quad core 1GB RAM ...
4,/cellphones/leagoo-lead-7/,4/18/2015,en,gb,Amazon,amazon.co.uk,10.0,10.0,I m happy with this phone.it s very good.thx team,viorel,Leagoo Lead 7 Lead7 MTK6582 Quad core 1GB RAM ...


### A. Merge all the provided CSVs into one dataFrame

In [None]:
phone_data = pd.concat([df1,df2,df3,df4,df5,df6], ignore_index=True)

In [None]:
phone_data.shape

(1415133, 11)

In [None]:
phone_data.head()

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8
2,/cellphones/samsung-galaxy-s8/,5/4/2017,en,us,Amazon,amazon.com,6.0,10.0,Adequate feel. Nice heft. Processor's still sl...,R. Craig,"Samsung Galaxy S8 (64GB) G950U 5.8"" 4G LTE Unl..."
3,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Samsung,samsung.com,9.2,10.0,Never disappointed. One of the reasons I've be...,Buster2020,Samsung Galaxy S8 64GB (AT&T)
4,/cellphones/samsung-galaxy-s8/,5/11/2017,en,us,Verizon Wireless,verizonwireless.com,4.0,10.0,I've now found that i'm in a group of people t...,S Ate Mine,Samsung Galaxy S8


### B. Explore, understand the Data and share at least 2 observations.

In [None]:
phone_data.dtypes

phone_url     object
date          object
lang          object
country       object
source        object
domain        object
score        float64
score_max    float64
extract       object
author        object
product       object
dtype: object

In [None]:
phone_data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
score,1351644.0,8.00706,2.616121,0.2,7.2,9.2,10.0,10.0
score_max,1351644.0,10.0,0.0,10.0,10.0,10.0,10.0,10.0


In [None]:
print(phone_data.isnull().sum())

phone_url        0
date             0
lang             0
country          0
source           0
domain           0
score        63489
score_max    63489
extract      19361
author       63202
product          1
dtype: int64


### C. Round off scores to the nearest integers.

In [None]:
phone_data['score'] = phone_data['score'].round(0).astype('Int64')

In [None]:
phone_data.head()

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8
2,/cellphones/samsung-galaxy-s8/,5/4/2017,en,us,Amazon,amazon.com,6,10.0,Adequate feel. Nice heft. Processor's still sl...,R. Craig,"Samsung Galaxy S8 (64GB) G950U 5.8"" 4G LTE Unl..."
3,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Samsung,samsung.com,9,10.0,Never disappointed. One of the reasons I've be...,Buster2020,Samsung Galaxy S8 64GB (AT&T)
4,/cellphones/samsung-galaxy-s8/,5/11/2017,en,us,Verizon Wireless,verizonwireless.com,4,10.0,I've now found that i'm in a group of people t...,S Ate Mine,Samsung Galaxy S8


### D. Check for missing values. Impute the missing values, if any

In [None]:
# filling the null values in column 'score' and 'score_max' 

phone_data = phone_data.fillna(phone_data.median())

# dropping the null values in columns 'extract' ,'author' and 'product'
phone_data = phone_data.dropna()

  phone_data = phone_data.fillna(phone_data.median())


In [None]:
phone_data.isnull().sum()

phone_url    0
date         0
lang         0
country      0
source       0
domain       0
score        0
score_max    0
extract      0
author       0
product      0
dtype: int64

### E. Check for duplicate values and remove them, if any

In [None]:
phone_data.duplicated().sum()

4823

In [None]:
phone_data = phone_data.drop_duplicates()

In [None]:
print(phone_data.duplicated().sum())
print(phone_data.shape)

0
(1331593, 11)


### F. Keep only 1 Million data samples. Use random state=612

In [None]:
df_1M = phone_data.sample(n = 1000000, random_state= 612)

In [None]:
df_1M.head(5)

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
8765,/cellphones/samsung-galaxy-s7-edge/,5/23/2016,en,us,Samsung,samsung.com,10,10.0,I love this phone. Very fast no problems since...,Kdotj15,Samsung Galaxy S7 edge 32GB (Sprint)
233365,/cellphones/asus-zenfone-2-ze551ml/,2/20/2017,it,it,Amazon,amazon.it,10,10.0,"QualitÃ prezzo davvero ottimo, rispetto ai pi...",Cliente Amazon,Asus ZE551ML-2A760WW Smartphone ZenFone 2 Delu...
145859,/cellphones/huawei-mate-s/,1/14/2017,he,il,Zap.il,zap.co.il,10,10.0,×§× ××ª× ××ª ××××©××¨ ×1500 ×©:× ××...,ron,×××¤×× ×¡××××¨× Huawei Mate S 32GB
1203260,/cellphones/sony-ericsson-w395/,5/28/2009,de,de,Amazon,amazon.de,8,10.0,Ich habe dieses Handy am 30.3. bei amazon erwo...,katha_maria93,Sony Ericsson W395 blush titanium Handy
1205666,/cellphones/apple-iphone-3g/,2/6/2009,en,gb,Amazon,amazon.co.uk,2,10.0,There not unock t to any network and not engli...,paul george,Apple iPhone 3G 8GB SIM-Free - Black


In [None]:
df_1M.shape

(1000000, 11)

### 2. Answer the following questions.
### **A. Identify the most rated products**

In [None]:
df_1M['product'].value_counts().head()

Lenovo Vibe K4 Note (White,16GB)     3913
Lenovo Vibe K4 Note (Black, 16GB)    3228
OnePlus 3 (Graphite, 64 GB)          3127
OnePlus 3 (Soft Gold, 64 GB)         2643
Huawei P8lite zwart / 16 GB          1994
Name: product, dtype: int64

### B. Identify the users with most number of reviews.

In [None]:
print('Users with most number of reviews: \n\n',df_1M['author'].value_counts().head(10))


Users with most number of reviews: 

 Amazon Customer    57801
Cliente Amazon     14656
e-bit               6260
Client d'Amazon     5715
Amazon Kunde        3563
Anonymous           1968
einer Kundin        1953
einem Kunden        1432
unknown             1283
Anonymous           1096
Name: author, dtype: int64


### C. Select the data with products having more than 50 ratings and users who have given more than 50 ratings. Report the shape of the final dataset

In [None]:
#authors/users who gave more than 50 ratings
user_50 = pd.DataFrame(columns=['author', 'count'])
user_50['author']=df_1M['author'].value_counts().index.tolist() 
user_50['count'] = list(df_1M['author'].value_counts() > 50)

In [None]:
#removing False values from count column

false_index = user_50[user_50['count'] == False ].index 
user_50.drop(false_index, inplace = True) 
user_50

Unnamed: 0,author,count
0,Amazon Customer,True
1,Cliente Amazon,True
2,e-bit,True
3,Client d'Amazon,True
4,Amazon Kunde,True
...,...,...
685,Federica,True
686,vesponethebest,True
687,vincent,True
688,Cindy,True


In [None]:
# products with more than 50 ratings
prod_50 = pd.DataFrame(columns=['product', 'p_count'])
prod_50['product']=df_1M['product'].value_counts().index.tolist() 
prod_50['p_count'] = list(df_1M['product'].value_counts() > 50)

In [None]:
false_index = prod_50[prod_50['p_count'] == False ].index 
prod_50.drop(false_index, inplace = True) 
prod_50

Unnamed: 0,product,p_count
0,"Lenovo Vibe K4 Note (White,16GB)",True
1,"Lenovo Vibe K4 Note (Black, 16GB)",True
2,"OnePlus 3 (Graphite, 64 GB)",True
3,"OnePlus 3 (Soft Gold, 64 GB)",True
4,Huawei P8lite zwart / 16 GB,True
...,...,...
4371,Smartphone Samsung Galaxy Core Plus Preto com ...,True
4372,Samsung Verizon Samsung Brightside SCH-U380 - ...,True
4373,"BLU Dash Music II Android 4.4 KK, 3.2MP/VGA - ...",True
4374,Samsung Galaxy S6 SM-G920F 32GB,True


In [None]:
df3 = df_1M[df_1M['product'].isin(prod_50['product'])] 
df_50 = df3[df3['author'].isin(user_50['author'])]
df_50.head

<bound method NDFrame.head of                                    phone_url        date lang country  \
233365   /cellphones/asus-zenfone-2-ze551ml/   2/20/2017   it      it   
537487          /cellphones/apple-iphone-5s/  12/18/2013   en      in   
518771           /cellphones/zte-blade-a452/   4/12/2017   de      de   
353663        /cellphones/samsung-galaxy-s5/    7/5/2014   fr      fr   
224123         /cellphones/motorola-moto-g3/   9/20/2015   en      in   
...                                      ...         ...  ...     ...   
177801           /cellphones/huawei-p8-lite/   2/18/2016   it      it   
505475          /cellphones/nokia-lumia-635/   9/10/2014   en      us   
1170634        /cellphones/samsung-sgh-m150/   3/10/2010   tr      tr   
577011       /cellphones/huawei-ascend-y330/   9/21/2014   es      es   
287871    /cellphones/samsung-galaxy-note-4/  12/21/2014   en      us   

           source        domain  score  score_max  \
233365     Amazon     amazon.it     10  

### 3. Build a popularity based model and recommend top 5 mobile phones

In [None]:
df_1M.groupby('product')['score'].mean().sort_values(ascending = False).head()

product
Motorola U9 PEBL Pink Mobile Phone Unlocked Sim Free                                                                                                                           10.0
Motorola Moto X - Smartphone libre Android (pantalla 5.2", cÃ¡mara 13 Mp, 32 GB, 2 GB RAM), marrÃ³n (importado)                                                                10.0
Sony Xperia T2 Ultra Dual D5322 (White)                                                                                                                                        10.0
Motorola Moto X 2. Generation Smartphone (13,2 cm (5,2 Zoll) Full HD-Display, 13 Megapixel Kamera, Quad-Core Prozessor, 32GB interner Speicher, Android KitKat 4.4.4) weiÃ    10.0
DOOGEE T6 Pro Smartphone 5.5'' 4G Android 6.0 Octa Core 6250mAh di Grande Capienza della Batteria Fast Charge Dual SIM 1.5GHZ 3GB RAM 32GB 13.0MP                              10.0
Name: score, dtype: Float64

In [None]:
df_1M.groupby('product')['score'].count().sort_values(ascending = False).head()

product
Lenovo Vibe K4 Note (White,16GB)     3913
Lenovo Vibe K4 Note (Black, 16GB)    3228
OnePlus 3 (Graphite, 64 GB)          3127
OnePlus 3 (Soft Gold, 64 GB)         2643
Huawei P8lite zwart / 16 GB          1994
Name: score, dtype: int64

In [None]:
score_mean_count = pd.DataFrame(df_1M.groupby('product')['score'].mean())
score_mean_count['score_count'] = pd.DataFrame(df_1M.groupby('product')['score'].count())
score_mean_count.sort_values(by=['score','score_count'], ascending=[False,False]).head()

Unnamed: 0_level_0,score,score_count
product,Unnamed: 1_level_1,Unnamed: 2_level_1
Samsung Galaxy Note5,10.0,144
Motorola Smartphone Motorola Moto X Desbloqueado Preto Android 4.2.2 CÃ¢mera 10MP e Frontal 2MP MemÃ³ria Interna de 16GB GSM,10.0,140
Nokia Smartphone Nokia Lumia 520 Desbloqueado Oi Preto Windows Phone 8 CÃ¢mera 5MP 3G Wi-Fi MemÃ³ria Interna 8G GPS,10.0,135
Samsung Smartphone Dual Chip Samsung Galaxy SIII Duos Desbloqueado Claro Azul Android 4.1 3G/Wi-Fi CÃ¢mera 5MP,10.0,128
Motorola Smartphone Motorola Moto G Dual Chip Desbloqueado TIM Android 4.3 Tela 4.5 8GB 3G Wi-Fi CÃ¢mera 5MP - Preto,10.0,126


### **4. Build a collaborative filtering model using SVD. You can use SVD from surprise or build it from scratch(Note: Incase you’re building it from scratch you can limit your data points to 5000 samples if you face memory issues). Build a collaborative filtering model using kNNWithMeans from surprise. You can try both user-based and item-based model.**
### **SVD**

In [None]:
#importing required libraries
from surprise import SVD
from surprise import Dataset
from surprise import KNNWithMeans
from surprise import accuracy
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split

In [None]:
#arranging columns in order author/user, product, score/rating
columns = ['author','product','score']
df_rs = df_1M.reindex(columns = columns)

In [None]:
df_5K = df_rs.sample(n = 5000, random_state= 612)
df_5K.head()

Unnamed: 0,author,product,score
766097,Amazon Customer,"Lenovo Used Lenovo Zuk Z1 (Space Grey, 64GB)",2
357351,krisheed,Samsung Galaxy J7 Prime 16GB,10
614054,Margarita,Samsung Galaxy Mega 6.3 - Smartphone libre And...,2
507950,Amazon Customer,"Lenovo Vibe K5 (Gold, VoLTE update)",4
33359,Henry P.,"Samsung Galaxy S7 Smartphone, 12,9 cm (5,1 Zol...",10


In [None]:
from surprise import Reader
data = Dataset.load_from_df(df_5K, reader = Reader(rating_scale=(1, 10)))

In [None]:
trainset = data.build_full_trainset()

In [None]:
trainset.ur

defaultdict(list,
            {0: [(0, 2.0),
              (3, 4.0),
              (0, 4.0),
              (16, 6.0),
              (26, 2.0),
              (40, 10.0),
              (3, 6.0),
              (48, 4.0),
              (54, 2.0),
              (72, 2.0),
              (98, 8.0),
              (109, 10.0),
              (120, 10.0),
              (132, 10.0),
              (134, 8.0),
              (159, 6.0),
              (163, 6.0),
              (177, 2.0),
              (3, 6.0),
              (205, 10.0),
              (205, 10.0),
              (287, 2.0),
              (292, 2.0),
              (306, 2.0),
              (178, 8.0),
              (159, 8.0),
              (326, 6.0),
              (341, 10.0),
              (349, 8.0),
              (362, 10.0),
              (363, 4.0),
              (382, 8.0),
              (395, 10.0),
              (163, 2.0),
              (421, 4.0),
              (444, 8.0),
              (26, 10.0),
              (109, 10.0)

In [None]:
algo = SVD()
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f71ec0a5910>

In [None]:
testset = trainset.build_anti_testset()
testset

[('Amazon Customer', 'Samsung Galaxy J7 Prime 16GB', 8.048),
 ('Amazon Customer',
  'Samsung Galaxy Mega 6.3 - Smartphone libre Android (pantalla 6.3", cÃ¡mara 8 Mp, 8 GB, Dual-Core 1.7 GHz, 1.5 GB RAM), blanco (importado)',
  8.048),
 ('Amazon Customer',
  'Samsung Galaxy S7 Smartphone, 12,9 cm (5,1 Zoll) Display, LTE (4G)',
  8.048),
 ('Amazon Customer', 'Nokia 5800', 8.048),
 ('Amazon Customer', 'Sony Ericsson W890i', 8.048),
 ('Amazon Customer',
  'Samsung Wave S8500 Smartphone (Super Amoled Display, Touchscreen, bada-Betriebssystem) metallic-black',
  8.048),
 ('Amazon Customer',
  'Samsung Galaxy Note 4 Smartphone (5,7 Zoll (14,5 cm) Touch-Display, 32 GB Speicher, Android 4.4) weiÃ\x9f',
  8.048),
 ('Amazon Customer',
  'Samsung A157 Unlocked GSM Cell Phone with Internet Browser, 3G Capabilities, SMS & MMS and Speakerphone - Black',
  8.048),
 ('Amazon Customer', 'Nokia N9 (Black) 16GB', 8.048),
 ('Amazon Customer', 'Motorola MOTORAZR2 V9', 8.048),
 ('Amazon Customer', 'Samsung G

In [None]:
prediction = algo.test(testset)
prediction

[Prediction(uid='Amazon Customer', iid='Samsung Galaxy J7 Prime 16GB', r_ui=8.048, est=7.279510285464847, details={'was_impossible': False}),
 Prediction(uid='Amazon Customer', iid='Samsung Galaxy Mega 6.3 - Smartphone libre Android (pantalla 6.3", cÃ¡mara 8 Mp, 8 GB, Dual-Core 1.7 GHz, 1.5 GB RAM), blanco (importado)', r_ui=8.048, est=6.761298670133152, details={'was_impossible': False}),
 Prediction(uid='Amazon Customer', iid='Samsung Galaxy S7 Smartphone, 12,9 cm (5,1 Zoll) Display, LTE (4G)', r_ui=8.048, est=7.907238215974977, details={'was_impossible': False}),
 Prediction(uid='Amazon Customer', iid='Nokia 5800', r_ui=8.048, est=7.521803027515043, details={'was_impossible': False}),
 Prediction(uid='Amazon Customer', iid='Sony Ericsson W890i', r_ui=8.048, est=7.174830083681851, details={'was_impossible': False}),
 Prediction(uid='Amazon Customer', iid='Samsung Wave S8500 Smartphone (Super Amoled Display, Touchscreen, bada-Betriebssystem) metallic-black', r_ui=8.048, est=7.32908750

In [None]:
from collections import defaultdict
def get_top_n(prediction, n=5):
    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in prediction:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

In [None]:
top_n = get_top_n(prediction, n=5)
top_n

defaultdict(list,
            {'Amazon Customer': [('Nokia Lumia 925 16GB NFC LTE - Smartphone libre Windows Phone (pantalla 4.5", cÃ¡mara 8.7 Mp, 16 GB, 1.5 GHz), blanco (importado)',
               9.04728732922704),
              ('Motorola Spice XT300 Unlocked GSM Phone with Android 2.1 OS, 3MP Camera, GPS, Wi-Fi, Bluetooth and FM Radio - Black',
               8.894700181375615),
              ('LG Leon 4G - Smartphone libre Android (pantalla 4.5", cÃ¡mara 5 Mp, 8 GB, Quad-Core 1.2 GHz, 1 GB RAM), titan',
               8.813785422482054),
              ('Micromax Canvas A1 AQ4501 (White)', 8.793688748802573),
              ('Apple iPhone 5c Unlocked Cellphone, 32GB, White',
               8.764260731696655)],
             'krisheed': [('OnePlus 3 (Graphite, 64 GB)', 9.38313281195469),
              ('Samsung Galaxy S7 edge 32GB (AT&T)', 9.055637510258828),
              ('Samsung Galaxy S7 edge 32GB (Verizon)', 9.034937436382302),
              ('Samsung GT-S5230 Star', 8.9452074

In [None]:
# Print the recommended items for each user
for uid, user_ratings in top_n.items():
    print(uid, [iid for (iid, _) in user_ratings])

Amazon Customer ['Nokia Lumia 925 16GB NFC LTE - Smartphone libre Windows Phone (pantalla 4.5", cÃ¡mara 8.7 Mp, 16 GB, 1.5 GHz), blanco (importado)', 'Motorola Spice XT300 Unlocked GSM Phone with Android 2.1 OS, 3MP Camera, GPS, Wi-Fi, Bluetooth and FM Radio - Black', 'LG Leon 4G - Smartphone libre Android (pantalla 4.5", cÃ¡mara 5 Mp, 8 GB, Quad-Core 1.2 GHz, 1 GB RAM), titan', 'Micromax Canvas A1 AQ4501 (White)', 'Apple iPhone 5c Unlocked Cellphone, 32GB, White']
krisheed ['OnePlus 3 (Graphite, 64 GB)', 'Samsung Galaxy S7 edge 32GB (AT&T)', 'Samsung Galaxy S7 edge 32GB (Verizon)', 'Samsung GT-S5230 Star', 'Nokia 6210']
Margarita ['OnePlus 3 (Graphite, 64 GB)', 'Samsung Galaxy S7 edge 32GB (Verizon)', 'Samsung Galaxy S7 edge 32GB (AT&T)', 'Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', 'Nokia 6210']
Henry P. ['OnePlus 3 (Graphite, 64 GB)', 'Samsung Galaxy S7 edge 32GB (Verizon)', "Huawei P9 Lite Smartphone, LTE, Display 5.2'' FHD, Processore Octa-Core Kirin 

In [None]:
print("SVD Model : Test Set")
RMSE_SVD = accuracy.rmse(prediction, verbose=True)
RMSE_SVD

SVD Model : Test Set
RMSE: 0.3350


0.33502169011784844

### Build a collaborative filtering model using kNNWithMeans from surprise. You can try both user-based and item-based model
### **Building Item based**

In [None]:
from surprise import KNNWithMeans
from surprise.model_selection import train_test_split

reader = Reader(rating_scale=(1, 10))
data_item = Dataset.load_from_df(df_5K, reader = Reader(rating_scale=(1, 10)))

In [None]:
trainset_k, testset_k = train_test_split(data_item, test_size=.15)

In [None]:
#building item based first. We can simply switch user_based to True for user based model
algo_item = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': False})
algo_item.fit(trainset_k)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f7035db1610>

In [None]:
test_predict = algo_item.test(testset_k)

In [None]:
test_predict

[Prediction(uid='calesa', iid='Sony Ericsson V630i', r_ui=10.0, est=8.042588235294117, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Samson', iid='Samsung Intensity', r_ui=10.0, est=8.042588235294117, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Herbert Augusto Infante Romero', iid='Lenovo Golden Warrior S8 S898+ 16GB Gold, Dual Sim, 5.3 inch, Unlocked International Model, No Warranty', r_ui=10.0, est=8.042588235294117, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Ð\x9aÐ¸Ñ\x81Ñ\x81Ð° Ð\x9dÐ°Ñ\x82Ð°Ð»Ð¸Ñ\x8f', iid='Sony Ericsson C905', r_ui=10.0, est=8.042588235294117, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='sherrie springer', iid='BLU Zoey II Quadband Unlocked Dual Sim Phone with Camera Bluetooth and Social Networks - Retail Packaging - White Blue', r_ui=2.0, est=8.042588235294117, details

### RMSE for item based model

In [None]:
print("Item-based Model : Test Set")
Itembased_RMSE = accuracy.rmse(test_predict, verbose=True)

Itembased_RMSE

Item-based Model : Test Set
RMSE: 2.6870


2.687020134083868

### **Build a collaborative filtering model using kNNWithMeans from surprise. You can try both user-based and item-based model**
### Building user based



In [None]:
reader = Reader(rating_scale=(1, 10))
data_user = Dataset.load_from_df(df_5K, reader = Reader(rating_scale=(1, 10)))

trainset_u, testset_u = train_test_split(data_user, test_size=.15)

algo_user = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': True})
algo_user.fit(trainset_u)

test_predict_u = algo_user.test(testset_u)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [None]:
test_predict_u

[Prediction(uid='Tracy Minor', iid='BLU Studio 5.0 C HD Quad Core - Unlocked Cell Phone - (Blue)', r_ui=8.0, est=8.03364705882353, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Matt', iid='Motorola Moto X Pure Edition Unlocked Smartphone, 64 GB Black XT1575, 5.7" Quad HD display, 21 MP Camera, Quad-core 1.8GHz', r_ui=6.0, est=8.03364705882353, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Indira Prashant', iid='LG Optimus GT540 Unlocked GSM Quad-Band Phone with 3 MP Camera, Android OS, Touchscreen, Wi-Fi, and Bluetooth-...', r_ui=10.0, est=8.03364705882353, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='bublikovazawuba', iid='LG Optimus One P500', r_ui=8.0, est=8.03364705882353, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Adnan', iid='HTC One (M8) Smartphone (12,7 cm (5 Zoll) LCD-Display, Quad-Co

### RMSE for user based model.

In [None]:
print("user-based Model : Test Set")
userbased_RMSE = accuracy.rmse(test_predict_u, verbose=True)

userbased_RMSE

user-based Model : Test Set
RMSE: 2.4904


2.490406518124238

### 5. Evaluate the collaborative model. Print RMSE

In [None]:
rmse_d = [['SVD', RMSE_SVD], ['Item Based', Itembased_RMSE], ['User based', userbased_RMSE]]

RMSE = pd.DataFrame(rmse_d, columns=['Model', 'RMSE Score'])

RMSE

Unnamed: 0,Model,RMSE Score
0,SVD,0.335022
1,Item Based,2.68702
2,User based,2.490407


### 6. Predict score (average rating) for test users

In [None]:
##SVD
pred_svd_df= pd.DataFrame(prediction, columns=['uid', 'iid', 'rui', 'est', 'details'])
print('average prediction for test users: ',pred_svd_df['est'].mean())
print('average rating  by test users: ',pred_svd_df['rui'].mean())
print('average prediction error for test users: ',(pred_svd_df['rui']-pred_svd_df['est']).abs().mean())

average prediction for test users:  8.057311016478252
average rating  by test users:  8.048000000000252
average prediction error for test users:  0.2684675983461329


In [None]:
##Item based
pred_knn_item= pd.DataFrame(test_predict, columns=['uid', 'iid', 'rui', 'est', 'details'])
print('average prediction for test users: ',pred_knn_item['est'].mean())
print('average rating  by test users: ',pred_knn_item['rui'].mean())
print('average prediction error for test users: ',(pred_knn_item['rui']-pred_knn_item['est']).abs().mean())

average prediction for test users:  7.970442548311643
average rating  by test users:  8.078666666666667
average prediction error for test users:  2.018201669563866


In [None]:
##User based
pred_knn_user= pd.DataFrame(test_predict_u, columns=['uid', 'iid', 'rui', 'est', 'details'])
print('average prediction for test users: ',pred_knn_user['est'].mean())
print('average rating  by test users: ',pred_knn_user['rui'].mean())
print('average prediction error for test users: ',(pred_knn_user['rui']-pred_knn_user['est']).abs().mean())

average prediction for test users:  8.020355009434047
average rating  by test users:  8.129333333333333
average prediction error for test users:  1.91289404764561


### 7. Report your findings and inferences.

### 1.) Most popular product - Samsung Galaxy Note5

### 2.) Most number of reviews: - Amazon Customer posting review actively

3.) Lenovo Vibe K4 Note (White,16GB) was rated by most of the authors/users

### 8. Try and recommend top 5 products for test users

In [None]:
##we have already defined a function above during SVD model to get top 5 predictions for test user. 
##Using same for User based CF testset""""
top_5_rec = get_top_n(test_predict,5)
print('Top 5 recommendations for all test users are: \n')
for key,value in top_5_rec.items(): print(key,'-> ',value,'\n')

Top 5 recommendations for all test users are: 

calesa ->  [('Sony Ericsson V630i', 8.042588235294117)] 

Samson ->  [('Samsung Intensity', 8.042588235294117)] 

Herbert Augusto Infante Romero ->  [('Lenovo Golden Warrior S8 S898+ 16GB Gold, Dual Sim, 5.3 inch, Unlocked International Model, No Warranty', 8.042588235294117)] 

ÐÐ¸ÑÑÐ° ÐÐ°ÑÐ°Ð»Ð¸Ñ ->  [('Sony Ericsson C905', 8.042588235294117)] 

sherrie springer ->  [('BLU Zoey II Quadband Unlocked Dual Sim Phone with Camera Bluetooth and Social Networks - Retail Packaging - White Blue', 8.042588235294117)] 

Jp Roberts ->  [('Nokia 6310i Mobile Phone - Silver', 8.042588235294117)] 

Subir Sarkar ->  [('Lenovo Vibe K4 Note (White,16GB)', 8.042588235294117)] 

Mirko Kocic ->  [('Lenovo Razr V3i Handy (1.2 MP Kamera, MP3-Player) stone grey', 8.042588235294117)] 

Cliente Amazon ->  [('Asus ZenFone 3 Smartphone, Memoria Interna da 64 GB, Dual-SIM, Bianco [Italia]', 10), ('Lenovo Zuk Z1 - Smartphone libre (pantalla 5.5", cÃ¡mara 13 Mp

### 9. Try other techniques (Example: cross validation) to get better results

In [None]:
cross_validate(algo,data, measures=['RMSE'], cv=5, verbose=False)

{'test_rmse': array([2.58286685, 2.53299549, 2.55933605, 2.48194859, 2.62263149]),
 'fit_time': (0.07486534118652344,
  0.07799410820007324,
  0.07658219337463379,
  0.0696563720703125,
  0.07617449760437012),
 'test_time': (0.005811452865600586,
  0.005888938903808594,
  0.005252361297607422,
  0.005097389221191406,
  0.005494594573974609)}

In [None]:
cross_validate(algo_item,data_item, measures=['RMSE'], cv=5, verbose=False)


Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


{'test_rmse': array([2.73466832, 2.56146222, 2.52536826, 2.59013293, 2.56735692]),
 'fit_time': (0.40533995628356934,
  0.26630210876464844,
  0.21066832542419434,
  0.217681884765625,
  0.20723652839660645),
 'test_time': (0.012036323547363281,
  0.011937618255615234,
  0.011735677719116211,
  0.015027046203613281,
  0.011170148849487305)}

In [None]:
cross_validate(algo_user,data_user, measures=['RMSE'], cv=5, verbose=False)


Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


{'test_rmse': array([2.56453629, 2.59580271, 2.59370207, 2.55502335, 2.6103888 ]),
 'fit_time': (0.2838578224182129,
  0.24063587188720703,
  0.2391979694366455,
  0.24200701713562012,
  0.24304413795471191),
 'test_time': (0.006062746047973633,
  0.005977153778076172,
  0.006693601608276367,
  0.006032228469848633,
  0.005863189697265625)}

### **10. In what business scenario you should use popularity based Recommendation Systems ?**
### -Popularity based recommendation systems can be used where a business does not have user preferences available or users past history available such as new users signing up on a website.

### -We can directly recommend popular items in a particular field depnding on the type of business it is.

### For example: A travel agency can use it to sell most popular holiday packages or best hotel in a particular destination provided the user has searched for some location.

### -Music websites/app to recommend popular/trending music tracks.

### -Movie websites/OTTs to recommend popular movies.

### -Shopping websites to recomment popular products in the category they deal in like electronics, cloths etc

### **11. In what business scenario you should use CF based Recommendation Systems ?**

### -To share more personalized reccommendations when user past history preferences are known.

### -Movies and music based on user interest and past history or based on interests of other similar userswith similar interests or history.

### **12. What other possible methods can you think of which can further improve the recommendation for different users ?**

### **Hybrid models can be used to dela with cold start problems/grey sheep area using 2 recommendations systems to come up with better recommendations to the user.**