<font size="4"> <b> • DOMAIN: </b>Smartphone, Electronics</font>

<font size="4"> <b> • CONTEXT: </b> India is the second largest market globally for smartphones after China. About 134 million smartphones were sold across India in the year 2017 and is estimated to increase to about 442 million in 2022. India ranked second in the average time spent on mobile web by smartphone users across Asia Pacific. The combination of very high sales volumes and the average smartphone consumer behaviour has made India a very attractive market for foreign vendors. As per Consumer behaviour, 97% of consumers turn to a search engine when they are buying a product vs. 15% who turn to social media. If a seller succeeds to publish smartphones based on user’s behaviour/choice at the right place, there are 90% chances that user will enquire for the same. This Case Study is targeted to build a recommendation system based on individual consumer’s behaviour or choice.</font> 

<font size="4"> <b> • DATA DESCRIPTION: </b>
    
• author : name of the person who gave the rating
    
• country : country the person who gave the rating belongs to
    
• data : date of the rating
    
• domain: website from which the rating was taken from
    
• extract: rating content
    
• language: language in which the rating was given
    
• product: name of the product/mobile phone for which the rating was given
    
• score: average rating for the phone
    
• score_max: highest rating given for the phone
    
• source: source from where the rating was taken
    

    

<font size="4"> <b> • PROJECT OBJECTIVE: </b> We will build a recommendation system using popularity based and collaborative filtering methods to recommend
mobile phones to a user which are most popular and personalised respectively.</font>
    
<font size="4">Steps and tasks: [ Total Score: 60 points]
  
1. Import the necessary libraries and read the provided CSVs as a data frame and perform the below steps.
    
> • Merge the provided CSVs into one data-frame.
    
> • Check a few observations and shape of the data-frame.
    
> • Round off scores to the nearest integers.
    
> • Check for missing values. Impute the missing values if there is any.
    
> • Check for duplicate values and remove them if there is any.
    
> • Keep only 1000000 data samples. Use random state=612.
    
> • Drop irrelevant features. Keep features like Author, Product, and Score.

2. Answer the following questions
    
> • Identify the most rated features.
    
> • Identify the users with most number of reviews.
    
> • Select the data with products having more than 50 ratings and users who have given more than 50 ratings. Report the shape of the final dataset.
    
3. Build a popularity based model and recommend top 5 mobile phones.  
4. Build a collaborative filtering model using SVD. You can use SVD from surprise or build it from scratch(Note: Incase you’re building it from scratch you can limit your data points to 5000 samples if you face memory issues). Build a collaborative filtering model using kNNWithMeans from surprise. You can try both user-based and item-based model.
5. Evaluate the collaborative model. Print RMSE value.
6. Predict score (average rating) for test users.
7. Report your findings and inferences.
8. Try and recommend top 5 products for test users.
9. Try cross validation techniques to get better results.
10. In what business scenario you should use popularity based Recommendation Systems ?
11. In what business scenario you should use CF based Recommendation Systems ?
12. What other possible methods can you think of which can further improve the recommendation for different users ?</font>
 

<font size="5"><p style="color:black"> <b>1. Import the necessary libraries and read the provided CSVs as a data frame and perform the below steps.</p></font>

In [1]:
import pandas as pd
import os
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn import preprocessing
from collections import defaultdict
from surprise import SVD
from surprise import KNNWithMeans
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split

In [2]:
d1 = pd.read_csv('phone_user_review_file_1.csv')
d2 = pd.read_csv('phone_user_review_file_2.csv')
d3 = pd.read_csv('phone_user_review_file_3.csv')
d4 = pd.read_csv('phone_user_review_file_4.csv')
d5 = pd.read_csv('phone_user_review_file_5.csv')
d6 = pd.read_csv('phone_user_review_file_6.csv')

In [3]:
d1.head(2)

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8


In [4]:
d1.shape

(374910, 11)

In [5]:
d2.head(2)

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/leagoo-lead-7/,4/15/2015,en,us,Amazon,amazon.com,2.0,10.0,"The telephone headset is of poor quality , not...",luis,Leagoo Lead7 5.0 Inch HD JDI LTPS Screen 3G Sm...
1,/cellphones/leagoo-lead-7/,5/23/2015,en,gb,Amazon,amazon.co.uk,10.0,10.0,This is my first smartphone so I have nothing ...,Mark Lavin,Leagoo Lead 7 Lead7 MTK6582 Quad core 1GB RAM ...


In [6]:
d2.shape

(114925, 11)

In [7]:
d3.head(2)

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,11/7/2015,pt,br,Submarino,submarino.com.br,6.0,10.0,"recomendo, eu comprei um, a um ano, e agora co...",herlington tesch,Samsung Smartphone Samsung Galaxy S3 Slim G381...
1,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,10/2/2015,pt,br,Submarino,submarino.com.br,10.0,10.0,Comprei um pouco desconfiada do site e do celu...,Luisa Silva Marieta,Samsung Smartphone Samsung Galaxy S3 Slim G381...


In [8]:
d3.shape

(312961, 11)

In [9]:
d4.head(2)

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-s7262-duos-galaxy-ace/,3/11/2015,en,us,Amazon,amazon.com,2.0,10.0,was not conpatable with my phone as stated. I ...,Frances DeSimone,Samsung Galaxy Star Pro DUOS S7262 Unlocked Ce...
1,/cellphones/samsung-s7262-duos-galaxy-ace/,17/11/2015,en,in,Zopper,zopper.com,10.0,10.0,Decent Functions and Easy to Operate Pros:- Th...,Expert Review,Samsung Galaxy Star Pro S7262 Black


In [10]:
d4.shape

(98284, 11)

In [11]:
d5.head(2)

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/karbonn-k1616/,7/13/2016,en,in,91 Mobiles,91mobiles.com,2.0,10.0,I bought 1 month before. currently speaker is ...,venkatesh,Karbonn K1616
1,/cellphones/karbonn-k1616/,7/13/2016,en,in,91 Mobiles,91mobiles.com,6.0,10.0,"I just bought one week back, I have Airtel con...",Venkat,Karbonn K1616


In [12]:
d5.shape

(350216, 11)

In [13]:
d6.head(2)

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-instinct-sph-m800/,9/16/2011,en,us,Phone Arena,phonearena.com,8.0,10.0,I've had the phone for awhile and it's a prett...,ajabrams95,Samsung Instinct HD
1,/cellphones/samsung-instinct-sph-m800/,2/13/2014,en,us,Amazon,amazon.com,6.0,10.0,to be clear it is not the sellers fault that t...,Stephanie,Samsung SPH M800 Instinct


In [14]:
d6.shape

(163837, 11)

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;">1.1 Merge the provided CSVs into one data-frame.

In [15]:
phone = pd.concat([d1,d2,d3,d4,d5,d6],axis=0)
phone.head(2)

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8


<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;">1.2 Check a few observations and shape of the data-frame

In [16]:
phone.shape

(1415133, 11)

In [17]:
phone.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1415133 entries, 0 to 163836
Data columns (total 11 columns):
 #   Column     Non-Null Count    Dtype  
---  ------     --------------    -----  
 0   phone_url  1415133 non-null  object 
 1   date       1415133 non-null  object 
 2   lang       1415133 non-null  object 
 3   country    1415133 non-null  object 
 4   source     1415133 non-null  object 
 5   domain     1415133 non-null  object 
 6   score      1351644 non-null  float64
 7   score_max  1351644 non-null  float64
 8   extract    1395772 non-null  object 
 9   author     1351931 non-null  object 
 10  product    1415132 non-null  object 
dtypes: float64(2), object(9)
memory usage: 129.6+ MB


In [18]:
phone.describe()

Unnamed: 0,score,score_max
count,1351644.0,1351644.0
mean,8.00706,10.0
std,2.616121,0.0
min,0.2,10.0
25%,7.2,10.0
50%,9.2,10.0
75%,10.0,10.0
max,10.0,10.0


<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;">1.3 Round off scores to the nearest integers.

In [19]:
phone.dtypes

phone_url     object
date          object
lang          object
country       object
source        object
domain        object
score        float64
score_max    float64
extract       object
author        object
product       object
dtype: object

In [20]:
phone['score'].unique()

array([10. ,  6. ,  9.2,  4. ,  8. ,  2. ,  9.6,  7.2,  6.8,  9. ,  8.3,
        8.8,  8.4,  5.3,  7. ,  6.4,  7.6,  nan,  5.2,  3.2,  4.4,  2.8,
        5.6,  3.6,  4.8,  1. ,  5. ,  3. ,  2.4,  9.3,  8.5,  9.5,  6.5,
        5.5,  9.8,  8.2,  8.6,  7.8,  9.4,  6.6,  6.2,  7.5,  9.9,  2.7,
        8.7,  6.7,  3.3,  7.7,  7.3,  9.7,  6.3,  7.4,  5.7,  4.7,  4.3,
        5.8,  4.2,  4.5,  2.2,  5.4,  7.9,  3.5,  4.6,  3.7,  2.5,  3.4,
        7.1,  8.1,  1.2,  1.4,  3.8,  9.1,  2.6,  1.6,  1.7,  1.5,  1.8,
        2.3,  6.1,  5.9,  1.3,  0.2,  0.4,  8.9,  6.9,  0.6,  4.9])

In [21]:
phone['score_max'].unique()

array([10., nan])

##### Rounding off the 'score' column values to the nearest integer

In [22]:
phone['score'] = round(phone['score'])
phone['score_max'] = round(phone['score_max'])

In [23]:
phone['score'].unique()

array([10.,  6.,  9.,  4.,  8.,  2.,  7.,  5., nan,  3.,  1.,  0.])

In [24]:
phone['score_max'].unique()

array([10., nan])

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;">1.4 Check for missing values. Impute the missing values if there is any.

In [25]:
def missing_check(df):
    total = df.isnull().sum().sort_values(ascending=False)   # total number of null values
    percent = (df.isnull().sum()/df.isnull().count()).sort_values(ascending=False)  # percentage of values that are null
    missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])  # putting the above two together
    return missing_data # return the dataframe
missing_check(phone)

Unnamed: 0,Total,Percent
score,63489,0.04486433
score_max,63489,0.04486433
author,63202,0.04466153
extract,19361,0.0136814
product,1,7.066474e-07
phone_url,0,0.0
date,0,0.0
lang,0,0.0
country,0,0.0
source,0,0.0


In [26]:
phone.isnull().values.sum()

209542

##### The missing values are imputed with Median

In [27]:
phone = phone.fillna(phone.median())
phone = phone.dropna()
phone.isnull().values.sum()

0

In [28]:
phone.shape

(1336416, 11)

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;">1.5 Check for duplicate values and remove them if there is any

In [29]:
dupe = phone.duplicated()
sum(dupe)

4823

In [30]:
phone = phone.drop_duplicates()
phone.shape

(1331593, 11)

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;">1.6 Keep only 1000000 data samples. Use random state=612.

In [31]:
phone1 = phone.sample(n=1000000, random_state=612)

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;">1.7 Drop irrelevant features. Keep features like Author, Product, and Score

##### Irrelavent features are those features that do not contribute to the rating given by the users 

In [32]:
phone1.dtypes

phone_url     object
date          object
lang          object
country       object
source        object
domain        object
score        float64
score_max    float64
extract       object
author        object
product       object
dtype: object

In [33]:
phone1.drop(['phone_url','date','lang','country','source','domain','score_max','extract'], axis = 1, inplace = True)

In [34]:
phone1.dtypes

score      float64
author      object
product     object
dtype: object

In [35]:
phone1 = phone1.reset_index()
phone1

Unnamed: 0,index,score,author,product
0,8765,10.0,Kdotj15,Samsung Galaxy S7 edge 32GB (Sprint)
1,233365,10.0,Cliente Amazon,Asus ZE551ML-2A760WW Smartphone ZenFone 2 Delu...
2,145859,10.0,ron,טלפון סלולרי Huawei Mate S 32GB
3,302180,8.0,katha_maria93,Sony Ericsson W395 blush titanium Handy
4,304586,2.0,paul george,Apple iPhone 3G 8GB SIM-Free - Black
...,...,...,...,...
999995,351525,10.0,irene,"Samsung G900 Galaxy S5 Smartphone, 16 GB, Nero..."
999996,260069,10.0,anaid96,Sony Ericsson Aino
999997,296533,8.0,Shrijith Menon,HTC One A9 (Carbon Grey)
999998,41165,6.0,Ms. C. M. Nichols,Sony Xperia Z3 Compact UK SIM-Free Smartphone ...


In [36]:
phone1.drop('index',axis = 1,inplace = True)

In [37]:
phone1

Unnamed: 0,score,author,product
0,10.0,Kdotj15,Samsung Galaxy S7 edge 32GB (Sprint)
1,10.0,Cliente Amazon,Asus ZE551ML-2A760WW Smartphone ZenFone 2 Delu...
2,10.0,ron,טלפון סלולרי Huawei Mate S 32GB
3,8.0,katha_maria93,Sony Ericsson W395 blush titanium Handy
4,2.0,paul george,Apple iPhone 3G 8GB SIM-Free - Black
...,...,...,...
999995,10.0,irene,"Samsung G900 Galaxy S5 Smartphone, 16 GB, Nero..."
999996,10.0,anaid96,Sony Ericsson Aino
999997,8.0,Shrijith Menon,HTC One A9 (Carbon Grey)
999998,6.0,Ms. C. M. Nichols,Sony Xperia Z3 Compact UK SIM-Free Smartphone ...


<font size="5"><p style="color:black"> <b>2. Answer the following questions</p></font>

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;">2.1 Identify the most rated features.

In [38]:
phone1['product'] = phone1['product'].str[:45]

In [39]:
phone1.groupby('product')['score'].mean().sort_values(ascending=False).head(15)

product
Huawei Ascend Y100 Mobile Phone O2 Pay As You    10.0
BlackBerry Passport 4.5-inch SIM-Free Smartph    10.0
LG Google Nexus 5X ( 32GB, Blanco Cuarzo ) EU    10.0
BlackBerry Passport 32 GB                        10.0
BlackBerry Passport 32 GB, Handy                 10.0
BlackBerry Passport 32GB 4G Plata - Smartphon    10.0
LG Google Nexus 5 – Smartphone sbloccato, sch    10.0
BlackBerry Passport 32GB NFC LTE Smartphone C    10.0
BlackBerry Passport Black                        10.0
LG Google Nexus 5X (16GB, quartz)                10.0
BlackBerry Passport QWERTY 4.5-Inch SIM-Free     10.0
BlackBerry Passport QWERTY Black                 10.0
BlackBerry Passport QWERTY Red                   10.0
BlackBerry Passport QWERTY White                 10.0
LG Google Nexus 5 D821 16GB Black                10.0
Name: score, dtype: float64

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;">2.2 Identify the users with most number of reviews.

##### Top 15 Users with most number of reviews 

In [40]:
phone1['author'].value_counts().sort_values(ascending=False).head(15)

Amazon Customer    57801
Cliente Amazon     14656
e-bit               6260
Client d'Amazon     5715
Amazon Kunde        3563
Anonymous           1968
einer Kundin        1953
einem Kunden        1432
unknown             1283
Anonymous           1096
David                751
Александр            746
Alex                 655
Сергей               653
Marco                646
Name: author, dtype: int64

##### Top 15 products with most number of reviews 

In [41]:
phone1['product'].value_counts().sort_values(ascending=False).head(15)

Lenovo Vibe K4 Note (White,16GB)                 3913
Lenovo Vibe K4 Note (Black, 16GB)                3228
OnePlus 3 (Graphite, 64 GB)                      3127
OnePlus 3 (Soft Gold, 64 GB)                     2643
Huawei P8lite zwart / 16 GB                      1994
Samsung Galaxy Express I8730                     1982
Lenovo Vibe K5 (Gold, VoLTE update)              1865
Huawei P9 Lite Smartphone, LTE, Display 5.2''    1821
Samsung Galaxy S6 zwart / 32 GB                  1729
Lenovo Vibe K5 (Grey, VoLTE update)              1596
Lenovo Used Lenovo Zuk Z1 (Space Grey, 64GB)     1453
OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)     1424
Samsung Galaxy J3 (8GB)                          1393
Samsung Galaxy S7 edge 32GB (Verizon)            1347
Nokia N95                                        1337
Name: product, dtype: int64

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;">2.3 Select the data with products having more than 50 ratings and users who have given more than 50 ratings. Report the shape of the final
dataset.

##### Users that have given more than 50 ratings for the products

In [42]:
phone2 = pd.DataFrame(columns=['author', 'a_count'])
phone2['author']=phone1['author'].value_counts().index.tolist() 
phone2['a_count'] = list(phone1['author'].value_counts() > 50)

In [43]:
index_names = phone2[ phone2['a_count'] == False ].index 

In [44]:
phone2.drop(index_names, inplace = True) 

In [45]:
phone2.value_counts().sum()

690

In [46]:
phone2.head(15)

Unnamed: 0,author,a_count
0,Amazon Customer,True
1,Cliente Amazon,True
2,e-bit,True
3,Client d'Amazon,True
4,Amazon Kunde,True
5,Anonymous,True
6,einer Kundin,True
7,einem Kunden,True
8,unknown,True
9,Anonymous,True


- There are 690 users that have given over 50 ratings for products out of 1 million which is 0.069% of the entire database

##### Products that have more than 50 ratings given by users

In [47]:
phone3 = pd.DataFrame(columns=['product', 'a_count'])
phone3['product']=phone1['product'].value_counts().index.tolist() 
phone3['a_count'] = list(phone1['product'].value_counts() > 50)

In [48]:
index_names = phone3[ phone3['a_count'] == False ].index 

In [49]:
phone3.drop(index_names, inplace = True) 

In [50]:
phone3.value_counts().sum()

4568

In [51]:
phone3.head(15)

Unnamed: 0,product,a_count
0,"Lenovo Vibe K4 Note (White,16GB)",True
1,"Lenovo Vibe K4 Note (Black, 16GB)",True
2,"OnePlus 3 (Graphite, 64 GB)",True
3,"OnePlus 3 (Soft Gold, 64 GB)",True
4,Huawei P8lite zwart / 16 GB,True
5,Samsung Galaxy Express I8730,True
6,"Lenovo Vibe K5 (Gold, VoLTE update)",True
7,"Huawei P9 Lite Smartphone, LTE, Display 5.2''",True
8,Samsung Galaxy S6 zwart / 32 GB,True
9,"Lenovo Vibe K5 (Grey, VoLTE update)",True


- There are 4568 products with over 50 ratings given by users out of 1 million which is 0.45% of the entire database

<font size="5"><p style="color:black"> <b>3. Build a popularity based model and recommend top 5 mobile phones.</p></font>

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> Popularity Based Recommender System

In [52]:
ratings_mean_count = pd.DataFrame(phone1.groupby('product')['score'].mean())
ratings_mean_count['rating_counts'] = pd.DataFrame(phone1.groupby('product')['score'].count()) 
ratings_mean_count.sort_values(by=['score','rating_counts'], ascending=[False,False]).head(15)

Unnamed: 0_level_0,score,rating_counts
product,Unnamed: 1_level_1,Unnamed: 2_level_1
Samsung Galaxy Note5,10.0,144
Motorola Smartphone Motorola Novo Moto G DTV,10.0,119
Apple iPhone 4S Branco 8GB - Apple,10.0,116
Samsung Smartphone Galaxy Win Duos Branco Des,10.0,116
"Smartphone Samsung Galaxy S7 Edge G935F, Octa",10.0,37
"ZTE Axon 7 64GB Smartphone (Unlocked, Ion Gol",10.0,32
BlackBerry Passport QWERTY 4.5-Inch SIM-Free,10.0,28
"Smartphone Samsung Galaxy S7 G930F, Octa Core",10.0,26
Smartphone Samsung Galaxy J7 Metal SM-J710MN/,10.0,25
Smartphone Asus Zenfone 2 ZE551ML - 6C542WW c,10.0,23


- Here, the recommender system is based on the smartphone with the most number of ratings and highest average scores

<font size="5"><p style="color:black"> <b>4. Build a collaborative filtering model using SVD. You can use SVD from surprise or build it from scratch</p></font>
    
<font size="5">(Note: Incase you’re building it from scratch you can limit your data points to 5000 samples if you face memory issues). </font>
    
    
<font size="5"><p style="color:black"> <b>Build a collaborative filtering model using kNNWithMeans from surprise. You
can try both user-based and item-based model.</p></font>

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 4.1 Collaborative filtering model with SVD (singular value decomposition)

In [53]:
columns_titles = ['author','product','score']
DS_phone = phone.reindex(columns=columns_titles)

In [54]:
DS_phone['product'] = DS_phone['product'].str[:25]
DS_phone['product']

0                 Samsung Galaxy S8
1                 Samsung Galaxy S8
2         Samsung Galaxy S8 (64GB) 
3         Samsung Galaxy S8 64GB (A
4                 Samsung Galaxy S8
                    ...            
163832      Alcatel Club Plus Handy
163833      Alcatel Club Plus Handy
163834      Alcatel Club Plus Handy
163835      Alcatel Club Plus Handy
163836      Alcatel Club Plus Handy
Name: product, Length: 1331593, dtype: object

In [55]:
DS_phone.shape

(1331593, 3)

In [56]:
DS_phone.head(5)

Unnamed: 0,author,product,score
0,CarolAnn35,Samsung Galaxy S8,10.0
1,james0923,Samsung Galaxy S8,10.0
2,R. Craig,Samsung Galaxy S8 (64GB),6.0
3,Buster2020,Samsung Galaxy S8 64GB (A,9.0
4,S Ate Mine,Samsung Galaxy S8,4.0


#### Limiting the datasamples to 5000 to prevent memory issues

In [57]:
DS_phone1 = DS_phone.sample(n=5000, random_state=5)

In [58]:
DS_phone1 = DS_phone1.reset_index()
DS_phone1 = DS_phone1.drop('index',axis= 1)
DS_phone1

Unnamed: 0,author,product,score
0,Karthik Shankar,Gionee Elife E3 (Blue),8.0
1,A. Fieber,Microsoft Nokia 5320 Xpre,4.0
2,Jeannot lapin,SAMSUNG - Smartphone Gala,10.0
3,Amazon Customer,"Lenovo Vibe K5 (Grey, VoL",2.0
4,DonkeyInSpace,Huawei Ascend Y200 Androi,2.0
...,...,...,...
4995,яицкая наташа,Samsung Galaxy Y GT-S5360,10.0
4996,Claudette Jackson,Samsung Galaxy S III T999,10.0
4997,Amazon Customer,Lenovo A7000 (Black),2.0
4998,Блаблабла БлаБлабла,Samsung Galaxy S II Plus,10.0


In [59]:
DS_phone1.shape

(5000, 3)

In [60]:
DS_phone1.head(5)

Unnamed: 0,author,product,score
0,Karthik Shankar,Gionee Elife E3 (Blue),8.0
1,A. Fieber,Microsoft Nokia 5320 Xpre,4.0
2,Jeannot lapin,SAMSUNG - Smartphone Gala,10.0
3,Amazon Customer,"Lenovo Vibe K5 (Grey, VoL",2.0
4,DonkeyInSpace,Huawei Ascend Y200 Androi,2.0


In [61]:
DS_phone1['product'].unique()

array(['Gionee Elife E3 (Blue)', 'Microsoft Nokia 5320 Xpre',
       'SAMSUNG - Smartphone Gala', ..., 'Samsung Galaxy S III T999',
       'Samsung Galaxy S II Plus ', 'Sony Xperia Z Ultra (Blac'],
      dtype=object)

In [62]:
DS_phone1['author'].unique()

array(['Karthik Shankar', 'A. Fieber', 'Jeannot lapin', ...,
       'Claudette Jackson', 'Блаблабла БлаБлабла', 'vikram'], dtype=object)

#### Since the columns are strings,there are duplicates and no missing values in Item or User. Hence, we cannot make a pivot or matrix for the Collaborative filtering. With Reader function from Surprise library, we can segregate the user from item for the collaborative filtering


In [63]:
reader = Reader(rating_scale=(1, 10))
d1 = Dataset.load_from_df(DS_phone1,reader = reader)

##### Trainset is constructed from the dataset 'DS_phone1'

In [64]:
train = d1.build_full_trainset()

#### User ratings in Train set

In [65]:
train.ur

defaultdict(list,
            {0: [(0, 8.0)],
             1: [(1, 4.0)],
             2: [(2, 10.0)],
             3: [(3, 2.0),
              (15, 10.0),
              (18, 8.0),
              (25, 2.0),
              (73, 8.0),
              (76, 8.0),
              (78, 10.0),
              (113, 8.0),
              (126, 10.0),
              (155, 8.0),
              (190, 2.0),
              (196, 2.0),
              (217, 8.0),
              (218, 2.0),
              (222, 10.0),
              (225, 10.0),
              (236, 10.0),
              (250, 8.0),
              (275, 2.0),
              (76, 10.0),
              (310, 10.0),
              (358, 2.0),
              (246, 2.0),
              (382, 2.0),
              (400, 2.0),
              (318, 8.0),
              (429, 10.0),
              (439, 8.0),
              (491, 8.0),
              (543, 2.0),
              (547, 10.0),
              (551, 4.0),
              (554, 4.0),
              (559, 2.0),
         

#### Item ratings in Train set

In [66]:
train.ir

defaultdict(list,
            {0: [(0, 8.0)],
             1: [(1, 4.0)],
             2: [(2, 10.0)],
             3: [(3, 2.0),
              (3, 4.0),
              (1151, 10.0),
              (3, 6.0),
              (3, 10.0),
              (3, 6.0),
              (3, 2.0),
              (3382, 8.0)],
             4: [(4, 2.0)],
             5: [(5, 9.0), (4144, 10.0), (4306, 10.0)],
             6: [(6, 10.0)],
             7: [(7, 10.0)],
             8: [(8, 6.0)],
             9: [(9, 10.0), (534, 8.0), (3875, 10.0)],
             10: [(10, 2.0)],
             11: [(11, 9.0),
              (385, 10.0),
              (871, 10.0),
              (1022, 10.0),
              (1735, 8.0),
              (2602, 10.0),
              (3424, 9.0)],
             12: [(12, 2.0)],
             13: [(13, 2.0), (999, 9.0)],
             14: [(14, 8.0)],
             15: [(3, 10.0)],
             16: [(15, 8.0)],
             17: [(16, 9.0), (84, 9.0), (1949, 9.0)],
             18: [(3, 8.0)],

#### Singular Value Decomposition (SVD) 

In [67]:
svd = SVD()
svd.fit(train)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x22baef14760>

##### All values that are not present in train dataset can be recalled using the 'build_anti_testset() function'

In [68]:
test = train.build_anti_testset()

#### Predicting the test dataset for test users

In [69]:
predict = svd.test(test)

##### Creating dataframe for the predicted test users

In [110]:
ds = pd.DataFrame(predict, columns=['uid', 'iid', 'rui', 'est', 'details'])
ds['uid'].unique()

array(['Karthik Shankar', 'A. Fieber', 'Jeannot lapin', ...,
       'Claudette Jackson', 'Блаблабла БлаБлабла', 'vikram'], dtype=object)

##### Finding the top 10 best and worst predictions from collaborative filtering with SVD

In [77]:
ds['err'] = abs(ds.est - ds.rui)
best_predictions = ds.sort_values(by='err')[:10]
worst_predictions = ds.sort_values(by='err')[-10:]

In [78]:
best_predictions

Unnamed: 0,uid,iid,rui,est,details,err
12352052,Nonie17,LG G2 Verizon VS980,7.988,7.988,{'was_impossible': False},7.640353e-09
8752045,vishwas,HTC One Smartphone con Di,7.988,7.988,{'was_impossible': False},9.596615e-08
4111099,pudding40,Micromax Canvas HD A116 (,7.988,7.988,{'was_impossible': False},1.16059e-07
12946566,L,Blu Win HD LTE (Grey),7.988,7.988,{'was_impossible': False},1.268625e-07
9181957,things4methings4you,Lenovo Moto G4 - Smartpho,7.988,7.988,{'was_impossible': False},1.843304e-07
2949628,Sunrise007,Samsung (936) Galaxy S Du,7.988,7.988,{'was_impossible': False},1.886783e-07
3444871,persona2010,MM-535,7.988,7.988,{'was_impossible': False},1.920643e-07
4567796,D00M5D4Y,"Nokia X3 Handy (Ovi, UKW",7.988,7.988,{'was_impossible': False},1.934282e-07
10430344,menkmax,Telit GM 830,7.988,7.988,{'was_impossible': False},2.204145e-07
13470378,bbb_forever,Motorola A1200,7.988,7.988,{'was_impossible': False},2.506544e-07


- The best prediction only has predicted value of 7.988 even though there are values with 9 and above. Is because, this is the most closest to the actual value.

In [79]:
worst_predictions

Unnamed: 0,uid,iid,rui,est,details,err
11680,Amazon Customer,ZTE Nubia Z5S Mini LTE NX,7.988,5.36748,{'was_impossible': False},2.62052
667984,Client d'Amazon,Microsoft Nokia 6131 blac,7.988,5.331256,{'was_impossible': False},2.656744
669640,Client d'Amazon,Nokia Lumia 521 T-Mobile,7.988,5.3282,{'was_impossible': False},2.6598
670243,Client d'Amazon,Celular Motorola EX117,7.988,5.316927,{'was_impossible': False},2.671073
669440,Client d'Amazon,BlackBerry Curve 8310 Unl,7.988,5.221385,{'was_impossible': False},2.766615
13329,Amazon Customer,Blackberry Q5 Smartphone,7.988,5.213661,{'was_impossible': False},2.774339
667854,Client d'Amazon,HTC Desire 620G (Santroni,7.988,5.207254,{'was_impossible': False},2.780746
668560,Client d'Amazon,HTC Desire 820 (Santorini,7.988,5.16794,{'was_impossible': False},2.82006
10560,Amazon Customer,HTC Desire 620G (Santroni,7.988,5.132614,{'was_impossible': False},2.855386
10419,Amazon Customer,Microsoft Nokia Lumia 630,7.988,4.130223,{'was_impossible': False},3.857777


- For the worst predictions, the predicted values are far worse than the actual values

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 4.2 collaborative filtering model using kNNWithMeans from surprise for Item based model

In [80]:
reader = Reader(rating_scale=(1, 10))
d2 = Dataset.load_from_df(DS_phone1,reader = reader)

In [81]:
train1, test1= train_test_split(d2, test_size= 0.3)

#### Setting Userbased as 'False' sets it to Item-Item Collaborative Filtering

In [82]:
svd = KNNWithMeans(k=10, sim_options={'name': 'pearson_baseline', 'user_based': False})
svd.fit(train1)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x22bea2c7e20>

In [84]:
test_pred = svd.test(test1)

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 4.3 Collaborative filtering model using kNNWithMeans from surprise for User based model

In [88]:
reader = Reader(rating_scale=(1, 10))
d3 = Dataset.load_from_df(DS_phone1,reader = reader)

In [89]:
train2, test2= train_test_split(d3, test_size= 0.3)

#### Setting Userbased as 'True' sets it to User-User Collaborative Filtering

In [90]:
svd = KNNWithMeans(k=10, sim_options={'name': 'pearson_baseline', 'user_based': True})
svd.fit(train1)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x22b9ae57220>

##### Now add input for UserID and ItemID and then use it for user based recommendations

In [92]:
uid = 'Andi'
iid = 'Sony Xperia Z1 Compact D5'

In [93]:
pred = svd.predict(uid, iid, verbose=True)

user: Andi       item: Sony Xperia Z1 Compact D5 r_ui = None   est = 8.02   {'was_impossible': True, 'reason': 'User and/or item is unknown.'}


In [99]:
test_pred1 = svd.test(test2)

- From the above prediction, we can understand that the users with similar interests were outputted. This helps to recommend the user of interest 

<font size="5"><p style="color:black"> <b>5. Evaluate the collaborative model. Print RMSE value.</p></font>


<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 5.1 RMSE of Collaborative filtering model with SVD

In [103]:
accuracy.rmse(predict, verbose=True)

RMSE: 0.3476


0.3476316448039195

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 5.2 RMSE of Collaborative filtering model using kNNWithMeans from surprise for Item based model

In [97]:
accuracy.rmse(test_pred, verbose=True)

RMSE: 2.7128


2.7127828532055154

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 5.3 RMSE of Collaborative filtering model using kNNWithMeans from surprise for User based model

In [98]:
accuracy.rmse(test_pred1, verbose=True)

RMSE: 1.5364


1.5364215563456562

<font size="5"><p style="color:black"> <b>6. Predict score (average rating) for test users.</p></font>

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 6.1 Average Predicted Score for Collaborative filtering model with SVD for Users

In [70]:
predict

[Prediction(uid='Karthik Shankar', iid='Microsoft Nokia 5320 Xpre', r_ui=7.988, est=7.769965916212418, details={'was_impossible': False}),
 Prediction(uid='Karthik Shankar', iid='SAMSUNG - Smartphone Gala', r_ui=7.988, est=8.064366879556255, details={'was_impossible': False}),
 Prediction(uid='Karthik Shankar', iid='Lenovo Vibe K5 (Grey, VoL', r_ui=7.988, est=7.846300847358594, details={'was_impossible': False}),
 Prediction(uid='Karthik Shankar', iid='Huawei Ascend Y200 Androi', r_ui=7.988, est=7.279919520784936, details={'was_impossible': False}),
 Prediction(uid='Karthik Shankar', iid='Samsung N7100 Galaxy Note', r_ui=7.988, est=8.401332788988668, details={'was_impossible': False}),
 Prediction(uid='Karthik Shankar', iid='Lenovo Moto G4 Play - Sma', r_ui=7.988, est=8.075767452490656, details={'was_impossible': False}),
 Prediction(uid='Karthik Shankar', iid='Apple iPhone SE 64 Go 4" ', r_ui=7.988, est=8.22598322293534, details={'was_impossible': False}),
 Prediction(uid='Karthik Sha

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 6.1 Average Predicted Score for Collaborative filtering model using kNNWithMeans from surprise for Item based model

In [100]:
test_pred

[Prediction(uid='Cliente de Amazon', iid='Elephone P8000 - Smartpho', r_ui=6.0, est=8.024, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Wesley0264 ', iid='Samsung Galaxy S7 edge 32', r_ui=10.0, est=8.024, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Andi', iid='Sony Xperia Z1 Compact D5', r_ui=6.0, est=8.024, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='SIO_SET', iid='Motorola i776', r_ui=10.0, est=8.024, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Amazon Customer', iid='Micromax YU YU Yureka Plu', r_ui=8.0, est=7.76113484126683, details={'actual_k': 10, 'was_impossible': False}),
 Prediction(uid='Mrs. S. J. Parkes', iid='HTC 10 Sim Free Smartphon', r_ui=10.0, est=8.024, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Buecherjule', iid='Nokia 5200

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 6.2 Average Predicted Score for Collaborative filtering model using kNNWithMeans from surprise for User based model

In [101]:
test_pred1

[Prediction(uid='Anonymous ', iid='Samsung Galaxy S7 32GB (S', r_ui=10.0, est=10, details={'actual_k': 1, 'was_impossible': False}),
 Prediction(uid='krise20', iid='Samsung GT S8300 Ultra To', r_ui=8.0, est=8.0, details={'actual_k': 1, 'was_impossible': False}),
 Prediction(uid='lenusia9829', iid='Sony Ericsson W910i', r_ui=6.0, est=8.024, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Zákazník', iid='ASUS ZenFone 5 A500KL 16G', r_ui=10.0, est=10, details={'actual_k': 1, 'was_impossible': False}),
 Prediction(uid='Maico80', iid='Samsung Galaxy S6 zwart /', r_ui=9.0, est=9.0, details={'actual_k': 1, 'was_impossible': False}),
 Prediction(uid='Vinayak K.', iid='Micromax Canvas A1 AQ4502', r_ui=10.0, est=10, details={'actual_k': 1, 'was_impossible': False}),
 Prediction(uid='Wogenwolf', iid='Microsoft Nokia Lumia 800', r_ui=10.0, est=10, details={'actual_k': 1, 'was_impossible': False}),
 Prediction(uid='ilmir-n', iid='Sony Xperia P', r_ui=6.

<font size="5"><p style="color:black"> <b>7. Report your findings and inferences. </p></font>

<font size="3"> From the Three different recommendation systems :

> 1 - Collaborative Filtering with SVD
    
> 2 - Collaborative Filtering using kNNWithMeans from surprise for Item based model
    
> 3 - Collaborative Filtering using kNNWithMeans from surprise for User based model

The above recommendation systems have predicted Users/Items respectively. The following are the observations and inferences:
    
>1 - From Collaborative Filtering with SVD : The respective ratings were predicted for the following test users: 'Karthik Shankar', 'A. Fieber', 'Jeannot lapin','Claudette Jackson', 'Блаблабла БлаБлабла','vikram. The top 3 best predictions were for: 'Nonie17','vishwas','pudding40' with the closest to original rating. The RMSE was found to be 0.3476. The best predicted score is 7.988.
 
>2 - From Collaborative Filtering using kNNWithMeans from surprise for Item based model : The RMSE was found to be 2.7128. The average predicted score is 8.024.
    
>3 - From Collaborative Filtering using kNNWithMeans from surprise for User based model : The RMSE was found to be 1.5364. The sample user is 'Andi' and item is 'Sony Xperia Z1 Compact D5'. The 'test_pred1' shows the list of recommended items/phones.
    
</font>


<font size="5"><p style="color:black"> <b>8. Try and recommend top 5 products for test users. </p></font>

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 8.1 Using loop to retreive the item and the top 5 predicted rating by the user

In [71]:
def get_pred(predictions, n =5):
    pred_n = defaultdict(list)
    for uid, iid,true_r, est, _ in predictions:
        pred_n[uid].append((iid,est))
        
    for uid, user_ratings in pred_n.items():
        user_ratings.sort(key=lambda x: x[1],reverse=True)
        pred_n[uid] = user_ratings[:n]

    return pred_n

In [72]:
pred_n = get_pred(predict, n = 5)

In [73]:
pred_n

defaultdict(list,
            {'Karthik Shankar': [('Samsung Galaxy S7 edge 32',
               9.555474617756078),
              ('Nokia 5800', 9.145265641192088),
              ('Smartphone Samsung Galaxy', 9.006670338864144),
              ('Motorola Smartphone Motor', 8.931689659018149),
              ('Samsung Smartphone Samsun', 8.929392201823577)],
             'A. Fieber': [('Samsung Galaxy S7 edge 32', 9.126426309325302),
              ('Nokia 5800', 8.679693063971987),
              ('Asus Zenfone 2 Laser ZE55', 8.58754118945116),
              ('Samsung Smartphone Samsun', 8.507790903404452),
              ('Honor 7 Smartphone débloq', 8.505328041969436)],
             'Jeannot lapin': [('Samsung Galaxy S7 edge 32',
               9.500340115506226),
              ('Motorola Smartphone Motor', 9.16779287985926),
              ('Samsung Galaxy S6 Edge G9', 9.118048389168393),
              ('Nokia 5800', 9.095094794992274),
              ('Asus Zenfone 2 Laser ZE55', 9.092092

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 8.2 Using loop to retrieve the recommended item for each users based on the ratings given for the items

In [74]:
# Print the recommended items for each user
for uid, user_ratings in pred_n.items():
    print(uid,[iid for (iid,_)in user_ratings])

Karthik Shankar ['Samsung Galaxy S7 edge 32', 'Nokia 5800', 'Smartphone Samsung Galaxy', 'Motorola Smartphone Motor', 'Samsung Smartphone Samsun']
A. Fieber ['Samsung Galaxy S7 edge 32', 'Nokia 5800', 'Asus Zenfone 2 Laser ZE55', 'Samsung Smartphone Samsun', 'Honor 7 Smartphone débloq']
Jeannot lapin ['Samsung Galaxy S7 edge 32', 'Motorola Smartphone Motor', 'Samsung Galaxy S6 Edge G9', 'Nokia 5800', 'Asus Zenfone 2 Laser ZE55']
Amazon Customer ['Nokia Asha 311', 'Samsung Galaxy Ace S5830 ', 'Samsung Galaxy A3 Smartph', 'Samsung SGH G800 Gris tit', 'SONY ERICSSON Vivaz']
DonkeyInSpace ['Samsung Galaxy S7 edge 32', 'Smartphone Samsung Galaxy', 'Samsung B5722', 'Nokia 5800', 'Samsung Galaxy S6 Edge G9']
Василий ['Samsung Galaxy S7 edge 32', 'OnePlus 3 (Graphite, 64 G', 'Nokia 5800', 'Motorola Smartphone Motor', 'Samsung Galaxy S5 16GB (V']
gabriel grado ['Samsung Galaxy S7 edge 32', 'Motorola Smartphone Motor', 'Asus Zenfone 2 Laser ZE55', 'Smartphone Samsung Galaxy', 'Nokia 5800']
Olivi

<font size="5"><p style="color:black"> <b>9. Try cross validation techniques to get better results </p></font>

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 9.1 Cross Validation for Collaborative filtering model with SVD

In [104]:
cross_validate(svd,d1, measures=['RMSE'], cv=3, verbose=False)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


{'test_rmse': array([2.65474502, 2.60085878, 2.66267588]),
 'fit_time': (0.20722627639770508, 0.21970844268798828, 0.20869135856628418),
 'test_time': (0.02037978172302246,
  0.010971546173095703,
  0.010972261428833008)}

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 9.2 Cross Validation for Collaborative filtering model using kNNWithMeans from surprise for Item based model

In [105]:
cross_validate(svd,d2, measures=['RMSE'], cv=3, verbose=False)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


{'test_rmse': array([2.60420083, 2.55589334, 2.69831208]),
 'fit_time': (0.19919228553771973, 0.24020910263061523, 0.2594943046569824),
 'test_time': (0.01562190055847168, 0.01097249984741211, 0.014010190963745117)}

<span style="font-family: Arial; font-weight:bold;font-size:1.6em;color:#0000FF;"> 9.3 Cross Validation for Collaborative filtering model using kNNWithMeans from surprise for User based model

In [106]:
cross_validate(svd,d3, measures=['RMSE'], cv=3, verbose=False)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


{'test_rmse': array([2.61173392, 2.65177687, 2.6430079 ]),
 'fit_time': (0.27725672721862793, 0.25481319427490234, 0.24534273147583008),
 'test_time': (0.014957666397094727,
  0.011972904205322266,
  0.014960527420043945)}

<font size="5"><p style="color:black"> <b>10. In what business scenario you should use popularity based Recommendation Systems </p></font>

 <font size="3">
    
- Recommender systems are the systems that are designed to recommend things to the user based on many different factors. These systems predict the most likely product that the users are most likely to purchase and are of interest to. 
    
- Popularity based recommendation system relies on the popularity,trends and frequency counts of which items were most purchased.  This is the most basic recommendation system which provides generalized recommendation to every user depending on the popularity. Whatever is more popular among the general public that is more likely to be recommended to new customers. The generalized recommendation not personalized is based on count. 

- It is used by Advertising companies for recommending popular items or activities. Popularity based recommendation system is widely used by news websites to show Top Stories with images based on the frequency of viewed. Most media platforms like youtube, twitter, instagram have the feature to show the most popular/trending posts/items. </font>

<font size="5"><p style="color:black"> <b>11. In what business scenario you should use CF based Recommendation Systems ? </p></font>

<font size="3">
    
- Collaborative filtering is used in different fields of application but it is particularly popular in recommender systems, where it aims at providing personalized suggestions to users by predicting their preference towards items or services that have not been considered before.
    
- Collaborative filtering assumes that in order to predict the rating of an item by a user, it is useful to exploit information about other users who have expressed preferences for items similar to this one. Its underlying assumption is that preferences are shared among people with similar interests or tastes in movies, music, etc. Collaborative filtering techniques fall into two major categories: memory-based and model-based.
    
- Most websites like Amazon, YouTube, and Netflix use collaborative filtering along with their hybrid recommendation system.</font>

<font size="5"><p style="color:black"> <b>12. What other possible methods can you think of which can further improve the recommendation for different users ? </p></font>

<font size="3">
    
- Comparing products rather than rating them on an absolute scale, will lead to algorithms that better predict customers preferences. One of the problems with basing recommendations on ratings is that an individual’s rating scale will tend to fluctuate based on mood, mindset, etc. 
    
- The extreme rating given by the user should be disregarded when providing recommendation to reduce the amount of items from recommendation to give the algorithm only preferrable items to work with.
    
- Hybrid recommendation systems are also very useful in improving recommendation systems, as it can be compiled with multiple recommendation systems to increase the functionality, and the performance at recommending to different users. </font>