### DOMAIN: 
Smartphone, Electronics
### CONTEXT: 
India is the second largest market globally for smartphones after China. About 134 million smartphones were sold across India in the year 2017 and is estimated to increase to about 442 million in 2022. India ranked second in the average time spent on mobile web by smartphone users across Asia Pacific. The combination of very high sales volumes and the average smartphone consumer behaviour has made India a very attractive market for foreign vendors. As per Consumer behaviour, 97% of consumers turn to a search engine when they are buying a product vs. 15% who turn to social media. If a seller succeeds to publish smartphones based on user’s behaviour/choice at the right place, there are 90% chances that user will enquire for the same. This Case Study is targeted to build a recommendation system based on individual consumer’s behaviour or choice.
### DATA DESCRIPTION: 
* author : name of the person who gave the rating
* country : country the person who gave the rating belongs to
* data : date of the rating
* domain: website from which the rating was taken from
* extract: rating content
* language: language in which the rating was given
* product: name of the product/mobile phone for which the rating was given
* score: average rating for the phone
* score_max: highest rating given for the phone
* source: source from where the rating was taken

### PROJECT OBJECTIVE:
We will build a recommendation system using popularity based and collaborative filtering methods to recommend mobile phones to a user which are most popular and personalised respectively.

### 1. Import the necessary libraries and read the provided CSVs as a data frame.

In [1]:
# import Libraries
import numpy as np
import pandas as pd

# Library to impute missing values
from sklearn.impute import SimpleImputer

# import suprise Library for Recommendation Systems.
from surprise.model_selection import train_test_split
from surprise import SVD,KNNWithMeans,accuracy,Reader,Dataset
from surprise.model_selection import cross_validate

In [2]:
# read All the datasets.
review_file1 = pd.read_csv('phone_user_review_file_1.csv',encoding="latin")
review_file2 = pd.read_csv('phone_user_review_file_2.csv',encoding="latin")
review_file3 = pd.read_csv('phone_user_review_file_3.csv',encoding="latin")
review_file4 = pd.read_csv('phone_user_review_file_4.csv',encoding="latin")
review_file5 = pd.read_csv('phone_user_review_file_5.csv',encoding="latin")
review_file6 = pd.read_csv('phone_user_review_file_6.csv',encoding="latin")

In [3]:
# Combine all datasets into 1.
files = [review_file1,review_file2,review_file3,review_file4,review_file5,review_file6]
phone_data = pd.concat(files)
phone_data.reset_index(drop=True,inplace=True)

In [4]:
# check for top columns in the data.
phone_data.head()

# observation : Here the data has 11 columns.We can drop Extract,Score_max,phone_url,date columns are not useful for analysis.

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8
2,/cellphones/samsung-galaxy-s8/,5/4/2017,en,us,Amazon,amazon.com,6.0,10.0,Adequate feel. Nice heft. Processor's still sl...,R. Craig,"Samsung Galaxy S8 (64GB) G950U 5.8"" 4G LTE Unl..."
3,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Samsung,samsung.com,9.2,10.0,Never disappointed. One of the reasons I've be...,Buster2020,Samsung Galaxy S8 64GB (AT&T)
4,/cellphones/samsung-galaxy-s8/,5/11/2017,en,us,Verizon Wireless,verizonwireless.com,4.0,10.0,I've now found that i'm in a group of people t...,S Ate Mine,Samsung Galaxy S8


In [5]:
# check for shape of the data
phone_data.shape

(1415133, 11)

In [6]:
# check for columns in the data.
phone_data.columns

Index(['phone_url', 'date', 'lang', 'country', 'source', 'domain', 'score',
       'score_max', 'extract', 'author', 'product'],
      dtype='object')

In [7]:
# check for data types in the data.
phone_data.dtypes

# Observation : Score,Score_max are Float type columns.

phone_url     object
date          object
lang          object
country       object
source        object
domain        object
score        float64
score_max    float64
extract       object
author        object
product       object
dtype: object

In [8]:
# round the scores to Nearest Integers.

phone_data['score'] = np.round(phone_data['score'])

In [9]:
# check for null values in the data
phone_data.isna().sum()

# observation : We can see that the data has some outliers in the Score,Score_max,extract,author columns.

phone_url        0
date             0
lang             0
country          0
source           0
domain           0
score        63489
score_max    63489
extract      19361
author       63202
product          1
dtype: int64

In [10]:
# fill the nullvalues in author using unknown and in score with mean() of the score.
phone_data['author'] = phone_data['author'].fillna('unknown')
phone_data['score'] = phone_data['score'].fillna(int(phone_data['score'].mean()))

In [11]:
# check for duplicate values in the column
duplicate = phone_data[phone_data.duplicated()]
  
print("shape of Duplicate Rows :\n")
duplicate.shape

# Observation : We can see that there are 6436 rows are duplicates.

shape of Duplicate Rows :



(6436, 11)

In [12]:
# drop duplicate rows using drop_duplicate function.

phone_data.drop_duplicates(keep='first',inplace=True)

In [13]:
# check the shape of data after drop duplicate rows.
phone_data.shape

(1408697, 11)

In [14]:
# Keep only 1000000 data samples.
phone_df = phone_data.sample(n=1000000,random_state=612)

In [15]:
# drop the irrelevant features.
phone_df = phone_df.drop(phone_df.columns.difference(['author', 'product','score']),axis =1)
phone_df.dropna(inplace=True)

### 2. Answer the following questions

#### > Identify the most rated features

In [16]:
# most rated products
#phone_df['product'].value_counts().head(10)
phone_df[phone_df['score']==10].head(10)
# Lenovo Vibe K4 Note - White color Product was rated most.

Unnamed: 0,score,author,product
1231940,10.0,egornesterenko2009,Nokia 6233
1150123,10.0,unknown,Samsung C5212
1041704,10.0,chads3010,HTC Nexus One - Black Smartphone
1171786,10.0,unknown,Nokia E52
1278230,10.0,viktoria-o,Nokia 6131
124134,10.0,Alessio T.,"Lenovo Motorola Moto X Play Smartphone, 5.5"", ..."
1140490,10.0,amigo-vulnerable,Nokia 5800 XpressMusic
801649,10.0,Merendeiro,"Samsung Galaxy S III Smartphone, Bianco [Italia]"
981008,10.0,SCORPIONNE666,Nokia Asha 306 Smartphone GPRS 2G Bluetooth Wi...
1244089,10.0,Anonymous,Samsung Rant


#### > Identify the users with most number of reviews

In [17]:
# users with Most no of reviews
phone_df['author'].value_counts().head(10)

# Amazon Customers have given most no of Reviews.

Amazon Customer    54542
unknown            45014
Cliente Amazon     13661
e-bit               5959
Client d'Amazon     5495
Amazon Kunde        3283
Anonymous           1970
einer Kundin        1890
einem Kunden        1350
Anonymous           1014
Name: author, dtype: int64

#### > Select the data with products having more than 50 ratings and users who have given more than 50 ratings. 

In [18]:
# define a function to return a count values > 50
def val_counts(x,y):
    temp = phone_df[x].value_counts()
    users = temp[temp>50].index.tolist()

    temp = phone_df[y].value_counts()
    phones = temp[temp>50].index.tolist()
    return users,phones

In [19]:
# store the Phones and users having More than 50 ratings.
while(1):
    users1 ,phones1 = val_counts('author','product')

    phone_df = phone_df.loc[phone_df['product'].isin(phones1) & phone_df['author'].isin(users1)]
    
    users2 ,phones2 = val_counts('author','product')
    
    if (users1==users2) and (phones1==phones2):
        break

In [20]:
# lets see the top rows of the data After Getting 50+ ratings.
phone_df.head()

Unnamed: 0,score,author,product
1150123,10.0,unknown,Samsung C5212
1171786,10.0,unknown,Nokia E52
766631,2.0,Amazon Customer,"Lenovo Used Lenovo Zuk Z1 (Space Grey, 64GB)"
119096,10.0,Amazon Customer,"OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)"
511423,8.0,Amazon Customer,"Lenovo Vibe K5 (Gold, VoLTE update)"


In [21]:
# lets see the shape of the data
phone_df.shape

# Observation : We have 57436 Rows after dropping the rare rating Rows.

(57436, 3)

In [22]:
phone_df['score']=phone_df['score'].astype('int64')

In [23]:
# lets see for no of unique values in Product and author

print('No of Products having Morethan 50 ratings : ',len(phone_df['product'].unique()))
print('No of Users who given Morethan 50 ratings : ',len(phone_df['author'].unique()))

No of Products having Morethan 50 ratings :  416
No of Users who given Morethan 50 ratings :  56


In [24]:
# lets see that the rows any have the Score < 0. if any then drop them.
phone_df['score'].value_counts()

10    27160
8     13116
2      7226
6      5264
4      3170
9      1113
7       219
5        95
1        37
3        36
Name: score, dtype: int64

### 3. Popularity Based Model

In [25]:
# Build a popularity based model and recommend top 5 mobile phones.
phone_df.groupby('product')['score'].mean().sort_values(ascending=False).head()

product
Apple iPhone 6s 4,7" 128 GB             9.796178
Apple iPhone 7 4,7" 32 GB               9.776358
Samsung Samsung Galaxy A5 2016 - Wit    9.754386
Smartphone Asus ZenFone 3 ZE552KL       9.731343
Apple iPhone 6s Plus 5,5" 128 GB        9.722222
Name: score, dtype: float64

### 4. Colleborative Filtering

#### SVD - (Singular value Decomposition)

In [26]:
# read the dataset and split into train and testset
read = Reader(rating_scale=(1, 10))
phone_df_final = Dataset.load_from_df(phone_df[['author', 'product', 'score']],read)
trainset, testset = train_test_split(phone_df_final, test_size=.25,random_state=42)

In [27]:
# lets see the 1st value in trainset
print(trainset.to_raw_uid(0))
print(trainset.to_raw_iid(0))

Amazon Customer
YU Yuphoria YU5010A (Black+Silver)


In [28]:
# build svd model and predict the values
svd = SVD(n_factors=50,biased=False)
svd.fit(trainset)
svd_predict = svd.test(testset)

# compute RMSE
accuracy.rmse(svd_predict)

RMSE: 2.6172


2.617171650498187

In [29]:
svd_predict

[Prediction(uid='unknown', iid='Nokia 5228', r_ui=8.0, est=9.423667733507845, details={'was_impossible': False}),
 Prediction(uid='unknown', iid='HTC Desire HD', r_ui=8.0, est=8.825912083523075, details={'was_impossible': False}),
 Prediction(uid='Amazon Customer', iid='Lenovo Vibe K4 Note (Black, 16GB)', r_ui=2.0, est=5.759761809182034, details={'was_impossible': False}),
 Prediction(uid='unknown', iid='Sony Ericsson Vivaz', r_ui=8.0, est=6.701316500663942, details={'was_impossible': False}),
 Prediction(uid='einer Kundin', iid='Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', r_ui=10.0, est=9.593627807217132, details={'was_impossible': False}),
 Prediction(uid='Ð®Ñ\x80Ð¸Ð¹', iid='Sony Xperia C (Ð±ÐµÐ»Ñ\x8bÐ¹)', r_ui=9.0, est=9.095623742832583, details={'was_impossible': False}),
 Prediction(uid='unknown', iid='Samsung Star GT-S5230', r_ui=10.0, est=8.75267551246204, details={'was_impossible': False}),
 Prediction(uid='e-bit', iid='Smartphone Asus ZenFone 3 ZE

#### KNNWithMeans

In [30]:
knn_model = KNNWithMeans()
knn_model.fit(trainset)
knn_predict = knn_model.test(testset)

# compute RMSE
accuracy.rmse(knn_predict)

Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 2.5666


2.5666388079063256

In [31]:
knn_predict

[Prediction(uid='unknown', iid='Nokia 5228', r_ui=8.0, est=8.3, details={'actual_k': 40, 'was_impossible': False}),
 Prediction(uid='unknown', iid='HTC Desire HD', r_ui=8.0, est=8.7, details={'actual_k': 40, 'was_impossible': False}),
 Prediction(uid='Amazon Customer', iid='Lenovo Vibe K4 Note (Black, 16GB)', r_ui=2.0, est=6.95, details={'actual_k': 40, 'was_impossible': False}),
 Prediction(uid='unknown', iid='Sony Ericsson Vivaz', r_ui=8.0, est=6.725, details={'actual_k': 40, 'was_impossible': False}),
 Prediction(uid='einer Kundin', iid='Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', r_ui=10.0, est=9.85, details={'actual_k': 40, 'was_impossible': False}),
 Prediction(uid='Ð®Ñ\x80Ð¸Ð¹', iid='Sony Xperia C (Ð±ÐµÐ»Ñ\x8bÐ¹)', r_ui=9.0, est=8.833582910712595, details={'actual_k': 40, 'was_impossible': False}),
 Prediction(uid='unknown', iid='Samsung Star GT-S5230', r_ui=10.0, est=7.3, details={'actual_k': 40, 'was_impossible': False}),
 Prediction(uid='e-bit', 

#### User-User Based

In [32]:
# Use user_based true/false to switch between user-based or item-based collaborative filtering
user_algo = KNNWithMeans(k=50, sim_options={'name': 'pearson', 'user_based': True})
user_algo.fit(trainset)
user_predict = user_algo.test(testset)

# compute RMSE
accuracy.rmse(user_predict)

Computing the pearson similarity matrix...
Done computing similarity matrix.
RMSE: 2.5640


2.564018494863441

In [33]:
user_predict

[Prediction(uid='unknown', iid='Nokia 5228', r_ui=8.0, est=8.24, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='unknown', iid='HTC Desire HD', r_ui=8.0, est=8.72, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='Amazon Customer', iid='Lenovo Vibe K4 Note (Black, 16GB)', r_ui=2.0, est=6.96, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='unknown', iid='Sony Ericsson Vivaz', r_ui=8.0, est=6.379999999999999, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='einer Kundin', iid='Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', r_ui=10.0, est=9.88, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='Ð®Ñ\x80Ð¸Ð¹', iid='Sony Xperia C (Ð±ÐµÐ»Ñ\x8bÐ¹)', r_ui=9.0, est=8.814742591370573, details={'actual_k': 38, 'was_impossible': False}),
 Prediction(uid='unknown', iid='Samsung Star GT-S5230', r_ui=10.0, est=7.2, details={'actual_k': 50, 'was_impossible': False}),
 Prediction

#### Item-Item Based

In [34]:
item_algo = KNNWithMeans(k=50,sim_options={'name': 'pearson', 'user_based': False})
item_algo.fit(trainset)
product_predict = item_algo.test(testset)

# compute RMSE
accuracy.rmse(product_predict)

Computing the pearson similarity matrix...
Done computing similarity matrix.
RMSE: 2.5687


2.5687006112642052

In [35]:
product_predict

[Prediction(uid='unknown', iid='Nokia 5228', r_ui=8.0, est=8.24, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='unknown', iid='HTC Desire HD', r_ui=8.0, est=8.72, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='Amazon Customer', iid='Lenovo Vibe K4 Note (Black, 16GB)', r_ui=2.0, est=6.96, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='unknown', iid='Sony Ericsson Vivaz', r_ui=8.0, est=6.38, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='einer Kundin', iid='Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', r_ui=10.0, est=9.88, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='Ð®Ñ\x80Ð¸Ð¹', iid='Sony Xperia C (Ð±ÐµÐ»Ñ\x8bÐ¹)', r_ui=9.0, est=8.728682918985578, details={'actual_k': 17, 'was_impossible': False}),
 Prediction(uid='unknown', iid='Samsung Star GT-S5230', r_ui=10.0, est=7.2, details={'actual_k': 50, 'was_impossible': False}),
 Prediction(uid='e-bit',

###### Colleborative Filtering using KNNMeans Gives the Low RMSE score for the phone data. so lets take KnnMeans predictions to recommend the products for users.

In [36]:
knn_data = pd.DataFrame(knn_predict,columns=['Users','Phones','Score','est','details'])
knn_data.head()

Unnamed: 0,Users,Phones,Score,est,details
0,unknown,Nokia 5228,8.0,8.3,"{'actual_k': 40, 'was_impossible': False}"
1,unknown,HTC Desire HD,8.0,8.7,"{'actual_k': 40, 'was_impossible': False}"
2,Amazon Customer,"Lenovo Vibe K4 Note (Black, 16GB)",2.0,6.95,"{'actual_k': 40, 'was_impossible': False}"
3,unknown,Sony Ericsson Vivaz,8.0,6.725,"{'actual_k': 40, 'was_impossible': False}"
4,einer Kundin,"Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,...",10.0,9.85,"{'actual_k': 40, 'was_impossible': False}"


In [37]:
# Predict score (average rating) for test users.
knn_data.groupby('Users')['Score'].mean().sort_values(ascending=False)

Users
Fabio                 9.882353
Anonymous             9.750000
ÐÐ°ÑÑÑ            9.642857
einem Kunden          9.467085
ÐÐ¸ÐºÐ¾Ð»Ð°Ð¹        9.437500
einer Kundin          9.354067
Luca                  9.333333
Francesco             9.285714
Stefano               9.272727
Ð¡Ð²ÐµÑÐ»Ð°Ð½Ð°      9.235294
#                     9.200000
ÐÐ°ÑÐ¸Ð½Ð°          9.190476
ÐÐ°ÑÐ¸Ñ            9.111111
ÐÐ¸ÑÐ¸Ð»Ð»          9.052632
Ð¡ÐµÑÐ³ÐµÐ¹          9.048387
ÐÐ¸Ñ
Ð°Ð¸Ð»          9.045455
ÐÐ»ÐµÐ½Ð°            9.040000
Andrea                9.037037
ÐÐ²Ð³ÐµÐ½Ð¸Ð¹        9.034483
ÐÐ°ÑÐ°Ð»ÑÑ        9.000000
ÐÐ³Ð¾ÑÑ            9.000000
ÐÐ½Ð°ÑÑÐ°ÑÐ¸Ñ    9.000000
ÐÐºÐ°ÑÐµÑÐ¸Ð½Ð°    9.000000
e-bit                 8.929134
Marco                 8.916667
Alex                  8.850000
ÐÐ¼Ð¸ÑÑÐ¸Ð¹        8.848485
Alessandro            8.823529
ÐÐ»Ð°Ð´Ð¸Ð¼Ð¸Ñ      8.821429
ÐÐ°ÐºÑÐ¸Ð¼          8.777778
Ð¢Ð°ÑÑÑÐ½Ð°        8.750000
ÐÐ»ÐµÐºÑÐ°Ð½Ð´Ñ    8.743243
Ð

### 5. Findings and Inferences

* When a new user wants to buy a new phone then i would recommend the top 5 Phones which are Rated high: 

    1.Apple iPhone 6s 4,7" 128 GB
    
    2.Apple iPhone 7 4,7" 32 GB
    
    3.Samsung Samsung Galaxy A5 2016 - Wit
    
    4.Smartphone Asus ZenFone 3 ZE552KL

    5.Apple iPhone 6s Plus 5,5" 128 GB
    
    
* KNNMeans Model gives the less Mase Score i would take the knnmeans as the best model to recommend the smart phones.After predict the Testset using Knn the user named fabio have given high ratings to every product that he is Visited.
* In the data We have some Missing values Which may effect the Recommendations for the Product that Most of the Users have not given Ratings for the products.
* In testset users 50% of the users have avearge rating 9.
* Most of the users given the ratings 10 and 8 for the products.
* In the data there 2 phone Models are Rated by most people are Lenovo vibe and One plus.
* Most of the Users from Amazon are Given the ratings for the Products.

### 6. Lets Recommend the top 5 Products for the test users. 

In [38]:
indices = pd.Series(knn_data.index, index=knn_data['Users'])
phone_title = knn_data['Phones']

# Defining the function to Recommend top 5 phones for the users.
def Phone_recommend(original_title):
    scores = []
    phone = []
    idx = indices[original_title]
    for i in idx.values:
        sim_scores = list(enumerate(knn_predict[i]))
        scores.append(sim_scores)
    scores = sorted(scores,key=lambda x: x[3], reverse=True)
    for i in range(len(scores)):
        temp = str(scores[i][1][1])
        phone.append(temp)
    df = pd.DataFrame(phone)
    pred_values = df[0].unique()[:5]
    print('For User : ',original_title)
    for i in range(len(pred_values)):
        print(i+1,':',pred_values[i])
    print('===============================================================================================================')

In [39]:
# Lets Recommend top 5 phones for testset users.
test_users = knn_data['Users'].unique()
for i in test_users:
    Phone_recommend(i)

For User :  unknown
1 : Samsung Samsung Galaxy A5 2016 - Wit
2 : Sony Ericsson K750i
3 : Sony Ericsson W810i
4 : Sony Ericsson K800i
5 : Huawei Ascend G 510 schwarz [11,43cm (4,5") LCD, Android 4.1, 1.2 GHz Dual Core]
For User :  Amazon Customer
1 : OnePlus 3 (Graphite, 64 GB)
2 : Motorola Moto X Pure Edition Unlocked Smartphone, 64 GB Black XT1575, 5.7" Quad HD display, 21 MP Camera, Quad-core 1.8GHz
3 : Honor 6X (Silver, 32GB)
4 : ZTE Axon 7 Unlocked Smartphone,64GB Ion Gold (US Warranty)
5 : OnePlus 3 (Soft Gold, 64 GB)
For User :  einer Kundin
1 : Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)
2 : Huawei P9 Lite Dual-SIM Smartphone, 13,2 cm (5,2 Zoll) Display, LTE (4G), Android 6.0 (Marshmallow)
3 : Apple iPhone 6s Plus 5,5" 128 GB
4 : Samsung Galaxy S7 Smartphone, 12,9 cm (5,1 Zoll) Display, LTE (4G)
5 : Apple iPhone 7 4,7" 32 GB
For User :  Ð®ÑÐ¸Ð¹
1 : Sony Xperia Z1 Compact (Ð±ÐµÐ»ÑÐ¹)
2 : Samsung N7100 Galaxy Note II 16GB (Ð±ÐµÐ»ÑÐ¹)
3 : Apple iPhone

### 7. Cross Validation

In [40]:
# Perfrom Cross validation technique for svd model to get better results
cross_validate(svd, phone_df_final, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    2.5982  2.5812  2.5952  2.5709  2.6325  2.5956  0.0209  
MAE (testset)     2.0937  2.0879  2.0960  2.0783  2.1363  2.0984  0.0199  
Fit time          1.94    1.93    1.94    1.98    1.86    1.93    0.04    
Test time         0.15    0.09    0.11    0.11    0.07    0.11    0.03    


{'test_rmse': array([2.59816765, 2.5811705 , 2.59518338, 2.57086104, 2.63251802]),
 'test_mae': array([2.09372126, 2.08789711, 2.09596872, 2.07826885, 2.13631704]),
 'fit_time': (1.936985731124878,
  1.9276258945465088,
  1.9436607360839844,
  1.9775488376617432,
  1.8552331924438477),
 'test_time': (0.15202569961547852,
  0.08906793594360352,
  0.10800838470458984,
  0.11130261421203613,
  0.06900978088378906)}

### 8. In what business scenario you should use popularity based Recommendation Systems ?

There are N number of Business Organizations are using Popularity Based Recommendation Systems.To Recommend the Most selling Products to the new Customers,Most Watched Vedios in Vedio sharing Websites to the New Viewers.Trending News for the News Readers in Specific Websites.Some Popular Websites uses Popularity Based Recommendation are:
* Facebook - Remommends the Top selling Products to the users.
* Twitter - to recommend most trending tweets to the users.
* Google News - To recommend present trending news.

### 9. In what business scenario you should use CF based Recommendation Systems ?

Most websites uses collaborative filtering as a part of their sophisticated recommendation systems. You can use this technique to build recommenders that give suggestions to a user on the basis of the likes and dislikes of similar users.Most using Websites are:
* Amazon : Recommends the Products to the New users based on the sellings and similar Characteristics of the users.
* Netflix : Recommends the Movies to the New users based on the Similar Genre's and Top Viewd Movies.
* Youtube : Recommends the Vedios to the New users based on Different Patterns of the old users.

### 10. What other possible methods can you think of which can further improve the recommendation for different users ?


* ROC Curve : A ROC curve plots recall (true positive rate) against fallout (false positive rate) for increasing recommendation set size.
* PR Curve : Percision Recall Curve to Evaluate the top n Recommendations. It is the ration between Recall score and Percision Score.
* Cross Validation Techniques : Kfold Cross validation,Bootstrap Sampling,etc,.
* Different Similaity Techniques : Run the models with different Similarity Measures like Pearsons,cosine,Jaccard,etc,.
* Changing K value : Performing Different values of k may be increase the Performance.