![example](images/director_shot.jpeg)

# Project Title

**Authors:** Student 1, Student 2, Student 3
***

## Overview

A one-paragraph overview of the project, including the business problem, data, methods, results and recommendations.

## Business Problem

Summary of the business problem you are trying to solve, and the data questions that you plan to answer to solve them.

***
Questions to consider:
* What are the business's pain points related to this project?
* How did you pick the data analysis question(s) that you did?
* Why are these questions important from a business perspective?
***

## Data Understanding

Describe the data being used for this project.
***
Questions to consider:
* Where did the data come from, and how do they relate to the data analysis questions?
* What do the data represent? Who is in the sample and what variables are included?
* What is the target variable?
* What are the properties of the variables you intend to use?
***

In [77]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [78]:
# Here you run your code to explore the data

In [112]:
df = pd.read_csv('data/Video_Games.csv', names=['asin', 'user', 'rating', 'timestamp'])
df

Unnamed: 0,asin,user,rating,timestamp
0,0439381673,A21ROB4YDOZA5P,1.0,1402272000
1,0439381673,A3TNZ2Q5E7HTHD,3.0,1399680000
2,0439381673,A1OKRM3QFEATQO,4.0,1391731200
3,0439381673,A2XO1JFCNEYV3T,1.0,1391731200
4,0439381673,A19WLPIRHD15TH,4.0,1389830400
...,...,...,...,...
2565344,B01HJEBIAA,ANGB54K3888S4,5.0,1533081600
2565345,B01HJEBIAA,A3TEVKR0ZVQB2T,5.0,1531785600
2565346,B01HJEBIAA,ABE7YPWEHNVJZ,5.0,1530835200
2565347,B01HJEBIAA,A3ES9QBK3G192O,5.0,1528761600


In [114]:
asin_list = df['asin'].unique()

In [115]:
np.arange(len(asin_list))

array([    0,     1,     2, ..., 71979, 71980, 71981])

In [126]:
asin_lookup = dict(zip(np.arange(len(asin_list)), asin_list))

In [120]:
asin_map = dict(zip(asin_list, np.arange(len(asin_list))))

In [121]:
asin_map

{'0439381673': 0,
 '0700026657': 1,
 '0700099867': 2,
 '0700026398': 3,
 '0758534531': 4,
 '0804161380': 5,
 '1616616873': 6,
 '3815864844': 7,
 '3828770193': 8,
 '3866811659': 9,
 '6050036071': 10,
 '7293000960': 11,
 '7293000936': 12,
 '7543450933': 13,
 '7544256944': 14,
 '7561321074': 15,
 '8176503290': 16,
 '8565000168': 17,
 '907843905X': 18,
 '952590444X': 19,
 '9625990674': 20,
 '9625990992': 21,
 '9629551462': 22,
 '9629971372': 23,
 '9752300480': 24,
 '9758648950': 25,
 '9756663855': 26,
 '975539463X': 27,
 '9867299434': 28,
 '988800171X': 29,
 '9882155456': 30,
 '9882106463': 31,
 'B000003SQQ': 32,
 'B000006OTB': 33,
 'B000006OWS': 34,
 'B000006OVF': 35,
 'B000006OVJ': 36,
 'B000006OWT': 37,
 'B000006OVG': 38,
 'B000006P0M': 39,
 'B000006P0J': 40,
 'B000006OVE': 41,
 'B000006P0K': 42,
 'B000006OVK': 43,
 'B000006OWR': 44,
 'B000006OVL': 45,
 'B000006RGS': 46,
 'B000006P0P': 47,
 'B000006RGQ': 48,
 'B000006RGO': 49,
 'B000006RGR': 50,
 'B000006RGP': 51,
 'B000006OVI': 52,
 'B

In [123]:
df['asin'] = df['asin'].map(asin_map)

In [124]:
df

Unnamed: 0,asin,user,rating,timestamp
0,0,A21ROB4YDOZA5P,1.0,1402272000
1,0,A3TNZ2Q5E7HTHD,3.0,1399680000
2,0,A1OKRM3QFEATQO,4.0,1391731200
3,0,A2XO1JFCNEYV3T,1.0,1391731200
4,0,A19WLPIRHD15TH,4.0,1389830400
...,...,...,...,...
2565344,25455,ANGB54K3888S4,5.0,1533081600
2565345,25455,A3TEVKR0ZVQB2T,5.0,1531785600
2565346,25455,ABE7YPWEHNVJZ,5.0,1530835200
2565347,25455,A3ES9QBK3G192O,5.0,1528761600


In [127]:
df['asin'].map(asin_lookup)

0          0439381673
1          0439381673
2          0439381673
3          0439381673
4          0439381673
              ...    
2565344    B01HJEBIAA
2565345    B01HJEBIAA
2565346    B01HJEBIAA
2565347    B01HJEBIAA
2565348    B01HJEBIAA
Name: asin, Length: 2565349, dtype: object

In [80]:
df['asin'].nunique()

71982

In [81]:
df['user'].nunique()

1540618

In [82]:
df=df.sample(frac=1)

In [106]:
df['rating'].value_counts()

5.0    1447324
4.0     397993
1.0     302251
3.0     205656
2.0     136972
Name: rating, dtype: int64

In [111]:
df.dtypes

asin          object
user          object
rating       float64
timestamp      int64
dtype: object

In [83]:
df.isna().sum()

asin         0
user         0
rating       0
timestamp    0
dtype: int64

In [84]:
df[df.duplicated(keep=False)==True].head(20)

Unnamed: 0,asin,user,rating,timestamp
1782415,B00SN1QEGW,A71Z5AIGEFK11,5.0,1532995200
550506,B0013016O0,A2EJIPSUG5J6NH,4.0,1228521600
464001,B000ZK698C,A35WDB667ARZU,5.0,1406851200
475396,B0012N8WXQ,A534OK7WDYERU,5.0,1348444800
579797,B0017IUFAE,A2336BHPQW6UQH,5.0,1329436800
566814,B00163EWQI,A3KBYGS21NC80X,4.0,1414368000
518876,B000WMEEAI,A2QRWCIV0Z4QMF,5.0,1484697600
472225,B00128CH7S,AQITO3THZGYBR,5.0,1300579200
461728,B000ZK9QCS,A2FKZC2P6K3O99,5.0,1314576000
478705,B0013E9HP6,A299YIRXVGEJM6,5.0,1445299200


In [85]:
df[(df['user']=='AF3EVH5OFWIQN') & (df['asin']=='1300450991')]

Unnamed: 0,asin,user,rating,timestamp


In [86]:
df[df.duplicated(keep=False)==False].head(20)

Unnamed: 0,asin,user,rating,timestamp
1431057,B00FWWY1V0,A19Y6MQ2928NX6,2.0,1411516800
2345048,B002C92OP6,A1ZQ8PVBNGZPOM,1.0,1258156800
642026,B001H0RZX2,A15H0U1SHC4VI1,5.0,1242345600
2235464,B0002FQVBA,A142B0CTP2PUBB,4.0,1428796800
1634890,B00KY0QH0I,AY5651M8C8NR8,5.0,1412726400
2545630,B01BKY707K,A2ZY4LWSNB68O9,3.0,1527206400
1554013,B00JKM06EO,A1T2DNKLHYWP80,5.0,1501200000
1110690,B008DBJPLS,A34X974YQ9BYRX,5.0,1429660800
195840,B00022GIYI,AKRH6V60MHFNZ,5.0,1089936000
2558980,B01FVLA43A,A37FYBF0IUWQIW,5.0,1504310400


In [87]:
df.drop_duplicates(inplace=True)
df

Unnamed: 0,asin,user,rating,timestamp
1431057,B00FWWY1V0,A19Y6MQ2928NX6,2.0,1411516800
2345048,B002C92OP6,A1ZQ8PVBNGZPOM,1.0,1258156800
642026,B001H0RZX2,A15H0U1SHC4VI1,5.0,1242345600
2235464,B0002FQVBA,A142B0CTP2PUBB,4.0,1428796800
1634890,B00KY0QH0I,AY5651M8C8NR8,5.0,1412726400
...,...,...,...,...
679714,B001TOQ8K2,A1RXC1H08QBH2T,5.0,1448236800
1648286,B00LLBZINQ,A1Y9UC4APAI599,4.0,1415491200
1214449,B00BGD6LMG,A1YWA9YV80SYGB,5.0,1421193600
2209424,B000063RRK,A5VDI4RFJ4X8W,5.0,1033948800


In [88]:
df['rating'].value_counts(normalize=True).sort_index(ascending=False)

5.0    0.581209
4.0    0.159824
3.0    0.082586
2.0    0.055005
1.0    0.121376
Name: rating, dtype: float64

In [89]:
df['asin'].nunique()

71982

In [90]:
df['user'].nunique()

1540618

In [91]:
meta_df = pd.read_json('data/meta_Video_Games.json.gz', lines=True)
meta_df

Unnamed: 0,category,tech1,description,fit,title,also_buy,tech2,brand,feature,rank,also_view,main_cat,similar_item,date,price,asin,imageURL,imageURLHighRes,details
0,"[Video Games, PC, Games]",,[],,Reversi Sensory Challenger,[],,Fidelity Electronics,[],"[>#2,623,937 in Toys &amp; Games (See Top 100 ...",[],Toys &amp; Games,,,,0042000742,[https://images-na.ssl-images-amazon.com/image...,[https://images-na.ssl-images-amazon.com/image...,
1,"[Video Games, Xbox 360, Games, </span></span><...",,[Brand new sealed!],,Medal of Honor: Warfighter - Includes Battlefi...,[B00PADROYW],,by\n \n EA Games,[],"[>#67,231 in Video Games (See Top 100 in Video...","[B0050SY5BM, B072NQJCW5, B000TI836G, B002SRSQ7...",Video Games,,,"\n\t\t\t\t\t\t\t\t\t\t\t\t<span class=""vertica...",0078764343,[https://images-na.ssl-images-amazon.com/image...,[https://images-na.ssl-images-amazon.com/image...,
2,"[Video Games, Retro Gaming & Microconsoles, Su...",,[],,street fighter 2 II turbo super nintendo snes ...,[],,Nintendo,[],"[>#134,433 in Video Games (See Top 100 in Vide...",[],Video Games,,,$0.72,0276425316,[],[],
3,"[Video Games, Xbox 360, Accessories, Controlle...",,[MAS's Pro Xbox 360 Stick (Perfect 360 Stick) ...,,Xbox 360 MAS STICK,[],,by\n \n MAS SYSTEMS,[Original PCB used from Xbox 360 Control Pad (...,"[>#105,263 in Video Games (See Top 100 in Vide...",[],Video Games,,,,0324411812,[https://images-na.ssl-images-amazon.com/image...,[https://images-na.ssl-images-amazon.com/image...,
4,"[Video Games, PC, Games, </span></span></span>...",,"[Phonics Alive! 3, The Speller teaches student...",,Phonics Alive! 3: The Speller,[],,by\n \n Advanced Software Pty. Ltd.,"[Grades 2-12, Spelling Program, Teaches Spelli...","[>#92,397 in Video Games (See Top 100 in Video...",[B000BCZ7U0],Video Games,,,,0439335310,[https://images-na.ssl-images-amazon.com/image...,[https://images-na.ssl-images-amazon.com/image...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
84814,"[Video Games, PlayStation Vita, Digital Games ...",,[<div>The Force is strong with this one The No...,,Lego Star Wars: The Force Awakens - PS Vita [D...,[],,by\n \n Warner Bros.,[],"[>#74,224 in Video Games (See Top 100 in Video...",[],Video Games,,,,B01HJ1521Y,[],[],{}
84815,"[Video Games, PlayStation 4, Digital Games & D...",,[<div>The Season Pass includes three Level Pac...,,Lego Star Wars: The Force Awakens Season Pass...,[],,by\n \n Warner Bros.,[DLC Requires base game],"[>#62,150 in Video Games (See Top 100 in Video...",[],Video Games,,,\n\t\t ...,B01HJ14TTA,[],[],{}
84816,"[Video Games, PlayStation 4, Digital Games & D...",,"[<div>The Technomancer takes you to Mars, wher...",,The Technomancer - PS4 [Digital Code],[],,by\n \n Focus Home Interactive,[],"[>#94,234 in Video Games (See Top 100 in Video...",[],Video Games,,,,B01HJ14OT0,[https://images-na.ssl-images-amazon.com/image...,[https://images-na.ssl-images-amazon.com/image...,{}
84817,"[Video Games, Xbox 360, Accessories, </span></...",,[<b>FUNCTIONS:</b><br> 1.Take apart your Xbox ...,,"Repair T8 T6 Tools for XBOX One Xbox 360, YTTL...","[B01KBNB7K2, B06X6JSYPC, B01N6Y0Z7W, B06VXD2W5...",,by\n \n YTTL,[If you want to Replacement you Xbox one /360 ...,"[>#16,087 in Video Games (See Top 100 in Video...","[B01KH25ZY6, B00PG8SU26, B07G122BVS, B016XLTQP...",Video Games,,,"\n\t\t\t\t\t\t\t\t\t\t\t\t<span class=""vertica...",B01HJC33WS,[https://images-na.ssl-images-amazon.com/image...,[https://images-na.ssl-images-amazon.com/image...,{}


In [92]:
meta_df = meta_df[['title', 'asin']]

In [93]:
merged_df = df.merge(meta_df, how='inner', on='asin')
merged_df

Unnamed: 0,asin,user,rating,timestamp,title
0,B00FWWY1V0,A19Y6MQ2928NX6,2.0,1411516800,KontrolFreek FPS Freek Phantom for PlayStation...
1,B00FWWY1V0,A2IOLR781CHYDE,5.0,1413417600,KontrolFreek FPS Freek Phantom for PlayStation...
2,B00FWWY1V0,AN8SBLH1T2MXG,5.0,1388448000,KontrolFreek FPS Freek Phantom for PlayStation...
3,B00FWWY1V0,A94EZP705DDOY,5.0,1450051200,KontrolFreek FPS Freek Phantom for PlayStation...
4,B00FWWY1V0,A2TMASRK1MNWZ2,2.0,1391126400,KontrolFreek FPS Freek Phantom for PlayStation...
...,...,...,...,...,...
2770597,B002XDLLB4,AM3ICG8AUMFUX,5.0,1362614400,God War PS3 Playstation 3 Body Protector Skin ...
2770598,B0043M64GA,A27EBTIQEO2XGC,1.0,1330128000,Dead Space 2
2770599,B00006LJUR,A1BQKZ2IIKXQF4,5.0,1391385600,Valkyrie Profile [Limited Deluxe Pack] [Japan ...
2770600,B00006LJUR,A1BQKZ2IIKXQF4,5.0,1391385600,Valkyrie Profile [Limited Deluxe Pack] [Japan ...


In [94]:
merged_df.tail(20)

Unnamed: 0,asin,user,rating,timestamp,title
2770582,B000R7YDJU,A1JRYELA59GKH2,5.0,1410825600,SPROINK - PC
2770583,B0001RBMMC,A33FVFKPMPAVLV,4.0,1363478400,4mb Memory Card (Japanese Import)
2770584,B0001RBMMC,A33FVFKPMPAVLV,4.0,1363478400,4mb Memory Card (Japanese Import)
2770585,B00YFQM6N4,A2Q14TLLNVHPJA,5.0,1483920000,Legend of Zelda Link Ocarina 3D of Time Epona ...
2770586,B000069T8M,A2MJIKNJ8JHDPZ,4.0,1381795200,Sonic 3D: Flicky's Island [Japan Import]
2770587,B000069T8M,A2MJIKNJ8JHDPZ,4.0,1381795200,Sonic 3D: Flicky's Island [Japan Import]
2770588,B005553P2A,A24X3S2YINEYYR,2.0,1439164800,Final Fantasy Crystal Chronicles:The Crystal B...
2770589,B00R06VT5C,A24AV8V31CGZY8,5.0,1481673600,MODFREAKZ Pair of Vinyl Controller Skins - Fly...
2770590,B00AC41M6A,A1O17XN1STC1UT,5.0,1369872000,The Game Chamber DS Game Vault 12
2770591,B001FSKJZW,A258RT106MZYOM,3.0,1443916800,Sengoku Basara: Battle Heroes [Japan Import]


In [95]:
merged_df['user'].nunique()

1539732

In [96]:
merged_df['title'].nunique()

68663

In [97]:
merged_df.isna().sum()

asin         0
user         0
rating       0
timestamp    0
title        0
dtype: int64

In [98]:
merged_df[merged_df.duplicated(keep=False)==True].head(20)

Unnamed: 0,asin,user,rating,timestamp,title
832,B0002FQVBA,A142B0CTP2PUBB,4.0,1428796800,Close Combat: First to Fight - Xbox
833,B0002FQVBA,A142B0CTP2PUBB,4.0,1428796800,Close Combat: First to Fight - Xbox
834,B0002FQVBA,A20ZBW8WN0MEMT,5.0,1119916800,Close Combat: First to Fight - Xbox
835,B0002FQVBA,A20ZBW8WN0MEMT,5.0,1119916800,Close Combat: First to Fight - Xbox
836,B0002FQVBA,A2WW5VUQN2JFMC,4.0,1116288000,Close Combat: First to Fight - Xbox
837,B0002FQVBA,A2WW5VUQN2JFMC,4.0,1116288000,Close Combat: First to Fight - Xbox
838,B0002FQVBA,APUIZOMJCYQ6F,3.0,1114905600,Close Combat: First to Fight - Xbox
839,B0002FQVBA,APUIZOMJCYQ6F,3.0,1114905600,Close Combat: First to Fight - Xbox
840,B0002FQVBA,A2H8UCVKM5YFF8,3.0,1114560000,Close Combat: First to Fight - Xbox
841,B0002FQVBA,A2H8UCVKM5YFF8,3.0,1114560000,Close Combat: First to Fight - Xbox


In [99]:
from surprise import Dataset, Reader
from surprise import accuracy
from surprise.prediction_algorithms import knns
from surprise.similarities import cosine, msd, pearson
from surprise.model_selection import cross_validate, train_test_split
from surprise.prediction_algorithms import SVD
from surprise.model_selection import GridSearchCV

In [100]:
data= merged_df[['user', 'title', 'rating']]
reader= Reader(line_format= 'user item rating', sep= ',')
data= Dataset.load_from_df(data, reader=reader)

In [101]:
trainset, testset= train_test_split(data, test_size=0.25, random_state=42)

In [102]:
testset

[('A1ZOBRYGBL5OD1', 'Dragon Age Inquisition - Deluxe Edition - PC', 5.0),
 ('AQW328WL34MNQ',
  'Animal Crossing:  Happy Home Designer - 3DS [Digital Code]',
  4.0),
 ('A26VTX5W1NWTUF', 'Pokemon, Crystal Version', 5.0),
 ('A2Y4NXYIKCR5EU', 'Dying Light - PlayStation 4', 5.0),
 ('A87SK4BBWBI8Q',
  'Modern-Tech Nintendo DSi XL High Capacity 2000 mAh Replacement Battery',
  4.0),
 ('A319B5UXCJQW8B', 'DmC Devil May Cry', 5.0),
 ('A19QPUTHGBZN3Z', 'Spy vs Spy', 3.0),
 ('A36V0ID0XVBW10',
  'The Elder Scrolls V: Skyrim Special Edition - PS4 [Digital Code]',
  5.0),
 ('A77J3FEM7573',
  'Mayflash GameCube Controller Adapter for Wii U, PC USB and Switch, 4 Port',
  5.0),
 ('A3RYR8X088WP7D',
  'World of Tanks-X360 Xbox 360 English US NTSC DVD - Xbox 360',
  1.0),
 ('A2CYOFT3EMRCX1', 'Kingdoms of Amalur: Reckoning - PS3 [Digital Code]', 5.0),
 ('A3REF9LD0INKRI', ' Majesty 2', 3.0),
 ('A21EKIKJ2WWKDQ', 'Xenoblade Chronicles X', 5.0),
 ('A28MNVWDTIABGC', 'At Games ATARI Flashback 4 Deluxe Edition', 5

## KNN Basic

In [25]:
KNN_model= knns.KNNBasic(sim_options={'name': 'cosine', 'user_based': False}).fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


In [26]:
cross_validate(KNN_model, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2967  1.3165  1.3062  1.3070  1.2987  1.3050  0.0070  
MAE (testset)     0.9703  0.9858  0.9748  0.9759  0.9743  0.9762  0.0051  
Fit time          56.14   57.66   56.20   53.79   53.85   55.53   1.50    
Test time         0.64    0.35    0.39    0.61    0.37    0.47    0.13    


{'test_rmse': array([1.29667705, 1.3164655 , 1.30615174, 1.3069601 , 1.29872379]),
 'test_mae': array([0.97029964, 0.98580177, 0.97480664, 0.97591937, 0.9743212 ]),
 'fit_time': (56.141650915145874,
  57.65526604652405,
  56.20202279090881,
  53.78870487213135,
  53.850388050079346),
 'test_time': (0.6395540237426758,
  0.34877920150756836,
  0.38604021072387695,
  0.6083400249481201,
  0.3662452697753906)}

In [27]:
KNN_model2= knns.KNNBasic(sim_options={'name': 'msd', 'user_based': False}).fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


In [28]:
cross_validate(KNN_model2, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.3070  1.2933  1.3055  1.2950  1.2943  1.2990  0.0060  
MAE (testset)     0.9739  0.9611  0.9742  0.9641  0.9623  0.9671  0.0057  
Fit time          55.81   55.55   55.80   55.31   55.03   55.50   0.30    
Test time         0.62    0.58    0.66    0.61    0.50    0.59    0.05    


{'test_rmse': array([1.30702091, 1.29331413, 1.30550743, 1.29498519, 1.29428809]),
 'test_mae': array([0.97390054, 0.96114017, 0.97424211, 0.96405965, 0.9622922 ]),
 'fit_time': (55.805038928985596,
  55.55095672607422,
  55.80167579650879,
  55.30667281150818,
  55.02721285820007),
 'test_time': (0.6193728446960449,
  0.5824072360992432,
  0.6564249992370605,
  0.6091680526733398,
  0.5015571117401123)}

In [29]:
KNN_model3= knns.KNNBasic(sim_options={'name': 'pearson', 'user_based': False}).fit(trainset)

Computing the pearson similarity matrix...
Done computing similarity matrix.


In [30]:
cross_validate(KNN_model3, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.3051  1.2931  1.3026  1.2888  1.2918  1.2963  0.0064  
MAE (testset)     0.9615  0.9541  0.9655  0.9541  0.9579  0.9587  0.0044  
Fit time          50.71   53.08   50.44   53.93   49.82   51.60   1.61    
Test time         0.59    0.34    0.52    0.28    0.54    0.45    0.12    


{'test_rmse': array([1.30506132, 1.29309865, 1.30259424, 1.28879011, 1.29182685]),
 'test_mae': array([0.96154875, 0.95411434, 0.96553037, 0.95412569, 0.9579485 ]),
 'fit_time': (50.708773136138916,
  53.078081130981445,
  50.44257378578186,
  53.9336199760437,
  49.816184759140015),
 'test_time': (0.5917799472808838,
  0.3407597541809082,
  0.5162580013275146,
  0.28071093559265137,
  0.5371177196502686)}

In [31]:
KNN_model4= knns.KNNBasic(sim_options={'name': 'pearson_baseline', 'user_based': False}).fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [32]:
cross_validate(KNN_model4, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.3024  1.2958  1.2967  1.2976  1.2975  1.2980  0.0023  
MAE (testset)     0.9649  0.9626  0.9582  0.9613  0.9596  0.9613  0.0023  
Fit time          49.75   51.10   50.25   50.21   50.10   50.28   0.44    
Test time         0.53    0.50    0.39    0.48    0.45    0.47    0.05    


{'test_rmse': array([1.30237473, 1.29577235, 1.29667927, 1.29759325, 1.29745356]),
 'test_mae': array([0.96488078, 0.96259135, 0.95820719, 0.96128778, 0.95964681]),
 'fit_time': (49.753724813461304,
  51.09508514404297,
  50.25413107872009,
  50.213900089263916,
  50.098995208740234),
 'test_time': (0.5322749614715576,
  0.49553728103637695,
  0.3861398696899414,
  0.4799790382385254,
  0.45453691482543945)}

## KNN With Means

In [35]:
KNN_model= knns.KNNWithMeans(sim_options={'name': 'cosine', 'user_based': False}).fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


In [36]:
cross_validate(KNN_model, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.3145  1.3162  1.3036  1.3027  1.3096  1.3093  0.0055  
MAE (testset)     0.9768  0.9782  0.9682  0.9685  0.9714  0.9726  0.0042  
Fit time          47.81   48.31   48.48   47.62   47.04   47.85   0.51    
Test time         0.72    0.47    0.31    0.33    0.35    0.43    0.15    


{'test_rmse': array([1.31445833, 1.31624245, 1.30363781, 1.30273256, 1.3096166 ]),
 'test_mae': array([0.97683726, 0.97820841, 0.96823514, 0.96850464, 0.97138305]),
 'fit_time': (47.81424903869629,
  48.30622100830078,
  48.47829604148865,
  47.615411043167114,
  47.041229009628296),
 'test_time': (0.716500997543335,
  0.46961116790771484,
  0.30996012687683105,
  0.3289520740509033,
  0.34900879859924316)}

In [37]:
KNN_model2= knns.KNNWithMeans(sim_options={'name': 'msd', 'user_based': False}).fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


In [38]:
cross_validate(KNN_model2, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2996  1.3020  1.3097  1.3040  1.3070  1.3044  0.0036  
MAE (testset)     0.9605  0.9601  0.9692  0.9675  0.9674  0.9650  0.0038  
Fit time          60.95   61.51   61.69   60.84   60.60   61.12   0.41    
Test time         0.72    0.64    0.49    0.43    0.44    0.54    0.11    


{'test_rmse': array([1.29959384, 1.30200683, 1.30969295, 1.3039749 , 1.30696601]),
 'test_mae': array([0.96051324, 0.96014564, 0.96915187, 0.96753465, 0.96743836]),
 'fit_time': (60.95057988166809,
  61.51298928260803,
  61.68616318702698,
  60.83865475654602,
  60.598276138305664),
 'test_time': (0.7162270545959473,
  0.6378788948059082,
  0.49219298362731934,
  0.434877872467041,
  0.4386141300201416)}

In [39]:
KNN_model3= knns.KNNWithMeans(sim_options={'name': 'pearson', 'user_based': False}).fit(trainset)

Computing the pearson similarity matrix...
Done computing similarity matrix.


In [40]:
cross_validate(KNN_model3, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.3006  1.2987  1.3009  1.3052  1.3059  1.3023  0.0028  
MAE (testset)     0.9559  0.9537  0.9566  0.9627  0.9618  0.9581  0.0035  
Fit time          60.51   60.83   58.53   60.28   58.92   59.81   0.91    
Test time         0.43    0.44    0.74    0.31    0.33    0.45    0.15    


{'test_rmse': array([1.30062477, 1.29869273, 1.3008936 , 1.3052147 , 1.3059231 ]),
 'test_mae': array([0.95592434, 0.95368688, 0.95656881, 0.96271495, 0.96178389]),
 'fit_time': (60.5059609413147,
  60.82538604736328,
  58.5308837890625,
  60.27626395225525,
  58.915345907211304),
 'test_time': (0.42750096321105957,
  0.44039320945739746,
  0.7377080917358398,
  0.30954909324645996,
  0.3265848159790039)}

In [41]:
KNN_model4= knns.KNNWithMeans(sim_options={'name': 'pearson_baseline', 'user_based': False}).fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [42]:
cross_validate(KNN_model4, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.3056  1.2993  1.3002  1.3037  1.3092  1.3036  0.0036  
MAE (testset)     0.9657  0.9563  0.9542  0.9612  0.9624  0.9600  0.0042  
Fit time          64.94   68.00   67.20   66.59   65.86   66.52   1.06    
Test time         0.87    0.41    0.41    0.40    0.47    0.51    0.18    


{'test_rmse': array([1.30561396, 1.29928417, 1.3002484 , 1.30365682, 1.3091943 ]),
 'test_mae': array([0.96572902, 0.95633506, 0.9541786 , 0.96123706, 0.96241009]),
 'fit_time': (64.94071006774902,
  68.00457620620728,
  67.20421576499939,
  66.59040379524231,
  65.85597372055054),
 'test_time': (0.865253210067749,
  0.41051506996154785,
  0.407520055770874,
  0.3995490074157715,
  0.4683661460876465)}

## SVD

In [103]:
svd = SVD()

In [104]:
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7feefe4aeeb0>

In [105]:
predictions= svd.test(testset)
accuracy.rmse(predictions)

RMSE: 1.2324


1.2324324052459386

In [108]:
accuracy.mae(predictions)

MAE:  0.9516


0.9515967603715499

## Data Preparation

Describe and justify the process for preparing the data for analysis.

***
Questions to consider:
* Were there variables you dropped or created?
* How did you address missing values or outliers?
* Why are these choices appropriate given the data and the business problem?
***

In [33]:
# Here you run your code to clean the data

## Data Modeling
Describe and justify the process for analyzing or modeling the data.

***
Questions to consider:
* How did you analyze or model the data?
* How did you iterate on your initial approach to make it better?
* Why are these choices appropriate given the data and the business problem?
***

In [34]:
# Here you run your code to model the data


## Evaluation
Evaluate how well your work solves the stated business problem.

***
Questions to consider:
* How do you interpret the results?
* How well does your model fit your data? How much better is this than your baseline model?
* How confident are you that your results would generalize beyond the data you have?
* How confident are you that this model would benefit the business if put into use?
***

## Conclusions
Provide your conclusions about the work you've done, including any limitations or next steps.

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***