![example](images/director_shot.jpeg)

# Project Title

**Authors:** Student 1, Student 2, Student 3
***

## Overview

A one-paragraph overview of the project, including the business problem, data, methods, results and recommendations.

## Business Problem

Summary of the business problem you are trying to solve, and the data questions that you plan to answer to solve them.

***
Questions to consider:
* What are the business's pain points related to this project?
* How did you pick the data analysis question(s) that you did?
* Why are these questions important from a business perspective?
***

## Data Understanding

Describe the data being used for this project.
***
Questions to consider:
* Where did the data come from, and how do they relate to the data analysis questions?
* What do the data represent? Who is in the sample and what variables are included?
* What is the target variable?
* What are the properties of the variables you intend to use?
***

In [83]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [292]:
df = pd.read_csv('https://github.com/ds-papes/dsc-phase-4-project-clone/blob/main/data/Luxury_Beauty.csv?raw=true', names=['asin', 'user', 'rating', 'timestamp'])
df

Unnamed: 0,asin,user,rating,timestamp
0,B00004U9V2,A1Q6MUU0B2ZDQG,2.0,1276560000
1,B00004U9V2,A3HO2SQDCZIE9S,5.0,1262822400
2,B00004U9V2,A2EM03F99X3RJZ,5.0,1524009600
3,B00004U9V2,A3Z74TDRGD0HU,5.0,1524009600
4,B00004U9V2,A2UXFNW9RTL4VM,5.0,1523923200
...,...,...,...,...
574623,B01HIQEOLO,AHYJ78MVF4UQO,5.0,1489968000
574624,B01HIQEOLO,A1L2RT7KBNK02K,5.0,1477440000
574625,B01HIQEOLO,A36MLXQX9WPPW9,5.0,1475193600
574626,B01HJ2UY0W,A23DRCOMC2RIXF,1.0,1480896000


In [18]:
df.dtypes

asin           int64
user           int64
rating       float64
timestamp      int64
dtype: object

In [19]:
df['asin'].nunique()

12120

In [20]:
df['user'].nunique()

416174

In [149]:
df['rating'].value_counts()

5    355484
4     65899
1     47634
3     39440
2     27838
Name: rating, dtype: int64

In [151]:
df.isna().sum()

asin      0
user      0
rating    0
dtype: int64

In [26]:
df[df.duplicated(keep=False)==True].head(20)

Unnamed: 0,asin,user,rating
25,0,25,5
26,0,25,5
49,0,48,5
50,0,48,5
118,0,116,5
147,0,145,5
148,0,145,5
157,0,154,5
158,0,154,5
161,0,157,5


In [153]:
df.describe()

Unnamed: 0,asin,user,rating
count,536295.0,536295.0,536295.0
mean,2710.716481,188902.131303,4.219032
std,2155.861348,122243.472041,1.30209
min,0.0,0.0,1.0
25%,1023.0,81430.5,4.0
50%,2305.0,181283.0,5.0
75%,3977.0,293029.5,5.0
max,12119.0,416173.0,5.0


In [154]:
df['rating'].value_counts(normalize=True).sort_index(ascending=False)

5    0.662852
4    0.122878
3    0.073542
2    0.051908
1    0.088821
Name: rating, dtype: float64

## Data Preparation

Describe and justify the process for preparing the data for analysis.

***
Questions to consider:
* Were there variables you dropped or created?
* How did you address missing values or outliers?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# Here you run your code to clean the data

In [3]:
asin_list = df['asin'].unique()

In [4]:
np.arange(len(asin_list))

array([    0,     1,     2, ..., 12117, 12118, 12119])

In [5]:
asin_lookup = dict(zip(np.arange(len(asin_list)), asin_list))

In [6]:
asin_map = dict(zip(asin_list, np.arange(len(asin_list))))

In [7]:
asin_map

{'B00004U9V2': 0,
 'B00005A77F': 1,
 'B00005NDTD': 2,
 'B00005V50C': 3,
 'B00005V50B': 4,
 'B000066SYB': 5,
 'B000068DWY': 6,
 'B00008WFSM': 7,
 'B0000Y3NO6': 8,
 'B0000ZREXG': 9,
 'B0000ZREXQ': 10,
 'B00011JU6I': 11,
 'B00011QUKW': 12,
 'B00012C5RS': 13,
 'B000142FVW': 14,
 'B000141PIG': 15,
 'B00014351Q': 16,
 'B0001433OU': 17,
 'B000141PYK': 18,
 'B00014340I': 19,
 'B0001435D4': 20,
 'B00014353E': 21,
 'B0001432PK': 22,
 'B00014GT8W': 23,
 'B0001EKVCW': 24,
 'B0001EKVGS': 25,
 'B0001EKTTC': 26,
 'B0001EL5KO': 27,
 'B0001EL5JA': 28,
 'B0001EL9BO': 29,
 'B0001EL0WC': 30,
 'B0001EL5OU': 31,
 'B0001EL59K': 32,
 'B0001EL5R2': 33,
 'B0001EL4DC': 34,
 'B0001EL39C': 35,
 'B0001EL5Q8': 36,
 'B0001EL4M8': 37,
 'B0001EKYEM': 38,
 'B0001F3QV4': 39,
 'B0001NAYOI': 40,
 'B0001QNLNG': 41,
 'B0001UWRCI': 42,
 'B0001XDU2Q': 43,
 'B0001XDTYA': 44,
 'B0001XDUBC': 45,
 'B0001XDTWM': 46,
 'B0001Y74TA': 47,
 'B0001Y74TU': 48,
 'B0001Y74XG': 49,
 'B0001Y74H2': 50,
 'B0001Y74KO': 51,
 'B0001Y74SG': 52,
 'B

In [8]:
df['asin'] = df['asin'].map(asin_map)

In [9]:
df

Unnamed: 0,asin,user,rating,timestamp
0,0,A1Q6MUU0B2ZDQG,2.0,1276560000
1,0,A3HO2SQDCZIE9S,5.0,1262822400
2,0,A2EM03F99X3RJZ,5.0,1524009600
3,0,A3Z74TDRGD0HU,5.0,1524009600
4,0,A2UXFNW9RTL4VM,5.0,1523923200
...,...,...,...,...
574623,6012,AHYJ78MVF4UQO,5.0,1489968000
574624,6012,A1L2RT7KBNK02K,5.0,1477440000
574625,6012,A36MLXQX9WPPW9,5.0,1475193600
574626,12118,A23DRCOMC2RIXF,1.0,1480896000


In [10]:
# df['asin'].map(asin_lookup)

In [11]:
user_list = df['user'].unique()

In [12]:
np.arange(len(user_list))

array([     0,      1,      2, ..., 416171, 416172, 416173])

In [13]:
user_lookup = dict(zip(np.arange(len(user_list)), user_list))

In [14]:
user_map = dict(zip(user_list, np.arange(len(user_list))))

In [15]:
user_map

{'A1Q6MUU0B2ZDQG': 0,
 'A3HO2SQDCZIE9S': 1,
 'A2EM03F99X3RJZ': 2,
 'A3Z74TDRGD0HU': 3,
 'A2UXFNW9RTL4VM': 4,
 'AXX5G4LFF12R6': 5,
 'A7GUKMOJT2NR6': 6,
 'A3FU4L59BHA9FY': 7,
 'A1AMNMIPQMXH9M': 8,
 'A3DMBDTA8VGWSX': 9,
 'A160DTI3H7VHLQ': 10,
 'A1H41DKPDPVA0R': 11,
 'A2BDI7THUMJ8V': 12,
 'AM7EBP5TRX7AC': 13,
 'A31FOVCS3WTWPT': 14,
 'AXUU8F9EM6U3E': 15,
 'A24B46V78ATNRP': 16,
 'ABUBKML2EONCG': 17,
 'A2UA6E1RVG3C1I': 18,
 'A1TRMJHEDGX0HF': 19,
 'A2TTJS62322SXW': 20,
 'AX2K33SNI3WHN': 21,
 'ALX99DYO827ZK': 22,
 'A3PVVQ9MHYFTV9': 23,
 'A22NEUQTKWQM98': 24,
 'A1TQQZ6NVDTPNL': 25,
 'A32E3RVLI6D4TM': 26,
 'A3KUYXBMJ8AVIX': 27,
 'A3TMPSQ7X4M9LO': 28,
 'AUEUNR2AQQ0SY': 29,
 'A2P5MRZ68JX8EE': 30,
 'A8VD1E2O6N2KO': 31,
 'A1Q3N7GU27KGMA': 32,
 'A3QEV9GSI4HPA5': 33,
 'A2FMDHT0HNA3WY': 34,
 'A2QPHVVXS9FUBS': 35,
 'AL63CNA6X6IX8': 36,
 'A2N6AACMA6WOMN': 37,
 'A35I4FD5EARKTS': 38,
 'A2N8V79LWVR8F2': 39,
 'A2R9R1DJ9RHXOX': 40,
 'A60EV0X26JNB3': 41,
 'A3CG9DJUY5F2UY': 42,
 'AEDOSTGV48XO9': 43,
 'A21BQWP17Y

In [16]:
df['user'] = df['user'].map(user_map)

In [17]:
df

Unnamed: 0,asin,user,rating,timestamp
0,0,0,2.0,1276560000
1,0,1,5.0,1262822400
2,0,2,5.0,1524009600
3,0,3,5.0,1524009600
4,0,4,5.0,1523923200
...,...,...,...,...
574623,6012,194479,5.0,1489968000
574624,6012,175357,5.0,1477440000
574625,6012,416172,5.0,1475193600
574626,12118,416173,1.0,1480896000


In [21]:
df['rating']=df['rating'].astype(np.int8)

In [22]:
df['asin']=df['asin'].astype(np.int32)

In [23]:
df['user']=df['user'].astype(np.int32)

In [24]:
df.dtypes

asin         int32
user         int32
rating        int8
timestamp    int64
dtype: object

In [25]:
df.drop('timestamp', axis=1, inplace=True)
df

Unnamed: 0,asin,user,rating
0,0,0,2
1,0,1,5
2,0,2,5
3,0,3,5
4,0,4,5
...,...,...,...
574623,6012,194479,5
574624,6012,175357,5
574625,6012,416172,5
574626,12118,416173,1


In [27]:
df.drop_duplicates(inplace=True)
df

Unnamed: 0,asin,user,rating
0,0,0,2
1,0,1,5
2,0,2,5
3,0,3,5
4,0,4,5
...,...,...,...
574623,6012,194479,5
574624,6012,175357,5
574625,6012,416172,5
574626,12118,416173,1


In [28]:
df.dtypes

asin      int32
user      int32
rating     int8
dtype: object

In [29]:
df['asin'].nunique()

12120

In [30]:
df['user'].nunique()

416174

In [31]:
# df.to_csv(r'data/Luxury_Beauty_reduced.csv', index=False)

## Data Modeling
Describe and justify the process for analyzing or modeling the data.

***
Questions to consider:
* How did you analyze or model the data?
* How did you iterate on your initial approach to make it better?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# Here you run your code to model the data


In [32]:
%pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 4.9MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp37-cp37m-linux_x86_64.whl size=1617601 sha256=0358eb5b23d056f1b6e1ea394cc940742b389547632d8c5e51c078ae7c26576a
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1


In [33]:
from surprise import Dataset, Reader
from surprise import accuracy
from surprise.prediction_algorithms import knns
from surprise.similarities import cosine, msd, pearson
from surprise.model_selection import cross_validate, train_test_split
from surprise.prediction_algorithms import SVD
from surprise.model_selection import GridSearchCV

In [34]:
data= df[['user', 'asin', 'rating']]
reader= Reader(line_format= 'user item rating', sep= ',')
data= Dataset.load_from_df(data, reader=reader)

In [35]:
trainset, testset= train_test_split(data, test_size=0.25, random_state=42)

In [36]:
testset

[(188795, 2024, 1.0),
 (311026, 4453, 5.0),
 (16548, 200, 5.0),
 (256627, 3212, 5.0),
 (263708, 3354, 5.0),
 (9734, 44, 5.0),
 (410889, 5538, 4.0),
 (39708, 271, 5.0),
 (124912, 3208, 5.0),
 (262784, 3327, 5.0),
 (200905, 2194, 5.0),
 (210733, 2376, 5.0),
 (203708, 2234, 5.0),
 (255719, 3208, 1.0),
 (13935, 2254, 1.0),
 (297383, 4088, 1.0),
 (531, 5552, 4.0),
 (207075, 2300, 5.0),
 (330737, 5080, 5.0),
 (340, 3868, 5.0),
 (42474, 272, 5.0),
 (267228, 3436, 1.0),
 (346456, 5603, 1.0),
 (290318, 3898, 5.0),
 (371574, 1604, 2.0),
 (21658, 129, 5.0),
 (111074, 1039, 5.0),
 (166514, 1769, 5.0),
 (356537, 5972, 5.0),
 (42651, 272, 3.0),
 (60984, 443, 5.0),
 (45118, 299, 5.0),
 (120528, 1233, 5.0),
 (273370, 3576, 3.0),
 (415031, 5911, 5.0),
 (280341, 3728, 5.0),
 (409937, 11159, 5.0),
 (408430, 5317, 5.0),
 (345435, 5544, 1.0),
 (175310, 3494, 5.0),
 (237293, 2839, 5.0),
 (154660, 4342, 5.0),
 (118028, 1194, 4.0),
 (53847, 379, 5.0),
 (234990, 2805, 5.0),
 (164117, 1752, 5.0),
 (63655, 478, 

## KNN Basic

In [37]:
KNN_model= knns.KNNBasic(sim_options={'name': 'cosine', 'user_based': False}).fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


In [38]:
cross_validate(KNN_model, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2615  1.2647  1.2633  1.2566  1.2644  1.2621  0.0030  
MAE (testset)     0.9386  0.9408  0.9391  0.9352  0.9401  0.9388  0.0019  
Fit time          6.37    6.52    6.29    6.48    6.26    6.39    0.10    
Test time         0.92    0.93    0.93    0.93    0.95    0.93    0.01    


{'fit_time': (6.366328239440918,
  6.5206944942474365,
  6.291171073913574,
  6.482555627822876,
  6.264395713806152),
 'test_mae': array([0.93862232, 0.94078812, 0.9391082 , 0.93520325, 0.94006197]),
 'test_rmse': array([1.26145607, 1.26467147, 1.26333156, 1.25661226, 1.26444157]),
 'test_time': (0.9239859580993652,
  0.9295411109924316,
  0.9254364967346191,
  0.9261338710784912,
  0.9491074085235596)}

In [39]:
KNN_model2= knns.KNNBasic(sim_options={'name': 'msd', 'user_based': False}).fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


In [40]:
cross_validate(KNN_model2, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2629  1.2654  1.2597  1.2589  1.2657  1.2625  0.0028  
MAE (testset)     0.9393  0.9400  0.9379  0.9378  0.9396  0.9389  0.0009  
Fit time          4.94    4.92    5.05    5.03    5.10    5.01    0.07    
Test time         0.92    0.92    0.91    0.93    0.92    0.92    0.01    


{'fit_time': (4.940540790557861,
  4.923975467681885,
  5.0526442527771,
  5.032916069030762,
  5.103760719299316),
 'test_mae': array([0.93931173, 0.9400327 , 0.93788005, 0.93776529, 0.93962971]),
 'test_rmse': array([1.26285026, 1.26537035, 1.25970674, 1.25887806, 1.26567356]),
 'test_time': (0.9221189022064209,
  0.9210367202758789,
  0.9116344451904297,
  0.9341559410095215,
  0.9156935214996338)}

In [41]:
KNN_model3= knns.KNNBasic(sim_options={'name': 'pearson', 'user_based': False}).fit(trainset)

Computing the pearson similarity matrix...
Done computing similarity matrix.


In [42]:
cross_validate(KNN_model3, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2608  1.2595  1.2603  1.2656  1.2552  1.2603  0.0033  
MAE (testset)     0.9578  0.9563  0.9559  0.9614  0.9544  0.9572  0.0024  
Fit time          7.87    7.57    7.46    7.43    7.46    7.56    0.16    
Test time         0.90    0.91    0.91    0.91    0.92    0.91    0.01    


{'fit_time': (7.870452404022217,
  7.5726964473724365,
  7.460343837738037,
  7.4263389110565186,
  7.45734977722168),
 'test_mae': array([0.9577557 , 0.95626984, 0.95593341, 0.9613597 , 0.95444794]),
 'test_rmse': array([1.26082074, 1.25945167, 1.26032358, 1.26560771, 1.25520061]),
 'test_time': (0.9031574726104736,
  0.9138782024383545,
  0.910822868347168,
  0.9138634204864502,
  0.924767255783081)}

In [43]:
KNN_model4= knns.KNNBasic(sim_options={'name': 'pearson_baseline', 'user_based': False}).fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [44]:
cross_validate(KNN_model4, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2603  1.2621  1.2540  1.2643  1.2558  1.2593  0.0039  
MAE (testset)     0.9503  0.9517  0.9467  0.9526  0.9483  0.9499  0.0022  
Fit time          7.77    7.66    7.43    6.65    6.56    7.21    0.51    
Test time         0.83    0.81    0.88    0.87    0.88    0.86    0.03    


{'fit_time': (7.771963596343994,
  7.655634164810181,
  7.427996397018433,
  6.652855157852173,
  6.559553146362305),
 'test_mae': array([0.95034818, 0.95172215, 0.94665873, 0.95260601, 0.94827692]),
 'test_rmse': array([1.26026515, 1.26211865, 1.25395268, 1.26433715, 1.25577056]),
 'test_time': (0.8348603248596191,
  0.8060896396636963,
  0.8804202079772949,
  0.8711402416229248,
  0.883711576461792)}

## KNN With Means

In [45]:
KNN_model= knns.KNNWithMeans(sim_options={'name': 'cosine', 'user_based': False}).fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


In [46]:
cross_validate(KNN_model, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2613  1.2627  1.2615  1.2533  1.2602  1.2598  0.0034  
MAE (testset)     0.9431  0.9446  0.9436  0.9393  0.9432  0.9428  0.0018  
Fit time          4.93    6.46    6.50    7.11    7.03    6.40    0.79    
Test time         1.00    1.01    1.00    0.98    0.98    0.99    0.01    


{'fit_time': (4.925355911254883,
  6.457956075668335,
  6.499936819076538,
  7.112385272979736,
  7.027133941650391),
 'test_mae': array([0.94310686, 0.94457577, 0.94359825, 0.93926055, 0.94324548]),
 'test_rmse': array([1.26129118, 1.2627159 , 1.26151527, 1.25326047, 1.26024412]),
 'test_time': (0.9963021278381348,
  1.0102460384368896,
  0.999931812286377,
  0.9765207767486572,
  0.9803135395050049)}

In [47]:
KNN_model2= knns.KNNWithMeans(sim_options={'name': 'msd', 'user_based': False}).fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


In [48]:
cross_validate(KNN_model2, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2564  1.2647  1.2580  1.2636  1.2619  1.2609  0.0032  
MAE (testset)     0.9403  0.9457  0.9406  0.9448  0.9449  0.9433  0.0023  
Fit time          5.41    5.39    5.44    5.28    5.55    5.42    0.09    
Test time         1.00    0.98    0.99    0.97    0.94    0.98    0.02    


{'fit_time': (5.40801477432251,
  5.394423246383667,
  5.444626808166504,
  5.280726432800293,
  5.55126953125),
 'test_mae': array([0.94030681, 0.94572333, 0.94063084, 0.94480617, 0.94490126]),
 'test_rmse': array([1.25643469, 1.26468233, 1.25804419, 1.26361476, 1.26193457]),
 'test_time': (1.0006492137908936,
  0.9804389476776123,
  0.9915900230407715,
  0.9654686450958252,
  0.9432570934295654)}

In [49]:
KNN_model3= knns.KNNWithMeans(sim_options={'name': 'pearson', 'user_based': False}).fit(trainset)

Computing the pearson similarity matrix...
Done computing similarity matrix.


In [50]:
cross_validate(KNN_model3, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2602  1.2632  1.2578  1.2586  1.2631  1.2606  0.0022  
MAE (testset)     0.9552  0.9567  0.9532  0.9537  0.9586  0.9555  0.0020  
Fit time          9.90    7.75    7.99    7.88    7.99    8.30    0.80    
Test time         0.93    0.80    0.92    0.92    0.92    0.90    0.05    


{'fit_time': (9.902523040771484,
  7.745898723602295,
  7.994913101196289,
  7.883168458938599,
  7.990584135055542),
 'test_mae': array([0.95519745, 0.95668861, 0.95315636, 0.95371527, 0.95859716]),
 'test_rmse': array([1.26019284, 1.26321081, 1.25784184, 1.25862082, 1.26306489]),
 'test_time': (0.9307336807250977,
  0.8015198707580566,
  0.9222090244293213,
  0.9159443378448486,
  0.9176650047302246)}

In [51]:
KNN_model4= knns.KNNWithMeans(sim_options={'name': 'pearson_baseline', 'user_based': False}).fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [52]:
cross_validate(KNN_model4, data, verbose= True, n_jobs=-1)

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2641  1.2597  1.2610  1.2567  1.2556  1.2594  0.0031  
MAE (testset)     0.9525  0.9503  0.9526  0.9485  0.9476  0.9503  0.0020  
Fit time          7.66    7.70    7.36    7.74    7.76    7.64    0.15    
Test time         0.84    0.83    0.83    0.83    0.82    0.83    0.01    


{'fit_time': (7.6642844676971436,
  7.701639890670776,
  7.3556365966796875,
  7.735780715942383,
  7.758908987045288),
 'test_mae': array([0.9524784 , 0.95030535, 0.95259456, 0.94849835, 0.9476131 ]),
 'test_rmse': array([1.26414569, 1.2597381 , 1.26095924, 1.25669364, 1.25555429]),
 'test_time': (0.8410019874572754,
  0.8272409439086914,
  0.8289005756378174,
  0.8319263458251953,
  0.8213996887207031)}

## SVD

In [53]:
svd = SVD()

In [54]:
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f02857b6650>

In [55]:
predictions= svd.test(testset)
accuracy.rmse(predictions)

RMSE: 1.2392


1.2392145851014515

In [56]:
accuracy.mae(predictions)

MAE:  0.9536


0.9535933319656562

In [57]:
param_grid = {'n_factors':[110, 130],'n_epochs': [25, 30], 'lr_all': [0.025, 0.05],
              'reg_all': [0.1, 0.2]}
gs_model = GridSearchCV(SVD,param_grid=param_grid,joblib_verbose=5, n_jobs=-1)
gs_model.fit(data)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=-1)]: Done  18 out of  80 | elapsed:  4.0min remaining: 13.7min
[Parallel(n_jobs=-1)]: Done  35 out of  80 | elapsed:  7.3min remaining:  9.4min
[Parallel(n_jobs=-1)]: Done  52 out of  80 | elapsed: 10.5min remaining:  5.6min
[Parallel(n_jobs=-1)]: Done  69 out of  80 | elapsed: 13.8min remaining:  2.2min
[Parallel(n_jobs=-1)]: Done  80 out of  80 | elapsed: 15.8min finished


In [58]:
gs_model.best_params

{'mae': {'lr_all': 0.05, 'n_epochs': 30, 'n_factors': 110, 'reg_all': 0.1},
 'rmse': {'lr_all': 0.025, 'n_epochs': 30, 'n_factors': 130, 'reg_all': 0.1}}

In [59]:
# use best params
svd = SVD(n_factors=130, n_epochs=30, lr_all=0.025, reg_all=0.1)
svd.fit(trainset)
predictions = svd.test(testset)
print(accuracy.rmse(predictions))

RMSE: 1.2237
1.223711651813143


In [60]:
print(accuracy.mae(predictions))

MAE:  0.9316
0.9315987932280569


In [61]:
param_grid = {'n_factors':[130, 150],'n_epochs': [30, 40], 'lr_all': [0.01, 0.025],
              'reg_all': [0.05, 0.1]}
gs_model = GridSearchCV(SVD,param_grid=param_grid,joblib_verbose=5, n_jobs=-1)
gs_model.fit(data)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=-1)]: Done  18 out of  80 | elapsed:  4.2min remaining: 14.6min
[Parallel(n_jobs=-1)]: Done  35 out of  80 | elapsed:  7.7min remaining:  9.9min
[Parallel(n_jobs=-1)]: Done  52 out of  80 | elapsed: 10.9min remaining:  5.9min
[Parallel(n_jobs=-1)]: Done  69 out of  80 | elapsed: 14.4min remaining:  2.3min
[Parallel(n_jobs=-1)]: Done  80 out of  80 | elapsed: 16.5min finished


In [62]:
gs_model.best_params

{'mae': {'lr_all': 0.025, 'n_epochs': 40, 'n_factors': 130, 'reg_all': 0.05},
 'rmse': {'lr_all': 0.025, 'n_epochs': 40, 'n_factors': 150, 'reg_all': 0.1}}

In [63]:
# use best params
svd = SVD(n_factors=150, n_epochs=40, lr_all=0.025, reg_all=0.1)
svd.fit(trainset)
predictions = svd.test(testset)
print(accuracy.rmse(predictions))
print(accuracy.mae(predictions))

RMSE: 1.2229
1.2229478827171454
MAE:  0.9290
0.928993616071036


In [64]:
param_grid = {'n_factors':[150, 200],'n_epochs': [40, 50], 'lr_all': [0.025],
              'reg_all': [0.1]}
gs_model = GridSearchCV(SVD,param_grid=param_grid,joblib_verbose=5, n_jobs=-1)
gs_model.fit(data)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=-1)]: Done   6 out of  20 | elapsed:  2.4min remaining:  5.6min
[Parallel(n_jobs=-1)]: Done  11 out of  20 | elapsed:  3.4min remaining:  2.8min
[Parallel(n_jobs=-1)]: Done  16 out of  20 | elapsed:  4.7min remaining:  1.2min
[Parallel(n_jobs=-1)]: Done  20 out of  20 | elapsed:  5.5min finished


In [65]:
gs_model.best_params

{'mae': {'lr_all': 0.025, 'n_epochs': 50, 'n_factors': 150, 'reg_all': 0.1},
 'rmse': {'lr_all': 0.025, 'n_epochs': 50, 'n_factors': 150, 'reg_all': 0.1}}

In [67]:
# use best params
svd = SVD(lr_all=0.025, n_epochs=50, n_factors=150, reg_all=0.1)
svd.fit(trainset)
predictions = svd.test(testset)
print(accuracy.rmse(predictions))
print(accuracy.mae(predictions))

RMSE: 1.2224
1.2224346375151383
MAE:  0.9264
0.9263962795321625


## Evaluation
Evaluate how well your work solves the stated business problem.

***
Questions to consider:
* How do you interpret the results?
* How well does your model fit your data? How much better is this than your baseline model?
* How confident are you that your results would generalize beyond the data you have?
* How confident are you that this model would benefit the business if put into use?
***

## Making Recommendations

In [249]:
meta_df = pd.read_json('data/meta_Luxury_Beauty.json.gz', lines=True)
meta_df

Unnamed: 0,category,tech1,description,fit,title,also_buy,tech2,brand,feature,rank,also_view,details,main_cat,similar_item,date,price,asin,imageURL,imageURLHighRes
0,[],,"[After a long day of handling thorny situations, our new hand therapy pump is just the help you ...",,Crabtree &amp; Evelyn - Gardener's Ultra-Moisturising Hand Therapy Pump - 250g/8.8 OZ,"[B00GHX7H0A, B00FRERO7G, B00R68QXCS, B000Z65AZE, B07GFHJRMX, B074KGBGL7, B00R68QXJG, B00025WYZC,...",,,[],"4,324 in Beauty & Personal Care (","[B00FRERO7G, B00GHX7H0A, B07GFHJRMX, B00TJ3NBN2, B00KOBT82G, B00R68QXCS, B074KGBGL7, B075MH4Q9L,...","{'  Product Dimensions: ': '2.2 x 2.2 x 7 inches ; 8.8 ounces', 'Shipping Weight:': '14....",Luxury Beauty,,NaT,$30.00,B00004U9V2,"[https://images-na.ssl-images-amazon.com/images/I/41ClX6BRvZL._SX50_SY65_CR,0,0,50,65_.jpg, http...","[https://images-na.ssl-images-amazon.com/images/I/41ClX6BRvZL.jpg, https://images-na.ssl-images-..."
1,[],,"[If you haven't experienced the pleasures of bathing in the Dead Sea, Bath Crystals are the next...",,AHAVA Bath Salts,[],,,[],"1,633,549 in Beauty & Personal Care (",[],"{'  Product Dimensions: ': '3 x 3.5 x 6 inches ; 2.2 pounds', 'Shipping Weight:': '2.6 p...",Luxury Beauty,,NaT,,B0000531EN,[],[]
2,[],,"[Rich, black mineral mud, harvested from the banks of the Dead Sea, is comprised of layer upon l...",,"AHAVA Dead Sea Mineral Mud, 8.5 oz, Pack of 4",[],,,[],"1,806,710 in Beauty &amp; Personal Care (",[],"{'  Product Dimensions: ': '5.1 x 3 x 5.5 inches ; 2.48 pounds', 'Shipping Weight:': '2....",Luxury Beauty,,NaT,,B0000532JH,"[https://images-na.ssl-images-amazon.com/images/I/41O1luEZuHL._SX50_SY65_CR,0,0,50,65_.jpg]",[https://images-na.ssl-images-amazon.com/images/I/41O1luEZuHL.jpg]
3,[],,[This liquid soap with convenient pump dispenser is formulated with conditioning extracts of sag...,,"Crabtree &amp; Evelyn Hand Soap, Gardeners, 10.1 fl. oz.",[],,,[],[],"[B00004U9V2, B00GHX7H0A, B00FRERO7G, B00R68QXCS, B00KOBT82G, B071G8FG2N, B07FYFXBK8, B00TJ3NBN2,...","{'  Product Dimensions: ': '2.6 x 2.6 x 6.7 inches ; 1.5 pounds', 'Shipping Weight:': '1...",Luxury Beauty,,NaT,$15.99,B00005A77F,"[https://images-na.ssl-images-amazon.com/images/I/31BBeRbXZsL._SX50_SY65_CR,0,0,50,65_.jpg, http...","[https://images-na.ssl-images-amazon.com/images/I/31BBeRbXZsL.jpg, https://images-na.ssl-images-..."
4,[],,"[Remember why you love your favorite blanket? The soft, comforting feeling of wrapping it around...",,Soy Milk Hand Crme,"[B000NZT6KM, B001BY229Q, B008J724QY, B0009YGKJ2, B001JB55SQ, B000M3OR7C, B00J0A3ZCQ, B00SKBJ4L2,...",,,[],"42,464 in Beauty &amp; Personal Care (",[],"{'  Product Dimensions: ': '7.2 x 2.2 x 7.2 inches ; 4 ounces', 'Shipping Weight:': '7.2...",Luxury Beauty,,NaT,$18.00,B00005NDTD,"[https://images-na.ssl-images-amazon.com/images/I/31agMAVCHtL._SX50_SY65_CR,0,0,50,65_.jpg, http...","[https://images-na.ssl-images-amazon.com/images/I/31agMAVCHtL.jpg, https://images-na.ssl-images-..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12294,[],,"[, CND Craft Culture Collection: Patina Buckle, Discover the beauty of artisanal design. Distres...",,"CND Shellac Power Polish, Patina Buckle","[B003ONLAXQ, B00YDEZ9T6, B074KHRD13, B00R3PZK14, B074KJZJYW, B01KTKO4CU, B01MT91G4R, B00DP64TLM,...",,,[],"88,740 in Beauty & Personal Care (","[B00D2VMUA2, B074KJZJYW, B074KHRD13, B073SB9JWB, B00R3PZK14, B0721YJ13B, B01KTKO4CU, B00EFGDYZS,...","{'  Item Weight: ': '0.48 ounces', 'Shipping Weight:': '1.4 ounces (', 'Domestic Shippin...",Luxury Beauty,,NaT,$15.95,B01HIQIEYC,[],[]
12295,[],,"[CND Shellac was designed to be used as a system. Featuring a Base Coat, Color Coat, and Top Coa...",,CND Shellac power polish denim patch,"[B003ONLAXQ, B003OH0KBA, B004LEMWGG, B01MT91G4R, B00AAV7H14, B074KBT2NM, B004N2SQUC, B00DP64TLM,...",,,[],"122,331 in Beauty & Personal Care (","[B00D2VMUA2, B01L0EV8X2, B004LEMWGG, B00EFGDYZS, B074KHRD13, B00R3PZK14, B074KJZJYW, B074KBT2NM,...","{'Shipping Weight:': '1.4 ounces (', 'ASIN:': 'B01HIQHQU0', 'Item model number:': 'C40625'}",Luxury Beauty,,NaT,$15.95,B01HIQHQU0,[],[]
12296,[],,"[CND Shellac was designed to be used as a system. Featuring a Base Coat, Color Coat, and Top Coa...",,"CND Shellac, Leather Satchel","[B003ONLAXQ, B003OH0KBA, B004LEMWGG, B01MT91G4R, B00AAV7H14, B074KBT2NM, B004N2SQUC, B00DP64TLM,...",,,[],"168,028 in Beauty & Personal Care (","[B00D2VMUA2, B01L0EV8X2, B004LEMWGG, B00EFGDYZS, B074KHRD13, B00R3PZK14, B074KJZJYW, B074KBT2NM,...","{'Shipping Weight:': '1.4 ounces (', 'Domestic Shipping: ': 'Item can be shipped within U.S.', '...",Luxury Beauty,,NaT,$15.95,B01HIQEOLO,"[https://images-na.ssl-images-amazon.com/images/I/41epzK1J%2BXL._SX50_SY65_CR,0,0,50,65_.jpg]",[https://images-na.ssl-images-amazon.com/images/I/41epzK1J%2BXL.jpg]
12297,[],,[The I AM JUICY COUTURE girl is once again taking a strong stance by declaring her love for the ...,,"Juicy Couture I Love Juicy Couture, 1.7 fl. Oz., perfume for women",[],,,[],"490,755 in Beauty & Personal Care (","[B0757439SY, B01HJ2UY1G, B01KX3TK7C, B01LX71LJV, B07K1Y92VL, B07GBSC3L2, B00ZCFJE7I, B076LKLB5G,...","{'  Product Dimensions: ': '3.3 x 2.7 x 4.6 inches', 'Shipping Weight:': '8 ounces (', '...",Luxury Beauty,,NaT,$76.00,B01HJ2UY0W,"[https://images-na.ssl-images-amazon.com/images/I/51vValOSv9L._SX50_SY65_CR,0,0,50,65_.jpg, http...","[https://images-na.ssl-images-amazon.com/images/I/51vValOSv9L.jpg, https://images-na.ssl-images-..."


In [250]:
pd.options.display.max_colwidth = 100

In [251]:
meta_df = meta_df[['asin', 'title']]
meta_df

Unnamed: 0,asin,title
0,B00004U9V2,Crabtree &amp; Evelyn - Gardener's Ultra-Moisturising Hand Therapy Pump - 250g/8.8 OZ
1,B0000531EN,AHAVA Bath Salts
2,B0000532JH,"AHAVA Dead Sea Mineral Mud, 8.5 oz, Pack of 4"
3,B00005A77F,"Crabtree &amp; Evelyn Hand Soap, Gardeners, 10.1 fl. oz."
4,B00005NDTD,Soy Milk Hand Crme
...,...,...
12294,B01HIQIEYC,"CND Shellac Power Polish, Patina Buckle"
12295,B01HIQHQU0,CND Shellac power polish denim patch
12296,B01HIQEOLO,"CND Shellac, Leather Satchel"
12297,B01HJ2UY0W,"Juicy Couture I Love Juicy Couture, 1.7 fl. Oz., perfume for women"


In [252]:
meta_df[meta_df['title'].str.contains("medicated", case=False, na=False)].head(200)

Unnamed: 0,asin,title
8579,B00LO1DNXU,"La Roche-Posay Effaclar Medicated Gel Acne Cleanser, 6.76 Fl. Oz."
8591,B00LRWAZU0,"IMAGE Skincare Clear Cell Medicated Acne Lotion, 1.7 oz."
9639,B00T57UPNQ,"IMAGE Skincare Clear Cell Medicated Acne Masque, 2 oz."


In [290]:
meta_df[meta_df['title'].str.contains("crabtree", case=False, na=False)].head(60)

Unnamed: 0,asin,title
0,B00004U9V2,Crabtree &amp; Evelyn - Gardener's Ultra-Moisturising Hand Therapy Pump - 250g/8.8 OZ
3,B00005A77F,"Crabtree &amp; Evelyn Hand Soap, Gardeners, 10.1 fl. oz."
126,B00025WYZC,"Crabtree &amp; Evelyn Body Lotion, Rosewater, 16.9 fl oz"
188,B00004U9V2,Crabtree &amp; Evelyn - Gardener's Ultra-Moisturising Hand Therapy Pump - 250g/8.8 OZ
191,B00005A77F,"Crabtree &amp; Evelyn Hand Soap, Gardeners, 10.1 fl. oz."
314,B00025WYZC,"Crabtree &amp; Evelyn Body Lotion, Rosewater, 16.9 fl oz"
1146,B000MNJMDG,"Crabtree &amp; Evelyn Bath and Shower Gel, 16.9, fl. oz."
1333,B000Q2Y0QM,"Crabtree &amp; Evelyn Bath and Shower Gel, 8.5 fl. oz."
1334,B000Q2Y0QC,"Crabtree &amp; Evelyn Conditioning Bath and Massage Oil, Jojoba Oil, 6.8 fl. oz."
1336,B000Q2ZPL6,Crabtree &amp; Evelyn Triple Milled Soap Set


In [139]:
meta_df[meta_df['title'].str.contains("toleriane", case=False, na=False)].head(60)

Unnamed: 0,asin,title
966,B000IO6NFE,La Roche-Posay Toleriane Fluide Soothing Protective Moisturizer
969,B000IOBEQ2,"La Roche-Posay Toleriane Soothing Protective Moisturizer, 1.35 Fl. Oz."
971,B000IOBEG2,"La Roche-Posay Toleriane Dermo Cleanser and Makeup Remover, 6.76 Fl. Oz."
972,B000IOBETE,"La Roche-Posay Toleriane Purifying Foaming Cream Cleanser, 4.22 Fl. Oz."
2234,B001AMBAIS,"La Roche-Posay Toleriane Riche Soothing Protective Moisturizer, 1.35 Fl. Oz."
2697,B001NZ1OWO,"La Roche-Posay Toleriane Teint Color Correcting Concealer Pen, 0.35 Fl. Oz."
3876,B004JKNYL4,"La Roche-Posay Toleriane Ultra Intense Soothing Moisturizer, 1.35-Fluid Ounce"
5881,B00B95PWYE,"La Roche-Posay Toleriane Teint Mattifying Mousse Foundation, 1 Fl. Oz."
5882,B00B95PWFS,"La Roche-Posay Toleriane Teint Water-Cream Liquid Foundation, 1 Fl. Oz."
9943,B00VGCXN0U,"La Roche-Posay Toleriane Ultra Soothing Eye Cream for Very Sensitive Eyes, 0.66 Fl. Oz."


In [220]:
meta_df[meta_df['title'].str.contains(" men", case=False, na=False)].head(60)

Unnamed: 0,asin,title
67,B0001XDUBC,"Anthony Logistics for Men Facial Moisturizer SPF 15, 2.5 oz."
68,B0001XDU94,"Anthony Logistics for Men Body Cleansing Gel Citrus, 8 fl. oz."
80,B00021AKJI,"Calvin Klein ETERNITY for Men Eau de Toilette, 3.4 fl. oz."
112,B00021E4AE,"Anthony Logistics for Men Body Cleansing Gel, Citrus, 8 fl. oz."
113,B00021E4J0,"Anthony Logistics for Men Shave Gel, 6 Oz"
122,B0002279MS,Anthony Logistics for Men Pre-shave Oil
134,B0002CHLW6,Anthony Logistics for Men Everyday Shampoo
137,B0002COKF2,Anthiony Logistics for Men Vitamin C Facial Serum (Anti-Aging)
147,B0002MES86,Lacoste Style in Play Eau de Toilette for Men
255,B0001XDUBC,"Anthony Logistics for Men Facial Moisturizer SPF 15, 2.5 oz."


In [None]:
B000NHZSKC     Paul Mitchell Tea Tree Lemon Sage Thickening Shampoo and Conditioner Set, 33.8 oz
B00LO1DNXU     La Roche-Posay Effaclar Medicated Gel Acne Cleanser, 6.76 Fl. Oz.
B000IOBEQ2     La Roche-Posay Toleriane Soothing Protective Moisturizer, 1.35 Fl. Oz.
B000C2148S     Boss In Motion By Hugo Boss For Men. Eau De Toilette Spray 3 Ounces

In [146]:
df['user'].sort_values().tail()

574619    416169
574620    416170
574621    416171
574625    416172
574626    416173
Name: user, dtype: int32

In [221]:
display(asin_map['B000NHZSKC'], asin_map['B00LO1DNXU'], asin_map['B000IOBEQ2'],
        asin_map['B000C2148S'])

655

4512

521

333

In [222]:
my_ratings= [{'user': 416174, 'asin': 655, 'rating': '5'},
             {'user': 416174, 'asin': 4512, 'rating': '5'},
             {'user': 416174, 'asin': 521, 'rating': '5'},
             {'user': 416174, 'asin': 333, 'rating': '5'}]

In [223]:
my_ratings

[{'user': 416174, 'asin': 655, 'rating': '5'},
 {'user': 416174, 'asin': 4512, 'rating': '5'},
 {'user': 416174, 'asin': 521, 'rating': '5'},
 {'user': 416174, 'asin': 333, 'rating': '5'}]

In [224]:
## add the new ratings to the original ratings DataFrame
new_ratings_df = df.append(my_ratings,ignore_index=True)
new_data = Dataset.load_from_df(new_ratings_df,reader)

In [225]:
# train a model using the new combined DataFrame
svd_ = SVD(lr_all=0.025, n_epochs=50, n_factors=150, reg_all=0.1)
svd_.fit(new_data.build_full_trainset())

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7ff4243b40d0>

In [226]:
# make predictions for the user
# you'll probably want to create a list of tuples in the format (movie_id, predicted_score)
list_of_products = []
for asin in df['asin'].unique():
    list_of_products.append((asin, svd_.predict(416174, asin)[3]))

In [227]:
list_of_products

[(0, 2.88702111462112),
 (1, 4.474598631227732),
 (2, 4.528632141620819),
 (3, 4.512569996045508),
 (4, 4.460050778943761),
 (5, 4.47965898194238),
 (6, 4.521096347185239),
 (7, 4.511488108443416),
 (8, 4.572431055019959),
 (9, 4.474871598879756),
 (10, 4.540256047921031),
 (11, 4.4031921525345785),
 (12, 4.4271261873419565),
 (13, 4.418681918070362),
 (14, 4.610899742688031),
 (15, 3.9318564602918196),
 (16, 4.401784359562941),
 (17, 4.471199591576272),
 (18, 4.473700188050436),
 (19, 2.323060856955598),
 (20, 4.423604395356111),
 (21, 4.443387492719955),
 (22, 4.435886607703005),
 (23, 1.9673420988940227),
 (24, 4.472534492239102),
 (25, 4.443556319521219),
 (26, 4.506469878454968),
 (27, 4.411905066015681),
 (28, 3.4763835624150063),
 (29, 3.979636190110112),
 (30, 2.4450361501872075),
 (31, 4.476649039741968),
 (32, 4.491545500958249),
 (33, 4.434640276273312),
 (34, 4.531824887206712),
 (35, 4.477191084842422),
 (36, 4.53126658783703),
 (37, 4.515007382313729),
 (38, 3.95650116890

In [228]:
# order the predictions from highest to lowest rated
ranked_products = sorted(list_of_products, key=lambda x:x[1], reverse=True)

In [229]:
ranked_products

[(10265, 5),
 (7532, 4.960541684387153),
 (9387, 4.946481021947744),
 (10353, 4.928406593969814),
 (1867, 4.924513287920506),
 (10629, 4.92184662611201),
 (9680, 4.918898415301735),
 (7339, 4.905734799247285),
 (8359, 4.890674673301468),
 (10326, 4.888849214635034),
 (9660, 4.885623251775405),
 (10385, 4.883316712949011),
 (4155, 4.878752347763556),
 (5247, 4.877364878374815),
 (10250, 4.876300820849254),
 (10503, 4.87404844004325),
 (9817, 4.864740764359),
 (1078, 4.864324576672627),
 (798, 4.857457130900409),
 (7503, 4.855032894497705),
 (9334, 4.854882751754088),
 (673, 4.854104105388437),
 (10270, 4.84336502383907),
 (10283, 4.842370076012857),
 (3122, 4.839704174832726),
 (11042, 4.839193530167339),
 (10225, 4.839182903620607),
 (9740, 4.839048523130962),
 (10246, 4.838417712982168),
 (7396, 4.838098710759246),
 (642, 4.83659672422877),
 (9938, 4.836054988440434),
 (10424, 4.83568047070994),
 (7331, 4.833298358714615),
 (8463, 4.831935230986618),
 (3281, 4.831857903779585),
 (716,

In [230]:
ranked_df = pd.DataFrame(ranked_products, columns=['asin_map', 'rating'])
ranked_df

Unnamed: 0,asin_map,rating
0,10265,5.000000
1,7532,4.960542
2,9387,4.946481
3,10353,4.928407
4,1867,4.924513
...,...,...
12115,283,2.084241
12116,159,2.054789
12117,23,1.967342
12118,9269,1.931810


In [231]:
asin_df = pd.DataFrame(asin_lookup.items(), columns=['asin_map', 'asin'])
asin_df

Unnamed: 0,asin_map,asin
0,0,B00004U9V2
1,1,B00005A77F
2,2,B00005NDTD
3,3,B00005V50C
4,4,B00005V50B
...,...,...
12115,12115,B01HHGDG82
12116,12116,B01HIIO7Q4
12117,12117,B01HIQCSBC
12118,12118,B01HJ2UY0W


In [232]:
merged_df = ranked_df.merge(asin_df, how='inner', on='asin_map')
merged_df

Unnamed: 0,asin_map,rating,asin
0,10265,5.000000,B00Q5GIUUU
1,7532,4.960542,B005V1PAFS
2,9387,4.946481,B00J0C1IOG
3,10353,4.928407,B00RXCJV4O
4,1867,4.924513,B002HMOWHQ
...,...,...,...
12115,283,2.084241,B000AZUGU4
12116,159,2.054789,B0006Q3P50
12117,23,1.967342,B00014GT8W
12118,9269,1.931810,B00IG4ZU4I


In [233]:
lookup_df = merged_df.merge(meta_df, how='inner', on='asin')
lookup_df.head(20)

Unnamed: 0,asin_map,rating,asin,title
0,10265,5.0,B00Q5GIUUU,"SKIN &amp; CO Women's Umbrian Truffle Body Oil, 4 fl. oz."
1,7532,4.960542,B005V1PAFS,"Indie Lee Lemongrass Citrus Body Wash, 8 fl. oz."
2,9387,4.946481,B00J0C1IOG,"Alfaparf Milano Alfaparf Semi Di Lino Discipline Frizz Control Butter Mask, 17.28 Fl Oz"
3,10353,4.928407,B00RXCJV4O,BABOR SPA Shaping for Body Feet Smoothing Balm for Feet 4.25 oz &ndash; Best Natural Foot Cream ...
4,1867,4.924513,B002HMOWHQ,"KAPLAN MD Lip 20 Moisture Therapy plus Sunscreen SPF 20- Ruby, 0.11 oz."
5,10629,4.921847,B00VNFMAAY,"StriVectin Ageless Essentials Collection, 5.76 oz."
6,9680,4.918898,B00KO6FOHY,Supergoop! City &amp; Sand Sunscreen Travel Tote
7,7339,4.905735,B00547HRWS,"The Art of Shaving Pre-Shave Oil, Lemon, 1 oz"
8,8359,4.890675,B00C1G44Q2,Malibu C Scalp Therapy Treatment
9,10326,4.888849,B00R69ZE4A,Georgie Beauty Deux Coeurs Luxury Edition Bridal Lash Compact


## Conclusions
Provide your conclusions about the work you've done, including any limitations or next steps.

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***