<img src="../images/cs_logo_pink.png" style="float: left; margin: 36px 20px 0 0; height: 60px">

# Capstone Project - Cos Skin <br><i style = "font-size:16px">Your skin but better</i>

## Notebook 6: Modelling
Notebook 1: Introduction & Data Collection Part 1 of 3<br>
Notebook 2: Data Collection Part 2 of 3<br>
Notebook 3: Data Collection Part 3 of 3<br>
Notebook 4: EDA & Data Cleaning<br>
Notebook 5: Preprocessing<br>
<b>Notebook 6: Modelling<br></b>
Notebook 7: Streamlit 

In [1]:
# import libraries

import pandas as pd
import numpy as np 
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
import pickle

In [2]:
# import datasets
dummified_cleanser = pd.read_csv('../data/dummified_cleanser.csv', index_col = 0)
dummified_toner = pd.read_csv('../data/dummified_toner.csv', index_col = 0)
dummified_day_moisturizer = pd.read_csv('../data/dummified_day_moisturizer.csv', index_col = 0)
dummified_night_cream = pd.read_csv('../data/dummified_night_cream.csv', index_col = 0)
dummified_sunscreen = pd.read_csv('../data/dummified_sunscreen.csv', index_col = 0)

In [12]:
# Set unique id as index
def set_unique_id_index(df):
    df_dummified = df.set_index(keys = 'unique_id')
    return df_dummified

In [14]:
dummified_cleanser = set_unique_id_index(dummified_cleanser)
dummified_toner = set_unique_id_index(dummified_toner)
dummified_day_moisturizer = set_unique_id_index(dummified_day_moisturizer)
dummified_night_cream = set_unique_id_index(dummified_night_cream)
dummified_sunscreen = set_unique_id_index(dummified_sunscreen)

## Profile-Based Recommendations

In this segment, I will use content-based filtering to generate recommendations for _new users_ using profile-based recommendations. <br>
This is because profile-based recommendations are immune to the cold-start problem. 

In [15]:
# Cleanser dataset vector
print(dummified_cleanser.shape)
dummified_cleanser.head()

(269, 36)


Unnamed: 0_level_0,rating,Combination,Dry,Normal,Oily,Sensitive,Ageing,Blackheads,Blemishes,DarkCircles,...,Liquid,Lotion,Oil,Powder,Wipe,20s,30s,40s,50+,Under20
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
FRESH-Soy Face Cleanser,4.4,1,1,1,1,0,1,0,0,0,...,0,0,0,0,0,1,1,1,1,1
THE ORDINARY-Squalane Cleanser,4.0,1,1,1,1,0,0,0,0,0,...,1,0,0,0,0,1,1,1,1,1
SEPHORA COLLECTION-Cleansing Face Wipe,4.1,1,1,1,1,1,0,0,0,0,...,0,0,0,0,1,1,1,1,1,1
FARMACY-Green Clean Makeup Meltaway Cleansing Balm,4.7,1,1,1,1,0,0,0,0,0,...,0,0,0,0,0,1,1,1,1,1
SEPHORA COLLECTION-Triple Action Cleansing Micellar Water,4.4,1,1,1,1,1,0,0,0,0,...,1,0,0,0,0,1,1,1,1,1


### Get User Recommendation

In [16]:
# Create a new vector for a 'new user' by initializing all features in the dataset above
cleanser_cols = dummified_cleanser.columns # match feature space for new user with the existing dataset
cleanser_user_profile = pd.Series(data=np.zeros(len(cleanser_cols)), index=dummified_cleanser.columns) # initialize 0s for all categories to create new user vector
cleanser_user_profile

rating                    0.0
Combination               0.0
Dry                       0.0
Normal                    0.0
Oily                      0.0
Sensitive                 0.0
Ageing                    0.0
Blackheads                0.0
Blemishes                 0.0
DarkCircles               0.0
Dryness                   0.0
Dullness                  0.0
FineLines&Wrinkles        0.0
Firmness&Elasticity       0.0
Oiliness                  0.0
Pigmentation&DarkSpots    0.0
Puffiness                 0.0
UnevenSkinTexture         0.0
UnevenSkinTone            0.0
VisiblePores              0.0
Balm                      0.0
Bar                       0.0
ClayMud                   0.0
Cream                     0.0
Foam                      0.0
Gel                       0.0
Liquid                    0.0
Lotion                    0.0
Oil                       0.0
Powder                    0.0
Wipe                      0.0
20s                       0.0
30s                       0.0
40s       

In [23]:
# assign input values to categories to test recommendations
cleanser_user_profile['20s'] = 1
cleanser_user_profile['Oily'] = 1
cleanser_user_profile['Cream'] = 2
cleanser_user_profile['DarkCircles'] = 3
cleanser_user_profile['rating'] = 1.5

In [24]:
# Cosine similiarity
# Get the dot product between dummified dataset vector and user_profile
cleanser_recommendation = np.dot(dummified_cleanser.values, cleanser_user_profile.values)
# Convert results to pandas Series 
cleanser_recommendation = pd.Series(cleanser_recommendation, index = dummified_cleanser.index)
# Get the top 10 recommendations for a new user
cleanser_recommendation.sort_values(ascending = False).head(10)

unique_id
DIOR-Prestige La Mousse Micellaire Face Cleanser        13.50
SKIN INC-Pure Serum Infused O2 Cleanser                 13.50
LAURA MERCIER-Skin Essentials Collection                13.50
INNISFREE-Green Tea Hydration Duo                       12.75
INNISFREE-Volcanic Pore Cleansing Foam EX               12.75
LAURA MERCIER-Balancing Foaming Cleanser                12.60
FOREO-Micro-Foam Cleanser                               12.60
EVE LOM-Foaming Cream Cleanser                          12.60
EVE LOM-Foaming Cream Cleanser                          12.60
CLINIQUE-All About Clean™ Rinse-Off Foaming Cleanser    12.60
dtype: float64

In [25]:
# Create a new vector for a 'new user' by initializing all features in the dataset above
toner_cols = dummified_toner.columns # match feature space for new user with the existing dataset
toner_user_profile = pd.Series(data=np.zeros(len(toner_cols)), index=dummified_toner.columns) # initialize 0s for all categories to create new user vector
toner_user_profile

rating                    0.0
Combination               0.0
Dry                       0.0
Normal                    0.0
Oily                      0.0
Sensitive                 0.0
Ageing                    0.0
Blackheads                0.0
Blemishes                 0.0
DarkCircles               0.0
Dryness                   0.0
Dullness                  0.0
FineLines&Wrinkles        0.0
Firmness&Elasticity       0.0
Oiliness                  0.0
Pigmentation&DarkSpots    0.0
Puffiness                 0.0
UnevenSkinTexture         0.0
UnevenSkinTone            0.0
VisiblePores              0.0
Cream                     0.0
Gel                       0.0
Liquid                    0.0
Lotion                    0.0
Sheet                     0.0
Spray                     0.0
Wipe                      0.0
20s                       0.0
30s                       0.0
40s                       0.0
50+                       0.0
Under20                   0.0
dtype: float64

In [26]:
# assign input values to categories to test recommendations
toner_user_profile['20s'] = 1
toner_user_profile['Oily'] = 1
toner_user_profile['Cream'] = 2
toner_user_profile['DarkCircles'] = 3
toner_user_profile['rating'] = 1.5

In [27]:
# Cosine similiarity
# Get the dot product between dummified dataset vector and user_profile
toner_recommendation = np.dot(dummified_toner.values, toner_user_profile.values)
# Convert results to pandas Series 
toner_recommendation = pd.Series(toner_recommendation, index = dummified_toner.index)
# Get the top 10 recommendations for a new user
toner_recommendation.sort_values(ascending = False).head(10)

unique_id
ELEMIS-Dynamic Resurfacing Pads                                                            11.45
FRESH-Best Of Beauty Bundle (Christmas Limited Edition)                                    11.20
LANEIGE-Cream Skin Refiner                                                                 10.75
FRESH-Rose Deep Hydration Essentials (Christmas Limited Edition)                           10.75
THE INKEY LIST-Glycolic Acid Toner                                                         10.60
ORIGINS-Mega-Mushroom Magic Ultimate Skin Relief Collection (Christmas Limited Edition)     9.50
LAB SERIES-Daily Rescue Water Lotion                                                        9.50
SHISEIDO-Revitalizing Treatment Softener Lotion                                             9.50
INNISFREE-Jeju Volcanic Pore Toner 2X                                                       9.50
FRESH-Kombucha Facial Treatment Essence Oamul Lu (Limited Edition)                          9.50
dtype: float64

The attributes for each category will form the basis of the recommendation system. This way, I am able to avoid running into cold-start issues and having to wait for more explit data from a new user before giving any recommendations. 

With this, I now have all the building blocks to build a web app which will be done in the next notebook!