<img src="../images/cs_logo_pink.PNG" style="float: left; margin: 36px 20px 0 0; height: 60px">

# Capstone Project - Cos Skin <br><i style = "font-size:16px">Your skin but better</i>

## Notebook 6: Modelling
Notebook 1: Introduction & Data Collection Part 1 of 3<br>
Notebook 2: Data Collection Part 2 of 3<br>
Notebook 3: Data Collection Part 3 of 3<br>
Notebook 4: EDA & Data Cleaning<br>
Notebook 5: Pre-processing<br>
<b>Notebook 6: Modeling<br></b>
Notebook 7: App Deployment

In [1]:
# import libraries

import pandas as pd
import numpy as np 
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
import pickle

In [2]:
# import datasets
dummified_cleanser = pd.read_csv('../data/dummified_cleanser.csv', index_col = 0)
dummified_toner = pd.read_csv('../data/dummified_toner.csv', index_col = 0)
dummified_day_moisturizer = pd.read_csv('../data/dummified_day_moisturizer.csv', index_col = 0)
dummified_night_cream = pd.read_csv('../data/dummified_night_cream.csv', index_col = 0)
dummified_sunscreen = pd.read_csv('../data/dummified_sunscreen.csv', index_col = 0)

In [3]:
# Set unique id as index
def set_unique_id_index(df):
    df_dummified = df.set_index(keys = 'unique_id')
    return df_dummified

In [4]:
# Set unique id as index
dummified_cleanser = set_unique_id_index(dummified_cleanser)
dummified_toner = set_unique_id_index(dummified_toner)
dummified_day_moisturizer = set_unique_id_index(dummified_day_moisturizer)
dummified_night_cream = set_unique_id_index(dummified_night_cream)
dummified_sunscreen = set_unique_id_index(dummified_sunscreen)

## Profile-Based Recommendations

In this segment, I will use content-based filtering to generate recommendations for _new users_ using profile-based recommendations. <br>
This is because profile-based recommendations are immune to the cold-start problem. 

In [5]:
# Cleanser dataset vector
print(dummified_cleanser.shape)
dummified_cleanser.head()

(265, 36)


Unnamed: 0_level_0,rating,Combination,Dry,Normal,Oily,Sensitive,Ageing,Blackheads,Blemishes,DarkCircles,...,Liquid,Lotion,Oil,Powder,Wipe,20s,30s,40s,50+,Under20
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
FRESH-Soy Face Cleanser,4.4,1,1,1,1,0,1,0,0,0,...,0,0,0,0,0,1,1,1,1,1
THE ORDINARY-Squalane Cleanser,4.0,1,1,1,1,0,0,0,0,0,...,1,0,0,0,0,1,1,1,1,1
SEPHORA COLLECTION-Cleansing Face Wipe,4.1,1,1,1,1,1,0,0,0,0,...,0,0,0,0,1,1,1,1,1,1
FARMACY-Green Clean Makeup Meltaway Cleansing Balm,4.7,1,1,1,1,0,0,0,0,0,...,0,0,0,0,0,1,1,1,1,1
SEPHORA COLLECTION-Triple Action Cleansing Micellar Water,4.4,1,1,1,1,1,0,0,0,0,...,1,0,0,0,0,1,1,1,1,1


### Get User Recommendations

#### Cleanser Recommendation

In [6]:
# Create a new vector for a 'new user' by initializing all features in the dataset above
cleanser_cols = dummified_cleanser.columns # match feature space for new user with the existing dataset
cleanser_user_profile = pd.Series(data=np.zeros(len(cleanser_cols)), index=dummified_cleanser.columns) # initialize 0s for all categories to create new user vector
cleanser_user_profile

rating                    0.0
Combination               0.0
Dry                       0.0
Normal                    0.0
Oily                      0.0
Sensitive                 0.0
Ageing                    0.0
Blackheads                0.0
Blemishes                 0.0
DarkCircles               0.0
Dryness                   0.0
Dullness                  0.0
FineLines&Wrinkles        0.0
Firmness&Elasticity       0.0
Oiliness                  0.0
Pigmentation&DarkSpots    0.0
Puffiness                 0.0
UnevenSkinTexture         0.0
UnevenSkinTone            0.0
VisiblePores              0.0
Balm                      0.0
Bar                       0.0
ClayMud                   0.0
Cream                     0.0
Foam                      0.0
Gel                       0.0
Liquid                    0.0
Lotion                    0.0
Oil                       0.0
Powder                    0.0
Wipe                      0.0
20s                       0.0
30s                       0.0
40s       

In [7]:
# assign input values to categories to test recommendations
cleanser_user_profile['20s'] = 1
cleanser_user_profile['Oily'] = 1
cleanser_user_profile['Cream'] = 2
cleanser_user_profile['Oiliness'] = 3
cleanser_user_profile['rating'] = 1.5

In [8]:
# Cosine similiarity
# Get the dot product between dummified dataset vector and user_profile
cleanser_recommendation = np.dot(dummified_cleanser.values, cleanser_user_profile.values)
# Convert results to pandas Series 
cleanser_recommendation = pd.Series(cleanser_recommendation, index = dummified_cleanser.index)
# Get the top 10 recommendations for a new user
cleanser = cleanser_recommendation.sort_values(ascending = False).head(10)
cleanser

unique_id
SKIN INC-Pure Serum Infused O2 Cleanser                                  14.50
SK-II-MEN Moisturizing Cleanser                                          14.20
FARMACY-Whipped Greens oil-free foaming cleanser                         13.90
INNISFREE-Volcanic Pore Cleansing Foam EX                                13.75
DERMALOGICA-Active Clay Cleanser                                         13.75
FOREO-Micro-Foam Cleanser                                                13.60
LAURA MERCIER-Balancing Foaming Cleanser                                 13.60
ORIGINS-Checks and Balances™ Frothy Face Wash                            13.60
TATCHA-Clarifying Cleanse And Hydrate Duo (Christmas Limited Edition)    13.60
OLE HENRIKSEN-Splashing Stars Cleanser & Scrub Duo                       13.45
dtype: float64

#### Toner Recommendation

In [9]:
# Create a new vector for a 'new user' by initializing all features in the dataset above
toner_cols = dummified_toner.columns # match feature space for new user with the existing dataset
toner_user_profile = pd.Series(data=np.zeros(len(toner_cols)), index=dummified_toner.columns) # initialize 0s for all categories to create new user vector
toner_user_profile

rating                    0.0
Combination               0.0
Dry                       0.0
Normal                    0.0
Oily                      0.0
Sensitive                 0.0
Ageing                    0.0
Blackheads                0.0
Blemishes                 0.0
DarkCircles               0.0
Dryness                   0.0
Dullness                  0.0
FineLines&Wrinkles        0.0
Firmness&Elasticity       0.0
Oiliness                  0.0
Pigmentation&DarkSpots    0.0
Puffiness                 0.0
UnevenSkinTexture         0.0
UnevenSkinTone            0.0
VisiblePores              0.0
Cream                     0.0
Gel                       0.0
Liquid                    0.0
Lotion                    0.0
Sheet                     0.0
Spray                     0.0
Wipe                      0.0
20s                       0.0
30s                       0.0
40s                       0.0
50+                       0.0
Under20                   0.0
dtype: float64

In [10]:
# assign input values to categories to test recommendations
toner_user_profile['20s'] = 1
toner_user_profile['Oily'] = 1
toner_user_profile['Cream'] = 2
toner_user_profile['Oiliness'] = 3
toner_user_profile['rating'] = 1.5

In [11]:
# Cosine similiarity
# Get the dot product between dummified dataset vector and user_profile
toner_recommendation = np.dot(dummified_toner.values, toner_user_profile.values)
# Convert results to pandas Series 
toner_recommendation = pd.Series(toner_recommendation, index = dummified_toner.index)
# Get the top 10 recommendations for a new user
toner = toner_recommendation.sort_values(ascending = False).head(10)
toner

unique_id
DRGL-Toner Oil Control                                  12.50
INNISFREE-Jeju Volcanic Pore Toner 2X                   12.50
DR.JART+-Ctrl-A Teatreement™ Toner                      12.20
LANEIGE-Essential Balancing Skin Refiner Light          12.05
PIXI-Skintreats Clarity Tonic Clarifying Toner          12.05
TATCHA-The Texture Tonic                                12.05
CLINIQUE-Clarifying Lotion 1 - Very Dry To Dry Skin     12.05
CLINIQUE-Clarifying Lotion 2 - Dry Combination Skin     12.05
BIOSSANCE-Squalane + BHA Pore Minimizing Toner          11.75
CLINIQUE-Clarifying Lotion 3 - Combination Oily Skin    11.60
dtype: float64

#### Day Moisturizer Recommendation

In [12]:
# Create a new vector for a 'new user' by initializing all features in the dataset above
day_moisturizer_cols = dummified_day_moisturizer.columns # match feature space for new user with the existing dataset
day_moisturizer_user_profile = pd.Series(data=np.zeros(len(day_moisturizer_cols)), index=dummified_day_moisturizer.columns) # initialize 0s for all categories to create new user vector
day_moisturizer_user_profile

rating                    0.0
Combination               0.0
Dry                       0.0
Normal                    0.0
Oily                      0.0
Sensitive                 0.0
Ageing                    0.0
Blackheads                0.0
Blemishes                 0.0
DarkCircles               0.0
Dryness                   0.0
Dullness                  0.0
FineLines&Wrinkles        0.0
Firmness&Elasticity       0.0
Oiliness                  0.0
Pigmentation&DarkSpots    0.0
Puffiness                 0.0
UnevenSkinTexture         0.0
UnevenSkinTone            0.0
VisiblePores              0.0
Balm                      0.0
Cream                     0.0
Gel                       0.0
Liquid                    0.0
Lotion                    0.0
Oil                       0.0
Spray                     0.0
20s                       0.0
30s                       0.0
40s                       0.0
50+                       0.0
Under20                   0.0
dtype: float64

In [13]:
# assign input values to categories to test recommendations
day_moisturizer_user_profile['20s'] = 1
day_moisturizer_user_profile['Oily'] = 1
day_moisturizer_user_profile['Cream'] = 2
day_moisturizer_user_profile['Oiliness'] = 3
day_moisturizer_user_profile['rating'] = 1.5

In [14]:
# Cosine similiarity
# Get the dot product between dummified dataset vector and user_profile
day_moisturizer_recommendation = np.dot(dummified_day_moisturizer.values, day_moisturizer_user_profile.values)
# Convert results to pandas Series 
day_moisturizer_recommendation = pd.Series(day_moisturizer_recommendation, index = dummified_day_moisturizer.index)
# Get the top 10 recommendations for a new user
day_moisturizer = day_moisturizer_recommendation.sort_values(ascending = False).head(10)
day_moisturizer

unique_id
LAB SERIES-All-In-One Face Treatment                        14.50
LANCÔME-Clarifique Brightening Plumping Milky Cream         14.05
DR.JART+-Ctrl-A Teatreement™ Moisturizer                    13.90
SEPHORA COLLECTION-Matte Moisturizer                        13.90
ORIGINS-Original Skin™ Matte Moisturizer with Willowherb    13.60
OLE HENRIKSEN-Cold Plunge™ Pore Remedy Moisturizer          13.60
NUDESTIX-Citrus-C Mask & Daily Moisturizer                  13.60
LAB SERIES-Oil Control Moisturizer                          13.45
AUGUSTINUS BADER-The Light Cream                            13.45
FENTY SKIN-Instant Reset Overnight Recovery Gel-Cream       13.45
dtype: float64

#### Night Cream Recommendation

In [15]:
# Create a new vector for a 'new user' by initializing all features in the dataset above
night_cream_cols = dummified_night_cream.columns # match feature space for new user with the existing dataset
night_cream_user_profile = pd.Series(data=np.zeros(len(night_cream_cols)), index=dummified_night_cream.columns) # initialize 0s for all categories to create new user vector
night_cream_user_profile

rating                    0.0
Combination               0.0
Dry                       0.0
Normal                    0.0
Oily                      0.0
Sensitive                 0.0
Ageing                    0.0
Blackheads                0.0
Blemishes                 0.0
Dryness                   0.0
Dullness                  0.0
FineLines&Wrinkles        0.0
Firmness&Elasticity       0.0
Oiliness                  0.0
Pigmentation&DarkSpots    0.0
Puffiness                 0.0
UnevenSkinTexture         0.0
UnevenSkinTone            0.0
VisiblePores              0.0
Balm                      0.0
Cream                     0.0
Foam                      0.0
Gel                       0.0
Liquid                    0.0
Lotion                    0.0
Oil                       0.0
20s                       0.0
30s                       0.0
40s                       0.0
50+                       0.0
Under20                   0.0
dtype: float64

In [16]:
# assign input values to categories to test recommendations
night_cream_user_profile['20s'] = 1
night_cream_user_profile['Oily'] = 1
night_cream_user_profile['Cream'] = 2
night_cream_user_profile['Oiliness'] = 3
night_cream_user_profile['rating'] = 1.5

In [17]:
# Cosine similiarity
# Get the dot product between dummified dataset vector and user_profile
night_cream_recommendation = np.dot(dummified_night_cream.values, night_cream_user_profile.values)
# Convert results to pandas Series 
night_cream_recommendation = pd.Series(night_cream_recommendation, index = dummified_night_cream.index)
# Get the top 10 recommendations for a new user
night_cream = night_cream_recommendation.sort_values(ascending = False).head(10)
night_cream

unique_id
LAB SERIES-All-In-One Face Treatment                     14.50
LAB SERIES-All-In-One Face Treatment                     14.50
LANCÔME-Clarifique Brightening Plumping Milky Cream      14.05
SEPHORA COLLECTION-Matte Moisturizer                     13.90
NUDESTIX-Citrus-C Mask & Daily Moisturizer               13.60
OLE HENRIKSEN-Cold Plunge™ Pore Remedy Moisturizer       13.60
FENTY SKIN-Instant Reset Overnight Recovery Gel-Cream    13.45
FIRST AID BEAUTY-Skin Rescue Daily Face Cream            13.30
THE INKEY LIST-Vitamin B, C And E Moisturizer            13.30
CLINIQUE-Dramatically Different Moisturizing Gel         11.75
dtype: float64

#### Sunscreen

In [18]:
# Create a new vector for a 'new user' by initializing all features in the dataset above
sunscreen_cols = dummified_sunscreen.columns # match feature space for new user with the existing dataset
sunscreen_user_profile = pd.Series(data=np.zeros(len(sunscreen_cols)), index=dummified_sunscreen.columns) # initialize 0s for all categories to create new user vector
sunscreen_user_profile

rating                    0.0
Combination               0.0
Dry                       0.0
Normal                    0.0
Oily                      0.0
Sensitive                 0.0
Ageing                    0.0
Blackheads                0.0
Blemishes                 0.0
DarkCircles               0.0
Dryness                   0.0
Dullness                  0.0
FineLines&Wrinkles        0.0
Firmness&Elasticity       0.0
Oiliness                  0.0
Pigmentation&DarkSpots    0.0
UnevenSkinTexture         0.0
UnevenSkinTone            0.0
VisiblePores              0.0
Balm                      0.0
Cream                     0.0
Gel                       0.0
Liquid                    0.0
LoosePowder               0.0
Lotion                    0.0
Oil                       0.0
Spray                     0.0
20s                       0.0
30s                       0.0
40s                       0.0
50+                       0.0
Under20                   0.0
dtype: float64

In [19]:
# assign input values to categories to test recommendations
sunscreen_user_profile['20s'] = 1
sunscreen_user_profile['Oily'] = 1
sunscreen_user_profile['Cream'] = 2
sunscreen_user_profile['Oiliness'] = 3
sunscreen_user_profile['rating'] = 1.5

In [20]:
# Cosine similiarity
# Get the dot product between dummified dataset vector and user_profile
sunscreen_recommendation = np.dot(dummified_sunscreen.values, sunscreen_user_profile.values)
# Convert results to pandas Series 
sunscreen_recommendation = pd.Series(sunscreen_recommendation, index = dummified_sunscreen.index)
# Get the top 10 recommendations for a new user
sunscreen = sunscreen_recommendation.sort_values(ascending = False).head(10)
sunscreen

unique_id
ULTRA VIOLETTE-Lean Screen Mineral Mattifying SPF 50+                                     13.90
SUPERGOOP!-Unseen Sunscreen Broad Spectrum Sunscreen SPF 40 PA+++                         13.45
SUPERGOOP!-Mineral Mattescreen SPF 40 PA+++                                               13.30
DERMALOGICA CLEAR START-Clearing Defense SPF 30                                           12.55
THREE-Balancing UV Protector R                                                            11.50
DIOR-Prestige Light-In-White The UV Protector Youth And Light Sheer Glow SPF 50+ PA+++    11.50
CLE DE PEAU-UV Protective Cream SPF 50+                                                   11.50
LANCÔME-UV Expert Youth-Shield™ Aqua Gel SPF50 PA++++                                     11.45
SHISEIDO-Clear Stick UV Protector SPF 50+ PA++++                                          11.20
SK-II-Atmosphere CC Cream SPF 50 PA++++                                                   11.20
dtype: float64

For a user in their 20s, with oily skin, skin concern of oiliness and formulation preference of cream, they will be recommended the following products: 


In [21]:
print(f'Cleanser: {cleanser.index[0]}\nToner: {toner.index[0]}\nDay Moisturizer: {day_moisturizer.index[0]}\nNight Cream: {night_cream.index[0]}\nSunscreen: {sunscreen.index[0]}')
      

Cleanser: SKIN INC-Pure Serum Infused O2 Cleanser
Toner: DRGL-Toner Oil Control
Day Moisturizer: LAB SERIES-All-In-One Face Treatment
Night Cream: LAB SERIES-All-In-One Face Treatment
Sunscreen: ULTRA VIOLETTE-Lean Screen Mineral Mattifying SPF 50+


The output above shows that my recommendation works, and I am able to recommend a user a set of 5 products.
With this, I now have all the building blocks to build a web app which will be done in the next notebook!
I will be using filter tags as the basis of my recommendation system. In doing so, I will be able to avoid running into cold-start issues and having to wait for more explicit data from a new user before giving any recommendations. 

In [22]:
# Export datasets
dummified_cleanser.to_csv('../data/recommender_cleanser.csv', index = True)
dummified_toner.to_csv('../data/recommender_toner.csv', index = True)
dummified_day_moisturizer.to_csv('../data/recommender_day_moisturizer.csv', index = True)
dummified_night_cream.to_csv('../data/recommender_night_cream.csv', index = True)
dummified_sunscreen.to_csv('../data/recommender_sunscreen.csv', index = True)