# Skincare Recommendation Engine


This notebook walks you through the development of a content-based recommendation engine that should take a list of skin metrics/concerns (skin type, tone, acne, blemishes, redness, etc) as input and return several products that might suit the user's skin. 

In [2]:
import numpy as np 
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import heapq

In [3]:
# 'essentials' implies essential facial skincare products
df = pd.read_csv('product_details.csv')
makeup = pd.read_csv('result2.csv')

In [4]:
df.head()

Unnamed: 0,label,url,brand,name,skin_type,concern_1,concern_2,concern_3,concern_4,formulation,price
0,face-moisturizer,https://www.sephora.com/product/the-dewy-skin-...,Tatcha,The Dewy Skin Cream Plumping & Hydrating Refil...,Normal and Dry,Dryness,Dullness,Fine Lines & Wrinkles,,Rich Cream,$72.00
1,face-moisturizer,https://www.sephora.com/product/glow-recipe-pl...,Glow Recipe,Plum Plump Refillable Hyaluronic Acid Moisturizer,"Normal, Dry, Combination, and Oily",Dryness,Dullness,Loss of Firmness,Elasticity,Cream,$40.00
2,face-moisturizer,https://www.sephora.com/product/drunk-elephant...,Drunk Elephant,Lala Retro™ Nourishing Whipped Refillable Mois...,"Normal, Dry, Combination, and Oily",Dryness,Fine Lines and Wrinkles,Loss of Firmness,Elasticity,Rich Cream,$64.00
3,face-moisturizer,https://www.sephora.com/product/kale-spinach-h...,Youth To The People,Superfood Air-Whip Lightweight Face Moisturize...,"Normal, Combination, and Oily",Fine Lines and Wrinkles,Dryness,Loss of Firmness,Elasticity,,$48.00
4,face-moisturizer,https://www.sephora.com/product/clarins-multi-...,Clarins,"Multi-Active Day Moisturizer for Lines, Pores,...","Normal, Dry, Combination, and Oily",Fine Lines and Wrinkles,Pores,Dullness,,Cream,$59.00


In [5]:
makeup['skin tone'].value_counts()

skin tone
Light to Medium    317
Fair to Light      109
Medium to Dark      49
Dark to Deep        19
Name: count, dtype: int64

## Data Preprocessing

### Imputation of values

In [6]:
df['label'].value_counts()

label
face-moisturizer    328
cleanser            214
eye-cream           119
sunscreen           106
masks                90
Primer               80
Name: count, dtype: int64

In [7]:
df.isna().sum()

label            0
url              0
brand            1
name             0
skin_type      156
concern_1      181
concern_2      333
concern_3      352
concern_4      743
formulation    348
price            8
dtype: int64

In [8]:
df['concern_1'].value_counts()

concern_1
Fine Lines and Wrinkles                                        205
Dryness                                                        200
Pores                                                          120
Dark Spots                                                      59
Dryness and Dullness                                            15
Dryness and Redness                                             15
Dullness                                                        14
Redness                                                         11
Oiliness                                                         8
Loss of Firmness and Elasticity                                  8
Fine lines and wrinkles                                          7
Pores and Oiliness                                               6
Fine Lines                                                       6
Fine Lines/Wrinkles                                              6
Dryness and Oiliness                                

In [9]:
df['concern_1'] = df['concern_1'].fillna('')
df['concern_2'] = df['concern_2'].fillna('')
df['concern_3'] = df['concern_3'].fillna('')
df['concern_4'] = df['concern_4'].fillna('')
df['concern'] = df['concern_1'] + ',' + df['concern_2'] + ',' + df['concern_3']  + ',' + df['concern_4']
df['concern']

0               Dryness, Dullness,Fine Lines & Wrinkles,
1          Dryness, Dullness,Loss of Firmness,Elasticity
2      Dryness, Fine Lines and Wrinkles,Loss of Firmn...
3      Fine Lines and Wrinkles, Dryness,Loss of Firmn...
4               Fine Lines and Wrinkles, Pores,Dullness,
                             ...                        
932                                           Dryness,,,
933                                                  ,,,
934                                                  ,,,
935                                                  ,,,
936                                                  ,,,
Name: concern, Length: 937, dtype: object

In [10]:
df.drop(columns=['concern_1', 'concern_2', 'concern_3', 'concern_4', 'formulation'], inplace = True)
df['concern'].value_counts()

concern
,,,                                                             181
Fine Lines and Wrinkles, Dryness,Loss of Firmness,Elasticity     56
Dryness,,,                                                       48
Dryness, Dullness,Uneven Texture,                                40
Fine Lines and Wrinkles, Dryness,Dullness,                       22
                                                               ... 
Pores,Dryness. And Oiliness,,                                     1
Dark Spots, Uneven Texture,Acne,Blemishes                         1
Oiliness, Acne,Blemishes,                                         1
Dullness, Uneven Texture,Dryness,                                 1
Uneven Skin Tone,,,                                               1
Name: count, Length: 233, dtype: int64

In [11]:
df2 = df[(df['label'] == 'face-moisturizer') | (df['label'] == 'masks') | (df['label'] == 'cleanser') | (df['label'] == 'eye-cream') | (df['label'] == 'sunscreen')]
df2

Unnamed: 0,label,url,brand,name,skin_type,price,concern
0,face-moisturizer,https://www.sephora.com/product/the-dewy-skin-...,Tatcha,The Dewy Skin Cream Plumping & Hydrating Refil...,Normal and Dry,$72.00,"Dryness, Dullness,Fine Lines & Wrinkles,"
1,face-moisturizer,https://www.sephora.com/product/glow-recipe-pl...,Glow Recipe,Plum Plump Refillable Hyaluronic Acid Moisturizer,"Normal, Dry, Combination, and Oily",$40.00,"Dryness, Dullness,Loss of Firmness,Elasticity"
2,face-moisturizer,https://www.sephora.com/product/drunk-elephant...,Drunk Elephant,Lala Retro™ Nourishing Whipped Refillable Mois...,"Normal, Dry, Combination, and Oily",$64.00,"Dryness, Fine Lines and Wrinkles,Loss of Firmn..."
3,face-moisturizer,https://www.sephora.com/product/kale-spinach-h...,Youth To The People,Superfood Air-Whip Lightweight Face Moisturize...,"Normal, Combination, and Oily",$48.00,"Fine Lines and Wrinkles, Dryness,Loss of Firmn..."
4,face-moisturizer,https://www.sephora.com/product/clarins-multi-...,Clarins,"Multi-Active Day Moisturizer for Lines, Pores,...","Normal, Dry, Combination, and Oily",$59.00,"Fine Lines and Wrinkles, Pores,Dullness,"
...,...,...,...,...,...,...,...
852,eye-cream,https://www.sephora.com/product/cucumber-de-to...,Peter Thomas Roth,Cucumber De-Tox™ Hydra-Gel Eye Patches,"Normal, Dry, Combination, and Oily",$55.00,"Fine Lines and Wrinkles, Dryness,Puffiness,"
853,eye-cream,https://www.sephora.com/product/glopro-eye-mic...,BeautyBio,GloPRO® EYE MicroTip™ Attachment Head,"Normal, Dry, Combination, and Oily",$39.00,"Fine Lines and Wrinkles, Dullness and Uneven T..."
854,eye-cream,https://www.sephora.com/product/guerlain-abeil...,GUERLAIN,Abeille Royale Anti-Aging Eye Cream,"Normal, Dry, Combination, and Oily",$110.00,"Fine Lines and Wrinkles, Dullness and Uneven T..."
855,eye-cream,https://www.sephora.com/product/goop-goopglow-...,goop,GOOPGLOW Vita-C Brightening Eye Cream,"Normal, Dry, Combination, and Oily",$58.00,"Dryness, Dullness,Dark Circles,"


In [12]:
LABELS = list(df2.label.unique())
LABELS

['face-moisturizer', 'cleanser', 'sunscreen', 'masks', 'eye-cream']

In [13]:
df2 = df2[df2['skin_type'].isna() == False]
df2.index = [i for i in range(0, len(df2))]
df2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 736 entries, 0 to 735
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   label      736 non-null    object
 1   url        736 non-null    object
 2   brand      735 non-null    object
 3   name       736 non-null    object
 4   skin_type  736 non-null    object
 5   price      731 non-null    object
 6   concern    736 non-null    object
dtypes: object(7)
memory usage: 46.0+ KB


In [14]:
df2[df2['concern'] == ',,,']['label'].value_counts()

label
sunscreen           6
face-moisturizer    5
cleanser            5
masks               1
Name: count, dtype: int64

In [15]:
df2[df2['label'] == 'sunscreen']['concern'].value_counts()

concern
Dryness,,,                                                                  8
,,,                                                                         6
Dryness, Dullness,Uneven Texture,                                           5
Dark Spots, Fine Lines and Wrinkles,Dryness,                                5
Dryness and Dullness,,,                                                     3
Dark Spots, Fine Lines and Wrinkles,Redness,                                3
Dark Spots, Fine Lines and Wrinkles,Dullness,                               3
Fine Lines and Wrinkles, Redness,Loss of Firmness,Elasticity                3
Fine Lines and Wrinkles, Dryness,Dullness,                                  3
Dullness,,,                                                                 2
Fine Lines and Wrinkles, Loss of Firmness and Elasticity,Uneven Texture,    2
Dryness and Redness,,,                                                      2
Dark Spots and Dullness,,,                              

In [16]:
top_concerns = {
    'face-moisturizer':'fine lines and wrinkles, dryness,loss of firmness,elasticity',
    'masks':'pores, dullness,uneven texture,', 
    'cleanser':'pores, dullness,uneven texture,', 
    'eye-cream':'fine lines and wrinkles, dark circles,puffiness,',
    'sunscreen': 'dryness,,,'            
}

entries = len(df2)
for i in range(entries):
    label = df2.iloc[i]['label']
    if df2.iloc[i]['concern'] == ',,,':
        df2.at[i, 'concern'] = top_concerns.get(label, '')

In [17]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 736 entries, 0 to 735
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   label      736 non-null    object
 1   url        736 non-null    object
 2   brand      735 non-null    object
 3   name       736 non-null    object
 4   skin_type  736 non-null    object
 5   price      731 non-null    object
 6   concern    736 non-null    object
dtypes: object(7)
memory usage: 62.2+ KB


In [27]:
df2.to_csv('final_dataset.csv')

### Successfully merged and filled null values for skin concerns


## Finding top Skin Concerns

In [143]:
product = pd.read_csv('skincare.csv')
product

Unnamed: 0.1,Unnamed: 0,label,url,brand,name,skin_type,price,concern,img
0,0,face-moisturizer,https://www.sephora.com/product/the-dewy-skin-...,Tatcha,The Dewy Skin Cream Plumping & Hydrating Refil...,Normal and Dry,$72.00,"Dryness, Dullness,Fine Lines & Wrinkles,",https://www.sephora.com/productimages/sku/s278...
1,1,face-moisturizer,https://www.sephora.com/product/glow-recipe-pl...,Glow Recipe,Plum Plump Refillable Hyaluronic Acid Moisturizer,"Normal, Dry, Combination, and Oily",$40.00,"Dryness, Dullness,Loss of Firmness,Elasticity",https://www.sephora.com/productimages/sku/s253...
2,2,face-moisturizer,https://www.sephora.com/product/drunk-elephant...,Drunk Elephant,Lala Retro™ Nourishing Whipped Refillable Mois...,"Normal, Dry, Combination, and Oily",$64.00,"Dryness, Fine Lines and Wrinkles,Loss of Firmn...",https://www.sephora.com/productimages/sku/s223...
3,3,face-moisturizer,https://www.sephora.com/product/kale-spinach-h...,Youth To The People,Superfood Air-Whip Lightweight Face Moisturize...,"Normal, Combination, and Oily",$48.00,"Fine Lines and Wrinkles, Dryness,Loss of Firmn...",https://www.sephora.com/productimages/sku/s186...
4,4,face-moisturizer,https://www.sephora.com/product/clarins-multi-...,Clarins,"Multi-Active Day Moisturizer for Lines, Pores,...","Normal, Dry, Combination, and Oily",$59.00,"Fine Lines and Wrinkles, Pores,Dullness,",https://www.sephora.com/productimages/sku/s273...
...,...,...,...,...,...,...,...,...,...
731,731,eye-cream,https://www.sephora.com/product/plantscription...,Origins,Plantscription™ Anti-Aging Power Eye Cream,"Normal, Dry, Combination, and Oily",$66.00,"Fine Lines and Wrinkles,,,",https://www.sephora.com/productimages/sku/s167...
732,732,eye-cream,https://www.sephora.com/product/cucumber-de-to...,Peter Thomas Roth,Cucumber De-Tox™ Hydra-Gel Eye Patches,"Normal, Dry, Combination, and Oily",$55.00,"Fine Lines and Wrinkles, Dryness,Puffiness,",https://www.sephora.com/productimages/sku/s186...
733,733,eye-cream,https://www.sephora.com/product/glopro-eye-mic...,BeautyBio,GloPRO® EYE MicroTip™ Attachment Head,"Normal, Dry, Combination, and Oily",$39.00,"Fine Lines and Wrinkles, Dullness and Uneven T...",https://www.sephora.com/productimages/sku/s216...
734,734,eye-cream,https://www.sephora.com/product/guerlain-abeil...,GUERLAIN,Abeille Royale Anti-Aging Eye Cream,"Normal, Dry, Combination, and Oily",$110.00,"Fine Lines and Wrinkles, Dullness and Uneven T...",https://www.sephora.com/productimages/sku/s235...


In [144]:
product['brand'] = product['brand'].str.lower()
product['name'] = product['name'].str.lower()
product['skin_type'] = product['skin_type'].str.lower()
product['skin_type'] = product['skin_type'].str.replace(' and ', ',').str.replace(' or ', ',')
product['concern'] = product['concern'].str.lower()
product['concern'] = product['concern'].str.replace(' and ', ',').str.replace(' or ', ',')
product

Unnamed: 0.1,Unnamed: 0,label,url,brand,name,skin_type,price,concern,img
0,0,face-moisturizer,https://www.sephora.com/product/the-dewy-skin-...,tatcha,the dewy skin cream plumping & hydrating refil...,"normal,dry",$72.00,"dryness, dullness,fine lines & wrinkles,",https://www.sephora.com/productimages/sku/s278...
1,1,face-moisturizer,https://www.sephora.com/product/glow-recipe-pl...,glow recipe,plum plump refillable hyaluronic acid moisturizer,"normal, dry, combination,,oily",$40.00,"dryness, dullness,loss of firmness,elasticity",https://www.sephora.com/productimages/sku/s253...
2,2,face-moisturizer,https://www.sephora.com/product/drunk-elephant...,drunk elephant,lala retro™ nourishing whipped refillable mois...,"normal, dry, combination,,oily",$64.00,"dryness, fine lines,wrinkles,loss of firmness,...",https://www.sephora.com/productimages/sku/s223...
3,3,face-moisturizer,https://www.sephora.com/product/kale-spinach-h...,youth to the people,superfood air-whip lightweight face moisturize...,"normal, combination,,oily",$48.00,"fine lines,wrinkles, dryness,loss of firmness,...",https://www.sephora.com/productimages/sku/s186...
4,4,face-moisturizer,https://www.sephora.com/product/clarins-multi-...,clarins,"multi-active day moisturizer for lines, pores,...","normal, dry, combination,,oily",$59.00,"fine lines,wrinkles, pores,dullness,",https://www.sephora.com/productimages/sku/s273...
...,...,...,...,...,...,...,...,...,...
731,731,eye-cream,https://www.sephora.com/product/plantscription...,origins,plantscription™ anti-aging power eye cream,"normal, dry, combination,,oily",$66.00,"fine lines,wrinkles,,,",https://www.sephora.com/productimages/sku/s167...
732,732,eye-cream,https://www.sephora.com/product/cucumber-de-to...,peter thomas roth,cucumber de-tox™ hydra-gel eye patches,"normal, dry, combination,,oily",$55.00,"fine lines,wrinkles, dryness,puffiness,",https://www.sephora.com/productimages/sku/s186...
733,733,eye-cream,https://www.sephora.com/product/glopro-eye-mic...,beautybio,glopro® eye microtip™ attachment head,"normal, dry, combination,,oily",$39.00,"fine lines,wrinkles, dullness,uneven texture,l...",https://www.sephora.com/productimages/sku/s216...
734,734,eye-cream,https://www.sephora.com/product/guerlain-abeil...,guerlain,abeille royale anti-aging eye cream,"normal, dry, combination,,oily",$110.00,"fine lines,wrinkles, dullness,uneven texture,l...",https://www.sephora.com/productimages/sku/s235...


In [145]:
LABELS = list(product.label.unique())
entries = len(product)

In [146]:
def concern_elements(comma_sep_concerns):
    words = comma_sep_concerns.split(',')
    for w in words:
        if w != '':
            temp = w.strip()
            if temp in concerns:
                concerns[temp] += 1
            else:
                concerns[temp] = 1
                
# features
list(product['skin_type'].unique())
concerns = {}
for i in range(entries):
    concern_elements(product.iloc[i]['concern'])

concerns

{'dryness': 437,
 'dullness': 292,
 'fine lines & wrinkles': 2,
 'loss of firmness': 180,
 'elasticity': 179,
 'fine lines': 292,
 'wrinkles': 293,
 'pores': 161,
 'redness': 90,
 'uneven texture': 186,
 'dark spots': 67,
 'oiliness': 108,
 'acne': 49,
 'blemishes': 56,
 'acne/blemishes': 3,
 'fine lines/wrinkles': 6,
 'fines lines': 2,
 'uneven skin tone': 7,
 'dryness.': 1,
 'dark spot': 4,
 'dullness/uneven texture': 2,
 'pregnancy': 1,
 'anti-aging': 1,
 'loss if firmness': 1,
 'dark circles': 61,
 'puffiness': 50,
 'loss of firmness/elasticity': 2,
 'puffiness/dark circles': 1}

In [147]:
print(sorted(concerns.items(), key=lambda kv:(kv[1], kv[0])))   

[('anti-aging', 1), ('dryness.', 1), ('loss if firmness', 1), ('pregnancy', 1), ('puffiness/dark circles', 1), ('dullness/uneven texture', 2), ('fine lines & wrinkles', 2), ('fines lines', 2), ('loss of firmness/elasticity', 2), ('acne/blemishes', 3), ('dark spot', 4), ('fine lines/wrinkles', 6), ('uneven skin tone', 7), ('acne', 49), ('puffiness', 50), ('blemishes', 56), ('dark circles', 61), ('dark spots', 67), ('redness', 90), ('oiliness', 108), ('pores', 161), ('elasticity', 179), ('loss of firmness', 180), ('uneven texture', 186), ('dullness', 292), ('fine lines', 292), ('wrinkles', 293), ('dryness', 437)]


In [148]:
concerns.pop('anti-aging')
concerns.pop('dryness.')
concerns.pop('loss if firmness')
concerns.pop('pregnancy')
concerns.pop('puffiness/dark circles')
concerns.pop('dullness/uneven texture')
concerns.pop('fine lines & wrinkles')
concerns.pop('fines lines')
concerns.pop('loss of firmness/elasticity')
concerns.pop('acne/blemishes')
concerns.pop('dark spot')
concerns.pop('fine lines/wrinkles')
concerns.pop('uneven skin tone')

7

In [149]:
def split_skin_types(df, skin_type_column='skin_type'):
    # List of conjunctions and delimiters to handle different combinations
    delimiters = [' and ', ', and ', ',', ', ', ' ,', ' / ']
    data = []

    # Function to split skin types into individual rows
    def expand_row(row):
        skin_types = row[skin_type_column]
        # Split based on delimiters and strip any extra spaces
        # for delim in delimiters:
        #     if delim in skin_types:
        #         skin_types = [stype.strip() for stype in skin_types.split(delim)]
        #         break
        # else:
        #     # If no delimiters were found, treat as a single skin type
        #     skin_types = [skin_types]
        # # Create a row for each skin type
        # for stype in skin_types:
        #     if stype != '':
        #         row_copy = row.copy()
        #         row_copy[skin_type_column] = stype
        #         data.append(row_copy)
        skin_types = skin_types.replace(',,', " ")
        skin_types = skin_types.replace(',', '')
        skin_types = skin_types.split()
        
        if 'combination' in skin_types:
            stype = 'combination'
        elif 'oily' in skin_types:
            stype = 'oily'
        elif 'dry' in skin_types:
            stype = 'dry'
        else:
            stype = 'normal'

            
        if stype != None:
            row_copy = row.copy()
            row_copy[skin_type_column] = stype
            data.append(row_copy)

    # Apply the function to each row and collect expanded rows
    df.apply(expand_row, axis=1)

    # Create a DataFrame from the collected data
    expanded_df = pd.DataFrame(data)

    return expanded_df

# Example usage
# df2_expanded = split_skin_types(df2)
# df3_expanded = split_skin_types(df2_expanded)
# p1_expanded = split_skin_types(product)
product_expanded = split_skin_types(product)
product_expanded

Unnamed: 0.1,Unnamed: 0,label,url,brand,name,skin_type,price,concern,img
0,0,face-moisturizer,https://www.sephora.com/product/the-dewy-skin-...,tatcha,the dewy skin cream plumping & hydrating refil...,normal,$72.00,"dryness, dullness,fine lines & wrinkles,",https://www.sephora.com/productimages/sku/s278...
1,1,face-moisturizer,https://www.sephora.com/product/glow-recipe-pl...,glow recipe,plum plump refillable hyaluronic acid moisturizer,combination,$40.00,"dryness, dullness,loss of firmness,elasticity",https://www.sephora.com/productimages/sku/s253...
2,2,face-moisturizer,https://www.sephora.com/product/drunk-elephant...,drunk elephant,lala retro™ nourishing whipped refillable mois...,combination,$64.00,"dryness, fine lines,wrinkles,loss of firmness,...",https://www.sephora.com/productimages/sku/s223...
3,3,face-moisturizer,https://www.sephora.com/product/kale-spinach-h...,youth to the people,superfood air-whip lightweight face moisturize...,combination,$48.00,"fine lines,wrinkles, dryness,loss of firmness,...",https://www.sephora.com/productimages/sku/s186...
4,4,face-moisturizer,https://www.sephora.com/product/clarins-multi-...,clarins,"multi-active day moisturizer for lines, pores,...",combination,$59.00,"fine lines,wrinkles, pores,dullness,",https://www.sephora.com/productimages/sku/s273...
...,...,...,...,...,...,...,...,...,...
731,731,eye-cream,https://www.sephora.com/product/plantscription...,origins,plantscription™ anti-aging power eye cream,combination,$66.00,"fine lines,wrinkles,,,",https://www.sephora.com/productimages/sku/s167...
732,732,eye-cream,https://www.sephora.com/product/cucumber-de-to...,peter thomas roth,cucumber de-tox™ hydra-gel eye patches,combination,$55.00,"fine lines,wrinkles, dryness,puffiness,",https://www.sephora.com/productimages/sku/s186...
733,733,eye-cream,https://www.sephora.com/product/glopro-eye-mic...,beautybio,glopro® eye microtip™ attachment head,combination,$39.00,"fine lines,wrinkles, dullness,uneven texture,l...",https://www.sephora.com/productimages/sku/s216...
734,734,eye-cream,https://www.sephora.com/product/guerlain-abeil...,guerlain,abeille royale anti-aging eye cream,combination,$110.00,"fine lines,wrinkles, dullness,uneven texture,l...",https://www.sephora.com/productimages/sku/s235...


In [150]:
product_expanded['skin_type'].value_counts()

skin_type
combination    658
normal          56
dry             15
oily             7
Name: count, dtype: int64

In [151]:
features= list(product_expanded['skin_type'].unique()) + list(concerns)
print(features)

['normal', 'combination', 'dry', 'oily', 'dryness', 'dullness', 'loss of firmness', 'elasticity', 'fine lines', 'wrinkles', 'pores', 'redness', 'uneven texture', 'dark spots', 'oiliness', 'acne', 'blemishes', 'dark circles', 'puffiness']


In [152]:
len(features)

19

In [153]:
product_expanded.info()

<class 'pandas.core.frame.DataFrame'>
Index: 736 entries, 0 to 735
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  736 non-null    int64 
 1   label       736 non-null    object
 2   url         736 non-null    object
 3   brand       735 non-null    object
 4   name        736 non-null    object
 5   skin_type   736 non-null    object
 6   price       731 non-null    object
 7   concern     736 non-null    object
 8   img         732 non-null    object
dtypes: int64(1), object(8)
memory usage: 57.5+ KB


In [154]:
len(features)

19

In [155]:
entries = len(product_expanded)
entries

736

In [156]:
def search_concern(target, i):
    if target in product_expanded.iloc[i]['concern']:
        return True
    return False

one_hot_encodings = np.zeros([entries, len(features)])

#skin types first
for i in range(entries):
    for j in range(4):
        target = features[j]
        sk_type = product_expanded.iloc[i]['skin_type']
        if sk_type == 'combination':
            one_hot_encodings[i][0:4] = 1
        elif target == sk_type:
            one_hot_encodings[i][j] = 1

#other features
for i in range(entries):
    for j in range(len(features)):
        feature = features[j]
        if feature in product_expanded.iloc[i]['concern']:
            one_hot_encodings[i][j] = 1

In [157]:
x = one_hot_encodings[2]
print(list(x))

[1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


In [158]:
from sklearn.neighbors import NearestNeighbors
nbrs = NearestNeighbors(n_neighbors=6, algorithm='ball_tree').fit(one_hot_encodings)
distances, indices = nbrs.kneighbors(one_hot_encodings)

## Cosine Similarity

In [91]:
# utility functions
def name2index(name):
    return product_expanded[product_expanded["name"]==name].index.tolist()[0]

def index2prod(index):
    return product_expanded.iloc[index]

In [92]:
def wrap(info_arr):
    result = {'brand': info_arr[0], 'name': info_arr[1], 'price': info_arr[2], 'url': info_arr[3],
              'skin_type': info_arr[4], 'concern': str(info_arr[5]).split(','), 'img': str(info_arr[6])}
#     print(info_arr)
    return result


# recommend top 10 similar items from a category
def recs_cs(vector = None, name = None, label = None, count = 5):
    products = []
    if name:
        idx = name2index(name)
        fv = one_hot_encodings[idx]
    elif vector:
        fv = vector
    cs_values = cosine_similarity(np.array([fv, ]), one_hot_encodings)
    product_expanded['cs'] = cs_values[0]
    
    if label:
        dff = product_expanded[product_expanded['label'] == label]
    else:
        dff = product_expanded
    
    if name:
        dff = dff[dff['name'] != name]
    recommendations = dff.sort_values('cs', ascending=False).head(count)
    #   print(f"Top {count} matching {label} items")
    data = recommendations[['brand', 'name', 'price', 'url','skin_type', 'concern', 'img']].to_dict('split')['data']
    for element in data:
        products.append(wrap(element))
    return products

In [93]:
# overall recommendation
def recs_essentials(vector = None, name = None):
#     print("ESSENTIALS:")
    response = {}
    for label in LABELS:
#         print(f"{label}:")
        if name: 
            r = recs_cs(None, name, label)
        elif vector:
            r = recs_cs(vector, None, label)
        response[label] = r
    return response
            

In [94]:
# features = ['normal', 'dry', 'combination', 'oily', 'dullness', 'loss of firmness', 'elasticity', 'fine lines', 'wrinkles', 'pores', 'redness', 'uneven texture', 'dark spots', 'oiliness', 'acne', 'blemishes', 'dryness.', 'dark circles', 'puffiness']
x = [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0]

y = recs_essentials(x, None)
y

{'face-moisturizer': [{'brand': 'saint jane beauty',
   'name': 'luxury sun ritual pore smoothing face sunscreen spf 30',
   'price': '$38.00 ',
   'url': 'https://www.sephora.com/product/luxury-sun-ritual-pore-smoothing-sunscreen-spf-30-P501321?skuId=2599330&icid2=products%20grid:p501321:product',
   'skin_type': 'combination',
   'concern': ['pores', ' redness', 'oiliness', ''],
   'img': 'https://www.sephora.com/productimages/sku/s2599330-main-zoom.jpg?pb=clean-at-sephora&imwidth=175'},
  {'brand': 'bobbi brown',
   'name': 'jumbo vitamin enriched face base moisturizer & primer with vitamin c + hyaluronic acid',
   'price': '$110.00 ',
   'url': 'https://www.sephora.com/product/bobbi-brown-vitamin-enriched-face-base-jumbo-P468634?skuId=2421840&icid2=products%20grid:p468634:product',
   'skin_type': 'combination',
   'concern': ['dryness', ' fine lines', 'wrinkles', 'oiliness', ''],
   'img': 'https://www.sephora.com/productimages/sku/s2421840-main-zoom.jpg?imwidth=175'},
  {'brand':

In [159]:
product_expanded.to_csv('final_extended_dataset2.csv')

## Makeup Items

In [160]:
makeup = pd.read_csv('makeup_dataset.csv')
makeup.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 246 entries, 0 to 245
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   label        246 non-null    object
 1   url          246 non-null    object
 2   brand        246 non-null    object
 3   name         246 non-null    object
 4   skin_tone    246 non-null    object
 5   skin_type    54 non-null     object
 6   concern_1    90 non-null     object
 7   concern_2    36 non-null     object
 8   formulation  246 non-null    object
 9   price        246 non-null    object
 10  image        246 non-null    object
dtypes: object(11)
memory usage: 21.3+ KB


In [161]:
makeup.drop(columns = ['concern_1', 'concern_2', 'formulation'], inplace = True)
makeup

Unnamed: 0,label,url,brand,name,skin_tone,skin_type,price,image
0,foundation,https://www.sephora.com/product/ilia-super-ser...,ILIA,Super Serum Skin Tint SPF 40 Skincare Foundation,Very Light,,$48.00,https://www.sephora.com/productimages/sku/s233...
1,foundation,https://www.sephora.com/product/ilia-super-ser...,ILIA,Super Serum Skin Tint SPF 40 Skincare Foundation,Light,,$48.00,https://www.sephora.com/productimages/sku/s233...
2,foundation,https://www.sephora.com/product/ilia-super-ser...,ILIA,Super Serum Skin Tint SPF 40 Skincare Foundation,Medium,,$48.00,https://www.sephora.com/productimages/sku/s233...
3,foundation,https://www.sephora.com/product/ilia-super-ser...,ILIA,Super Serum Skin Tint SPF 40 Skincare Foundation,Medium Deep,,$48.00,https://www.sephora.com/productimages/sku/s242...
4,foundation,https://www.sephora.com/product/ilia-super-ser...,ILIA,Super Serum Skin Tint SPF 40 Skincare Foundation,Deep,,$48.00,https://www.sephora.com/productimages/sku/s233...
...,...,...,...,...,...,...,...,...
241,concealer,https://www.sephora.com/product/givenchy-prism...,Givenchy,Prisme Libre Skin-Caring 24H Hydrating + Radia...,Light,,$39.00,https://www.sephora.com/productimages/sku/s263...
242,concealer,https://www.sephora.com/product/givenchy-prism...,Givenchy,Prisme Libre Skin-Caring 24H Hydrating + Radia...,Medium,,$39.00,https://www.sephora.com/productimages/sku/s263...
243,concealer,https://www.sephora.com/product/givenchy-prism...,Givenchy,Prisme Libre Skin-Caring 24H Hydrating + Radia...,Medium Deep,,$39.00,https://www.sephora.com/productimages/sku/s263...
244,concealer,https://www.sephora.com/product/givenchy-prism...,Givenchy,Prisme Libre Skin-Caring 24H Hydrating + Radia...,Deep,,$39.00,https://www.sephora.com/productimages/sku/s263...


In [162]:
makeup['skin_tone'].value_counts()

skin_tone
Very Light     41
Light          41
Medium         41
Medium Deep    41
Deep           41
Extra Deep     41
Name: count, dtype: int64

In [163]:
makeup.isna().sum()

label          0
url            0
brand          0
name           0
skin_tone      0
skin_type    192
price          0
image          0
dtype: int64

In [35]:
makeup.dropna(subset=['skin tone'], inplace=True)


In [100]:
makeup[makeup['label'] == 'foundation']['skin_type'].value_counts()

skin_type
Normal, Dry, Combination, and Oily     6
Normal, Combination, and Oily          6
Normal and Combination                 6
Combination and Oily                   6
Name: count, dtype: int64

In [102]:
makeup[makeup['label'] == 'foundation']['skin_type'].isna().sum()

102

In [103]:
makeup[makeup['label'] == 'primer']['skin_type'].isna().sum()

0

In [104]:
makeup[makeup['label'] == 'primer']['skin_type'].value_counts()

Series([], Name: count, dtype: int64)

In [40]:
makeup[makeup['label'] == 'concealer']['skin_type'].isna().sum()

70

In [41]:
makeup[makeup['label'] == 'concealer']['skin type'].value_counts()

All            32
Normal         11
Combination     1
Name: skin type, dtype: int64

In [164]:
makeup['skin_type'] = makeup['skin_type'].fillna('normal')

In [165]:
makeup['brand'] = makeup['brand'].str.lower()
makeup['name'] = makeup['name'].str.lower()
makeup['skin_type'] = makeup['skin_type'].str.lower()
makeup['skin_tone'] = makeup['skin_tone'].str.lower()
makeup

Unnamed: 0,label,url,brand,name,skin_tone,skin_type,price,image
0,foundation,https://www.sephora.com/product/ilia-super-ser...,ilia,super serum skin tint spf 40 skincare foundation,very light,normal,$48.00,https://www.sephora.com/productimages/sku/s233...
1,foundation,https://www.sephora.com/product/ilia-super-ser...,ilia,super serum skin tint spf 40 skincare foundation,light,normal,$48.00,https://www.sephora.com/productimages/sku/s233...
2,foundation,https://www.sephora.com/product/ilia-super-ser...,ilia,super serum skin tint spf 40 skincare foundation,medium,normal,$48.00,https://www.sephora.com/productimages/sku/s233...
3,foundation,https://www.sephora.com/product/ilia-super-ser...,ilia,super serum skin tint spf 40 skincare foundation,medium deep,normal,$48.00,https://www.sephora.com/productimages/sku/s242...
4,foundation,https://www.sephora.com/product/ilia-super-ser...,ilia,super serum skin tint spf 40 skincare foundation,deep,normal,$48.00,https://www.sephora.com/productimages/sku/s233...
...,...,...,...,...,...,...,...,...
241,concealer,https://www.sephora.com/product/givenchy-prism...,givenchy,prisme libre skin-caring 24h hydrating + radia...,light,normal,$39.00,https://www.sephora.com/productimages/sku/s263...
242,concealer,https://www.sephora.com/product/givenchy-prism...,givenchy,prisme libre skin-caring 24h hydrating + radia...,medium,normal,$39.00,https://www.sephora.com/productimages/sku/s263...
243,concealer,https://www.sephora.com/product/givenchy-prism...,givenchy,prisme libre skin-caring 24h hydrating + radia...,medium deep,normal,$39.00,https://www.sephora.com/productimages/sku/s263...
244,concealer,https://www.sephora.com/product/givenchy-prism...,givenchy,prisme libre skin-caring 24h hydrating + radia...,deep,normal,$39.00,https://www.sephora.com/productimages/sku/s263...


In [167]:
def wrap_makeup(info_arr):
    result = {}
#     print(info_arr)
    result['brand'] = info_arr[0]
    result['name'] = info_arr[1]
    result['price'] = info_arr[2]
    result['url'] = info_arr[3]
    result['skin type'] = info_arr[4]
    result['skin tone'] = info_arr[5]
    return result



def makeup_recommendation(skin_tone):
    result = []
    dff = pd.DataFrame()
    dff = dff._append(makeup[(makeup['skin_tone'] == skin_tone) & (makeup['label'] == 'foundation')].head(3))
    dff = dff._append(makeup[(makeup['skin_tone'] == skin_tone) & (makeup['label'] == 'concealer')].head(3))
    # dff = dff._append(makeup[(makeup['skin tone'] == skin_tone) & (makeup['skin type'] == skin_type) & (makeup['label'] == 'primer')].head(2))
    dff= dff.sample(frac = 1)
    data = dff[['brand', 'name', 'price', 'url', 'skin_type', 'skin_tone']].to_dict('split')['data']
    for element in data:
        result.append(wrap_makeup(element))
    return result



In [171]:
makeup_recommendation('deep')

[{'brand': 'tower 28 beauty',
  'name': 'swipe all-over hydrating serum concealer',
  'price': '$22.00',
  'url': 'https://www.sephora.com/product/swipe-all-over-hydrating-serum-concealer-P507142?skuId=2697621&icid2=products%20grid:p507142:product',
  'skin type': 'normal',
  'skin tone': 'deep'},
 {'brand': 'sephora',
  'name': 'reveal the real 12hr soft radiant skin tint',
  'price': '$22.00',
  'url': 'https://www.sephora.com/product/reveal-real-soft-radiant-skin-tint-P511752?skuId=2760767',
  'skin type': 'normal, dry, combination, and oily ',
  'skin tone': 'deep'},
 {'brand': 'estée lauder',
  'name': 'double wear stay-in-place flawless longwear cream concealer',
  'price': '$32.00',
  'url': 'https://www.sephora.com/product/double-wear-stay-in-place-flawless-wear-concealer-P379951?icid2=products%20grid:p379951:product&skuId=1464973',
  'skin type': 'normal, dry, combination, and oily',
  'skin tone': 'deep'},
 {'brand': 'ilia',
  'name': 'super serum skin tint spf 40 skincare fo

In [50]:
df2
df2.to_csv('general_skin_care_final.csv')

In [47]:
df2.drop(columns = ['cs'], inplace = True)

In [51]:
len(df2.to_dict('split')['data'])

967

In [None]:
makeup

In [None]:
makeup.to_csv('makeup_test.csv')

In [52]:
df2.to_csv('general_test_final.csv')

In [142]:
product_expanded

Unnamed: 0.1,Unnamed: 0,label,url,brand,name,skin_type,price,concern,img,cs
0,0,face-moisturizer,https://www.sephora.com/product/the-dewy-skin-...,tatcha,the dewy skin cream plumping & hydrating refil...,normal,$72.00,"dryness, dullness,fine lines & wrinkles,",https://www.sephora.com/productimages/sku/s278...,0.182574
1,1,face-moisturizer,https://www.sephora.com/product/glow-recipe-pl...,glow recipe,plum plump refillable hyaluronic acid moisturizer,combination,$40.00,"dryness, dullness,loss of firmness,elasticity",https://www.sephora.com/productimages/sku/s253...,0.154303
2,2,face-moisturizer,https://www.sephora.com/product/drunk-elephant...,drunk elephant,lala retro™ nourishing whipped refillable mois...,combination,$64.00,"dryness, fine lines,wrinkles,loss of firmness,...",https://www.sephora.com/productimages/sku/s223...,0.288675
3,3,face-moisturizer,https://www.sephora.com/product/kale-spinach-h...,youth to the people,superfood air-whip lightweight face moisturize...,combination,$48.00,"fine lines,wrinkles, dryness,loss of firmness,...",https://www.sephora.com/productimages/sku/s186...,0.288675
4,4,face-moisturizer,https://www.sephora.com/product/clarins-multi-...,clarins,"multi-active day moisturizer for lines, pores,...",combination,$59.00,"fine lines,wrinkles, pores,dullness,",https://www.sephora.com/productimages/sku/s273...,0.288675
...,...,...,...,...,...,...,...,...,...,...
731,731,eye-cream,https://www.sephora.com/product/plantscription...,origins,plantscription™ anti-aging power eye cream,combination,$66.00,"fine lines,wrinkles,,,",https://www.sephora.com/productimages/sku/s167...,0.333333
732,732,eye-cream,https://www.sephora.com/product/cucumber-de-to...,peter thomas roth,cucumber de-tox™ hydra-gel eye patches,combination,$55.00,"fine lines,wrinkles, dryness,puffiness,",https://www.sephora.com/productimages/sku/s186...,0.462910
733,733,eye-cream,https://www.sephora.com/product/glopro-eye-mic...,beautybio,glopro® eye microtip™ attachment head,combination,$39.00,"fine lines,wrinkles, dullness,uneven texture,l...",https://www.sephora.com/productimages/sku/s216...,0.258199
734,734,eye-cream,https://www.sephora.com/product/guerlain-abeil...,guerlain,abeille royale anti-aging eye cream,combination,$110.00,"fine lines,wrinkles, dullness,uneven texture,l...",https://www.sephora.com/productimages/sku/s235...,0.258199


In [172]:
makeup

Unnamed: 0,label,url,brand,name,skin_tone,skin_type,price,image
0,foundation,https://www.sephora.com/product/ilia-super-ser...,ilia,super serum skin tint spf 40 skincare foundation,very light,normal,$48.00,https://www.sephora.com/productimages/sku/s233...
1,foundation,https://www.sephora.com/product/ilia-super-ser...,ilia,super serum skin tint spf 40 skincare foundation,light,normal,$48.00,https://www.sephora.com/productimages/sku/s233...
2,foundation,https://www.sephora.com/product/ilia-super-ser...,ilia,super serum skin tint spf 40 skincare foundation,medium,normal,$48.00,https://www.sephora.com/productimages/sku/s233...
3,foundation,https://www.sephora.com/product/ilia-super-ser...,ilia,super serum skin tint spf 40 skincare foundation,medium deep,normal,$48.00,https://www.sephora.com/productimages/sku/s242...
4,foundation,https://www.sephora.com/product/ilia-super-ser...,ilia,super serum skin tint spf 40 skincare foundation,deep,normal,$48.00,https://www.sephora.com/productimages/sku/s233...
...,...,...,...,...,...,...,...,...
241,concealer,https://www.sephora.com/product/givenchy-prism...,givenchy,prisme libre skin-caring 24h hydrating + radia...,light,normal,$39.00,https://www.sephora.com/productimages/sku/s263...
242,concealer,https://www.sephora.com/product/givenchy-prism...,givenchy,prisme libre skin-caring 24h hydrating + radia...,medium,normal,$39.00,https://www.sephora.com/productimages/sku/s263...
243,concealer,https://www.sephora.com/product/givenchy-prism...,givenchy,prisme libre skin-caring 24h hydrating + radia...,medium deep,normal,$39.00,https://www.sephora.com/productimages/sku/s263...
244,concealer,https://www.sephora.com/product/givenchy-prism...,givenchy,prisme libre skin-caring 24h hydrating + radia...,deep,normal,$39.00,https://www.sephora.com/productimages/sku/s263...


In [173]:
makeup.to_csv('makeup_dataset2.csv')