<a href="https://colab.research.google.com/github/jjone36/Cosmetic/blob/master/cosmtic_map.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cosmetic Recommendation based on Chemical Composition

This is the project for mapping cosmetic items based on similarities of chemical composition and giving content-based  recommendation. The dataset was prepared in advance and the details [here](https://towardsdatascience.com/for-your-skin-beauty-mapping-cosmetic-items-with-bokeh-af7523ca68e5).

## 1. Importing the necessary libraries and the dataset

In [2]:
# import libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

from bokeh.io import show, curdoc, output_notebook, push_notebook
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, HoverTool, Select, Paragraph, TextInput
from bokeh.layouts import widgetbox, column, row
from ipywidgets import interact 

In [3]:
df = pd.read_csv('data/cosmetic_TSNE.csv')
df.head()

Unnamed: 0,index,Label,brand,name,price,rank,ingredients,Combination,Dry,Normal,Oily,Sensitive,X,Y
0,0,Moisturizer_Combination,LA MER,Crème de la Mer,175,4.1,"Algae (Seaweed) Extract, Mineral Oil, Petrolat...",1,1,1,1,1,2.303123,17.373549
1,1,Moisturizer_Combination,SK-II,Facial Treatment Essence,179,4.1,"Galactomyces Ferment Filtrate (Pitera), Butyle...",1,1,1,1,1,7.593926,-10.227859
2,2,Moisturizer_Combination,DRUNK ELEPHANT,Protini™ Polypeptide Cream,68,4.4,"Water, Dicaprylyl Carbonate, Glycerin, Ceteary...",1,1,1,1,0,-18.200281,4.022318
3,3,Moisturizer_Combination,LA MER,The Moisturizing Soft Cream,175,3.8,"Algae (Seaweed) Extract, Cyclopentasiloxane, P...",1,1,1,1,1,32.75558,-40.191727
4,4,Moisturizer_Combination,IT COSMETICS,Your Skin But Better™ CC+™ Cream with SPF 50+,38,4.1,"Water, Snail Secretion Filtrate, Phenyl Trimet...",1,1,1,1,1,-12.003542,-57.294006


In [4]:
df.columns

Index(['index', 'Label', 'brand', 'name', 'price', 'rank', 'ingredients',
       'Combination', 'Dry', 'Normal', 'Oily', 'Sensitive', 'X', 'Y'],
      dtype='object')

All the steps until the decomposition is done already and I combine all data into one with all possible combination. `brand`,  `name`, `price` and `rank` is the data of each item scraped from [Sephora](https://www.sephora.com). 

In [5]:
# the 30 different combinations of options
df.Label.unique()

array(['Moisturizer_Combination', 'Moisturizer_Dry', 'Moisturizer_Normal',
       'Moisturizer_Oily', 'Moisturizer_Sensitive',
       'Cleanser_Combination', 'Cleanser_Dry', 'Cleanser_Normal',
       'Cleanser_Oily', 'Cleanser_Sensitive', 'Treatment_Combination',
       'Treatment_Dry', 'Treatment_Normal', 'Treatment_Oily',
       'Treatment_Sensitive', 'Face Mask_Combination', 'Face Mask_Dry',
       'Face Mask_Normal', 'Face Mask_Oily', 'Face Mask_Sensitive',
       'Eye cream_Combination', 'Eye cream_Dry', 'Eye cream_Normal',
       'Eye cream_Oily', 'Eye cream_Sensitive', 'Sun protect_Combination',
       'Sun protect_Dry', 'Sun protect_Normal', 'Sun protect_Oily',
       'Sun protect_Sensitive'], dtype=object)

In [6]:
# cosmetic filtering options 
option_1 = ['Moisturizer', 'Cleanser', 'Treatment', 'Face Mask', 'Eye cream', 'Sun protect']
option_2 = ['Combination', 'Dry', 'Normal', 'Oily', 'Sensitive']

There are 6 different categories of items and 5 skin tpye options. So `Label` column has all possible 30 combinations as above. To make a selecting option and filtering application on them, I calculated the similarities separately. Users can choice each one from option_1 and option_2 and get the filtered plot accordingly.

## 2. Mapping with Bokeh

In [7]:
output_notebook()

To work with Bokeh server on jupyter notebook, made a connection first.

In [8]:
# make a source and scatter bokeh plot  
source = ColumnDataSource(df)
plot = figure(x_axis_label = 'T-SNE 1', y_axis_label = 'T-SNE 2', 
              width = 500, height = 400)
plot.circle(x = 'X', y = 'Y', source = source, 
            size = 10, color = '#FF7373', alpha = .8)

plot.background_fill_color = "beige"
plot.background_fill_alpha = 0.2

# add hover tool
hover = HoverTool(tooltips = [
        ('Item', '@name'),
        ('brand', '@brand'),
        ('Price', '$ @price'),
        ('Rank', '@rank')])
plot.add_tools(hover)

In [9]:
# define the callback
def update(op1 = option_1[0], op2 = option_2[0]):
    a_b = op1 + '_' + op2
    new_data = {
        'X' : df[df['Label'] == a_b]['X'],
        'Y' : df[df['Label'] == a_b]['Y'],
        'name' : df[df['Label'] == a_b]['name'],
        'brand' : df[df['Label'] == a_b]['brand'],
        'price' : df[df['Label'] == a_b]['price'],
        'rank' : df[df['Label'] == a_b]['rank'],
    }
    source.data = new_data
    push_notebook()    

In [10]:
# interact the plot with callback 
output_notebook()

interact(update, op1 = option_1, op2 = option_2)
show(plot, notebook_handle = True)

interactive(children=(Dropdown(description='op1', options=('Moisturizer', 'Cleanser', 'Treatment', 'Face Mask'…

# 3. Cosine similarity

Now each item is plotted on the plane we can simply calculate the cosine similarities between each point. I took [Peat Miracle Revital Cream](https://www.sephora.com/product/peat-miracle-revital-cream-P412440) from Belif as an example. 

In [11]:
df_2 = df[df.Label == 'Moisturizer_Dry'].reset_index().drop('index', axis = 1)
df_2['dist'] = 0.0

myItem = df_2[df_2.name.str.contains('Peat Miracle Revital')]
myItem

Unnamed: 0,level_0,Label,brand,name,price,rank,ingredients,Combination,Dry,Normal,Oily,Sensitive,X,Y,dist
87,286,Moisturizer_Dry,BELIF,Peat Miracle Revital Cream,58,4.7,"Water, Dipropylene Glycol, Glycerin, Caprylic/...",1,1,1,0,0,-0.076813,-0.521115,0.0


In [12]:
# getting the array for myItem
P1 = np.array([myItem.SVD1.values, myItem.SVD2.values]).reshape(1, -1)
P1

AttributeError: 'DataFrame' object has no attribute 'SVD1'

In [13]:
# cosine similarities with other items
for i in range(len(df_2)):
    P2 = np.array([df_2['X'][i], df_2['Y'][i]]).reshape(-1, 1)
    df_2.dist[i] = (P1 * P2).sum() / (np.sqrt(np.sum(P1))*np.sqrt(np.sum(P2)))

NameError: name 'P1' is not defined

If we sort the result in ascending order, we can see the top 5 closest cosmetic items like below.

In [14]:
df_2 = df_2.sort_values('dist')
df_2[['name', 'brand', 'dist']].head(5)

Unnamed: 0,name,brand,dist
0,Crème de la Mer,LA MER,0.0
121,Camera Ready CC Cream Broad Spectrum SPF 30 Da...,SMASHBOX,0.0
122,CC Cream Daily Correct Broad Spectrum SPF 35+ ...,SUPERGOOP!,0.0
123,GinZing™ SPF 40 Energy-Boosting Tinted Moistur...,ORIGINS,0.0
124,Premier Cru Cream,CAUDALIE,0.0


These are the top 5 cosmetics that have similar properties with myItem. With this list, we can produce a recommendation for new products. If we sort them in descending way, then the list could be used as *'the worst choice for you'*.