<h1 style="
    color:#F06292;
    font-family:Georgia;
    text-align:center;
">
ðŸ§´ Skincare Recommendation System
</h1>


<h2 style="
    color:#9575CD;
    font-family:Georgia;
    text-align:center;
">
 Skin-type Based Product recommendation using ingredients
</h2>



<h3 style="color:#4DB6AC; font-family:Georgia;">
 1. Import required libraries
</h3>


In [1]:
import pandas as pd #data handling
import numpy as np
import re #cleaning ingredient list

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity #ingredient based recommendation
from collections import Counter


<h3 style="color:#4DB6AC; font-family:Georgia;">
2. Load the dataset
</h3>


In [2]:
import pandas as pd
df = pd.read_csv("skincare_products_clean.csv")
df.head()


Unnamed: 0,product_name,product_url,product_type,clean_ingreds,price
0,The Ordinary Natural Moisturising Factors + HA...,https://www.lookfantastic.com/the-ordinary-nat...,Moisturiser,"['capric triglyceride', 'cetyl alcohol', 'prop...",Â£5.20
1,CeraVe Facial Moisturising Lotion SPF 25 52ml,https://www.lookfantastic.com/cerave-facial-mo...,Moisturiser,"['homosalate', 'glycerin', 'octocrylene', 'eth...",Â£13.00
2,The Ordinary Hyaluronic Acid 2% + B5 Hydration...,https://www.lookfantastic.com/the-ordinary-hya...,Moisturiser,"['sodium hyaluronate', 'sodium hyaluronate', '...",Â£6.20
3,AMELIORATE Transforming Body Lotion 200ml,https://www.lookfantastic.com/ameliorate-trans...,Moisturiser,"['ammonium lactate', 'c12-15', 'glycerin', 'pr...",Â£22.50
4,CeraVe Moisturising Cream 454g,https://www.lookfantastic.com/cerave-moisturis...,Moisturiser,"['glycerin', 'cetearyl alcohol', 'capric trigl...",Â£16.00


<h3 style="color:#4DB6AC; font-family:Georgia;">
3. Basic Data Inspection
</h3>


In [3]:
df.info()
df.isnull().sum()
df = df.dropna(subset=["clean_ingreds"]) # if any rows have missing ingredients drop them


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1138 entries, 0 to 1137
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   product_name   1138 non-null   object
 1   product_url    1138 non-null   object
 2   product_type   1138 non-null   object
 3   clean_ingreds  1138 non-null   object
 4   price          1138 non-null   object
dtypes: object(5)
memory usage: 44.6+ KB


<h3 style="color:#4DB6AC; font-family:Georgia;">
4. Clean the ingredients text
</h3>


In [4]:
import re

# Function to clean ingredient text
def clean_ingredients(text):
    if isinstance(text, str): 
        text = text.lower()
        text = re.sub(r"[^a-zA-Z, ]", "", text)
        text = text.replace(",", " ")
        return text
    else:
        return ""
df["clean_ingredients"] = df["clean_ingreds"].apply(clean_ingredients)
df[["clean_ingreds", "clean_ingredients"]].head()



Unnamed: 0,clean_ingreds,clean_ingredients
0,"['capric triglyceride', 'cetyl alcohol', 'prop...",capric triglyceride cetyl alcohol propanedio...
1,"['homosalate', 'glycerin', 'octocrylene', 'eth...",homosalate glycerin octocrylene ethylhexyl ...
2,"['sodium hyaluronate', 'sodium hyaluronate', '...",sodium hyaluronate sodium hyaluronate panthe...
3,"['ammonium lactate', 'c12-15', 'glycerin', 'pr...",ammonium lactate c glycerin prunus amygdalu...
4,"['glycerin', 'cetearyl alcohol', 'capric trigl...",glycerin cetearyl alcohol capric triglycerid...


<h3 style="color:#4DB6AC; font-family:Georgia;">
5. Exploratory data insights
</h3>


<h3 style="color:#4DB6AC; font-family:Georgia;">
5.1. Product type distribution
</h3>


In [5]:
df["product_type"].value_counts()


product_type
Mask           124
Body Wash      123
Moisturiser    115
Cleanser       115
Serum          113
Eye Care       100
Mist            80
Oil             76
Toner           73
Balm            61
Exfoliator      57
Bath Salts      36
Bath Oil        33
Peel            32
Name: count, dtype: int64

<h3 style="color:#4DB6AC; font-family:Georgia;">
5.2. Price Distribution
</h3>


In [6]:
df["price"].describe()


count       1138
unique       290
top       Â£22.00
freq          33
Name: price, dtype: object

<h3 style="color:#4DB6AC; font-family:Georgia;">
5.3. Most common ingredients
</h3>


In [7]:
all_ingredients = " ".join(df["clean_ingredients"]).split()
Counter(all_ingredients).most_common(15)


[('extract', 3460),
 ('sodium', 2036),
 ('oil', 1342),
 ('acid', 1319),
 ('glycol', 1099),
 ('glycerin', 1069),
 ('alcohol', 720),
 ('phenoxyethanol', 618),
 ('flower', 601),
 ('parfum', 584),
 ('ci', 546),
 ('seed', 538),
 ('disodium', 452),
 ('edta', 449),
 ('citrus', 441)]

<h3 style="color:#4DB6AC; font-family:Georgia;">
6. Define skin-type Ingredient rules
</h3>


In [8]:
SKIN_TYPE_RULES = {
    "oily": {
        "good": ["niacinamide", "salicylic", "zinc", "tea tree"],
        "avoid": ["coconut oil", "shea butter", "lanolin"]
    },
    "dry": {
        "good": ["glycerin", "hyaluronic", "ceramide", "squalane"],
        "avoid": ["alcohol"]
    },
    "sensitive": {
        "good": ["centella", "aloe", "panthenol"],
        "avoid": ["fragrance", "essential oil", "alcohol"]
    },
    "combination": {
        "good": ["niacinamide", "green tea"],
        "avoid": []
    }
}


<h3 style="color:#4DB6AC; font-family:Georgia;">
7. Compute Skin-type compatibility scores 
</h3>


<h3 style="color:#4DB6AC; font-family:Georgia;">
7.1. Scoring function
</h3>


In [9]:
def skin_type_score(ingredients, skin_type):
    score = 0
    rules = SKIN_TYPE_RULES[skin_type]
    
    for good in rules["good"]:
        if good in ingredients:
            score += 1
            
    for bad in rules["avoid"]:
        if bad in ingredients:
            score -= 1
            
    return score
    


<h3 style="color:#4DB6AC; font-family:Georgia;">
7.2. Apply scores to dataset
</h3>


In [10]:
for skin in SKIN_TYPE_RULES.keys():
    df[f"{skin}_score"] = df["clean_ingredients"].apply(
        lambda x: skin_type_score(x, skin)
    )

df[[col for col in df.columns if "score" in col]].head()


Unnamed: 0,oily_score,dry_score,sensitive_score,combination_score
0,0,0,-1,0
1,1,1,-1,1
2,0,1,1,0
3,0,0,-1,0
4,0,1,-1,0


<h3 style="color:#4DB6AC; font-family:Georgia;">
8. Skin-type Insight Summary
</h3>


In [11]:
df[
    ["oily_score", "dry_score", "sensitive_score", "combination_score"]
].mean()


oily_score           0.162566
dry_score            0.323374
sensitive_score     -0.115114
combination_score    0.059754
dtype: float64

<h3 style="color:#4DB6AC; font-family:Georgia;">
9. Feature Engineering using TF-IDF
</h3>


In [12]:
tfidf = TfidfVectorizer(stop_words="english")
ingredient_matrix = tfidf.fit_transform(df["clean_ingredients"])


<h3 style="color:#4DB6AC; font-family:Georgia;">
10. Compute cosine similarity Matrix
</h3>


In [13]:
similarity_matrix = cosine_similarity(ingredient_matrix)


<h3 style="color:#4DB6AC; font-family:Georgia;">
11. Recommendation Function (Core logic)
</h3>


In [14]:
def recommend_products(skin_type, top_n=5):
    # Ranking products by skin-type compatibility
    ranked = df.sort_values(by=f"{skin_type}_score", ascending=False)
    
    # Selecting top candidates
    top_candidates = ranked.head(30)
    
    # Taking first product as reference
    idx = top_candidates.index[0]
    
    similarity_scores = list(enumerate(similarity_matrix[idx]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    
    top_indices = [i[0] for i in similarity_scores[:top_n]]
    
    return df.loc[top_indices][
        ["product_name", "product_type", "price", "product_url"]
    ]


<h3 style="color:#4DB6AC; font-family:Georgia;">
12. Testing the Recommendation system
</h3>


In [15]:
recommend_products("oily")


Unnamed: 0,product_name,product_type,price,product_url
521,Origins Out of Trouble 10 Minute Mask to Rescu...,Mask,Â£22.00,https://www.lookfantastic.com/origins-out-of-t...
834,PIXI Milky Tonic 100ml,Toner,Â£10.00,https://www.lookfantastic.com/pixi-milky-tonic...
851,PIXI Milky Tonic 250ml,Toner,Â£18.00,https://www.lookfantastic.com/pixi-milky-tonic...
309,PIXI Hydrating Milky Mist 80ml,Mist,Â£18.00,https://www.lookfantastic.com/pixi-hydrating-m...
205,L'Oreal Paris Dermo Expertise Revitalift Laser...,Serum,Â£24.99,https://www.lookfantastic.com/l-oreal-paris-de...


In [16]:
recommend_products("sensitive")


Unnamed: 0,product_name,product_type,price,product_url
449,Holika Holika Pure Essence Mask Sheet - Cucumber,Mask,Â£1.95,https://www.lookfantastic.com/holika-holika-pu...
465,Holika Holika Pure Essence Mask Sheet - Damask...,Mask,Â£1.95,https://www.lookfantastic.com/holika-holika-pu...
479,Holika Holika Pure Essence Mask Sheet - Lemon,Mask,Â£1.95,https://www.lookfantastic.com/holika-holika-pu...
475,Holika Holika Pure Essence Mask Sheet - Acai B...,Mask,Â£1.95,https://www.lookfantastic.com/holika-holika-pu...
450,Holika Holika Pure Essence Mask Sheet - Avocado,Mask,Â£1.95,https://www.lookfantastic.com/holika-holika-pu...


<h1 style="
    font-family: 'Georgia';
    color: #E75480;
">
    
1.  Ingredient similarity is an effective signal for skincare recommendation

2. Rule-based skin-type inference enables personalization without explicit labels

3. Oily and dry skin types are better supported in the dataset

4. Content-based filtering is suitable due to absence of user ratings
</h1>
