# **Problem Statement**
We are a personalized skincare brand, and we are looking for someone who is skilled in using Typeform and can also create recommendation algorithms. As a specialist, your job will involve creating user-friendly Typeform surveys and forms to collect the data we need. You will also be responsible for developing and improving recommendation algorithms based on the data we collect. This will involve analyzing user preferences, behavior, and patterns to create personalized recommendations.

# **Solution**
Typeform created: Link Below
https://oroni7j6an3.typeform.com/to/MJ6DKbTd

# **Building the model**
## 1. About the Data
The dataset consists of user preferences, behavior, and attributes like age, skin type, skin concerns, allergies, and product satisfaction. This data can help you make personalized recommendations based on patterns found in similar users.
## 2. Building the Recommendation System (Content-Based Filtering)
I have chosen this type of recommendation system because it suggests products or routines based on the user's features (age group, skin type, concerns, etc.). Since we don’t have user feedback on products yet, recommending products based on the preferences of similar users won’t be possible hence Collaborative Filtering won’t be suitable.





In [None]:
import pandas as pd
import numpy as np


In [None]:
skincare=pd.read_csv('/content/skincare_survey_data.csv')

In [None]:
# Add a user_id column to uniquely identify each user
skincare['user_id'] = skincare.index

In [None]:
skincare.head()

Unnamed: 0,Age Group,Gender,Skin Type,Skin Concerns,Breakouts Frequency,Allergies/Skin Conditions,Skincare Products,Routine Frequency,Satisfaction with Products,Environment,...,Skin Irritation from Ingredients,Avoided Ingredients,Skincare Goals,Purchase Frequency,Purchase Location,Monthly Spend,Recommendation Preference,Willingness to Try New Products,Sunscreen Use,user_id
0,36-45,Female,Dry,Dark circles,Often,Eczema,Exfoliant,Twice a day,Dissatisfied,Urban,...,Not sure,Fragrances,Hydration,Annually,Subscription services,$50-$100,In-app,Somewhat willing,Sometimes,0
1,18-25,Female,Combination,Dark spots,Often,Psoriasis,Serum,Twice a day,Very satisfied,Urban,...,No,Alcohol,Hydration,Annually,Subscription services,Less than $50,On the website,Very willing,No,1
2,46+,Female,Sensitive,Dark circles,Rarely,Rosacea,Cleanser,Twice a day,Satisfied,Cold,...,Yes,Parabens,Skin barrier repair,Every 2-3 months,Subscription services,Less than $50,Via email,Somewhat willing,Sometimes,2
3,18-25,Female,Sensitive,Wrinkles,Often,Psoriasis,Moisturizer,Twice a day,Very dissatisfied,Dry,...,Not sure,Fragrances,Hydration,Annually,Online,$50-$100,In-app,Very willing,Sometimes,3
4,26-35,Female,Combination,Dark circles,Often,Eczema,Moisturizer,Occasionally,Satisfied,Hot,...,Not sure,Parabens,Pore minimizing,Every 2-3 months,Subscription services,More than $100,Via email,Not willing,No,4


In [None]:
skincare.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 21 columns):
 #   Column                            Non-Null Count  Dtype 
---  ------                            --------------  ----- 
 0   Age Group                         200 non-null    object
 1   Gender                            200 non-null    object
 2   Skin Type                         200 non-null    object
 3   Skin Concerns                     200 non-null    object
 4   Breakouts Frequency               200 non-null    object
 5   Allergies/Skin Conditions         154 non-null    object
 6   Skincare Products                 200 non-null    object
 7   Routine Frequency                 200 non-null    object
 8   Satisfaction with Products        200 non-null    object
 9   Environment                       200 non-null    object
 10  Sensitivity to New Products       200 non-null    object
 11  Skin Irritation from Ingredients  200 non-null    object
 12  Avoided Ingredients   

In [None]:
skincare.isnull().sum()

Unnamed: 0,0
Age Group,0
Gender,0
Skin Type,0
Skin Concerns,0
Breakouts Frequency,0
Allergies/Skin Conditions,46
Skincare Products,0
Routine Frequency,0
Satisfaction with Products,0
Environment,0


In [None]:
skincare['Allergies/Skin Conditions'].value_counts()

Unnamed: 0_level_0,count
Allergies/Skin Conditions,Unnamed: 1_level_1
Psoriasis,62
Rosacea,51
Eczema,41


In [None]:
#Converting categorical data into numeric values
from sklearn.preprocessing import LabelEncoder

# List of columns to encode
columns_to_encode = [
    'Gender', 'Skin Type', 'Skin Concerns', 'Breakouts Frequency', 'Allergies/Skin Conditions',
    'Skincare Products', 'Routine Frequency', 'Satisfaction with Products', 'Environment',
    'Sensitivity to New Products', 'Skin Irritation from Ingredients', 'Avoided Ingredients',
    'Skincare Goals', 'Purchase Frequency', 'Purchase Location', 'Recommendation Preference',
    'Willingness to Try New Products', 'Sunscreen Use'
]

# Initialize the LabelEncoder
label_encoder = LabelEncoder()

# Apply label encoding to each column
for column in columns_to_encode:
    skincare[column] = label_encoder.fit_transform(skincare[column])

# Check the first few rows of the updated dataframe
print(skincare.head())


  Age Group  Gender  Skin Type  Skin Concerns  Breakouts Frequency  \
0     36-45       0          1              1                    2   
1     18-25       0          0              2                    2   
2       46+       0          4              1                    3   
3     18-25       0          4              5                    2   
4     26-35       0          0              1                    2   

   Allergies/Skin Conditions  Skincare Products  Routine Frequency  \
0                          0                  1                  3   
1                          1                  3                  3   
2                          2                  0                  3   
3                          1                  2                  3   
4                          0                  2                  1   

   Satisfaction with Products  Environment  ...  \
0                           0            5  ...   
1                           4            5  ...   
2    

In [None]:
#Standardizing 'Age Group' and 'Monthly Spend' columns
from sklearn.preprocessing import StandardScaler

# Map Age Group to numerical values
age_group_mapping = {
    'Under 18': 0,
    '18-25': 1,
    '26-35': 2,
    '36-45': 3,
    '46+': 4
}
skincare['Age Group'] = skincare['Age Group'].map(age_group_mapping)

# Map Monthly Spend to numerical values
monthly_spend_mapping = {
    'Less than $50': 0,
    '$50-$100': 1,
    'More than $100': 2
}
skincare['Monthly Spend'] = skincare['Monthly Spend'].map(monthly_spend_mapping)

# Initialize the StandardScaler
scaler = StandardScaler()

# Standardize the 'Age Group' and 'Monthly Spend' columns
skincare[['Age Group', 'Monthly Spend']] = scaler.fit_transform(skincare[['Age Group', 'Monthly Spend']])

# Check the first few rows of the updated dataframe
print(skincare[['Age Group', 'Monthly Spend']].head())


   Age Group  Monthly Spend
0   0.539039       0.037236
1  -1.185885      -1.203971
2   1.401500      -1.203971
3  -1.185885       0.037236
4  -0.323423       1.278443


# Feature Engineering

In [None]:
#Behavior-Based Features:
#1.Create Breakout Severity by combining Breakouts Frequency and Acne Concern
skincare['Breakout Severity'] = skincare['Breakouts Frequency'] * (skincare['Skin Concerns'] == 'Acne').astype(int)

#2.Create Sensitivity Risk based on product sensitivity and irritation risk
skincare['Sensitivity Risk'] = skincare['Sensitivity to New Products'] + skincare['Skin Irritation from Ingredients']


In [None]:
# Preview the data in the 'Skincare Product' column
print(skincare['Skincare Product'].head())

# Count the number of products each user uses by splitting the string and counting the items
skincare['Total Products Used'] = skincare['Skincare Product'].apply(lambda x: len(x.split(', ')))

# Preview the result
print(skincare[['Skincare Product', 'Total Products Used']].head())

KeyError: 'Skincare Product'

In [None]:
#Product Preferences
#1.Create binary feature for moisturizer preference
skincare['Moisturizer User'] = skincare['Skincare Products'].apply(lambda x: 1 if '2' in str(x) else 0)

#2.Create a 'Heavy Routine' feature for users who use Cleanser, Moisturizer, and Sunscreen
skincare['Heavy Routine'] = skincare['Skincare Products'].apply(lambda x: 1 if all(product in str(x) for product in ['0', '2', '4']) else 0)




In [None]:
# Multiply Satisfaction with Routine Frequency to create a correlation feature
skincare['Satisfaction vs Routine'] = skincare['Satisfaction with Products'] * skincare['Routine Frequency']

# Multiply Satisfaction with Breakouts Frequency to create a correlation feature
skincare['Satisfaction vs Breakouts'] = skincare['Satisfaction with Products'] * skincare['Breakouts Frequency']

# Create an interaction feature between Skin Type and Sensitivity to New Products
skincare['SkinType-Sensitivity Interaction'] = skincare['Skin Type'] * skincare['Sensitivity to New Products']



In [None]:
#Clustering
from sklearn.cluster import KMeans

# Select routine-related features for clustering
routine_features = skincare[['Routine Frequency', 'Breakout Severity']]
kmeans = KMeans(n_clusters=3, random_state=0)
skincare['Routine Cluster'] = kmeans.fit_predict(routine_features)

# Select skin type and environment for clustering
kmeans = KMeans(n_clusters=3, random_state=0)
skincare['SkinType-Environment Cluster'] = kmeans.fit_predict(skincare[['Skin Type', 'Environment']])

# Select satisfaction and spending behavior for clustering
kmeans = KMeans(n_clusters=3, random_state=0)
skincare['Satisfaction-Spend Cluster'] = kmeans.fit_predict(skincare[['Satisfaction with Products', 'Monthly Spend']])


In [None]:
# prompt: download skincare as a csv file

from google.colab import files

skincare.to_csv('skincare_processed.csv', encoding = 'utf-8-sig')
files.download('skincare_processed.csv')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
skincare.head()

Unnamed: 0,Age Group,Gender,Skin Type,Skin Concerns,Breakouts Frequency,Allergies/Skin Conditions,Skincare Products,Routine Frequency,Satisfaction with Products,Environment,...,Purchase Frequency,Purchase Location,Monthly Spend,Recommendation Preference,Willingness to Try New Products,Sunscreen Use,Breakout Severity,Sensitivity Risk,Moisturizer User,Heavy Routine
0,0.539039,0,1,1,2,0,1,3,0,5,...,0,2,0.037236,0,1,1,0,2,0,0
1,-1.185885,0,0,2,2,1,3,3,4,5,...,0,2,-1.203971,1,2,0,0,1,0,0
2,1.4015,0,4,1,3,2,0,3,2,0,...,1,2,-1.203971,2,1,1,0,3,0,0
3,-1.185885,0,4,5,2,1,2,3,3,1,...,0,1,0.037236,0,2,1,0,3,1,0
4,-0.323423,0,0,1,2,0,2,1,2,2,...,1,2,1.278443,2,0,0,0,3,1,0


In [None]:
from sklearn.metrics.pairwise import cosine_similarity
scaled_features = skincare[['Age Group', 'Monthly Spend']]

# Compute cosine similarity between users
similarity_matrix = cosine_similarity(scaled_features)

def recommend_for_user(user_id, top_n=3):
    # Get the most similar users based on the user_id
    similar_users = similarity_matrix[user_id].argsort()[-top_n-1:-1][::-1]

    # Recommend products based on similar users' preferences
    recommendations = []
    for similar_user in similar_users:
        # Collect the preferred products or routines of similar users
        recommended_products = skincare.iloc[similar_user]['Skincare Products']
        recommendations.append(recommended_products)

    return recommendations

# Example: Recommend for user with user_id = 4
print(recommend_for_user(4))


[0.0, 2.0, 5.0]


In [None]:
  nbfrom sklearn.metrics.pairwise import cosine_similarity

# Assuming you already have the scaled features for 'Age Group' and 'Monthly Spend'
scaled_features = skincare[['Age Group', 'Monthly Spend']]

# Compute cosine similarity between users
similarity_matrix = cosine_similarity(scaled_features)

# Mapping encoded values to product names
product_mapping = {
    0: 'Cleanser',
    1: 'Exfoliant',
    2: 'Moisturizer',
    3: 'Serum',
    4: 'Sunscreen',
    5: 'Toner'
}

from sklearn.metrics.pairwise import cosine_similarity

# Assuming you already have the scaled features for 'Age Group' and 'Monthly Spend'
scaled_features = skincare[['Age Group', 'Monthly Spend']]

# Compute cosine similarity between users
similarity_matrix = cosine_similarity(scaled_features)

# Mapping encoded values to product names
product_mapping = {
    0: 'Cleanser',
    1: 'Exfoliant',
    2: 'Moisturizer',
    3: 'Serum',
    4: 'Sunscreen',
    5: 'Toner'
}

def recommend_for_user(user_id, top_n=2):
    # Get the most similar users based on the user_id
    similar_users = similarity_matrix[user_id].argsort()[-top_n-1:-1][::-1]

    # Recommend products based on similar users' preferences
    recommendations = []
    for similar_user in similar_users:
        # Collect the preferred product (encoded value) of the similar user
        recommended_product_encoded = skincare.iloc[similar_user]['Skincare Products']

        # Check if the value is a single encoded product (float/int) and map it to the product name
        if isinstance(recommended_product_encoded, (int, float)):
            recommended_product = product_mapping[int(recommended_product_encoded)]
            recommendations.append(recommended_product)
        else:
            # Handle if the value is a list of products
            recommended_products = [product_mapping[product] for product in recommended_product_encoded]
            recommendations.append(recommended_products)

    return recommendations

# Example: Recommend for user with user_id = 4
recommendations = recommend_for_user(4)
for i, rec in enumerate(recommendations, start=1):
    print(f"Recommendation from similar user {i}: {rec}")


Recommendation from similar user 1: Cleanser
Recommendation from similar user 2: Moisturizer


In [None]:
# similar_users = similarity_matrix[user_id].argsort()[-3-1:-1][::-1]

# # Recommend products based on similar users' preferences
# recommendations = []
# for similar_user in similar_users:
#     # Collect the preferred product (encoded value) of the similar user
#     recommended_product_encoded = skincare.iloc[similar_user]['Skincare Products']

#     # Check if the value is a single encoded product (float/int) and map it to the product name
#     if isinstance(recommended_product_encoded, (int, float)):
#         recommended_product = product_mapping[int(recommended_product_encoded)]
#         recommendations.append(recommended_product)
#     else:
#         # Handle if the value is a list of products
#         recommended_products = [product_mapping[product] for product in recommended_product_encoded]
#         recommendations.append(recommended_products)

# # Print the recommendations
# print("Recommended Products:")
# for rec in recommendations:
#     print(rec)

In [None]:
import pickle

with open('derma.pkl', 'wb') as f:
    pickle.dump((recommend_for_user), f)

In [None]:
# #Real model code
# # Get the encoded user inputs (these should have been passed to the function earlier)
# from sklearn.metrics.pairwise import cosine_similarity
# # Filter the skincare dataset based on the encoded user inputs
# filtered_data = skincare[['Gender','Skin Type','Skin Concerns']]

# similarity_matrix = cosine_similarity(filtered_data)

# # Mapping encoded values to product names
# product_mapping = {
#     0: 'Cleanser',
#     1: 'Exfoliant',
#     2: 'Moisturizer',
#     3: 'Serum',
#     4: 'Sunscreen',
#     5: 'Toner'
# }

# # If there are no exact matches, handle this case
# if filtered_data.empty:
#     print("No recommendations available for this combination.")

# # Get the first index of the filtered data as the base for similarity comparison
# base_user_index = filtered_data.index[0]

# # Find similar users based on the base user's index in the similarity matrix
# similar_users = similarity_matrix[base_user_index].argsort()[-3-1:-1][::-1]

# # Recommend products based on similar users' preferences
# recommendations = []
# for similar_user in similar_users:
#     # Collect the preferred product (encoded value) of the similar user
#     recommended_product_encoded = skincare.iloc[similar_user]['Skincare Products']

#     # Check if the value is a single encoded product (float/int) and map it to the product name
#     if isinstance(recommended_product_encoded, (int, float)):
#         recommended_product = product_mapping[int(recommended_product_encoded)]
#         recommendations.append(recommended_product)
#     else:
#         # Handle if the value is a list of products
#         recommended_products = [product_mapping[product] for product in recommended_product_encoded]
#         recommendations.append(recommended_products)

# # Return the recommendations list
# print (recommendations)


['Exfoliant', 'Moisturizer', 'Moisturizer']


In [None]:
import pickle

# Save the similarity matrix and product mapping to a pickle file
with open('derma.pkl', 'wb') as f:
    # You can pickle the similarity matrix and product mapping together
    pickle.dump((similarity_matrix, product_mapping), f)



In [None]:
from sklearn.neighbors import NearestNeighbors

# Use KNN to find similar users
knn = NearestNeighbors(n_neighbors=5, metric='cosine')
knn.fit(scaled_features)

# Find the nearest neighbors for a given user
distances, indices = knn.kneighbors([scaled_features[10]])  # Assuming user_id = 10

# Recommend products based on nearest neighbors
similar_users = indices.flatten()
recommendations = skincare.iloc[similar_users]['Skincare Products']
print(recommendations)