# REVOLUTIONIZING SKINCARE WITH AI POWERED RECOMMENDATIONS FOR MELANIN RICH SKIN

GROUP MEMBERS
1. Esther Cheruiyot
2. Brian Githinji
3. Grace Gitau 
4. Maureen Imanene

In [1]:
#image

In [2]:
## PROJECT SUMMARY

## Business Problem
Black women represent a significant demographic in the beauty and skincare industry, yet
they face limited access to skincare products tailored to their specific needs, such as hyper- pigmentation, dryness, and sensitivity. Most available recommendation systems overlook
the unique skin concerns of Black women, offering general suggestions rather than targeted
solutions. This gap impacts consumer satisfaction, as Black women often struggle to find
effective products for their melanin-rich skin. 

This project aims to develop a recommendation system using advanced AI techniques to
cater specifically to Black women’s skincare needs. By integrating machine learning, content- based filtering, collaborative filtering, and sentiment analysis, the system will offer
personalized skincare recommendations. Leveraging variation_desc (product descriptions
like “tone for fairest skin”) as a classification feature, we aim to distinguish and target
products that align with melanin-rich skin concerns.

## Objectives
1. Develop a melanin-centered skincare recommendation system using deep learning, tailored for Black women’s unique skin needs. 2. Utilize content-based and collaborative filtering along with sentiment analysis to enhance recommendation accuracy. 
3. Deploy an accessible Streamlit interface for personalized, user-friendly skincare suggestions. 
4. Integrate Logistic Regression, SVD, and K-means clustering to improve recommendation precision.

## Stakeholders
1. *Users*: Black women seeking tailored skincare solutions. 
2. *Skincare Brands*: Companies interested in product insights for melanin-rich skin. 
3. *Healthcare Professionals*: Dermatologists who may use the system as a recommendation tool. 
4. *AI and Skincare Researchers*: Those exploring AI applications in skincare for under- represented groups.

## Data Understanding:
The dataset was collected via a Python scraper and contains:
- Product Information: Over 8,000 beauty products from the Sephora online store, including product and brand names, prices, ingredients, ratings, and various features. 
- User Reviews: Approximately 1 million reviews across over 2,000 products in the skincare category. These reviews include user appearances, skin types, and review ratings.

The key features include:
- Product Features: `product_id`, `product_name`, `brand_name`, `ingredients`, `rating`, `price_usd`, `highlights`, `variation_desc` (e.g., tone for fairest skin). 
- Review Features: `author_id`, `rating`, `review_text`, `skin_type`, `skin_tone`, and
`helpfulness`.

### STEP 1: DATA LOADING AND PREPARATION

In [1]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import mean_squared_error, precision_score, recall_score
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import TruncatedSVD
from sklearn.cluster import KMeans

import pickle
import nltk  # For NLP tasks
from nltk.sentiment import SentimentIntensityAnalyzer
import re

In [4]:
# Load the product info and skincare products reviews from Excel files
product_info_df = pd.read_excel('data/product_info.xls', engine='xlrd')
reviews_df = pd.read_excel('data/skincare_products_reviews.xls', engine='xlrd')

In [11]:
product_info_df.head()

Unnamed: 0,"product_id,product_name,brand_id,brand_name,loves_count,rating,reviews,size,variation_type,variation_value,variation_desc,ingredients,price_usd,value_price_usd,sale_price_usd,limited_edition,new,online_only,out_of_stock,sephora_exclusive,highlights,primary_category,secondary_category,tertiary_category,child_count,child_max_price,child_min_price",Unnamed: 1
0,"P473671,Fragrance Discovery Set,6342,19-69,632...",
1,"P473668,La Habana Eau de Parfum,6342,19-69,382...",
2,"P473662,Rainbow Bar Eau de Parfum,6342,19-69,3...",
3,"P473660,Kasbah Eau de Parfum,6342,19-69,3018,4...",
4,"P473658,Purple Haze Eau de Parfum,6342,19-69,2...",


In [12]:
reviews_df.head()

Unnamed: 0,",author_id,rating,is_recommended,helpfulness,total_feedback_count,total_neg_feedback_count,total_pos_feedback_count,submission_time,review_text,review_title,skin_tone,eye_color,skin_type,hair_color,product_id,product_name,brand_name,price_usd",Unnamed: 1
0,"0,1945004256,5,1.0,0.0,2,2,0,2022-12-10,""I abs...",
1,"1,5478482359,3,1.0,0.3333329856395721,3,2,1,20...",
2,"2,29002209922,5,1.0,1.0,2,0,2,2021-06-07,Works...",
3,"3,7391078463,5,1.0,1.0,2,0,2,2021-05-21,""this ...",
4,"4,1766313888,5,1.0,1.0,13,0,13,2021-03-29,""Thi...",


In [13]:
# Save the dataframes as CSV files
products = product_info_df.to_csv('data/product_info.csv', index=False)
reviews = reviews_df.to_csv('data/skincare_products_reviews.csv', index=False)

In [8]:
reviews.head()

AttributeError: 'NoneType' object has no attribute 'head'

In [2]:
# Load the product info and skincare products datasets
products = pd.read_csv('data/product_info.csv')
products.head()

Unnamed: 0,product_id,product_name,brand_id,brand_name,loves_count,rating,reviews,size,variation_type,variation_value,...,online_only,out_of_stock,sephora_exclusive,highlights,primary_category,secondary_category,tertiary_category,child_count,child_max_price,child_min_price
0,P473671,Fragrance Discovery Set,6342,19-69,6320,3.6364,11.0,,,,...,1,0,0,"['Unisex/ Genderless Scent', 'Warm &Spicy Scen...",Fragrance,Value & Gift Sets,Perfume Gift Sets,0,,
1,P473668,La Habana Eau de Parfum,6342,19-69,3827,4.1538,13.0,3.4 oz/ 100 mL,Size + Concentration + Formulation,3.4 oz/ 100 mL,...,1,0,0,"['Unisex/ Genderless Scent', 'Layerable Scent'...",Fragrance,Women,Perfume,2,85.0,30.0
2,P473662,Rainbow Bar Eau de Parfum,6342,19-69,3253,4.25,16.0,3.4 oz/ 100 mL,Size + Concentration + Formulation,3.4 oz/ 100 mL,...,1,0,0,"['Unisex/ Genderless Scent', 'Layerable Scent'...",Fragrance,Women,Perfume,2,75.0,30.0
3,P473660,Kasbah Eau de Parfum,6342,19-69,3018,4.4762,21.0,3.4 oz/ 100 mL,Size + Concentration + Formulation,3.4 oz/ 100 mL,...,1,0,0,"['Unisex/ Genderless Scent', 'Layerable Scent'...",Fragrance,Women,Perfume,2,75.0,30.0
4,P473658,Purple Haze Eau de Parfum,6342,19-69,2691,3.2308,13.0,3.4 oz/ 100 mL,Size + Concentration + Formulation,3.4 oz/ 100 mL,...,1,0,0,"['Unisex/ Genderless Scent', 'Layerable Scent'...",Fragrance,Women,Perfume,2,75.0,30.0


In [3]:
products.shape

(8494, 27)

In [4]:
products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8494 entries, 0 to 8493
Data columns (total 27 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   product_id          8494 non-null   object 
 1   product_name        8494 non-null   object 
 2   brand_id            8494 non-null   int64  
 3   brand_name          8494 non-null   object 
 4   loves_count         8494 non-null   int64  
 5   rating              8216 non-null   float64
 6   reviews             8216 non-null   float64
 7   size                6863 non-null   object 
 8   variation_type      7050 non-null   object 
 9   variation_value     6896 non-null   object 
 10  variation_desc      1250 non-null   object 
 11  ingredients         7549 non-null   object 
 12  price_usd           8494 non-null   float64
 13  value_price_usd     451 non-null    float64
 14  sale_price_usd      270 non-null    float64
 15  limited_edition     8494 non-null   int64  
 16  new   