# **Makeup Shades Analysis**

## **Introduction**  
The cosmetics industry thrives on the ability to deliver products that resonate with customer preferences. Analyzing makeup shades and their properties is essential for brands to understand trends, improve product offerings, and cater to diverse markets. This project explores a dataset of makeup shades, focusing on their color properties represented in HSV (Hue, Saturation, Value) and HEX codes, as well as their grouping and classification.

## **Problem Statement**  
The dataset contains missing values and unstructured information, which hinders a smooth analysis of the makeup shades. Without proper cleaning and exploration, identifying patterns and insights, such as how shades align across product groups and brands, becomes challenging. Furthermore, the lack of a clear understanding of color properties may limit the dataset's utility for further applications in product development or marketing strategies.

## **Objective**  
This notebook aims to:
1. **Clean the Dataset**: Handle missing values and ensure a structured dataset ready for analysis.
2. **Analyze Color Properties**: Explore the distribution and relationships of color attributes (HSV and HEX).
3. **Provide Insights**: Uncover trends and patterns in makeup shades across brands and product groups to support data-driven decision-making in the cosmetics industry.


# **1. Data Cleaning and Preprocessing of Makeup Dataset**

In this section, we perform data cleaning and preprocessing tasks on the makeup dataset. The main steps include identifying missing values, imputing them, and verifying the process.


**1.1 Importing necessary libraries**

Mmport pandas as pd

**1.2 Loading Dataset**

Next, we load the dataset from a CSV file into a pandas DataFrame. Be sure to update the file path to match the location of your dataset.

**1.3 Identifying Missing Values**

Before we begin cleaning the dataset, it's crucial to check for missing values. The following code checks for any missing values in the dataset.

**1.4 Imputing Missing Values**

In this step, we handle the missing values in the numerical columns (H, S, and V) by imputing the mean of each respective column.

**1.5 Verifying the Cleaning Process**

After imputing the missing values, we verify that the missing values have been handled correctly by checking again for any missing data.


In [20]:
# Import necessary libraries
import pandas as pd

# Load the dataset
makeup_data = pd.read_csv('shades.csv')  # Ensure the correct path to the dataset

# Display the first few rows of the dataset
print("Initial Dataset:")
display(makeup_data.head())

# Step 1: Identify missing values
print("\nMissing Values Summary (Before Cleaning):")
missing_values_summary = makeup_data.isnull().sum()
print(missing_values_summary)

# Step 2: Impute missing values with column means for numerical columns
columns_to_impute = ['H', 'S', 'V']
makeup_data[columns_to_impute] = makeup_data[columns_to_impute].fillna(makeup_data[columns_to_impute].mean())

# Step 3: Verify the cleaning process
print("\nMissing Values Summary (After Cleaning):")
cleaned_data_missing_summary = makeup_data.isnull().sum()
print(cleaned_data_missing_summary)



Initial Dataset:


Unnamed: 0,brand,brand_short,product,product_short,hex,H,S,V,L,group
0,Maybelline,mb,Fit Me,fmf,f3cfb3,26.0,0.26,0.95,86,2
1,Maybelline,mb,Fit Me,fmf,ffe3c2,32.0,0.24,1.0,92,2
2,Maybelline,mb,Fit Me,fmf,ffe0cd,23.0,0.2,1.0,91,2
3,Maybelline,mb,Fit Me,fmf,ffd3be,19.0,0.25,1.0,88,2
4,Maybelline,mb,Fit Me,fmf,bd9584,18.0,0.3,0.74,65,2



Missing Values Summary (Before Cleaning):
brand             0
brand_short       0
product           0
product_short     0
hex               0
H                12
S                12
V                12
L                 0
group             0
dtype: int64

Missing Values Summary (After Cleaning):
brand            0
brand_short      0
product          0
product_short    0
hex              0
H                0
S                0
V                0
L                0
group            0
dtype: int64


**1.5 Displaying Column Names**

This step helps to ensure that the dataset contains the expected columns, and it can be useful for verifying the presence of the columns before performing further data manipulation tasks.

In [21]:
# Display the column names to verify their existence
print("Columns in the dataset:")
print(makeup_data.columns)


Columns in the dataset:
Index(['brand', 'brand_short', 'product', 'product_short', 'hex', 'H', 'S',
       'V', 'L', 'group'],
      dtype='object')


**1.5 Dropping Irrelevant Column Names**

Dropping Columns: The drop(columns=[...]) method is used to remove specific columns from the dataset. In this case, we are removing the brand_short and product_short columns.

Confirmation: After dropping the columns, we use display(makeup_data.head()) to display the first few rows and confirm that the columns have been successfully removed.

In [22]:
# Drop the 'brand_short' and 'product_short' columns
makeup_data = makeup_data.drop(columns=['brand_short', 'product_short'])

# Display the first few rows to confirm the columns were dropped
print("Dataset after dropping 'brand_short' and 'product_short' columns:")
display(makeup_data.head())


Dataset after dropping 'brand_short' and 'product_short' columns:


Unnamed: 0,brand,product,hex,H,S,V,L,group
0,Maybelline,Fit Me,f3cfb3,26.0,0.26,0.95,86,2
1,Maybelline,Fit Me,ffe3c2,32.0,0.24,1.0,92,2
2,Maybelline,Fit Me,ffe0cd,23.0,0.2,1.0,91,2
3,Maybelline,Fit Me,ffd3be,19.0,0.25,1.0,88,2
4,Maybelline,Fit Me,bd9584,18.0,0.3,0.74,65,2


**8. Saving the Cleaned Dataset**

After performing the necessary cleaning and preprocessing steps, we save the cleaned dataset to a CSV file for further use.

```python
# Save the cleaned dataset
cleaned_file_path = r'C:\Users\silin\Downloads\Makeup.Popularity\cleaned_shades.csv'  # Updated path
makeup_data.to_csv(cleaned_file_path, index=False)
print(f"\nCleaned dataset saved to {cleaned_file_path}")


In [24]:
# Save the cleaned dataset
cleaned_file_path = r'C:\Users\silin\Downloads\Makeup.Popularity\cleaned_shades.csv'  # Updated path
makeup_data.to_csv(cleaned_file_path, index=False)
print(f"\nCleaned dataset saved to {cleaned_file_path}")



Cleaned dataset saved to C:\Users\silin\Downloads\Makeup.Popularity\cleaned_shades.csv


## **2. Exploratory Data Analysis**