<a id="1"></a>
<p style="background-color:#2d8077;font-family:newtimeroman;color:#FFF9ED;font-size:120%;text-align:center;border-radius:10px 10px;"> BUSINESS UNDERSTANDING</p>

ChimpAI Beauty Boost: Smart Recommendations for Glam

The goal of our study is to help develop a product recommendation system using the provided datasets. This system aims to suggest products to users based on their preferences and interactions, leveraging the makeup products metadata and user review data. We will develop a business values proposition for predictive marketing to the board of Chimp Beauty to target customers based on features of their purchasing behavior to increase their customer lifetime value. Therefore, to ensure successful completion of this extensive study, we will implement a 5-step methodology.


<a id='top'></a>
<div class="list-group" id="list-tab" role="tablist">
<p style="background-color:#2d8077;font-family:newtimeroman;color:#FFF9ED;font-size:150%;text-align:center;border-radius:10px 10px;">TABLE OF CONTENTS</p> 

[OVERVIEW](#0)
- [1. BUSINESS UNDERSTANDING](#1)

- [2. DATA UNDERSTANDING](#2)

    - [2.1 Get information on the data](#2.1)
    - [2.2 Conducting summary statistics](#2.2)
    - [2.3 Taking care of Null values](#2.3)

- [3. DATA PREPARATION](#3)
    - [3.1 Taking care of "Null" string present in Metadata](#3.1)
    - [3.2 Replacing "null" values in Product Category with "Category Not Specified"](#3.2)
    - [3.3 Investigating all elements within each feature](#3.3)
    - [3.4 Transforming user reviews to long format](#3.4)

- [4. DATA VISUALIZATION STUDY](#4)

    - [4.1 Visualizing Distribution of Review Scores](#4.1)    
    - [4.2 Visualizing Top 20 Product Categories Distribution](#4.2)
    - [4.3 Visualizing Product Price Distribution](#4.3)
    - [4.4 Visualizing Top 20 Brands Distribution](#4.4)
    - [4.5 Visualizing Product Price vs. Category](#4.5)

- [5. DEPLOYMENT](#5)
    - [5.1 Recommendations Engine(POC)](#5.1)

- [6. IMPLEMENTATION PLAN](#6)

In [2]:
# IMPORTING PACKAGES
import numpy as np
import pandas as pd
import plotly.express as px

In [3]:
# LOADING PRODUCT META DATA
product__metadata= pd.read_excel("/Users/rupesh/Desktop/Res/Interviews/Chimp_AI/Data/Makeup_Products_Metadata.xlsx")
metadata_df = product__metadata.copy()
metadata_df

Unnamed: 0,Product ID,Product Category,Product Brand,Product Name,Product Price [SEK],Product Description,Product Tags,Product Contents
0,90001,Makeup > Face > Contour,ETUDE HOUSE,ETUDE HOUSE Face Color Shading - 02,120.0,Etude House Face Color Shading provides a shad...,"ETUDE HOUSE Face Color Shading - 02, Makeup, F...",
1,90002,Brand > L'Oreal Paris,L'Oreal Paris,L'Oreal Paris Glow Mon Amour Highlighting Drop...,90.8,It's time to skip the snooze button and get up...,L'Oreal Paris Glow Mon Amour Highlighting Drop...,"G927637, Cyclopentasiloxane, Dimethicone, Isod..."
2,90003,Makeup > Face > Foundation,The Body Shop,The Body Shop All-In-One Face Base - 045,279.0,Note: The Body Shop products will be dispatche...,"The Body Shop All-In-One Face Base - 045 , Mak...",
3,90004,Health & Wellness > Good for You > Super Food,True Elements,True Elements Sunflower Raw Seeds,35.0,Sunflower has always been admired for its beau...,"True Elements Sunflower Raw Seeds, Wellness, S...",Raw Sunflower Seeds
4,90005,Makeup > Lips > Lip Stain,Nykaa Cosmetics,Nykaa Wonderpuff Cushion Liquid Lipstick - Wer...,107.8,It's no secret that a good lippie is a real mo...,NykaaÃ‚Â Wonderpuff!Ã‚Â LipÃ‚Â &Ã‚Â CheekÃ‚Â C...,
...,...,...,...,...,...,...,...,...
561,90562,Null,Ralph Lauren,Ralph Lauren Polo Red Intense Eau De Parfum,980.0,"SOURCED FROM PARCOS, THE OFFICIAL BRAND PARTNE...","Ralph Lauren Polo Red Intense Eau De Parfum, F...",
562,90563,Makeup > Nails > Nail Polish,Orly,Orly Breathable Treatment + Color Nail Lacquer...,200.0,Orly Breathable Treatment + Colour Nail Lacque...,Orly Breathable Treatment + Color Nail Lacquer...,
563,90564,Makeup > Eyes > Under Eye Concealer,BOLLYGLOW,BOLLYGLOW Concealer + Corrector Masala - Cocoa,133.0,Have you been waking up with raccoon eyes due ...,BOLLYGLOW Concealer + Corrector Masala - Cocoa...,
564,90565,Makeup > Makeup Kits > Eye Palettes,L.A. Girl,L.A. Girl Eye Lux Mesmerizing Eyeshadow - Trop...,135.0,"L.A Girl introduces a luxurious, long wearing ...",L.A. Girl Eye Lux Mesmerizing Eyeshadow - Trop...,


In [4]:
metadata_df.head(), metadata_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Product ID           566 non-null    int64  
 1   Product Category     566 non-null    object 
 2   Product Brand        566 non-null    object 
 3   Product Name         566 non-null    object 
 4   Product Price [SEK]  566 non-null    float64
 5   Product Description  566 non-null    object 
 6   Product Tags         523 non-null    object 
 7   Product Contents     259 non-null    object 
dtypes: float64(1), int64(1), object(6)
memory usage: 35.5+ KB


(   Product ID                               Product Category    Product Brand  \
 0       90001                        Makeup > Face > Contour      ETUDE HOUSE   
 1       90002                          Brand > L'Oreal Paris    L'Oreal Paris   
 2       90003                     Makeup > Face > Foundation    The Body Shop   
 3       90004  Health & Wellness > Good for You > Super Food    True Elements   
 4       90005                      Makeup > Lips > Lip Stain  Nykaa Cosmetics   
 
                                         Product Name  Product Price [SEK]  \
 0                ETUDE HOUSE Face Color Shading - 02                120.0   
 1  L'Oreal Paris Glow Mon Amour Highlighting Drop...                 90.8   
 2           The Body Shop All-In-One Face Base - 045                279.0   
 3                  True Elements Sunflower Raw Seeds                 35.0   
 4  Nykaa Wonderpuff Cushion Liquid Lipstick - Wer...                107.8   
 
                                  Pr

In [5]:
# LOADING USER REVIEW DATA
user_reviewdata= pd.read_excel("/Users/rupesh/Desktop/Res/Interviews/Chimp_AI/Data/User_review_data.xlsx")
userdata_df = user_reviewdata.copy()
userdata_df

Unnamed: 0,User,90001,90002,90003,90004,90005,90006,90007,90008,90009,...,90557,90558,90559,90560,90561,90562,90563,90564,90565,90566
0,Vincent,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Edgar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Addilyn,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marlee,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Javier,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
595,Mariana,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
596,Ivy,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
597,Kevin,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
598,Nora,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<a id="2"></a>  
<p style="background-color:#2d8077;font-family:newtimeroman;color:#FFF9ED;font-size:120%;text-align:center;border-radius:10px 10px;"> DATA UNDERSTANDING</p>

#### **Makup Products Metadata:**
#### Makup Products Metadata consists of 566 Rows & 8 Columns which are as follows :

- Product ID: A unique identifier for each product.
- Product Category: The category to which the product belongs, often including a hierarchy (e.g., Makeup > Face > Contour).
- Product Brand: The brand of the product.
- Product Name: The name of the product.
- Product Price [SEK]: The price of the product in Swedish Krona.
- Product Description: A detailed description of the product.
- Product Tags: Tags associated with the product for easier search and categorization.
- Product Contents: Ingredients or contents of the product, if applicable.


#### **User Review Data:**
#### User review data consists of 600 Rows & 567 Columns:

- User names: Customer name
- Followed by columns named after product IDs (e.g., 90001, 90002, ..., up to 90566).



<a id="2.1"></a>
## <b>2.1<span style='color:#de9e46'> Get information on the data</span></b>

In [6]:
metadata_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Product ID           566 non-null    int64  
 1   Product Category     566 non-null    object 
 2   Product Brand        566 non-null    object 
 3   Product Name         566 non-null    object 
 4   Product Price [SEK]  566 non-null    float64
 5   Product Description  566 non-null    object 
 6   Product Tags         523 non-null    object 
 7   Product Contents     259 non-null    object 
dtypes: float64(1), int64(1), object(6)
memory usage: 35.5+ KB


In [7]:
userdata_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 600 entries, 0 to 599
Columns: 567 entries, User to 90566
dtypes: int64(566), object(1)
memory usage: 2.6+ MB


<a id="2.2"></a>
## <b>2.2 <span style='color:#de9e46'> Conducting summary statistics</span></b>

In [8]:
metadata_df.isnull().sum()

Product ID               0
Product Category         0
Product Brand            0
Product Name             0
Product Price [SEK]      0
Product Description      0
Product Tags            43
Product Contents       307
dtype: int64

In [9]:
userdata_df.isnull().sum()

User     0
90001    0
90002    0
90003    0
90004    0
        ..
90562    0
90563    0
90564    0
90565    0
90566    0
Length: 567, dtype: int64

<a id="2.3"></a>
## <b>2.3 <span style='color:#de9e46'> Taking care of Null values in Makeup Metadata</span></b>

In [10]:
from string import punctuation

# Define a function to normalize text
def normalize_text(text):
    # Convert text to lowercase
    text = text.lower()
    # Remove punctuation
    text = ''.join([char for char in text if char not in punctuation])
    # Remove extra spaces
    text = ' '.join(text.split())
    return text

# Apply text normalization to the 'Product Description' column
metadata_df['Product Description'] = metadata_df['Product Description'].apply(normalize_text)

# Fill missing values in 'Product Tags' and 'Product Contents'
metadata_df['Product Tags'] = metadata_df['Product Tags'].fillna('No Tags').apply(normalize_text)
metadata_df['Product Contents'] = metadata_df['Product Contents'].fillna('Not Available').apply(normalize_text)

# Display the first few rows of the updated dataframe to verify changes
metadata_df.head(25)


Unnamed: 0,Product ID,Product Category,Product Brand,Product Name,Product Price [SEK],Product Description,Product Tags,Product Contents
0,90001,Makeup > Face > Contour,ETUDE HOUSE,ETUDE HOUSE Face Color Shading - 02,120.0,etude house face color shading provides a shad...,etude house face color shading 02 makeup face ...,not available
1,90002,Brand > L'Oreal Paris,L'Oreal Paris,L'Oreal Paris Glow Mon Amour Highlighting Drop...,90.8,its time to skip the snooze button and get up ...,loreal paris glow mon amour highlighting drops...,g927637 cyclopentasiloxane dimethicone isodode...
2,90003,Makeup > Face > Foundation,The Body Shop,The Body Shop All-In-One Face Base - 045,279.0,note the body shop products will be dispatched...,the body shop allinone face base 045 makeup face,not available
3,90004,Health & Wellness > Good for You > Super Food,True Elements,True Elements Sunflower Raw Seeds,35.0,sunflower has always been admired for its beau...,true elements sunflower raw seeds wellness sho...,raw sunflower seeds
4,90005,Makeup > Lips > Lip Stain,Nykaa Cosmetics,Nykaa Wonderpuff Cushion Liquid Lipstick - Wer...,107.8,its no secret that a good lippie is a real moo...,nykaaã‚â wonderpuffã‚â lipã‚â ã‚â cheekã‚â cus...,not available
5,90006,Natural > Shop By Concern > Acne Treatment,HealthVit,HealthVit Activated Charcoal Powder,49.8,experience the goodness of pure activated char...,healthvit activated charcoal powder 250gm well...,not available
6,90007,Brand > Nivea,Nivea,NIVEA Body Lotion Oil in Lotion Rose & Argan O...,52.0,indulge in the goodness of natural oils in a f...,nivea oil in lotion rose argan oil skin body l...,aqua glycerin dicaprylyl ether alcohol denat g...
7,90008,Natural > Skin > Face Wash,Lotus Herbals,Lotus Herbals Whiteglow Activated Charcoal Bri...,28.0,lotus herbals whiteglow activated charcoal bri...,lotus herbals whiteglow activated charcoal bri...,key ingredientsactivated charcoal coconut shel...
8,90009,Mom & Baby > Maternity Wear > Maternity Bra,Floret,Floret Pack of 2 Full-Coverage Maternity Bras ...,151.6,product color multicolorpack of two fullcovera...,floret pack of 2 fullcoverage maternity bras m...,not available
9,90010,Mom & Baby > Maternity Wear > Maternity Bra,Inner Sense,Inner Sense Organic Antimicrobial Soft Nursing...,449.4,product color multicolor its better for you an...,inner sense organic antimicrobial soft nursing...,not available


In [11]:
# Check for null values across the entire dataset, focusing on 'Product Category' and others
null_values_summary = metadata_df.isnull().sum()

# Re-apply missing value handling for 'Product Category' if missed previously
# This time, ensuring we accurately fill any discovered nulls
if 'Product Category' in null_values_summary and null_values_summary['Product Category'] > 0:
    metadata_df['Product Category'] = metadata_df['Product Category'].fillna('Category Not Specified')

# Update summary after handling
updated_null_values_summary = metadata_df.isnull().sum()

null_values_summary, updated_null_values_summary


(Product ID             0
 Product Category       0
 Product Brand          0
 Product Name           0
 Product Price [SEK]    0
 Product Description    0
 Product Tags           0
 Product Contents       0
 dtype: int64,
 Product ID             0
 Product Category       0
 Product Brand          0
 Product Name           0
 Product Price [SEK]    0
 Product Description    0
 Product Tags           0
 Product Contents       0
 dtype: int64)

<a id="3"></a>
<p style="background-color:#2d8077;font-family:newtimeroman;color:#FFF9ED;font-size:120%;text-align:center;border-radius:10px 10px;"> DATA PREPARATION</p>

<a id="3.1"></a>
## <b>3.1 <span style='color:#de9e46'> Taking care of "Null" string present in Metadata</span></b>

In [12]:
# Define a function to detect "Null" strings in a dataframe
def detect_null_strings(df):
    contains_null_string = {}
    for column in df.columns:
        # Check for various capitalization forms of "Null"
        null_string_mask = df[column].astype(str).str.lower().isin(["null", "n/a", "na", "none"])
        if null_string_mask.any():
            contains_null_string[column] = df[null_string_mask].index.tolist()
    return contains_null_string

# Apply the function to the output dataframe
null_string_occurrences = detect_null_strings(metadata_df)

null_string_occurrences


{'Product Category': [45,
  47,
  75,
  77,
  80,
  81,
  82,
  89,
  91,
  95,
  104,
  113,
  125,
  139,
  141,
  149,
  153,
  154,
  156,
  162,
  167,
  172,
  176,
  188,
  193,
  196,
  200,
  205,
  208,
  210,
  215,
  218,
  222,
  223,
  235,
  244,
  246,
  251,
  263,
  272,
  274,
  284,
  290,
  299,
  345,
  353,
  355,
  364,
  370,
  371,
  372,
  383,
  385,
  388,
  393,
  400,
  417,
  418,
  423,
  433,
  436,
  459,
  473,
  474,
  494,
  497,
  505,
  508,
  509,
  512,
  525,
  547,
  561]}

<a id="3.2"></a>
## <b>3.2 <span style='color:#de9e46'> Replacing "null" values in Product Category with "Category Not Specified" </span></b>

In [13]:
metadata_original_df = metadata_df
# Replace "null" string values in 'Product Category' with "Category Not Specified"
metadata_original_df['Product Category'] = metadata_original_df['Product Category'].astype(str).str.lower().replace(["null", "n/a", "na", "none"], "category not specified")

# Verify the replacement by checking the unique values in 'Product Category' to ensure "null" strings are replaced
updated_categories = metadata_original_df['Product Category'].unique()

# Check the first few updated categories to ensure correctness of the replacement
updated_categories[:10], len(updated_categories)


(array(['makeup > face > contour', "brand > l'oreal paris",
        'makeup > face > foundation',
        'health & wellness > good for you > super food',
        'makeup > lips > lip stain',
        'natural > shop by concern > acne treatment', 'brand > nivea',
        'natural > skin > face wash',
        'mom & baby > maternity wear > maternity bra',
        'makeup > face > concealer'], dtype=object),
 169)

In [14]:
metadata_original_df.head()

Unnamed: 0,Product ID,Product Category,Product Brand,Product Name,Product Price [SEK],Product Description,Product Tags,Product Contents
0,90001,makeup > face > contour,ETUDE HOUSE,ETUDE HOUSE Face Color Shading - 02,120.0,etude house face color shading provides a shad...,etude house face color shading 02 makeup face ...,not available
1,90002,brand > l'oreal paris,L'Oreal Paris,L'Oreal Paris Glow Mon Amour Highlighting Drop...,90.8,its time to skip the snooze button and get up ...,loreal paris glow mon amour highlighting drops...,g927637 cyclopentasiloxane dimethicone isodode...
2,90003,makeup > face > foundation,The Body Shop,The Body Shop All-In-One Face Base - 045,279.0,note the body shop products will be dispatched...,the body shop allinone face base 045 makeup face,not available
3,90004,health & wellness > good for you > super food,True Elements,True Elements Sunflower Raw Seeds,35.0,sunflower has always been admired for its beau...,true elements sunflower raw seeds wellness sho...,raw sunflower seeds
4,90005,makeup > lips > lip stain,Nykaa Cosmetics,Nykaa Wonderpuff Cushion Liquid Lipstick - Wer...,107.8,its no secret that a good lippie is a real moo...,nykaaã‚â wonderpuffã‚â lipã‚â ã‚â cheekã‚â cus...,not available


In [15]:
metadata_original_df.isnull().sum()

Product ID             0
Product Category       0
Product Brand          0
Product Name           0
Product Price [SEK]    0
Product Description    0
Product Tags           0
Product Contents       0
dtype: int64

#### <b><span style='color:#FF0000'> NOTE :</span></b>
- **From the above we can see that we do not have null values or NULL string present**

<a id="3.3"></a>
## <b>3.3 <span style='color:#de9e46'> Investigating all elements within each feature</span></b>

In [15]:
for colum in metadata_original_df:
    unique_values = np.unique(metadata_original_df[colum])
    nr_values = len(unique_values)
    if nr_values < 22:
        print("The number of unique values for features {} : {} --- {}".format(colum, nr_values,unique_values))
    else:
         print("The number of unique values for features {} : {}".format(colum, nr_values))

The number of unique values for features Product ID : 566
The number of unique values for features Product Category : 169
The number of unique values for features Product Brand : 235
The number of unique values for features Product Name : 566
The number of unique values for features Product Price [SEK] : 315
The number of unique values for features Product Description : 563
The number of unique values for features Product Tags : 524
The number of unique values for features Product Contents : 254


In [16]:
# Summary statistics for 'Product Price [SEK]'
price_summary_statistics = metadata_original_df['Product Price [SEK]'].describe()

price_summary_statistics


count     566.000000
mean      200.979505
std       270.225308
min         5.000000
25%        35.850000
50%        81.100000
75%       250.000000
max      1900.000000
Name: Product Price [SEK], dtype: float64

<a id="3.4"></a>
## <b>3.4 <span style='color:#de9e46'> Transforming user reviews to long format</span></b>

In [17]:
# Transforming the user reviews dataset from wide to long format
user_reviews_long_df = userdata_df.melt(id_vars='User', var_name='Product ID', value_name='Review Score')

# Dropping rows where Review Score is 0, assuming 0 indicates no interaction/review
user_reviews_long_df = user_reviews_long_df[user_reviews_long_df['Review Score'] != 0]

# Convert Product ID in user_reviews_long_df to numeric for consistency with product_info_df
user_reviews_long_df['Product ID'] = pd.to_numeric(user_reviews_long_df['Product ID'])

# Display the transformed user reviews dataset
user_reviews_long_df.head()


Unnamed: 0,User,Product ID,Review Score
89,Lila,90001,3
122,Emery,90001,5
175,Sadie,90001,1
191,Adelyn,90001,5
209,Abby,90001,4


<a id="4"></a>
<p style="background-color:#2d8077;font-family:newtimeroman;color:#FFF9ED;font-size:120%;text-align:center;border-radius:10px 10px;"> DATA VISUALIZATION STUDY</p>

<a id="4.1"></a>
## <b>4.1 <span style='color:#de9e46'> Visualizing Distribution of Review Scores</span></b>

In [20]:
# Create a histogram with Plotly
fig = px.histogram(user_reviews_long_df, x='Review Score', nbins=10,
                   labels={'Review Score': 'Review Score', 'count': 'Frequency'},
                   title='Distribution of Review Scores')

# Update layout
fig.update_layout(
    xaxis=dict(title='Review Score'),
    yaxis=dict(title='Frequency')
)

# Show the plot
fig.show()


![Opp](Images/Distribution_reviewscore.png)

#### <b><span style='color:#FF0000'> NOTE :</span></b>
- **Review score "3" has the highest frequency, with the bar reaching over 1456 counts.**
- **Review score "4" & "5" has the highest frequency, with the bar reaching over 2277 counts indicating that on average most customers had a overall positive experience.**

<a id="4.2"></a>
## <b>4.2 <span style='color:#de9e46'> Visualizing Top 20 Product Categories Distribution </span></b>

In [19]:
# Exclude 'Category Not Specified' from the analysis
filtered_df = metadata_original_df[metadata_original_df['Product Category'] != 'category not specified']

# Count the occurrences of each category and select the top 20
category_counts = filtered_df['Product Category'].value_counts().head(20)

# Create the bar chart
fig = px.bar(
    category_counts,
    x=category_counts.index,
    y=category_counts.values,
    title="Top 20 Product Categories Distribution (Excluding 'Category Not Specified')",
    labels={'x': 'Product Category', 'y': 'Number of Products'},
    color=category_counts.values,
    color_continuous_scale=px.colors.sequential.Viridis
)

# Show the figure
fig.show()


![Opp](Images/Top20Products.png)

#### <b><span style='color:#FF0000'> NOTE :</span></b>
- **From the above plot we can understand "Makeup > lips > lipstick" & "makeup > face > blush" are the most common categories, suggesting a high consumer interest or market demand for these types of products.**
- **The rest of the top categories include a mix of makeup, skincare, and natural products, reflecting a diverse interest in various beauty and personal care items among the consumers.**

<a id="4.3"></a>
## <b>4.3 <span style='color:#de9e46'> Visualizing Product Price Distribution </span></b>

In [20]:
# Histogram for Product Price Distribution
fig_price_distribution = px.histogram(
    metadata_original_df,
    x='Product Price [SEK]',
    title='Product Price Distribution',
    labels={'Product Price [SEK]': 'Price (SEK)'},
    color_discrete_sequence=['#636EFA']
)

fig_price_distribution.update_layout(bargap=0.2)
fig_price_distribution.show()


![Opp](Images/ProductPriceDisc.png)

#### <b><span style='color:#FF0000'> NOTE :</span></b>
- **The majority of products are priced at the lower end, indicating a large market for affordable beauty products.**
- **The distribution has a long tail to the right, with a small number of products at much higher prices, suggesting the presence of premium or luxury items in the dataset.**

<a id="4.4"></a>
## <b>4.4 <span style='color:#de9e46'> Visualizing Top 20 Brands Distribution </span></b>

In [21]:
# Get top 20 brands
top_brands = metadata_original_df['Product Brand'].value_counts().head(20)

fig_brand_distribution = px.bar(
    top_brands,
    x=top_brands.index,
    y=top_brands.values,
    title='Top 20 Brands Distribution',
    labels={'y': 'Number of Products', 'index': 'Brand'},
    color=top_brands.values,
    color_continuous_scale='Agsunset'
)

fig_brand_distribution.show()


![Opp](Images/Top20Brands.png)

#### <b><span style='color:#FF0000'> NOTE :</span></b>
- **Himalaya is the leading brand in terms of the number of products, followed closely by Nykaa Cosmetics and Lakme, indicating these brands have a wide variety of offerings in the dataset.**
- **The distribution of products across these brands is fairly even, with no single brand dominating, which could imply a competitive market with numerous choices for consumers.**

<a id="4.5"></a>
## <b>4.5 <span style='color:#de9e46'> Visualizing Product Price vs. Category </span></b>

In [22]:
# Assuming you've filtered or aggregated the dataset for visual clarity
top_categories = metadata_original_df['Product Category'].value_counts().head(20).index

fig_price_vs_category = px.scatter(
    metadata_original_df[metadata_original_df['Product Category'].isin(top_categories)],
    x='Product Category',
    y='Product Price [SEK]',
    title='Product Price vs. Category',
    labels={'Product Price [SEK]': 'Price (SEK)', 'Product Category': 'Category'},
    color='Product Price [SEK]',
    color_continuous_scale='Viridis'
)

fig_price_vs_category.update_layout(xaxis={'categoryorder':'total descending'})
fig_price_vs_category.show()


![Opp](Images/PpVsCategory.png)

#### <b><span style='color:#FF0000'> NOTE :</span></b>
- **A wide range of prices within categories, with some categories like "makeup > lips > lipstick" showing a broad price range, indicating a diversity of product options from budget to premium within the same category.**
- **Some categories have a tight cluster of prices, such as "mom & baby > maternity wear > maternity bra", suggesting more uniform pricing within that category.**
- **The presence of outliers, particularly in categories like "makeup > face > foundation", where there are a few products with significantly higher prices than the rest.**

<a id="5"></a>
<p style="background-color:#2d8077;font-family:newtimeroman;color:#FFF9ED;font-size:120%;text-align:center;border-radius:10px 10px;"> DEPLOYMENT</p>

<p align="center">
  <img src="Images/ChimpAI.png">
</p>


<a id="6"></a>
<p style="background-color:#2d8077;font-family:newtimeroman;color:#FFF9ED;font-size:120%;text-align:center;border-radius:10px 10px;"> IMPLEMENTATION PLAN</p>

**6. IMPLEMENTATION PLAN FOR CHIMPAI**

*Model Deployment and Prediction:*

Deploy the trained model on AI Platform, and schedule daily updates using the first three days of sales data for new SKUs. Update the model daily with new sales data and use it to predict sales for the next day, repeating this process until the 30th day. Cloud Functions and Cloud Scheduler can automate this process. Store the model predictions in BigQuery for easy access by stakeholders.

*Monitoring and Evaluation:*

  - *Production Schedule:* Based on the model's predictions, adjust the production plan to optimize inventory levels.

  - *b. Lead Time Consideration:* Factor in the 1.5-month lead time for production and delivery to the warehouse when making adjustments.

  - *c. Batch Orders:* Work within the constraint of monthly batch orders, finding ways to optimize production without modifying existing orders.

  - *d. Business Objectives:* Focus on minimizing lost sales opportunities due to stock-outs (1st criterion) and reducing overstock (2nd criterion) while adjusting the production plan.

Monitor the model's performance by comparing predictions with actual sales data. Use tools like AI Platform's Monitoring or Data Studio for visualization and analysis. If the model's performance degrades, retrain it using the latest data or explore alternative modeling techniques.


*Testing Strategy:*

Conduct A/B testing to validate the model's effectiveness in meeting the business objectives. Compare the performance of the production plan based on the model's predictions with a control group using the traditional production plan. Track metrics like lost sales opportunities and overstock to ensure the model achieves the desired outcomes.

By following these detailed steps, we can develop, deploy, and maintain an effective recommendation predictive model for ChimpAI.

*Cost Approximation*

- Data Exploration and Preprocessing (1 week): €5,000
- Feature Engineering (2 weeks): €10,000
- Model Selection and Training (2 weeks): €10,000
- Prediction and Updating Implementation (1 week): €5,000
- Evaluation and Model Fine-tuning (1 week): €5,000
- Testing and Deployment (1 week): €5,000
- Total Time: 8 weeks
- Total Cost: €40,000


In summary, this end-to-end solution leverages GCP services like BigQuery, Dataflow, DataPrep, AI Platform, Cloud Functions, and Cloud Scheduler to integrate, preprocess, and analyze the data, develop and deploy the predictive recommendation model, and continuously monitor and evaluate its performance. The costs associated with these services will depend on the specific requirements, such as data volume, storage, and processing needs. The time to implement this solution will also vary, depending on factors like data quality, model complexity, and resource availability.