# Dataset Overview
The dataset comprises the following features related to the fashion and makeup product supply chain:

* Product Type: The category of the product (e.g., clothing, accessories, makeup).
* SKU: Stock Keeping Unit, a unique identifier for each product.
* Price: The price of the product.
* Availability: The current availability status of the product.
* Number of Products Sold: The number of products sold for a given period.
* Revenue Generated: The total revenue generated from product sales.
* Customer Demographics: Information about the customers, such as age, gender, location, etc.
* Stock Levels: The quantity of each product available in the inventory.
* Lead Times: Time taken for an order to be fulfilled from the supplier's end to the customer's end.
* Order Quantities: The number of products ordered in each transaction.
* Shipping Times: Time taken for shipping products to customers.
* Shipping Carriers: The company responsible for shipping the products.
* Shipping Costs: The cost incurred for shipping each product.
* Supplier Name: The name of the supplier providing the products.
* Location: Location of the supplier.
* Production Volumes: The volume of products manufactured.
* Manufacturing Lead Time: Time taken for the manufacturing process.
* Manufacturing Costs: The cost incurred during the manufacturing process.
* Inspection Results: The results of quality inspection for products.
* Defect Rates: The percentage of defective products.
* Transportation Modes: The modes of transportation used to deliver products.
* Routes: The transportation routes taken for delivery.
* Costs: Various costs associated with the supply chain process.

## ETL (proccess)

In [1]:
# import library
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
# read and load Data from Excel file
supply= pd.read_csv("supply_chain_data.CSV",delimiter=";")
supply

Unnamed: 0,Product type,SKU,Price,Availability,Number of products sold,Revenue generated,Customer demographics,Stock levels,Lead times,Order quantities,...,Location,Lead time,Production volumes,Manufacturing lead time,Manufacturing costs,Inspection results,Defect rates,Transportation modes,Routes,Costs
0,haircare,SKU0,70,55,802,8662,Non-binary,58,7,96,...,Mumbai,29,215,29,46,Pending,0.2,Road,Route B,188
1,skincare,SKU1,15,95,736,7461,Female,53,30,37,...,Mumbai,23,517,30,34,Pending,4.9,Road,Route B,503
2,haircare,SKU2,11,34,8,9578,Unknown,1,10,88,...,Mumbai,12,971,27,31,Pending,4.6,Air,Route C,142
3,skincare,SKU3,61,68,83,7767,Non-binary,23,13,59,...,Kolkata,24,937,18,36,Fail,4.7,Rail,Route A,255
4,skincare,SKU4,5,26,871,2687,Non-binary,5,3,56,...,Delhi,5,414,3,92,Fail,3.1,Air,Route A,923
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,haircare,SKU95,78,65,672,7386,Unknown,15,14,26,...,Mumbai,18,450,26,59,Pending,1.2,Air,Route A,779
96,cosmetics,SKU96,24,29,324,7698,Non-binary,67,2,32,...,Mumbai,28,648,28,18,Pending,3.9,Road,Route A,189
97,haircare,SKU97,4,56,62,4371,Male,46,19,4,...,Mumbai,10,535,13,66,Fail,3.4,Road,Route A,540
98,skincare,SKU98,20,43,913,8526,Female,53,1,27,...,Chennai,28,581,9,6,Pending,2.9,Rail,Route A,882


In [3]:
supply.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 24 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Product type             100 non-null    object 
 1   SKU                      100 non-null    object 
 2   Price                    100 non-null    int64  
 3   Availability             100 non-null    int64  
 4   Number of products sold  100 non-null    int64  
 5   Revenue generated        100 non-null    int64  
 6   Customer demographics    100 non-null    object 
 7   Stock levels             100 non-null    int64  
 8   Lead times               100 non-null    int64  
 9   Order quantities         100 non-null    int64  
 10  Shipping times           100 non-null    int64  
 11  Shipping carriers        100 non-null    object 
 12  Shipping costs           100 non-null    int64  
 13  Supplier name            100 non-null    object 
 14  Location                 10

In [4]:
supply.describe()

Unnamed: 0,Price,Availability,Number of products sold,Revenue generated,Stock levels,Lead times,Order quantities,Shipping times,Shipping costs,Lead time,Production volumes,Manufacturing lead time,Manufacturing costs,Defect rates,Costs
count,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
mean,49.46,48.4,460.99,5776.02,47.77,15.96,49.22,5.75,5.55,17.08,567.84,14.77,47.3,2.272,529.24
std,31.18191,30.743317,303.780074,2732.87984,31.369372,8.785801,26.784429,2.724283,2.713137,8.846251,263.046861,8.91243,28.957126,1.461498,258.284986
min,2.0,1.0,8.0,1062.0,0.0,1.0,1.0,1.0,1.0,1.0,104.0,1.0,1.0,0.0,104.0
25%,19.75,22.75,184.25,2812.5,16.75,8.0,26.0,3.75,3.75,10.0,352.0,7.0,23.0,1.0,319.0
50%,51.0,43.5,392.5,6006.5,47.5,17.0,52.0,6.0,5.0,18.0,568.5,14.0,46.0,2.15,520.0
75%,77.25,75.0,704.25,8253.75,73.0,24.0,71.25,8.0,8.0,25.0,797.0,23.0,68.5,3.525,762.75
max,99.0,100.0,996.0,9866.0,100.0,30.0,96.0,10.0,10.0,30.0,985.0,30.0,99.0,4.9,997.0


In [5]:
supply.columns

Index(['Product type', 'SKU', 'Price', 'Availability',
       'Number of products sold', 'Revenue generated', 'Customer demographics',
       'Stock levels', 'Lead times', 'Order quantities', 'Shipping times',
       'Shipping carriers', 'Shipping costs', 'Supplier name', 'Location',
       'Lead time', 'Production volumes', 'Manufacturing lead time',
       'Manufacturing costs', 'Inspection results', 'Defect rates',
       'Transportation modes', 'Routes', 'Costs'],
      dtype='object')

In [6]:
# check object columns
supply["Customer demographics"].value_counts(normalize=True)

Unknown       0.31
Female        0.25
Non-binary    0.23
Male          0.21
Name: Customer demographics, dtype: float64

In [7]:
supply["Customer demographics"].unique()

array(['Non-binary', 'Female', 'Unknown', 'Male'], dtype=object)

In [12]:
supply=supply[supply["Customer demographics"]!="Unknown"]

In [13]:
supply["Customer demographics"].unique()

array(['Non-binary', 'Female', 'Male'], dtype=object)

In [14]:
# check for null values
supply.isnull().sum()

Product type               0
SKU                        0
Price                      0
Availability               0
Number of products sold    0
Revenue generated          0
Customer demographics      0
Stock levels               0
Lead times                 0
Order quantities           0
Shipping times             0
Shipping carriers          0
Shipping costs             0
Supplier name              0
Location                   0
Lead time                  0
Production volumes         0
Manufacturing lead time    0
Manufacturing costs        0
Inspection results         0
Defect rates               0
Transportation modes       0
Routes                     0
Costs                      0
dtype: int64

In [15]:
supply

Unnamed: 0,Product type,SKU,Price,Availability,Number of products sold,Revenue generated,Customer demographics,Stock levels,Lead times,Order quantities,...,Location,Lead time,Production volumes,Manufacturing lead time,Manufacturing costs,Inspection results,Defect rates,Transportation modes,Routes,Costs
0,haircare,SKU0,70,55,802,8662,Non-binary,58,7,96,...,Mumbai,29,215,29,46,Pending,0.2,Road,Route B,188
1,skincare,SKU1,15,95,736,7461,Female,53,30,37,...,Mumbai,23,517,30,34,Pending,4.9,Road,Route B,503
3,skincare,SKU3,61,68,83,7767,Non-binary,23,13,59,...,Kolkata,24,937,18,36,Fail,4.7,Rail,Route A,255
4,skincare,SKU4,5,26,871,2687,Non-binary,5,3,56,...,Delhi,5,414,3,92,Fail,3.1,Air,Route A,923
5,haircare,SKU5,2,87,147,2828,Non-binary,90,27,66,...,Bangalore,10,104,17,57,Fail,2.8,Road,Route A,235
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,cosmetics,SKU91,62,90,916,1935,Male,98,22,85,...,Delhi,5,207,28,40,Pending,0.6,Rail,Route B,997
92,cosmetics,SKU92,48,44,276,2100,Male,90,25,10,...,Mumbai,4,671,29,63,Pass,0.3,Rail,Route B,230
96,cosmetics,SKU96,24,29,324,7698,Non-binary,67,2,32,...,Mumbai,28,648,28,18,Pending,3.9,Road,Route A,189
97,haircare,SKU97,4,56,62,4371,Male,46,19,4,...,Mumbai,10,535,13,66,Fail,3.4,Road,Route A,540


In [16]:
test= pd.read_csv("processed_data.csv")
test

Unnamed: 0,Product type,SKU,Price,Availability,Number of products sold,Revenue generated,Customer demographics,Stock levels,Lead times,Order quantities,...,Location,Lead time,Production volumes,Manufacturing lead time,Manufacturing costs,Inspection results,Defect rates,Transportation modes,Routes,Costs
0,haircare,SKU0,69.808006,55,802,8661.996792,Non-binary,58,7,96,...,Mumbai,29,215,29,46.279879,Pending,0.226410,Road,Route B,187.752075
1,skincare,SKU1,14.843523,95,736,7460.900065,Female,53,30,37,...,Mumbai,23,517,30,33.616769,Pending,4.854068,Road,Route B,503.065579
2,skincare,SKU3,61.163343,68,83,7766.836426,Non-binary,23,13,59,...,Kolkata,24,937,18,35.624741,Fail,4.746649,Rail,Route A,254.776159
3,skincare,SKU4,4.805496,26,871,2686.505152,Non-binary,5,3,56,...,Delhi,5,414,3,92.065161,Fail,3.145580,Air,Route A,923.440632
4,haircare,SKU5,1.699976,87,147,2828.348746,Non-binary,90,27,66,...,Bangalore,10,104,17,56.766476,Fail,2.779194,Road,Route A,235.461237
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64,cosmetics,SKU91,62.111965,90,916,1935.206794,Male,98,22,85,...,Delhi,5,207,28,39.772883,Pending,0.626002,Rail,Route B,996.778315
65,cosmetics,SKU92,47.714233,44,276,2100.129755,Male,90,25,10,...,Mumbai,4,671,29,62.612690,Pass,0.333432,Rail,Route B,230.092783
66,cosmetics,SKU96,24.423131,29,324,7698.424766,Non-binary,67,2,32,...,Mumbai,28,648,28,17.803756,Pending,3.872048,Road,Route A,188.742141
67,haircare,SKU97,3.526111,56,62,4370.916580,Male,46,19,4,...,Mumbai,10,535,13,65.765156,Fail,3.376238,Road,Route A,540.132423
