# SUPPLY CHAIN ANALYTICS PROJECT

## Exercise DESCRIPTION

The project provides a real-world dataset focusing on supply chain analytics.

### Business demand analysis

**Requirements:** Create dashboard to analyze the business problem and improve the supply chain’s efficiency



### **Overall target:**
Create an _interactive dashboard_ to summarize the research of the problem of the supply chain and suggest the solution


In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## III) DATA PREPROCESSING

### Data overview

In [2]:
fulfillment=pd.read_csv('Data/fulfillment.csv')
fulfillment.head()

FileNotFoundError: [Errno 2] No such file or directory: 'Data/fulfillment.csv'

In [3]:
fulfillment.shape

(118, 2)

In [4]:
order=pd.read_excel('suppply-chain-data- 600final.xlsx')
order.head()

Unnamed: 0,Product type,SKU,Price,Availability,Number of products sold,Revenue generated,Customer demographics,Stock levels,Lead times,Order quantities,...,Location,Lead time,Production volumes,Manufacturing lead time,Manufacturing costs,Inspection results,Defect rates,Transportation modes,Routes,Costs
0,haircare,SKU0,69.808006,55.0,802.0,8661.996792,Non-binary,58.0,7.0,96.0,...,Mumbai,29.0,215.0,29.0,46.279879,Pending,0.22641,Road,Route B,187.752075
1,skincare,SKU1,14.843523,95.0,736.0,7460.900065,Female,53.0,30.0,37.0,...,Mumbai,23.0,517.0,30.0,33.616769,Pending,4.854068,Road,Route B,503.065579
2,haircare,SKU2,11.319683,34.0,8.0,9577.749626,Unknown,1.0,10.0,88.0,...,Mumbai,12.0,971.0,27.0,30.688019,Pending,4.580593,Air,Route C,141.920282
3,skincare,SKU3,61.163343,68.0,83.0,7766.836426,Non-binary,23.0,13.0,59.0,...,Kolkata,24.0,937.0,18.0,35.624741,Fail,4.746649,Rail,Route A,254.776159
4,skincare,SKU4,4.805496,26.0,871.0,2686.505152,Non-binary,5.0,3.0,56.0,...,Delhi,5.0,414.0,3.0,92.065161,Fail,3.14558,Air,Route A,923.440632


In [5]:
order.shape

(30871, 24)

The dataset provides three data tables including order_and_shipment, inventory and fulfillment. After examining the data fields, I noticed that the dataset generally represents the following key information

**Customer:** General information about customers including identifiers and addresses

**Order:** Information about the order including date of order, product and quantity ordered, order value

**Shipment:** Shipping information including shipping date, shipping mode

**Product:** Specific information about the ordered item including product name, product category, product department

**Warehouse Inventory:** Information on inventory management for each product name including monthly inventory, warehouse location, storage costs, order fulfillment



### Data cleaning

In [7]:
order.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30871 entries, 0 to 30870
Data columns (total 24 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   Order ID                     30871 non-null  int64 
 1    Order Item ID               30871 non-null  int64 
 2    Order YearMonth             30871 non-null  int64 
 3    Order Year                  30871 non-null  int64 
 4    Order Month                 30871 non-null  int64 
 5    Order Day                   30871 non-null  int64 
 6   Order Time                   30871 non-null  object
 7   Order Quantity               30871 non-null  int64 
 8   Product Department           30871 non-null  object
 9   Product Category             30871 non-null  object
 10  Product Name                 30871 non-null  object
 11   Customer ID                 30871 non-null  int64 
 12  Customer Market              30871 non-null  object
 13  Customer Region              30

We have some unecessary columns in the dataset. For example, the Order Item ID column show the ID of the product in the _order_and_shipment_ table but it is not shown in the _fulfillment_ and _inventory_ table. So drop this column

Checking for missing value

In [12]:
fulfillment_missing_count = fulfillment.isna().sum()
print(fulfillment_missing_count)

Product Name                          0
Warehouse Order Fulfillment (days)    0
dtype: int64


There are no missing value in all three tables. This dataset seems to have already been clean

Checking for duplicates

In [13]:
#checking for duplicates
duplicate_rows = order[order.duplicated()]

In [19]:
duplicate_rows = fulfillment[fulfillment.duplicated()]

There are no duplicates. This dataset's quality is good!

Checking for which product category and product department the product name belongs to

In [14]:
# Create a new DataFrame with isolated columns
product_info = order[['Product Name', 'Product Category', 'Product Department']]
#Drop the duplicates to have the distinct product names
product = product_info.drop_duplicates()
product

Unnamed: 0,Product Name,Product Category,Product Department
0,Field & Stream Sportsman 16 Gun Fire Safe,Fishing,Fan Shop
157,Pelican Sunstream 100 Kayak,Water Sports,Fan Shop
294,Diamondback Women's Serene Classic Comfort Bi,Camping & Hiking,Fan Shop
418,O'Brien Men's Neoprene Life Vest,Indoor/Outdoor Games,Fan Shop
463,Team Golf Texas Longhorns Putter Grip,Accessories,Outdoors
...,...,...,...
15458,First aid kit,Health and Beauty,Health and Beauty
15464,Rock music,Music,Discs Shop
15621,Men's gala suit,Men's Clothing,Apparel
18869,Toys,Toys,Fan Shop


In [15]:
# Export the product information to a new CSV file
product.to_csv('product.csv', index=False)

In [16]:
product

Unnamed: 0,Product Name,Product Category,Product Department
0,Field & Stream Sportsman 16 Gun Fire Safe,Fishing,Fan Shop
157,Pelican Sunstream 100 Kayak,Water Sports,Fan Shop
294,Diamondback Women's Serene Classic Comfort Bi,Camping & Hiking,Fan Shop
418,O'Brien Men's Neoprene Life Vest,Indoor/Outdoor Games,Fan Shop
463,Team Golf Texas Longhorns Putter Grip,Accessories,Outdoors
...,...,...,...
15458,First aid kit,Health and Beauty,Health and Beauty
15464,Rock music,Music,Discs Shop
15621,Men's gala suit,Men's Clothing,Apparel
18869,Toys,Toys,Fan Shop


In [17]:
new_merge = pd.merge(fulfillment, product, on='Product Name', how='left')
new_merge

Unnamed: 0,Product Name,Warehouse Order Fulfillment (days),Product Category,Product Department
0,Perfect Fitness Perfect Rip Deck,8.3,Cleats,Apparel
1,Nike Men's Dri-FIT Victory Golf Polo,6.6,Women's Apparel,Golf
2,O'Brien Men's Neoprene Life Vest,5.5,Indoor/Outdoor Games,Fan Shop
3,Nike Men's Free 5.0+ Running Shoe,9.4,Cardio Equipment,Footwear
4,Under Armour Girls' Toddler Spine Surge Runni,6.3,Shop By Sport,Golf
...,...,...,...,...
113,Stiga Master Series ST3100 Competition Indoor,4.7,Hockey,Fitness
114,SOLE E35 Elliptical,1.9,,
115,Bushnell Pro X7 Jolt Slope Rangefinder,2.0,,
116,SOLE E25 Elliptical,2.1,,


There are some missing value from the merged table. Checking them

In [18]:
#Checking the missing value
missing_count = new_merge.isna().sum()
missing_count

Product Name                          0
Warehouse Order Fulfillment (days)    0
Product Category                      5
Product Department                    5
dtype: int64

In [19]:
null_records = new_merge[(new_merge['Product Category'].isna()) | (new_merge['Product Department'].isna())]
null_records

Unnamed: 0,Product Name,Warehouse Order Fulfillment (days),Product Category,Product Department
56,Dell Laptop,3.0,,
114,SOLE E35 Elliptical,1.9,,
115,Bushnell Pro X7 Jolt Slope Rangefinder,2.0,,
116,SOLE E25 Elliptical,2.1,,
117,Bowflex SelectTech 1090 Dumbbells,7.7,,


In [20]:
null_records['Product Name'].unique()

array(['Dell Laptop', 'SOLE E35 Elliptical',
       'Bushnell Pro X7 Jolt Slope Rangefinder', 'SOLE E25 Elliptical',
       'Bowflex SelectTech 1090 Dumbbells'], dtype=object)

In [21]:
new_merge['Product Category'].fillna('None', inplace=True)
new_merge['Product Department'].fillna('None', inplace=True)
new_merge[new_merge['Product Department'].isna()]

Unnamed: 0,Product Name,Warehouse Order Fulfillment (days),Product Category,Product Department


In [22]:
new_merge.head(10)

Unnamed: 0,Product Name,Warehouse Order Fulfillment (days),Product Category,Product Department
0,Perfect Fitness Perfect Rip Deck,8.3,Cleats,Apparel
1,Nike Men's Dri-FIT Victory Golf Polo,6.6,Women's Apparel,Golf
2,O'Brien Men's Neoprene Life Vest,5.5,Indoor/Outdoor Games,Fan Shop
3,Nike Men's Free 5.0+ Running Shoe,9.4,Cardio Equipment,Footwear
4,Under Armour Girls' Toddler Spine Surge Runni,6.3,Shop By Sport,Golf
5,Nike Men's CJ Elite 2 TD Football Cleat,7.0,Men's Footwear,Apparel
6,Field & Stream Sportsman 16 Gun Fire Safe,4.9,Fishing,Fan Shop
7,Pelican Sunstream 100 Kayak,1.8,Water Sports,Fan Shop
8,Diamondback Women's Serene Classic Comfort Bi,6.9,Camping & Hiking,Fan Shop
9,ENO Atlas Hammock Straps,2.2,Hunting & Shooting,Fan Shop
