# Exploratory Data Analysis – DataCo Supply Chain

This notebook performs data inspection and cleaning of the DataCo dataset before further analysis.


In [None]:
# Import Library
import pandas as pd

In [1]:
# Load Data
df = pd.read_csv("./data/DataCoSupplyChainDataset.csv", encoding="latin1")
print(df.head())

In [None]:
# Check Information Data
print(df.info())
print(df.isnull().sum())

<!-- Insight -->
- There are several columns with null/missing values, such as Customer Zipcode, Order Zipcode, and Product Description. (These do not have a significant impact, so they can either be removed [optional] or left as-is.)
- Convert order date (DateOrders) and shipping date (DateOrders) to datetime.
- Create a New Column: Shipping delays.
- Drop Irrelevant Data

In [None]:
# Cleaning and Preprocessing Data
df["Customer FullName"] = df[["Customer Fname", "Customer Lname"]].fillna("").agg(" ".join, axis=1).str.strip()

df["order date (DateOrders)"] = pd.to_datetime(df["order date (DateOrders)"])
df["shipping date (DateOrders)"] = pd.to_datetime(df["shipping date (DateOrders)"])
df["shipping delays (DateOrders)"] = (df["shipping date (DateOrders)"]-df["order date (DateOrders)"]).dt.days

df = df.drop(columns=["Latitude", "Longitude", "Order Zipcode", "Product Description", "Customer Fname", "Customer Lname"])

In [None]:
# Save Cleaned Data
df.to_csv("./data/DataCoSupplyChainDataset_Cleaned.csv", index=False)