# **Project Name**    - **FedEx Logistics Performance Analysis**



##### **Project Type**    - **EDA**
##### **Contribution**    - **Individual**
##### **Team Member 1 -** **Kailas Balaji Wadje**



# **Project Summary -**

This Exploratory Data Analysis (EDA) project focuses on uncovering insights from FedEx Logistics' global supply chain dataset. The dataset encompasses detailed information on various aspects of logistics operations, including purchase orders (POs), shipment methods, vendor agreements (INCO terms), delivery schedules, and product-level data such as item descriptions and dosage forms. Given the complexity and scale of international logistics, the primary objective of this EDA is to support FedEx in enhancing supply chain efficiency, reducing operational costs, and ensuring timely deliveries.

The project also investigates vendor performance by comparing delivery timelines and fulfillment rates. This helps in identifying reliable vendors and those requiring closer monitoring or contract renegotiation. Furthermore, the dosage form and item description fields are assessed to identify any product-specific handling requirements that may influence shipping timelines or costs.

One of the critical goals of this analysis is to detect bottlenecks in the supply chain and recommend actionable solutions. These insights are especially crucial in the context of rising global eCommerce demand, where speed and reliability are essential to maintaining a competitive advantage. Through visualizations and statistical summaries, the EDA provides a comprehensive overview of how various factors interact and contribute to either operational efficiency or delays.

In conclusion, this EDA lays the groundwork for data-driven decision-making within FedEx Logistics. By highlighting inefficiencies and opportunities within the logistics network, it empowers the company to streamline operations, optimize shipment strategies, and improve customer satisfaction while effectively controlling freight costs.

# **GitHub Link -**

##### GitHub URL: https://github.com/Kailaswadje

# **Problem Statement**


1. **Team Performance in On-Time Delivery** :
Are shipments managed by specific teams (e.g., PMO – US) more likely to be
delivered on time compared to others?

2. **Impact of Shipment Mode on Delivery** :
Does the shipment mode (air, sea, etc.) influence the likelihood of meeting the scheduled delivery date?
Does the shipment mode impact the frequency of on-time deliveries?

3. **Country-Wise Delivery Delays** :
Do shipments from certain countries experience more delays compared to others?

4. **Vendor Lead Time and Delivery Performance** :
Is there a difference in delivery performance (on-time vs. delayed) based on the time between the PO Sent to Vendor Date and the Scheduled Delivery Date?

5. **Effect of INCO Terms on Vendor Performance** :
Does the type of INCO term used impact vendor delivery performance?

6. **Weight vs. Insurance Cost** :
Are shipments with higher weights more likely to incur higher insurance costs (Line Item Insurance)?

#### **Define Your Business Objective?**

The primary business objective of this Exploratory Data Analysis (EDA) project is to optimize FedEx Logistics' global supply chain performance by identifying key factors that influence delivery timelines, cost-efficiency, and vendor reliability. By analyzing historical shipment data—spanning purchase orders, shipment modes, vendor terms (INCO), delivery schedules, and product-level attributes—FedEx aims to:

Improve On-Time Delivery Rates:
Determine which variables (e.g., shipment mode, team, country of origin) most significantly impact timely deliveries.

Minimize Freight and Insurance Costs:
Understand cost drivers such as shipment weight and insurance fees to recommend more economical shipping practices.

Enhance Vendor and Route Performance:
Evaluate vendor reliability and regional performance trends to renegotiate terms, switch suppliers, or adjust lead times.

Support Data-Driven Decision-Making:
Provide actionable insights through visualizations and statistical validation to inform logistics planning, procurement strategies, and stockholder communications.

Ultimately, the goal is to enable faster, more reliable, and cost-effective logistics operations, which enhances customer satisfaction and sustains FedEx’s competitive edge in a growing global eCommerce market.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')
import plotly.express as px

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
data = pd.read_csv("/content/drive/MyDrive/SCMS_Delivery_History_Dataset.csv")
data

### Dataset First View

In [None]:
# Dataset First Look
data.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
data.shape

### Dataset Information

In [None]:
# Dataset Info
data.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
data.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
data.isnull().sum()

In [None]:
# Visualizing the missing values
missing_counts = data.isnull().sum()
missing_counts = missing_counts[missing_counts > 0].sort_values(ascending=False)

plt.figure(figsize=(5, 5))
missing_counts.plot(kind='bar')
plt.title("Number of Missing Values per Column")
plt.xlabel("Columns")
plt.ylabel("Missing Values")
plt.xticks(rotation=45)
plt.show()

### What did you know about your dataset?

**Domain and Context:**

The dataset is related to FedEx Logistics operations, covering the global supply chain.

It includes details on purchase orders (POs), shipment logistics, vendor terms, delivery timelines, and product-level information.

**Structure and Columns:**

The dataset contains a mix of categorical, date, and numerical columns, such as:

  **Categorical:** Managed By, Vendor INCO Term, Item Description, Brand, Dosage Form, Manufacturing Site

  **Date:** PO First Sent to Client Date, Scheduled Delivery Date, Delivery Recorded Date

  **Numerical:** Weight (Kilograms), Line Item Quantity, Pack Price, Line Item Insurance (USD)

**Missing Values Present:**

Columns like Sub Classification, Dosage Form, and Brand have notable missing values, which could affect item-level analysis.

A few records are missing in critical date fields, which can hinder the calculation of delivery performance (on-time vs delayed).

**Potential Key Features for Analysis:**

Delivery performance can be assessed using date columns.

Cost analysis can utilize fields like Weight, Insurance, and Pack Price.

INCO terms and Managed By teams may impact responsibility and efficiency.

**Consistency:**

Most numeric fields seem well-populated and clean, ideal for analysis without much imputation.

Categorical fields show sparsity in certain areas, possibly due to inconsistent data collection or vendor differences.

**Data Size:**

Judging by the y-axis in the heatmap, the dataset has approximately 10,000 rows, making it suitable for statistical analysis and machine learning models.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
data.columns

In [None]:
# Dataset Describe
data.describe()

### Variables Description

**Key Observations:**

High Variability: Large standard deviations in "Line Item Quantity", "Line Item Value", "Pack Price", and "Line Item Insurance" indicate high variability in the dataset.

Zero Values: Minimum values for "Line Item Value", "Pack Price", "Unit Price", and "Line Item Insurance" are zero, suggesting that some line items might not have a cost or insurance associated.

Wide Ranges: All monetary variables and quantities cover a broad range, indicating a mix of small and very large transactions or items.



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
data.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Step 1: Clean column names (remove spaces and special characters for easier access)
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

In [None]:
# Step 2: Check missing values
missing_values = data.isnull().sum()

In [None]:
# Step 3: Handle missing values
# For simplicity, fill missing numerical values with 0 and categorical with 'Unknown'
num_cols = data.select_dtypes(include=['float64', 'int64']).columns
cat_cols = data.select_dtypes(include=['object']).columns

data[num_cols] = data[num_cols].fillna(0)
data[cat_cols] = data[cat_cols].fillna('Unknown')

In [None]:
# Step 4: Convert relevant columns to numeric (if they were read as object)
cols_to_convert = ['Unit_of_Measure_Per_Pack', 'Line_Item_Quantity', 'Line_Item_Value',
                   'Pack_Price', 'Unit_Price', 'Weight_Kilograms', 'Freight_Cost_USD',
                   'Line_Item_Insurance_USD']

for col in cols_to_convert:
    data[col] = pd.to_numeric(data[col], errors='coerce').fillna(0)

In [None]:
# Step 5: Remove potential outliers (optional example: quantities over 500,000)
data = data[data['Line_Item_Quantity'] <= 500000]

In [None]:
# Step 6: Reset index after filtering
data = data.reset_index(drop=True)

In [None]:
# Step 7: Final dataset check
print(data.info())
print(data.head())

### What all manipulations have you done and insights you found?

1. **Column Name Cleaning**
What: Removed spaces and special characters in column names to make them code-friendly.

Why: Simplifies the data handling process and avoids syntax issues.

2. **Missing Value Handling**
What:

Filled missing numerical values with 0 (assumed missing means unavailable or not applicable).

Filled missing categorical values with 'Unknown'.

Why: Ensures the dataset is complete and ready for analysis or modeling without crashing due to NaN values.

3. **Data Type Conversion**
What: Converted columns like Unit_of_Measure_Per_Pack, Line_Item_Quantity, Line_Item_Value, Pack_Price, Unit_Price, Weight_Kilograms, Freight_Cost_USD, and Line_Item_Insurance_USD to numeric types.

Why: Some of these may have been read as object due to formatting issues. Numeric types allow proper mathematical operations and statistical analysis.

4. **Outlier Handling**
What: Removed rows where Line_Item_Quantity exceeded 500,000 units.

Why: Based on the summary statistics (from your earlier image), 500,000+ is extremely high and could heavily skew analysis. These are potential outliers or data entry errors.

5. **Index Reset**
What: Reset the index after filtering the dataset.

Why: Keeps the dataset clean and orderly.

We found some key insights when performing data wrangling. They are listed below:

1. **Presence of Missing Data**:
Some insurance costs were missing, which is significant because it may indicate incomplete transactional records or cases where no insurance was applied.

2. **High Variability Across Orders**:
There’s a very wide range of quantities and prices.

Example: Some line items have a quantity of 1, while others go up to hundreds of thousands.

Prices range from $0.00 to $1,345.64 per pack.

This suggests:

The dataset includes both low-cost, high-frequency items (possibly consumables) and high-cost, low-frequency items (possibly capital goods or bulk orders).

3. **Data Skewness:**
Variables like Line_Item_Quantity, Line_Item_Value, and Pack_Price are highly skewed.

There are likely some extreme values or power-law distributions, typical in supply chain datasets.

4. **Zero Value Occurrences**:
Some records have zero prices, zero quantities, or zero insurance.

These may represent free-of-cost items, errors, or incomplete transactions.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

# Step 3: Clean column names for easier access
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

# Step 4: Convert relevant columns to numeric
cols_to_convert = ['Unit_of_Measure_Per_Pack', 'Line_Item_Quantity', 'Line_Item_Value',
                   'Pack_Price', 'Unit_Price', 'Weight_Kilograms', 'Freight_Cost_USD',
                   'Line_Item_Insurance_USD']

for col in cols_to_convert:
    data[col] = pd.to_numeric(data[col], errors='coerce').fillna(0)

# Step 5: Filter out extreme outliers in quantity
data = data[data['Line_Item_Quantity'] <= 500000].reset_index(drop=True)

# Step 6: Plotting the univariate distribution of Line Item Quantity
sns.set(style="whitegrid")
plt.figure(figsize=(5, 5))
sns.histplot(data['Line_Item_Quantity'], bins=50, kde=True, color='skyblue')
plt.title('Distribution of Line Item Quantity', fontsize=16)
plt.xlabel('Line Item Quantity')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

I selected a Histogram with KDE (Kernel Density Estimation) because:

It effectively displays the distribution of a single continuous variable.

The KDE overlay helps visualize the underlying probability density of the variable.

It shows whether the data is skewed, multi-modal, or uniformly distributed.

##### 2. What is/are the insight(s) found from the chart?

The distribution is heavily right-skewed.

The majority of the orders are for low quantities (between 0 and ~20,000 units).

There is a long tail where a few orders have significantly high quantities (above 100,000 units).

The peak ordering frequency is concentrated at the lower end of the quantity scale.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can positively influence the business:

Inventory Planning: Since most orders are in smaller quantities, warehouses should prioritize stocking smaller batch sizes to meet the majority demand efficiently.

Supply Chain Optimization: High-frequency, low-quantity orders may require better vendor coordination and faster delivery cycles.

Bulk Order Strategy: The rare high-quantity orders could be handled with a specialized fulfillment process, possibly at discounted rates to encourage more such orders.

Yes, potential negative growth risks are indicated:

Order Imbalance: The low frequency of large orders suggests underutilization of bulk purchase opportunities. If the business over-prepares for large orders, it may lead to inventory stagnation.

High Operational Costs: Frequent small orders can increase shipping and handling costs, reducing profit margins unless optimized.



#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Import necessary libraries

# Filter out extreme outliers in freight cost (keep within 95th percentile for better visibility)
upper_limit = data['Freight_Cost_USD'].quantile(0.95)
data_filtered = data[data['Freight_Cost_USD'] <= upper_limit].reset_index(drop=True)

# Plotting the box plot for Freight Cost
plt.figure(figsize=(6, 6))
sns.boxplot(x=data_filtered['Freight_Cost_USD'], color='orange')
plt.title('Box Plot of Freight Cost (USD)', fontsize=16)
plt.xlabel('Freight Cost (USD)')
plt.show()


##### 1. Why did you pick the specific chart?

I chose a box plot because:

It is perfect for identifying outliers and spread of continuous numerical data.

It provides quick visual cues for median, quartiles, and extreme values.

Freight costs often have high variability due to differing shipment modes, distances, and package sizes, so visualizing this distribution is essential.



##### 2. What is/are the insight(s) found from the chart?

Median Freight Cost: Most deliveries have a low to moderate freight cost.

Outliers: There are some significant outliers (beyond the whiskers), even after trimming to the 95th percentile.

Cost Concentration: Freight costs are heavily concentrated in the lower range.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Shipping Cost Management: Understanding that most shipments are low-cost can help optimize shipping contracts and negotiate better deals with logistics providers.

Cost Forecasting: Businesses can focus forecasting models around the majority cost range rather than budgeting for the extreme outliers.

Resource Allocation: Efficient allocation of transportation budgets to focus on regular, low-cost shipments.

Potential Overcharging or Inefficiency: The presence of high-cost outliers may indicate:

Poor vendor selection.

Last-minute shipments using expensive modes (like air freight instead of sea).

Inefficient route planning.

If these high-cost shipments are frequent but not adding significant business value, they can erode profit margins and indicate negative operational efficiency.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Clean column names
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

# Plotting the count plot for Shipment Mode
plt.figure(figsize=(10, 6))
sns.countplot(y='Shipment_Mode', data=data, order=data['Shipment_Mode'].value_counts().index, palette='viridis')
plt.title('Frequency of Shipment Modes', fontsize=16)
plt.xlabel('Number of Shipments')
plt.ylabel('Shipment Mode')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a count plot (bar chart) because:

It is ideal for visualizing the frequency distribution of categorical variables.

It clearly shows which shipment modes are most frequently used.

Helps in understanding operational preferences and logistic strategies.

##### 2. What is/are the insight(s) found from the chart?

Most Common Mode: The most frequently used shipment mode is Air.

Less Frequent Modes: Modes like Truck and Sea are used less often.

Operational Preference: There is a heavy reliance on fast shipping (Air), which could indicate the need for quick deliveries.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, here’s how:

Cost Savings Potential: If air shipments dominate, the company can explore opportunities to shift some shipments to cheaper modes like sea or truck when time sensitivity is not critical.

Vendor Negotiation: High air shipment volume could strengthen bargaining power with air freight providers.

Delivery Optimization: Identifying overuse of air freight could help in creating optimized shipping schedules that balance cost and delivery time.

**Insights leads to Negative Growth:**

Excessive Air Shipping Costs: Heavy dependence on air freight could inflate shipping costs significantly, hurting profit margins in the long run.

Potential Inventory Mismanagement: Frequent need for urgent deliveries may indicate poor inventory forecasting or supply chain bottlenecks that force last-minute orders.

Sustainability Impact: Air freight has a higher carbon footprint, which could negatively impact companies focusing on green supply chain initiatives.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Clean column names
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

# Convert relevant columns to numeric
cols_to_convert = ['Unit_of_Measure_Per_Pack', 'Line_Item_Quantity', 'Line_Item_Value',
                   'Pack_Price', 'Unit_Price', 'Weight_Kilograms', 'Freight_Cost_USD',
                   'Line_Item_Insurance_USD']

for col in cols_to_convert:
    data[col] = pd.to_numeric(data[col], errors='coerce').fillna(0)

# Filter out extreme outliers for weight (keep within 95th percentile)
upper_limit = data['Weight_Kilograms'].quantile(0.95)
df_filtered = data[data['Weight_Kilograms'] <= upper_limit]

# Plotting the distribution of Weight
plt.figure(figsize=(12, 6))
sns.histplot(df_filtered['Weight_Kilograms'], bins=50, kde=True, color='green')
plt.title('Distribution of Shipment Weight (Kilograms)', fontsize=16)
plt.xlabel('Weight (Kilograms)')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a Histogram with KDE (Kernel Density Estimation) because:

It helps visualize the distribution and density of continuous variables.

It clearly shows skewness, spread, and the presence of outliers.

Weight is a logistics-critical variable that directly impacts shipping cost, shipment mode, and delivery efficiency.

##### 2. What is/are the insight(s) found from the chart?

Key Observations:
The distribution is right-skewed.

Most shipments are lightweight (clustered below ~500 kg).

Very few shipments are heavy (beyond 1000 kg), indicating rare bulk deliveries.

The frequency sharply declines as weight increases.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impacts:
Optimize Shipment Modes: Since most shipments are lightweight, businesses can prioritize cost-effective, faster shipping methods like air freight for these orders.

Vendor and Route Selection: Lightweight shipments allow for flexible logistics options and route optimizations.

Inventory Forecasting: Knowing that the majority of shipments are light can reduce warehousing and transportation cost assumptions.

Are there any insights that lead to negative growth?
Yes, potential risks include:

Missed Bulk Shipping Opportunities: If the company is only shipping small loads, it may be missing out on bulk discounts and lower per-unit shipping rates.

High Frequency, Low Volume: Frequent lightweight shipments could lead to increased handling and packaging costs, driving operational inefficiencies.

Supply Chain Stress: The system may be configured to frequently handle many small shipments rather than fewer, larger ones, which may overburden resources.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Clean column names
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

# Convert relevant columns to numeric
cols_to_convert = ['Unit_of_Measure_Per_Pack', 'Line_Item_Quantity', 'Line_Item_Value',
                   'Pack_Price', 'Unit_Price', 'Weight_Kilograms', 'Freight_Cost_USD',
                   'Line_Item_Insurance_USD']

for col in cols_to_convert:
    data[col] = pd.to_numeric(data[col], errors='coerce').fillna(0)

# Filter out extreme outliers for better visibility
upper_limit_weight = data['Weight_Kilograms'].quantile(0.95)
upper_limit_freight = data['Freight_Cost_USD'].quantile(0.95)
df_filtered = data[(data['Weight_Kilograms'] <= upper_limit_weight) & (data['Freight_Cost_USD'] <= upper_limit_freight)]

# Plotting the scatter plot
plt.figure(figsize=(12, 6))
sns.scatterplot(x='Weight_Kilograms', y='Freight_Cost_USD', data=df_filtered, alpha=0.6, color='purple')
plt.title('Freight Cost vs. Shipment Weight', fontsize=16)
plt.xlabel('Weight (Kilograms)')
plt.ylabel('Freight Cost (USD)')
plt.show()


##### 1. Why did you pick the specific chart?

 chose a scatter plot because:

It is the best visual tool to explore relationships between two continuous variables.

It helps identify correlations, clusters, outliers, and trends.

It is perfect for cost-related logistic analysis to understand how weight drives freight costs.

##### 2. What is/are the insight(s) found from the chart?

Key Insights:
Positive Correlation: There is a visible upward trend — as shipment weight increases, freight cost also tends to increase.

Cost Variability: Even for shipments with similar weights, freight costs can vary significantly. This could be due to factors like:

Shipment mode

Shipping distance

Urgency

Data Clustering: Most shipments are concentrated at lower weights and lower freight costs.

Outliers: A few shipments have much higher freight costs for moderate weights, which may indicate inefficient shipping decisions or premium services.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impacts:
Cost Control: Understanding this relationship helps logistics teams optimize shipping strategies for different weight brackets.

Shipping Mode Optimization: By analyzing cases where moderate-weight shipments incurred high freight costs, the company can reconsider shipping modes or renegotiate vendor contracts.

Forecasting Models: This can improve cost prediction models for future shipments based on weight.

Negative Impact :
Cost Inefficiency: The variability in freight costs for shipments of similar weight suggests possible inefficiencies or inconsistencies in shipping practices.

Vendor Selection Risk: Frequent use of premium shipping options for non-critical deliveries can erode profit margins.

Potential Customer Dissatisfaction: If customers are charged based on fluctuating freight costs, inconsistent pricing could negatively affect trust.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
# Clean column names
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

# Convert relevant columns to numeric
cols_to_convert = ['Unit_of_Measure_Per_Pack', 'Line_Item_Quantity', 'Line_Item_Value',
                   'Pack_Price', 'Unit_Price', 'Weight_Kilograms', 'Freight_Cost_USD',
                   'Line_Item_Insurance_USD']

for col in cols_to_convert:
    data[col] = pd.to_numeric(data[col], errors='coerce').fillna(0)

# Extract year from 'Scheduled_Delivery_Date'
data['Scheduled_Delivery_Date'] = pd.to_datetime(data['Scheduled_Delivery_Date'], errors='coerce')
data['Delivery_Year'] = data['Scheduled_Delivery_Date'].dt.year

# Filter data to remove extreme outliers for clear visualization
upper_limit_weight = data['Weight_Kilograms'].quantile(0.95)
upper_limit_freight = data['Freight_Cost_USD'].quantile(0.95)

df_filtered = data[(data['Weight_Kilograms'] <= upper_limit_weight) &
                 (data['Freight_Cost_USD'] <= upper_limit_freight) &
                 (~data['Shipment_Mode'].isnull()) &
                 (~data['Delivery_Year'].isnull())]

# Create Facet Grid
g = sns.FacetGrid(df_filtered, col="Shipment_Mode", hue="Delivery_Year", col_wrap=3, height=5, palette="viridis")
g.map_dataframe(sns.scatterplot, x="Weight_Kilograms", y="Freight_Cost_USD", alpha=0.7)
g.add_legend()
g.set_axis_labels("Weight (Kilograms)", "Freight Cost (USD)")
g.fig.suptitle("Multivariate Analysis: Freight Cost vs Weight by Shipment Mode and Year", fontsize=16, y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

A Facet Grid Scatter Plot is selected because:

It enables comparison across categories (Shipment Modes) in separate subplots.

It adds another variable (like Year of shipment) to track trends over time.

It is highly effective when exploring multi-dimensional patterns while retaining visual clarity.

##### 2. What is/are the insight(s) found from the chart?

Key Insights:
Cost-Weight Consistency: Across most shipment modes, heavier shipments consistently lead to higher freight costs.

Year-on-Year Cost Variance: Freight costs for similar weight ranges can vary significantly between years, indicating price fluctuations or changing vendor contracts.

Mode-Specific Behavior:

Air shipments consistently have higher costs per weight.

Sea shipments can carry heavier loads at lower costs.

Volume Clustering: Most shipments are concentrated in low to mid-weight ranges, suggesting a common product size profile.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impacts:
Data-Driven Negotiation: Insights on year-wise cost fluctuations can help negotiate better contracts with shipping partners.

Mode Optimization: Understanding mode-specific cost structures can help optimize mode selection based on weight and urgency.

Budget Forecasting: Year-based freight cost trends can help in better logistics budgeting for future periods.

Factors that may contribute to Negative Impact:
Potential Overpayment: Year-on-year cost increases for similar weight shipments may indicate poor vendor management or lack of price monitoring.

Inconsistent Mode Selection: Using expensive shipping modes unnecessarily (like air for non-urgent deliveries) can erode profit margins.

Yearly Cost Escalation: If unmonitored, these rising costs can negatively impact operating margins over time.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# Clean column names
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

# Convert relevant columns to numeric
data['Weight_Kilograms'] = pd.to_numeric(data['Weight_Kilograms'], errors='coerce').fillna(0)

# Filter outliers for clear visualization
upper_limit_weight = data['Weight_Kilograms'].quantile(0.95)
df_filtered = data[data['Weight_Kilograms'] <= upper_limit_weight]

# Plotting the histogram
plt.figure(figsize=(12, 8))
sns.histplot(df_filtered['Weight_Kilograms'], bins=30, kde=True, color='skyblue')
plt.title('Histogram: Distribution of Shipment Weights', fontsize=16)
plt.xlabel('Shipment Weight (Kilograms)')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

A histogram plot is ideal because:

It shows frequency distribution of a continuous variable.

It helps identify concentration ranges, spread, and skewness in the data.

It visually detects normal, uniform, or skewed distributions, which is essential for logistics planning.

The histogram is the most effective tool to quickly understand the volume distribution of shipment weights.

##### 2. What is/are the insight(s) found from the chart?

Right-Skewed Distribution:
Most shipments are lightweight (low-weight shipments dominate the dataset), with a long tail for higher weights.

Shipment Volume Clusters:
A large number of shipments fall within the low to mid-weight range (0-200 kg).

Low Frequency of Heavy Shipments:
There are relatively fewer heavyweight shipments, which may suggest that most items are smaller or medium-sized goods.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impacts:
Better Freight Negotiation:
Knowing that most shipments are lightweight allows the company to negotiate better pricing tiers with logistics providers for low-weight shipments.

Optimized Shipment Mode Selection:
Since most shipments are light, less expensive shipment modes (like sea or truck) can be preferred unless urgent.

Inventory Management:
High volume of small shipments indicates potential to consolidate shipments to further reduce costs.


Can Contribute for negative impact:
Overuse of Premium Shipping for Small Shipments:
If lightweight shipments are frequently sent via expensive air freight, it could erode profit margins unnecessarily.

Missed Consolidation Opportunities:
A high frequency of small shipments may indicate lack of shipment bundling, increasing operational and shipping costs.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
# Clean column names
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

# Convert relevant columns to numeric
data['Weight_Kilograms'] = pd.to_numeric(data['Weight_Kilograms'], errors='coerce').fillna(0)
data['Freight_Cost_USD'] = pd.to_numeric(data['Freight_Cost_USD'], errors='coerce').fillna(0)

# Filter out records with zero weight to avoid division errors
df_filtered = data[(data['Weight_Kilograms'] > 0) & (data['Freight_Cost_USD'] > 0)]

# Create new column: Cost per Kilogram
df_filtered['Cost_per_Kg'] = df_filtered['Freight_Cost_USD'] / df_filtered['Weight_Kilograms']

# Remove outliers for visualization
upper_limit_cost = df_filtered['Cost_per_Kg'].quantile(0.95)
df_filtered = df_filtered[df_filtered['Cost_per_Kg'] <= upper_limit_cost]

# Plotting the new column
plt.figure(figsize=(12, 8))
sns.boxplot(x='Shipment_Mode', y='Cost_per_Kg', data=df_filtered)
plt.title('Cost Efficiency: Freight Cost per Kilogram by Shipment Mode', fontsize=16)
plt.xlabel('Shipment Mode')
plt.ylabel('Cost per Kilogram (USD)')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a box plot because:

It effectively shows distribution, medians, and outliers across categories.

It is perfect for comparing cost efficiency (Cost per Kg) across shipment modes.

The chart helps visually identify which shipment modes are consistently more or less cost-efficient.

##### 2. What is/are the insight(s) found from the chart?

Key Insights:
Air shipments have the highest cost per kg, confirming their premium pricing structure.

Sea shipments consistently offer the lowest cost per kg, making them the most cost-efficient option for heavy shipments.

Truck shipments show moderate and variable cost per kg, indicating possible vendor differences or route-specific pricing.

There are some significant outliers in all shipment modes, suggesting occasional inefficient or urgent shipments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Positive Impacts:
Improved Cost Planning:

The client can develop mode-specific cost benchmarks and proactively monitor shipping cost efficiency.

Better Mode Selection:

For low-weight, non-urgent shipments, more cost-efficient options like sea or truck should be prioritized.

Vendor Performance Monitoring:

Outliers can trigger vendor audits to understand why certain shipments are significantly more expensive.

Factors that can cause negative impact:
Cost Leakages in Air Shipments:

Frequent use of high-cost per kg shipments may erode profitability, especially if urgency isn’t justified.

Poor Vendor Contracts:

Inconsistent cost per kg across similar shipment modes could indicate unoptimized vendor pricing agreements.

Process Gaps:

Outliers suggest potential lack of oversight in shipment approvals, leading to occasional overpayment.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
# Clean column names
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

# Convert relevant columns to numeric
data['Weight_Kilograms'] = pd.to_numeric(data['Weight_Kilograms'], errors='coerce').fillna(0)
data['Line_Item_Value'] = pd.to_numeric(data['Line_Item_Value'], errors='coerce').fillna(0)

# Filter out records with zero weight to avoid division errors
df_filtered = data[(data['Weight_Kilograms'] > 0) & (data['Line_Item_Value'] > 0)]

# Create new column: Value per Kilogram
df_filtered['Value_per_Kg'] = df_filtered['Line_Item_Value'] / df_filtered['Weight_Kilograms']

# Remove outliers for clear visualization
upper_limit_value = df_filtered['Value_per_Kg'].quantile(0.95)
df_filtered = df_filtered[df_filtered['Value_per_Kg'] <= upper_limit_value]

# Plotting the new column using boxplot
plt.figure(figsize=(12, 8))
sns.boxplot(x='Shipment_Mode', y='Value_per_Kg', data=df_filtered)
plt.title('Value Density: Line Item Value per Kilogram by Shipment Mode', fontsize=16)
plt.xlabel('Shipment Mode')
plt.ylabel('Value per Kilogram (USD)')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a box plot because:

It visually compares the distribution of value per kilogram across different shipment modes.

It highlights medians, variability, and outliers in cost density for each shipping method.

It helps to identify risky shipping patterns like high-value goods being shipped in low-security modes.

##### 2. What is/are the insight(s) found from the chart?

Key Insights:
Air shipments tend to carry the highest value per kg, indicating that urgent, high-value shipments are rightly prioritized.

Sea and truck shipments typically carry lower value per kg, which is expected as they are often used for bulk, lower-cost shipments.

Some outliers show that extremely high-value shipments are occasionally moved by truck or other less secure modes.

Value density varies significantly across all shipment modes, indicating inconsistent shipping policies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impacts:
Risk Management:

High-value shipments can be flagged for security audits or insurance reviews.

Mode Optimization:

The client can prioritize air shipment for high-value, low-weight items and sea/truck for low-value bulk shipments.

Fraud & Loss Prevention:

Helps identify when low-security modes are used for high-value goods, increasing security focus on vulnerable shipments.

Factors that may contribute to negative impact:
Security Risk:

Occasional movement of high-value shipments through sea or truck may expose the company to higher theft or damage risk without adequate protection.

Insurance Misalignment:

If high-value shipments are not properly insured or not shipped via appropriate modes, it can lead to major financial losses.

Process Inefficiency:

Lack of a shipment value threshold policy can lead to inconsistencies and missed cost or security optimizations.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
# Clean column names
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

# Convert relevant columns to numeric
data['Line_Item_Value'] = pd.to_numeric(data['Line_Item_Value'], errors='coerce').fillna(0)
data['Freight_Cost_USD'] = pd.to_numeric(data['Freight_Cost_USD'], errors='coerce').fillna(0)

# Create new column: Profit per Shipment
data['Profit_per_Shipment'] = data['Line_Item_Value'] - data['Freight_Cost_USD']

# Filter out extreme outliers for better visualization
upper_limit_profit = data['Profit_per_Shipment'].quantile(0.95)
lower_limit_profit = data['Profit_per_Shipment'].quantile(0.05)
df_filtered = data[(data['Profit_per_Shipment'] <= upper_limit_profit) & (data['Profit_per_Shipment'] >= lower_limit_profit)]

# Plotting the new column using boxplot
plt.figure(figsize=(12, 8))
sns.boxplot(x='Shipment_Mode', y='Profit_per_Shipment', data=df_filtered)
plt.title('Profit per Shipment by Shipment Mode', fontsize=16)
plt.xlabel('Shipment Mode')
plt.ylabel('Profit per Shipment (USD)')
plt.show()


##### 1. Why did you pick the specific chart?

I selected a box plot because:

It effectively compares the distribution of profit margins across different shipment modes.

It highlights medians, variability, and potential loss-making shipments.

It visually identifies which shipment modes offer consistent profitability and which may result in loss.

##### 2. What is/are the insight(s) found from the chart?

Key Insights:
Air shipments have the highest profit variability and show multiple shipments with very low or even negative profit margins.

Sea shipments show more stable profit margins, indicating better cost efficiency.

Truck shipments show moderate but consistent profits.

Some negative profit outliers exist in all shipment modes, which means shipping costs exceeded the shipment value.Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impacts:
Profitability Tracking:

This analysis enables the company to track and maximize profitability per shipment mode.

Process Optimization:

The company can review shipments with low or negative profit and refine shipping decisions.

Pricing Strategy:

This metric can help adjust selling prices or freight recovery charges for low-margin shipments.

Vendor Negotiation:

Helps in evaluating which vendors or routes consistently lead to low-profit shipments.

Factors that contribute to Negative impacts:
Negative Profit Shipments:

Repeated negative profit shipments suggest significant cost inefficiencies, poor pricing, or unoptimized shipping decisions.

Misaligned Shipment Mode Selection:

Choosing premium shipment modes (like air) for low-margin shipments can quickly erode overall profitability.

Hidden Process Gaps:

Without continuous monitoring, these profit leakages can go unnoticed and accumulate into substantial losses.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
# Clean column names
data.columns = data.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

# Convert relevant columns to numeric
cols_to_convert = ['Unit_of_Measure_Per_Pack', 'Line_Item_Quantity', 'Line_Item_Value',
                   'Pack_Price', 'Unit_Price', 'Weight_Kilograms', 'Freight_Cost_USD',
                   'Line_Item_Insurance_USD']

for col in cols_to_convert:
    data[col] = pd.to_numeric(data[col], errors='coerce').fillna(0)

# Extract year from 'Scheduled_Delivery_Date'
data['Scheduled_Delivery_Date'] = pd.to_datetime(data['Scheduled_Delivery_Date'], errors='coerce')
data['Delivery_Year'] = data['Scheduled_Delivery_Date'].dt.year

# Filter data to remove extreme outliers
upper_limit_weight = data['Weight_Kilograms'].quantile(0.95)
df_filtered = data[(data['Weight_Kilograms'] <= upper_limit_weight) &
                 (~data['Shipment_Mode'].isnull()) &
                 (~data['Delivery_Year'].isnull())]

# Plotting the multivariate violin plot
plt.figure(figsize=(14, 8))
sns.violinplot(x='Shipment_Mode', y='Weight_Kilograms', hue='Delivery_Year', data=df_filtered, split=True)
plt.title('Multivariate Analysis: Distribution of Shipment Weight by Mode and Year', fontsize=16)
plt.xlabel('Shipment Mode')
plt.ylabel('Weight (Kilograms)')
plt.legend(title='Delivery Year', bbox_to_anchor=(1.05, 1), loc=2)
plt.show()

##### 1. Why did you pick the specific chart?

The violin plot is ideal because:

It shows distribution, density, spread, and outliers across categories.

It combines the benefits of boxplots and kernel density estimation.

It is highly effective in visualizing multi-year, multi-mode shipment data to detect trends and anomalies.

It’s especially useful when we need to understand both the spread and concentration of values across multiple dimensions (like shipment mode and year).

##### 2. What is/are the insight(s) found from the chart?

Key Insights:
Air shipments generally show lower weight distributions with a few significant outliers, indicating occasional bulk shipments via air.

Sea shipments handle a wider range of weights, with a higher concentration of heavy shipments.

Truck and other shipment modes show mixed distributions, but tend to handle moderately weighted shipments.

Yearly Variability:

Some shipment modes (especially air) show fluctuations in shipment weight distributions across years, indicating changing logistics strategies or supply chain issues.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impacts:
Better Shipment Mode Planning:

Knowing the weight distributions per mode helps logistics teams plan the most cost-effective and appropriate shipment strategies.

Yearly Performance Review:

Identifying changing patterns across years can help evaluate the effectiveness of previous shipping decisions and adjust procurement strategies.

Targeted Cost Reduction:

Insights from this plot can help minimize unnecessary use of expensive air freight for heavier shipments.

fators that may cause negative impacts:
Sub-optimal Shipment Mode Use:

Using air freight for occasional heavy shipments could indicate poor planning, leading to higher costs and lower profit margins.

Process Inefficiency:

The wide variability across years without clear trend improvements suggests lack of consistent shipping policies or planning cycles.

Uncontrolled Year-on-Year Variability:

If not monitored, the changing patterns can result in logistical unpredictability and unstable shipping budgets.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
# Convert relevant columns to numeric
cols_to_convert = ['Unit_of_Measure_Per_Pack', 'Line_Item_Quantity', 'Line_Item_Value',
                   'Pack_Price', 'Unit_Price', 'Weight_Kilograms', 'Freight_Cost_USD',
                   'Line_Item_Insurance_USD']

for col in cols_to_convert:
    data[col] = pd.to_numeric(data[col], errors='coerce').fillna(0)

# Filter outliers for better visibility
upper_limit_weight = data['Weight_Kilograms'].quantile(0.95)
upper_limit_freight = data['Freight_Cost_USD'].quantile(0.95)
upper_limit_value = data['Line_Item_Value'].quantile(0.95)

df_filtered = data[(data['Weight_Kilograms'] <= upper_limit_weight) &
                 (data['Freight_Cost_USD'] <= upper_limit_freight) &
                 (data['Line_Item_Value'] <= upper_limit_value)]

# Plotting the Bubble Plot
plt.figure(figsize=(14, 8))
sns.scatterplot(
    x='Weight_Kilograms',
    y='Freight_Cost_USD',
    size='Line_Item_Value',
    hue='Shipment_Mode',
    data=df_filtered,
    alpha=0.6,
    sizes=(50, 1000)
)

plt.title('Multivariate Analysis: Freight Cost vs. Weight with Shipment Mode & Value', fontsize=16)
plt.xlabel('Weight (Kilograms)')
plt.ylabel('Freight Cost (USD)')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()

##### 1. Why did you pick the specific chart?

A bubble plot is excellent for multivariate analysis because:

It simultaneously shows relationships between three or more variables.

Bubble size can represent monetary impact (Line_Item_Value).

Bubble color can add a categorical dimension (Shipment_Mode).

This plot provides a rich, multi-dimensional view of supply chain efficiency.

##### 2. What is/are the insight(s) found from the chart?

Key Insights:
Clear Relationship: Heavier shipments tend to have higher freight costs.

Shipment Mode Trends:

Air shipments often have higher freight costs for lighter weights.

Sea shipments handle heavier weights with relatively lower costs.

Value Concentration: Some high-value shipments are lightweight but still incur high shipping costs—indicating premium shipping decisions.

Diverse Cost Behavior: Even shipments with similar weights have different costs depending on the shipment mode.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impacts:
Shipping Mode Selection: Helps logistics teams choose cost-efficient shipping modes based on weight and item value.

Premium Shipment Control: Highlights cases where high-cost shipping might be unnecessarily applied to low-weight items, prompting policy revisions.

Vendor Negotiation: Supports better contract negotiation based on weight-cost trends across shipping modes.

Factors that contribute for negative impact:
Inefficient Mode Usage: Using expensive shipping modes (like air) for low-weight, low-value items may increase operational costs unnecessarily.

High Variability Risk: Different shipping modes and value levels can cause pricing inconsistency, potentially impacting customer satisfaction if costs are passed on.

Potential Over-reliance on Premium Freight: May limit profit margins if not balanced carefully.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Convert relevant columns to numeric
cols_to_convert = ['Unit_of_Measure_Per_Pack', 'Line_Item_Quantity', 'Line_Item_Value',
                   'Pack_Price', 'Unit_Price', 'Weight_Kilograms', 'Freight_Cost_USD',
                   'Line_Item_Insurance_USD']

for col in cols_to_convert:
    data[col] = pd.to_numeric(data[col], errors='coerce').fillna(0)

# Select numerical features for correlation
selected_columns = ['Unit_of_Measure_Per_Pack', 'Line_Item_Quantity', 'Line_Item_Value',
                    'Pack_Price', 'Unit_Price', 'Weight_Kilograms', 'Freight_Cost_USD',
                    'Line_Item_Insurance_USD']

# Compute correlation matrix
correlation_matrix = data[selected_columns].corr()

# Plot correlation heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of Supply Chain Variables', fontsize=16)
plt.show()

##### 1. Why did you pick the specific chart?

A correlation heatmap is the best visual to:

Understand the strength and direction of relationships between multiple numerical variables simultaneously.

Quickly spot strong positive or negative correlations.

Identify variables that heavily influence each other in supply chain operations (like weight, quantity, cost, value).

It provides a holistic, data-driven view of the dataset for strategic decision-making.

##### 2. What is/are the insight(s) found from the chart?

Key Insights:
Strong Positive Correlation:

Line_Item_Quantity and Line_Item_Value (very high correlation): More items → higher value.

Weight_Kilograms and Freight_Cost_USD: Heavier shipments → higher freight cost.

Moderate Correlations:

Line_Item_Value and Freight_Cost_USD: Costlier shipments often lead to higher shipping costs, but the relationship is weaker than expected, indicating pricing or shipping inconsistencies.

Unit_Price moderately influences Line_Item_Value (expected as unit price × quantity = total value).

Low or Negligible Correlations:

Unit_of_Measure_Per_Pack has very little correlation with other variables, indicating it does not significantly influence costs or weights directly.

#### Chart - 15 - Pair Plot

In [None]:
from typing_extensions import dataclass_transform
# Pair Plot visualization code
# Convert relevant columns to numeric
cols_to_convert = ['Unit_of_Measure_Per_Pack', 'Line_Item_Quantity', 'Line_Item_Value',
                   'Pack_Price', 'Unit_Price', 'Weight_Kilograms', 'Freight_Cost_USD',
                   'Line_Item_Insurance_USD']

for col in cols_to_convert:
    data[col] = pd.to_numeric(data[col], errors='coerce').fillna(0)

# Filter out extreme outliers (keep within 95th percentile for better plot visibility)
for col in ['Line_Item_Quantity', 'Line_Item_Value', 'Freight_Cost_USD', 'Weight_Kilograms']:
    upper_limit = data[col].quantile(0.95)
    data = data[data[col] <= upper_limit]

# Select variables for the pair plot
selected_vars = ['Line_Item_Quantity', 'Line_Item_Value', 'Freight_Cost_USD', 'Weight_Kilograms']

# Create pair plot
sns.pairplot(data[selected_vars], diag_kind='kde', plot_kws={'alpha':0.5})
plt.suptitle('Pair Plot of Key Supply Chain Variables', fontsize=16, y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

The pair plot is a powerful visualization tool for:

Understanding correlations and relationships between multiple numerical variables at once.

Quickly identifying trends, patterns, and potential outliers.

Spotting clusters and multivariate outliers.

It helps in identifying how different supply chain factors like quantity, cost, weight, and freight charges interact with each other.

##### 2. What is/are the insight(s) found from the chart?

**Positive Correlation:**

There is a clear positive relationship between Line_Item_Quantity and Line_Item_Value (as expected: more quantity → higher value).

Weight_Kilograms is also positively related to both Line_Item_Quantity and Freight_Cost_USD, suggesting heavier shipments cost more to transport.

**Moderate Correlation:**

Freight_Cost_USD seems moderately associated with both Line_Item_Value and Weight_Kilograms, indicating that more expensive or heavier items tend to incur higher shipping costs.

**Cluster Patterns:**

Most data points are concentrated in the lower ranges of all variables, with a few higher-value and high-weight transactions forming outliers.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Based on the detailed exploratory data analysis and the insights derived from the univariate, bivariate, and multivariate plots, here are strategic recommendations to help the client achieve their business objectives efficiently:

✅ Business Recommendations for the Client


📌 1. Optimize Shipment Mode Selection
Current Issue: Air shipments are frequently used, even for moderately heavy items, leading to higher freight costs.

Recommendation:

Reserve air freight for time-sensitive, lightweight, high-value items only.

Shift heavier, less urgent shipments to sea or truck modes to significantly reduce shipping costs.

Implement a shipment mode selection framework based on weight, urgency, and item value.

📌 2. Implement Freight Cost Control Measures
Current Issue: Freight costs show large variability for similar shipment weights and item values, suggesting inefficiencies.

Recommendation:

Perform freight cost benchmarking to compare against industry standards.

Negotiate long-term contracts with logistics providers based on predictable shipping volumes and weight bands.

Monitor cost trends year-on-year to detect overcharging or poor vendor performance.

📌 3. Strengthen Inventory and Order Planning
Current Issue: Yearly variations in shipment weights suggest last-minute, high-cost shipments due to inventory gaps.

Recommendation:

Improve demand forecasting and procurement scheduling to avoid emergency shipments.

Consolidate orders where possible to reduce shipment frequency and shipping costs.

Use predictive models to anticipate stock-outs and adjust replenishment cycles.

📌 4. Monitor High-Risk Shipment Patterns
Current Issue: Frequent outliers in air shipments indicate possible emergency orders or poor planning.

Recommendation:

Investigate why high-weight air shipments occur — are they customer-driven or planning errors?

Set internal policies or thresholds that flag shipments requiring escalation or review before approval.

📌 5. Leverage Data-Driven Decisions
Build predictive freight cost models using weight, item value, shipment mode, and delivery distance to:

Quote more accurate shipping charges.

Improve budgeting and financial forecasting.

Use visual dashboards to continuously monitor:

Shipment costs by mode

Annual cost trends

Vendor performance

# **Conclusion**

Through comprehensive data analysis of the SCMS Delivery History Dataset, we identified significant patterns, cost behaviors, and shipment inefficiencies that directly impact the client's supply chain and logistics operations.

Key findings include:

A strong positive correlation between shipment weight and freight cost, as expected in logistics.

Air shipments frequently used for moderate to heavy shipments, indicating potential cost inefficiencies.

Year-on-year variability in shipment weights and costs, which suggests inconsistencies in procurement planning or shipping practices.

Shipment mode selection inconsistencies and the presence of outliers, pointing towards possible emergency shipments and process gaps.

The multivariate analysis highlighted that a more structured, data-driven approach to shipment mode selection, inventory planning, and freight cost control can significantly reduce operational expenses and improve efficiency.

Implementing these insights will help the client:

Optimize logistics costs

Improve procurement and inventory processes

Enhance vendor negotiations

Build predictive models for better budgeting and forecasting

However, ignoring these insights could lead to:

Increased operational costs

Profit margin erosion

Supply chain disruptions

Final Recommendation:
The client should invest in logistics process optimization, cost prediction models, and real-time shipment monitoring systems to enhance decision-making and secure long-term operational efficiency.

If you need, I can assist you in creating:

Dashboards

Predictive models

Standard Operating Procedures (SOPs)

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***