<a href="https://colab.research.google.com/github/Aniket02102001/fedex-logistics-analysis/blob/main/Aniket_EDA_Submission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -FedEx Logistics Performance Analysis



 ##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -** Aniket Sahu



# **Project Summary -**

# **FedEx Logistics Performance Analysis - Project Summary**

This exploratory data analysis project delves into a comprehensive examination of global medical supply chain operations, focusing on the logistical intricacies of delivering critical healthcare products to developing nations. The study meticulously analyzes a rich historical dataset encompassing shipment details, delivery timelines, cost structures, and operational parameters over a multi-year period. The primary objective is to uncover patterns, identify inefficiencies, and derive actionable insights that can enhance supply chain reliability, optimize costs, and improve overall operational effectiveness in humanitarian health logistics.

The analysis reveals a complex logistics ecosystem where timely delivery of medical supplies is paramount for public health outcomes. Through systematic examination of delivery performance metrics, we observe varying levels of reliability across different geographical regions and transportation modes. Certain destinations consistently demonstrate better on-time delivery rates, while others experience more frequent delays, suggesting opportunities for targeted operational improvements. The relationship between shipment characteristics and delivery outcomes provides valuable clues about factors influencing supply chain efficiency.

Cost analysis uncovers significant variations in logistics expenses across different product categories and shipping methods. Medical products exhibit distinct cost profiles, with some items carrying higher transportation costs relative to their value, while others demonstrate more favorable economics. The distribution of costs across freight, insurance, and product value reveals opportunities for strategic optimization, particularly in weight management and route planning. These financial insights are crucial for developing cost-effective logistics strategies without compromising service quality.

Vendor performance emerges as a critical determinant of supply chain success. The analysis distinguishes between suppliers who consistently meet delivery expectations and those who struggle with reliability issues. This vendor differentiation enables more informed partnership decisions and highlights the importance of performance monitoring in supplier relationships. Similarly, manufacturing locations play a significant role in delivery outcomes, with certain production sites demonstrating better logistical integration than others.

Transportation mode selection proves to be a key strategic decision point, with clear trade-offs between speed and cost. Air shipments, while more expensive, generally ensure faster delivery—a crucial consideration for time-sensitive medical supplies. Alternative transportation methods, though more economical, introduce greater variability in delivery timelines. This balance between urgency and economy requires careful consideration in operational planning.

The temporal analysis of shipment patterns reveals seasonal fluctuations and evolving trends over time. These patterns inform capacity planning and resource allocation strategies, enabling proactive responses to anticipated demand variations. Additionally, the examination of shipping terms and contractual arrangements sheds light on how different logistical responsibilities affect overall performance and costs.

This comprehensive analysis culminates in a set of strategic recommendations aimed at enhancing supply chain resilience, reducing operational costs, and improving delivery reliability. By leveraging data-driven insights, logistics operators can make more informed decisions about vendor selection, transportation modes, route optimization, and operational processes. The findings emphasize the importance of continuous performance monitoring, adaptive strategies, and collaborative partnerships in building a more effective and responsive medical supply chain. The visualization-driven approach adopted throughout this study transforms raw operational data into compelling narratives that guide strategic decision-making, ultimately supporting both business objectives and humanitarian missions through optimized logistics performance.

Ultimately, this project demonstrates how analytical rigor applied to operational data can transform logistics management in the critical healthcare sector. The insights generated not only support business objectives but also contribute to broader humanitarian goals by ensuring reliable access to essential medical products in regions where they are most needed. The methodology establishes a framework for ongoing performance evaluation and continuous improvement in global health logistics operations.

# **GitHub Link -**

https://github.com/Aniket02102001/fedex-logistics-analysis

# **Problem Statement**


FedEx manages a complex global supply chain for medical supplies, but faces challenges in ensuring timely delivery, controlling logistics costs, and maintaining consistent vendor performance. There is a need to analyze historical delivery data to identify patterns of delay, cost overruns, and operational inefficiencies that impact service reliability and profitability.

#### **Define Your Business Objective?**

To analyze FedEx’s historical delivery performance and supply chain operations to:

Improve on-time delivery rates.

Reduce logistics and operational costs.

Identify reliable vendors and optimal shipment modes.

Enhance supply chain transparency and risk management.

Provide data-driven recommendations for process optimization.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
sns.set_style('whitegrid')
%matplotlib inline

### Dataset Loading

In [None]:
import pandas as pd
import requests
from io import StringIO

# Your Google Drive file ID
file_id = "1G3sg45qaljQ9w-bB5i4-n7QAM59cR4xN"

# Construct the direct download URL
url = f"https://drive.google.com/uc?id={file_id}&export=download"

# Download the file content
response = requests.get(url)
response.raise_for_status()  # Check if the download was successful

# Load CSV data directly into a DataFrame
df = pd.read_csv(StringIO(response.text), encoding='utf-8')

print("Dataset loaded successfully!")


### Dataset First View

In [None]:
# Dataset First Look
print("Dataset Preview:")
print(df.head())


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(f"Rows: {df.shape[0]}, Columns: {df.shape[1]}")


### Dataset Information

In [None]:
# Dataset Info
print(df.info())




#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isna().sum()

In [None]:
# Visualizing the missing values
import matplotlib.pyplot as plt
df.isna().sum().plot(kind='bar')
plt.title('Missing Values')
plt.xlabel('Columns')
plt.ylabel('Count')

plt.show()

### What did you know about your dataset?

Based on my comprehensive analysis, here's what I know about this dataset:

1. Dataset Overview
Source: Likely from a global health supply chain program (possibly SCMS - Supply Chain Management System)



Rows: 10324 individual shipment records

Columns: 33 features covering logistics, finance, products, and operations

Primary Context: Humanitarian medical supply chain for HIV/AIDS and related diseases

2. Core Business Context
What this dataset represents:
This is FedEx's delivery history for a major global health initiative, likely funded by organizations like PEPFAR (President's Emergency Plan for AIDS Relief) or The Global Fund. The dataset tracks shipments of life-saving medications and diagnostics to developing countries.

Key Stakeholders Identified:
Client: Healthcare systems in developing countries

Coordinator: PMO-US (Project Management Office - United States)

Suppliers: Pharmaceutical companies and diagnostic manufacturers

Logistics Provider: FedEx (implied from analysis context)

End Beneficiaries: HIV/AIDS patients in developing nations

3. Product Categories (Medical Focus)
Primary Product Groups:
ARV (Antiretroviral) - HIV treatment medications

HRDT (HIV Rapid Diagnostic Tests) - HIV testing kits

ACT (Artemisinin-based Combination Therapy) - Malaria treatment

MRDT (Malaria Rapid Diagnostic Tests)

ANTM (Anti-Malarial medications)

Critical Characteristics:
Temperature-sensitive products: Many require "cool" storage (e.g., Kaletra oral solution)

Regulated medications: Strict quality and handling requirements

Life-critical: Delays directly impact patient health outcomes

4. Geographic Operations
Coverage: 25+ countries across:
Africa: Nigeria, Tanzania, Zambia, Ethiopia, South Africa, etc.

Asia: Vietnam, India, China, etc.

Americas: Haiti, Guyana, etc.

Logistics Network:
Primary Mode: Air freight (for speed and reliability)

Secondary Mode: Truck (for regional distribution)

Fulfillment: Direct Drop and from RDC

Management: Centralized by PMO-US

5. Financial Structure
Cost Components:
Line Item Value: Product cost

Freight Cost: Shipping charges

Insurance Cost: Risk coverage



## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description

1. BASIC COLUMNS
ID = Order number

Country = Where it went

Product Group = What type (ARV=AIDS drugs, HRDT=HIV tests)

Weight = How heavy (kg)

Shipment Mode = By plane or truck

Scheduled Date = When it should arrive

Delivered Date = When it did arrive

Line Item Value = Medicine cost

Freight Cost = Shipping cost

Vendor = Who sold it

etc.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.


# Data Wrangling Code

# Convert date columns to datetime
date_cols = ['PQ First Sent to Client Date', 'PO Sent to Vendor Date',
             'Scheduled Delivery Date', 'Delivered to Client Date', 'Delivery Recorded Date']

for col in date_cols:
    df[col] = pd.to_datetime(df[col], errors='coerce')

# Extract year and month for analysis
df['Delivery_Year'] = df['Delivered to Client Date'].dt.year
df['Delivery_Month'] = df['Delivered to Client Date'].dt.month

# Handle numeric columns with errors
numeric_cols = ['Weight (Kilograms)', 'Freight Cost (USD)', 'Line Item Insurance (USD)']
for col in numeric_cols:
    df[col] = pd.to_numeric(df[col], errors='coerce')

# Fill missing numeric values with median
for col in numeric_cols:
    df[col].fillna(df[col].median(), inplace=True)

# Create delivery delay column (in days)
df['Delivery_Delay'] = (df['Delivered to Client Date'] - df['Scheduled Delivery Date']).dt.days

# Create a total cost column
df['Total_Cost'] = df['Freight Cost (USD)'] + df['Line Item Insurance (USD)'] + df['Line Item Value']

print("Data Wrangling Completed.")

### What all manipulations have you done and insights you found?

ALL manipulations:

Fixed Dates - Made all date columns computer-readable

Created New Date Info - Added Year, Month, and calculated delivery delays

Fixed Number Errors - Cleaned weight and cost columns, filled missing values

Added Total Cost - Combined freight + insurance + product value

Made Categories Consistent - Standardized shipping modes and terms

Insights:
South Africa/Nigeria as top destinations.
SCMS from RDC  handles highest  of shipments with best reliability, and generic medicines are  cheaper than branded alternatives.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(10,6))
sns.histplot(df['Delivery_Delay'].dropna(), bins=30, kde=True, color='blue')
plt.title('Distribution of Delivery Delays (Days)')
plt.xlabel('Delay (Days)')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

To understand the spread and central tendency of delivery delays.



##### 2. What is/are the insight(s) found from the chart?

Most deliveries are on time or slightly delayed, but there are outliers with
significant delays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifying that most deliveries are on-time or only slightly delayed confirms operational efficiency. Understanding the delay distribution helps set realistic customer expectations and SLAs.

Negative Growth Risk: The presence of significant delay outliers indicates systemic issues in certain shipments. These delays could damage customer relationships, especially for time-sensitive medical supplies where delays directly impact patient health.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
top_countries = df['Country'].value_counts().head(10)
plt.figure(figsize=(12,6))
sns.barplot(x=top_countries.values, y=top_countries.index, palette='viridis')
plt.title('Top 10 Countries by Number of Shipments')
plt.xlabel('Number of Shipments')
plt.ylabel('Country')
plt.show()

##### 1. Why did you pick the specific chart?

To identify high-volume destinations for resource allocation.

##### 2. What is/are the insight(s) found from the chart?

South Africa, Nigeria, and Côte d'Ivoire are top recipients.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Knowing Positive Impact: Knowing Vietnam, Nigeria, and Côte d'Ivoire are top destinations allows for targeted resource allocation, localized logistics partnerships, and customs clearance optimization in these regions.

Negative Growth Risk: Over-concentration in few countries creates dependency risk. Geopolitical issues, natural disasters, or regulatory changes in these key countries could severely disrupt the entire supply chain. are top destinations allows for targeted resource allocation, localized logistics partnerships, and customs clearance optimization in these regions.


#### Chart - 3

In [None]:
# Chart - 3 visualization code
mode_delay = df.groupby('Shipment Mode')['Delivery_Delay'].mean().sort_values()
plt.figure(figsize=(10,5))
sns.barplot(x=mode_delay.index, y=mode_delay.values, palette='coolwarm')
plt.title('Average Delivery Delay by Shipment Mode')
plt.xlabel('Shipment Mode')
plt.ylabel('Average Delay (Days)')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

To see which shipment modes are most reliable.

##### 2. What is/are the insight(s) found from the chart?

Air shipments have lower delays compared to Truck/N/A.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Clear evidence that air shipments have minimal delays (often on-time) supports premium pricing for urgent shipments and justifies higher costs for time-sensitive medical goods.

Negative Growth Risk: Truck shipments show higher average delays, which could lead to stockouts at healthcare facilities and potential loss of temperature-sensitive products (like ARV medications), damaging reputation.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
top_vendors = df['Vendor'].value_counts().head(15)
plt.figure(figsize=(12,6))
sns.barplot(x=top_vendors.values, y=top_vendors.index, palette='magma')
plt.title('Top 15 Vendors by Shipment Volume')
plt.xlabel('Number of Shipments')
plt.ylabel('Vendor')
plt.show()

##### 1. Why did you pick the specific chart?

To identify key vendors in the supply chain.

##### 2. What is/are the insight(s) found from the chart?

SCMS from RDC and Orgenics Ltd are major suppliers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifying SCMS from RDC and Orgenics as high-volume vendors enables strategic partnership development, volume discounts, and collaborative process improvements.

Negative Growth Risk: Heavy reliance on few vendors creates supply chain vulnerability. If SCMS from RDC faces production issues, it could cripple ARV supply to multiple countries simultaneously.



#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(8, 5))
df['Product Group'].value_counts().plot(kind='bar', color='steelblue')
plt.title('Product Group Distribution')
plt.ylabel('Number of Shipments')
plt.show()

##### 1. Why did you pick the specific chart?

To understand the composition of shipped products.

##### 2. What is/are the insight(s) found from the chart?

ARV and HRDT (HIV tests) dominate shipments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: ARV and HIV tests comprising ~80% of shipments confirms alignment with global health priorities and validates business focus on high-demand medical products.

Negative Growth Risk: Limited malaria product shipments (ACT, MRDT, ANTM) represents missed opportunities in growing markets and leaves the supply chain vulnerable to shifts in global health funding priorities.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(10,6))
sns.scatterplot(x='Weight (Kilograms)', y='Freight Cost (USD)', data=df, alpha=0.6, color='green')
plt.title('Freight Cost vs Weight')
plt.xlabel('Weight (kg)')
plt.ylabel('Freight Cost (USD)')
plt.show()

##### 1. Why did you pick the specific chart?

To check correlation between weight and freight cost.

##### 2. What is/are the insight(s) found from the chart?

Positive correlation but with high variance, indicating other cost factors.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Positive correlation between weight and cost provides clear data for pricing models, helping accurately quote customers and identify opportunities for shipment consolidation.

Negative Growth Risk: High variance in cost suggests inconsistent pricing or inefficiencies. Customers may perceive unfair pricing if similar-weight shipments have drastically different costs.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
avg_cost_country = df.groupby('Country')['Total_Cost'].mean().sort_values(ascending=False).head(15)
plt.figure(figsize=(12,6))
sns.barplot(x=avg_cost_country.values, y=avg_cost_country.index, palette='rocket')
plt.title('Top 15 Countries by Average Total Cost per Shipment')
plt.xlabel('Average Total Cost (USD)')
plt.ylabel('Country')
plt.show()

##### 1. Why did you pick the specific chart?

To identify high-cost destinations.

##### 2. What is/are the insight(s) found from the chart?

Malawi, Zambia, and Kenya have high average costs.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifying Malawi, Zambia, and Kenya as high-cost destinations allows for targeted cost-reduction initiatives like alternative routes, local partnerships, or customs process improvements.

Negative Growth Risk: High operational costs in key markets may make services uncompetitive against local logistics providers, potentially losing market share in these countries.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
monthly_shipments = df['Delivery_Month'].value_counts().sort_index()
plt.figure(figsize=(10,5))
sns.lineplot(x=monthly_shipments.index, y=monthly_shipments.values, marker='o', color='red')
plt.title('Monthly Shipment Volume Trends')
plt.xlabel('Month')
plt.ylabel('Number of Shipments')
plt.xticks(range(1,13))
plt.show()

##### 1. Why did you pick the specific chart?

To identify seasonal trends in shipments.

##### 2. What is/are the insight(s) found from the chart?

Peaks in August and March.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Recognizing peaks in august and March enables better capacity planning, staffing optimization, and inventory management to handle seasonal demand.

Negative Growth Risk: Underestimating peak season demand could lead to service failures, while overcapacity during low seasons creates unnecessary fixed costs and reduces profitability.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
incoterm_ins = df.groupby('Vendor INCO Term')['Line Item Insurance (USD)'].mean().sort_values()
plt.figure(figsize=(10,5))
sns.barplot(x=incoterm_ins.index, y=incoterm_ins.values, palette='Blues_d')
plt.title('Average Insurance Cost by INCO Term')
plt.xlabel('INCO Term')
plt.ylabel('Average Insurance Cost (USD)')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

To see how shipping terms affect insurance costs.

##### 2. What is/are the insight(s) found from the chart?

N/A - From RDC and CIP terms have higher insurance costs.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Understanding that N\A from RDC and CIP terms have higher insurance costs helps in quoting accurate all-inclusive prices and educating customers on trade-off decisions.

Negative Growth Risk: Customers may choose competitors offering lower apparent costs by selecting different INCO terms without understanding the full risk implications, leading to lost business.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
subclass = df['Sub Classification'].value_counts().head(10)
plt.figure(figsize=(12,6))
sns.barplot(x=subclass.values, y=subclass.index, palette='viridis')
plt.title('Top 10 Product Sub-Classification')
plt.xlabel('Count')
plt.ylabel('Sub-Classification')
plt.show()

##### 1. Why did you pick the specific chart?

To drill down into product types.

##### 2. What is/are the insight(s) found from the chart?

HIV tests, Adult ARVs, and Pediatric ARVs are top sub-categories.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifying that Adult ARVs, Pediatric ARVs, and HIV tests are top sub-categories enables specialized handling protocols and temperature-controlled logistics for these critical products.

Negative Growth Risk: Pediatric ARVs require different handling and dosing than adult formulations. Mishandling could lead to medication errors, regulatory violations, and reputational damage.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(6, 4))
df['Managed By'].value_counts().plot(kind='bar', color='lightcoral')
plt.title('Shipments Managed By')
plt.ylabel('Number of Shipments')
plt.show()

##### 1. Why did you pick the specific chart?

To see who manages shipments.

##### 2. What is/are the insight(s) found from the chart?

Most shipments are managed by PMO - US.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Having 100% of shipments managed by PMO-US ensures consistency, standardized processes, and centralized control, improving operational efficiency.

Negative Growth Risk: Single-point dependency on PMO-US creates organizational risk. If PMO-US experiences staffing issues or operational problems, there's no backup management system.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
fulfill_via = df['Fulfill Via'].value_counts()
plt.figure(figsize=(8,8))
plt.pie(fulfill_via.values, labels=fulfill_via.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette('Set2'))
plt.title('Fulfillment Method Distribution')
plt.show()

##### 1. Why did you pick the specific chart?

To see primary fulfillment methods.

##### 2. What is/are the insight(s) found from the chart?

All shipments are through from RDC(52.3%) and  Direct Drop(47.7%).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Direct Drop and from RDC fulfillment simplifies the supply chain, reduces handling points, and potentially speeds up delivery times for end recipients.

Negative Growth Risk: Lack of fulfillment diversity means no fallback options if Direct Drop or from RDC  becomes unavailable or inefficient in certain regions due to local restrictions.



#### Chart - 13

In [None]:
# Chart - 13 visualization code
top_brands = df['Brand'].value_counts().head(10)
plt.figure(figsize=(12,6))
sns.barplot(x=top_brands.values, y=top_brands.index, palette='coolwarm')
plt.title('Top 10 Brands by Shipment Volume')
plt.xlabel('Number of Shipments')
plt.ylabel('Brand')
plt.show()

##### 1. Why did you pick the specific chart?

To identify most shipped brands.

##### 2. What is/are the insight(s) found from the chart?

Generic brands dominate, followed by Determine and Uni-Gold.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: High volume of generic products indicates cost-effectiveness for health programs, potentially leading to more contracts with donor-funded organizations.

Negative Growth Risk: Generic manufacturers may have varying quality standards, increasing quality control costs and regulatory compliance risks compared to branded products.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Chart - 14 visualization code
numeric_df = df[['Weight (Kilograms)', 'Freight Cost (USD)', 'Line Item Insurance (USD)',
                 'Line Item Value', 'Unit Price', 'Delivery_Delay', 'Total_Cost']].corr()
plt.figure(figsize=(10,8))
sns.heatmap(numeric_df, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of Numerical Variables')
plt.show()

##### 1. Why did you pick the specific chart?

To understand relationships between numerical variables.

##### 2. What is/are the insight(s) found from the chart?

Line item value correlates with total cost.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Strong weight-cost correlation validates current pricing models, while identifying weak correlations helps avoid incorrect assumptions in business decisions.

Negative Growth Risk: Weak correlation between delivery delay and other factors suggests delays are caused by unmeasured variables (like customs delays, political instability), making them harder to predict and control.

#### Chart - 15 - Pair Plot

In [None]:
# Chart - 15 visualization code
sns.pairplot(df[['Weight (Kilograms)', 'Freight Cost (USD)', 'Line Item Value', 'Delivery_Delay']].dropna(),
             diag_kind='kde', palette='husl')
plt.suptitle('Pair Plot of Key Numeric Features', y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

To visualize distributions and relationships between key metrics.

##### 2. What is/are the insight(s) found from the chart?

Freight cost increases with weight but with high variability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Visualizing multiple relationships simultaneously helps identify complex patterns that might be missed in individual analyses, supporting more nuanced operational decisions.

Negative Growth Risk: The wide scatter in freight cost vs weight suggests inconsistent carrier pricing or hidden fees, potentially eroding profit margins if not properly managed.



#### Chart - 16

In [None]:
# Chart - 16 visualization code
plt.figure(figsize=(14, 6))

# Monthly on-time delivery rate
df['On_Time'] = df['Delivery_Delay'] <= 0
monthly_performance = df.groupby(df['Delivered to Client Date'].dt.to_period('M'))['On_Time'].mean()

# Plot
monthly_performance.plot(kind='line', marker='o', color='teal', linewidth=2)
plt.title('Monthly On-Time Delivery Rate Trend (2006-2009)', fontsize=14, fontweight='bold')
plt.xlabel('Month-Year')
plt.ylabel('On-Time Delivery Rate')
plt.ylim(0, 1)
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To track delivery performance trends over time and identify seasonal patterns or improvements.

##### 2. What is/are the insight(s) found from the chart?

Shows if on-time delivery rates are improving or deteriorating over the years. Sharp drops may indicate systemic issues.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Showing improving or stable on-time delivery rates demonstrates operational excellence and reliability to clients. Recognizing seasonal patterns (like mid-year peaks) enables proactive resource planning and staffing optimization.

Negative Growth Risk: Any declining trend in on-time delivery would signal deteriorating service quality, potentially leading to contract losses with health organizations that require strict delivery SLAs for life-saving medications.

#### Chart - 17

In [None]:
# Chart - 17 visualization code
# Calculate average costs by product group
cost_breakdown = df.groupby('Product Group')[['Freight Cost (USD)',
                                               'Line Item Insurance (USD)',
                                               'Line Item Value']].mean()

# Plot stacked bar
ax = cost_breakdown.plot(kind='bar', stacked=True, figsize=(12, 7),
                        colormap='tab20c', edgecolor='black')
plt.title('Average Cost Breakdown by Product Group', fontsize=14, fontweight='bold')
plt.xlabel('Product Group')
plt.ylabel('Average Cost (USD)')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Cost Components', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To understand which cost components dominate for different product categories.

##### 2. What is/are the insight(s) found from the chart?

ARV products have highest line item value, while freight costs are significant for all groups.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Clear visualization that ARV products have the highest line item value helps focus efforts on securing reliable ARV suppliers and optimizing their supply chain specifically. Understanding freight costs as a consistent component across all products enables targeted freight negotiations.

Negative Growth Risk: The high proportion of line item value in ARVs means that supplier price increases directly and significantly impact overall costs. Dependency on expensive branded ARVs rather than generics could make the service uncompetitive.



#### Chart - 18

In [None]:
# Chart - 18 visualization code
# Calculate vendor metrics
vendor_stats = df.groupby('Vendor').agg({
    'Delivery_Delay': 'mean',
    'Total_Cost': 'mean',
    'ID': 'count'
}).rename(columns={'ID': 'Shipment_Count', 'Delivery_Delay': 'Avg_Delay', 'Total_Cost': 'Avg_Cost'})

# Filter top vendors by shipment volume
top_vendors = vendor_stats.nlargest(20, 'Shipment_Count')

# Bubble chart
plt.figure(figsize=(14, 8))
scatter = plt.scatter(x=top_vendors['Avg_Delay'],
                     y=top_vendors['Avg_Cost'],
                     s=top_vendors['Shipment_Count']/10,  # Bubble size based on volume
                     c=top_vendors['Shipment_Count'],
                     cmap='viridis',
                     alpha=0.7,
                     edgecolors='black',
                     linewidth=0.5)

# Add vendor labels for significant points
for vendor in top_vendors.index:
    row = top_vendors.loc[vendor]
    if row['Shipment_Count'] > 100:  # Label only major vendors
        plt.annotate(vendor.split()[0],  # First word of vendor name
                    xy=(row['Avg_Delay'], row['Avg_Cost']),
                    xytext=(5, 5),
                    textcoords='offset points',
                    fontsize=9,
                    alpha=0.8)

plt.colorbar(scatter, label='Shipment Volume')
plt.title('Vendor Performance: Cost vs Delay (Bubble Size = Volume)', fontsize=14, fontweight='bold')
plt.xlabel('Average Delivery Delay (Days)')
plt.ylabel('Average Total Cost (USD)')
plt.axvline(x=0, color='red', linestyle='--', alpha=0.5, label='On-Time Line')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To visualize the trade-off between cost and reliability across vendors.

##### 2. What is/are the insight(s) found from the chart?

Identifies vendors that offer good value (low cost, low delay) vs those that are expensive or unreliable.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifying vendors in the "sweet spot" (low cost, low delay) enables strategic partnerships that improve both profitability and reliability. Large bubble size for reliable vendors confirms that volume doesn't necessarily compromise performance.

Negative Growth Risk: Vendors with high delays despite high costs represent poor value and could damage customer satisfaction if not addressed. Over-reliance on a few high-performing vendors creates supply chain vulnerability.

#### Chart - 19

In [None]:
# Chart - 19 visualization code
# Create pivot table for mode efficiency
mode_efficiency = df.pivot_table(values=['Delivery_Delay', 'Total_Cost'],
                                index='Shipment Mode',
                                aggfunc={'Delivery_Delay': 'mean', 'Total_Cost': 'mean'})

# Normalize for comparison
mode_efficiency_norm = (mode_efficiency - mode_efficiency.min()) / (mode_efficiency.max() - mode_efficiency.min())

# Heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(mode_efficiency_norm.T,
            annot=mode_efficiency.T.round(2),
            fmt='.2f',
            cmap='YlOrRd',
            cbar_kws={'label': 'Normalized Score (0=Best, 1=Worst)'},
            linewidths=1,
            linecolor='gray')

plt.title('Shipment Mode Efficiency Matrix\n(Lower values are better)', fontsize=14, fontweight='bold')
plt.xlabel('Shipment Mode')
plt.ylabel('Performance Metric')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To compare different shipment modes across multiple performance dimensions.

##### 2. What is/are the insight(s) found from the chart?

Air charter shipments have low delays but higher costs, while Truck has lower costs but higher delays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Clear evidence that air shipments minimize delays justifies their premium cost for time-sensitive medical supplies. The matrix provides a decision framework for choosing optimal shipment modes based on client priorities (speed vs cost).

Negative Growth Risk: The high cost of air freight may make the service uncompetitive for budget-constrained health programs, potentially losing contracts to cheaper but slower alternatives. Truck shipments' higher delays could lead to stockouts in healthcare facilities.

#### Chart - 20

In [None]:
# Chart - 20 visualization code
# Get top manufacturing sites
top_sites = df['Manufacturing Site'].value_counts().head(15).index

# Filter data
site_data = df[df['Manufacturing Site'].isin(top_sites)]

# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Subplot 1: Shipment volume by site
site_counts = site_data['Manufacturing Site'].value_counts()
axes[0].barh(site_counts.index, site_counts.values, color='skyblue', edgecolor='black')
axes[0].set_title('Top 15 Manufacturing Sites by Shipment Volume', fontweight='bold')
axes[0].set_xlabel('Number of Shipments')
axes[0].invert_yaxis()  # Highest on top

# Subplot 2: Average delay by site
site_delays = site_data.groupby('Manufacturing Site')['Delivery_Delay'].mean().sort_values()
axes[1].barh(site_delays.index, site_delays.values,
             color=np.where(site_delays.values <= 0, 'lightgreen', 'salmon'),
             edgecolor='black')
axes[1].set_title('Average Delivery Delay by Manufacturing Site', fontweight='bold')
axes[1].set_xlabel('Average Delay (Days)')
axes[1].axvline(x=0, color='red', linestyle='--', alpha=0.7)
axes[1].invert_yaxis()

plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To identify key manufacturing locations and their performance impact.

##### 2. What is/are the insight(s) found from the chart?

Certain sites (like Aurobindo Unit III, India) have high volume and good on-time performance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifying that Indian manufacturing sites (like Aurobindo Unit III) provide both high volume and good performance validates sourcing strategy. Geographic concentration in India enables economies of scale and streamlined quality control.

Negative Growth Risk: Over-concentration in Indian manufacturing creates significant geopolitical and supply chain risk. Any disruption in India (monsoons, political issues, port strikes) could paralyze the entire supply chain. Limited manufacturing in Africa increases lead times and costs for African destinations.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the analysis, I recommend the following:

Improve On-Time Delivery:

Prioritize air shipments for time-sensitive goods.

Monitor vendors with consistent delays and enforce SLAs.

Cost Optimization:

Consolidate shipments to reduce freight costs.

Renegotiate terms with high-cost vendors.

Optimize packaging to reduce weight.

Vendor Management:

Develop a vendor scorecard based on delivery performance, cost, and reliability.

Diversify supplier base to mitigate risks.

Route & Mode Optimization:

Analyze high-cost routes for alternatives.

Use multimodal transport where feasible.

Data-Driven Forecasting:

Use historical trends to forecast demand and plan inventory.

Implement real-time tracking for proactive delay management.

# **Conclusion**

The FedEx Logistics Performance Analysis reveals critical insights into delivery performance, cost drivers, and supply chain efficiencies. Key findings include the dominance of air freight for timeliness, high operational costs in certain regions, and the reliability of key vendors like SCMS from RDC. By implementing the recommended strategies, FedEx can enhance delivery reliability, reduce costs, and strengthen its humanitarian supply chain operations, ultimately supporting better health outcomes in developing nations.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***