# **Project Name**    -  FedEx Logistics Performance Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

. Delivery Timeliness and Performance
A central theme was the analysis of delivery timelines. By creating a new feature representing the difference between the Scheduled Delivery Date and the Delivered to Client Date, we can measure on-time delivery performance. Visualizations such as bar charts and histograms reveal the distribution of delivery delays and lead times. This analysis will identify which Shipment Mode (e.g., Air vs. Truck) or Country has the most significant delays. These findings can point to specific logistical bottlenecks that need to be addressed to improve delivery timelines and meet customer expectations more consistently.

2. Cost-Effectiveness and Vendor Performance
To evaluate cost-effectiveness, the analysis delved into the relationship between Line Item Value, Pack Price, and Freight Cost (USD). By calculating cost per unit and cost per shipment, we can compare the financial performance of different Vendor INCO Terms and Manufacturing Sites. This could reveal if certain vendors consistently incur higher freight costs or if specific Incoterms lead to more economical shipments. For instance, a scatter plot of Line Item Value versus Freight Cost could help identify a correlation or reveal outliers that warrant further investigation, providing a clear path for cost reduction.

3. Identifying Bottlenecks and Operational Inefficiencies
The EDA also looked for patterns in shipment data that might indicate systemic issues. By aggregating data by Product Group and Country, we can create heatmaps to visualize high-traffic areas or regions with frequent delays. For example, a bar chart could compare the average delay time across different Manufacturing Site locations, flagging underperforming suppliers. The analysis of PQ First Sent to Client Date and PO Sent to Vendor Date can also shed light on internal processing delays. These insights are crucial for streamlining workflows, optimizing vendor relationships, and ultimately reducing operational friction.

Conclusion and Recommendations
The EDA on FedEx Logistics data provides a robust overview of the supply chain, moving beyond a simple spreadsheet view to a dynamic understanding of operational performance. The insights gained from this analysis, ranging from on-time delivery metrics to cost-per-shipment breakdowns, provide a clear roadmap for strategic decision-making. By leveraging these findings, FedEx Logistics can:

Prioritize Vendor Management: Engage with underperforming vendors to improve delivery timelines and reduce costs.

Optimize Shipment Routes: Reroute specific product shipments or use different Shipment Mode to mitigate delays in bottleneck regions.

Enhance Cost Control: Renegotiate freight agreements or vendor terms to align with the data-driven understanding of cost-effectiveness.


# **GitHub Link -**

https://github.com/Kedarlimbalkar/FedEx-Logistics-Performance-Analysis

# **Problem Statement**


FedEx Logistics operates a vast and intricate global supply chain, where inefficiencies can lead to costly delays and diminished customer satisfaction. In a competitive market defined by the rise of e-commerce and global distribution, optimizing these operations is crucial for maintaining a competitive edge.

The core business problem is to leverage a comprehensive dataset of logistics information—including purchase orders, shipment methods, vendor terms, and delivery schedules—to identify and address key operational challenges. We need to analyze this data to pinpoint bottlenecks, evaluate the cost-effectiveness of different shipment methods, and assess delivery timelines. The objective is to move beyond reactive problem-solving by providing a data-driven framework to streamline the supply chain, thereby reducing freight costs, improving delivery speed, and enhancing the overall customer experience. This project will deliver actionable insights that allow FedEx to proactively optimize its logistics and solidify its market leadership.

#### **Define Your Business Objective?**

Optimize logistics to reduce costs, improve delivery times, and enhance customer satisfaction.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('SCMS_Delivery_History_Dataset.csv')

### Dataset First View

In [None]:
# Dataset First Look
# Display the first 5 rows of the DataFrame
print("First 5 rows:")
print(df.head())

# Display basic information about the DataFrame, including data types and non-null values
print("\nDataFrame Info:")
print(df.info())

# Display descriptive statistics for numerical columns
print("\nDescriptive statistics for numerical columns:")
print(df.describe())

# Display descriptive statistics for categorical columns
print("\nDescriptive statistics for categorical columns:")
print(df.describe(include=['object']))

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
import pandas as pd

# Load the dataset
df = pd.read_csv('SCMS_Delivery_History_Dataset.csv')

# Get the number of rows and columns
num_rows, num_cols = df.shape

# Print the results
print(f"The dataset has {num_rows} rows and {num_cols} columns.")

### Dataset Information

In [None]:
# Dataset Info
print(df.info())

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

In [None]:
# Visualizing the missing values
# Checking Null Value by plotting Heatmap
sns.heatmap(df.isnull(), cbar=False)

### What did you know about your dataset?

The dataset contains 10,324 records and 33 features, providing a comprehensive view of FedEx's global logistics operations. A key finding is the presence of inconsistent and missing data, which will require significant cleaning. For instance, columns such as Weight (Kilograms) and Freight Cost (USD) are incorrectly stored as text (object type) and contain non-numerical values. Similarly, all date-related columns are strings and need to be converted to a proper datetime format. Several columns, including Shipment Mode and Dosage, also have missing values that must be handled. This initial assessment shows that a crucial first step in the project will be data preparation to ensure the integrity of any subsequent analysis.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description

ID: A unique identifier for each record or line item in the dataset.

Project Code: An identifier for the specific project associated with the shipment.

PQ #: The number for the Purchase Quote, a preliminary request for goods or services.

PO / SO #: The number for the Purchase Order or Sales Order, a commercial document confirming an order.

ASN/DN #: The number for the Advanced Shipping Notice or Delivery Note.

Country: The country where the products were delivered.

Managed By: The organization or entity managing the shipment.

Fulfill Via: Specifies the method of fulfillment, such as "Direct Drop" or "From RDC" (Regional Distribution Center).

Vendor INCO Term: The Incoterm (International Commercial Term) for the vendor agreement, which defines the responsibilities of the buyer and seller.

Shipment Mode: The method of transport used for the shipment (e.g., Air, Truck).

PQ First Sent to Client Date: The date the initial purchase quote was sent to the client.

PO Sent to Vendor Date: The date the purchase order was sent to the vendor.

Scheduled Delivery Date: The planned date for the delivery to the client.

Delivered to Client Date: The actual date the shipment was delivered to the client.

Delivery Recorded Date: The date the delivery was officially recorded in the system.

Product Group: A high-level category for the product, such as 'ARV' or 'HRDT'.

Sub Classification: A more specific classification of the product (e.g., 'Adult', 'Pediatric', 'HIV test').

Vendor: The name of the company that supplied the goods.

Item Description: A detailed description of the product in the shipment.

Molecule/Test Type: The specific molecule or test type of the product.

Brand: The brand name of the product.

Dosage: The dosage strength of the medication (e.g., '150mg').

Dosage Form: The physical form of the medication (e.g., 'Tablet', 'Oral suspension').

Unit of Measure (Per Pack): The number of units contained in each pack.

Line Item Quantity: The total number of units shipped for that line item.

Line Item Value: The total monetary value of the line item.

Pack Price: The price per pack of the product.

Unit Price: The price per individual unit.

Manufacturing Site: The location where the product was manufactured.

First Line Designation: A categorical variable indicating if the product is a first-line treatment.

Weight (Kilograms): The total weight of the shipment in kilograms. This column contains both numerical and text values, such as "Weight Captured Separately."

Freight Cost (USD): The cost of shipping the products in US dollars. This also contains text values like "Freight Included in Commodity Cost."

Line Item Insurance (USD): The insurance cost for the specific line item in US dollars.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
import pandas as pd

# Load the dataset
dataset = pd.read_csv('SCMS_Delivery_History_Dataset.csv')

# Create a copy of the current dataset and assigning to df
df = dataset.copy()

# Checking Shape of "Yes" and "No" values in 'First Line Designation'
print("No. of items with First Line Designation 'Yes':", len(df[df['First Line Designation'] == 'Yes']))
print("No. of items with First Line Designation 'No':", len(df[df['First Line Designation'] == 'No']))

# Assigning 'Yes' designation data to variable df_yes
df_yes = df[df['First Line Designation'] == 'Yes'].copy()

# Assigning 'No' designation data to variable df_no
df_no = df[df['First Line Designation'] == 'No'].copy()

print("\nShape of the original DataFrame:", df.shape)
print("Shape of the 'Yes' DataFrame:", df_yes.shape)
print("Shape of the 'No' DataFrame:", df_no.shape)

In [None]:
# Load the dataset
df = pd.read_csv('SCMS_Delivery_History_Dataset.csv')

print("--- Initial Dataset Info ---")
print(df.info())
print("\n")

# --- DATA WRANGLING PROCESS ---

# 1. Convert Date Columns to datetime objects
# The 'Date Not Captured' string will be converted to NaT (Not a Time)
date_cols = ['PQ First Sent to Client Date', 'PO Sent to Vendor Date',
             'Scheduled Delivery Date', 'Delivered to Client Date',
             'Delivery Recorded Date']

for col in date_cols:
    df[col] = pd.to_datetime(df[col], errors='coerce')

print("--- After Date Conversion ---")
print(df[date_cols].info())
print("\n")


# 2. Clean and convert 'Weight (Kilograms)' and 'Freight Cost (USD)' to numeric
# Replace non-numeric strings with NaN, then convert to float
df['Weight (Kilograms)'] = pd.to_numeric(
    df['Weight (Kilograms)'].str.replace('Weight Captured Separately', 'NaN'), errors='coerce'
)
df['Freight Cost (USD)'] = pd.to_numeric(
    df['Freight Cost (USD)'].str.replace('Freight Included in Commodity Cost', 'NaN'), errors='coerce'
)

print("--- After Cleaning Numerical Columns ---")
print(df[['Weight (Kilograms)', 'Freight Cost (USD)']].info())
print("\n")


# 3. Handle missing values
# For 'Shipment Mode', we'll fill missing values with the mode (most frequent value)
df['Shipment Mode'] = df['Shipment Mode'].fillna(df['Shipment Mode'].mode()[0])

# For 'Dosage', we'll fill missing values with 'Unknown'
df['Dosage'] = df['Dosage'].fillna('Unknown')

# For 'Line Item Insurance (USD)', we'll fill with 0, assuming no cost if not recorded
df['Line Item Insurance (USD)'] = df['Line Item Insurance (USD)'].fillna(0)

# For 'Line Item Value', fill with 0 to make it consistent
df['Line Item Value'] = df['Line Item Value'].fillna(0)

# For 'Brand', we will fill with 'Unknown'
df['Brand'] = df['Brand'].fillna('Unknown')

print("--- After Handling Missing Values ---")
print(df.info())
print("\n")


# 4. Feature Engineering: Create a 'Delivery Delay (Days)' column
# Calculate the difference between Delivered and Scheduled dates
df['Delivery Delay (Days)'] = (df['Delivered to Client Date'] - df['Scheduled Delivery Date']).dt.days

print("--- After Feature Engineering ---")
print("New column 'Delivery Delay (Days)' has been created.")
print(df[['Scheduled Delivery Date', 'Delivered to Client Date', 'Delivery Delay (Days)']].head())
print("\n")


# 5. Check for and handle duplicates
num_duplicates = df.duplicated().sum()
if num_duplicates > 0:
    print(f"Found {num_duplicates} duplicate rows. Dropping them.")
    df.drop_duplicates(inplace=True)
else:
    print("No duplicate rows found.")

print("--- Final Wrangled Dataset Info ---")
print(df.info())
print("\nFinal shape of the dataset:", df.shape)

### What all manipulations have you done and insights you found?

he manipulations revealed that the raw dataset was messy, with significant data quality issues. Critical date and numerical data were stored as inconsistent strings, making direct analysis impossible. The cleaned data now provides a reliable foundation to analyze key performance indicators such as delivery timeliness and cost-effectiveness.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

# 1. Analysis: Overall Delivery Delay Distribution
# Goal: Understand the overall delivery performance.
plt.figure(figsize=(10, 6))
sns.histplot(df['Delivery Delay (Days)'].dropna(), bins=50, kde=True)
plt.title('Distribution of Delivery Delay (Days)')
plt.xlabel('Delivery Delay (Days) (Positive = Late, Negative = Early)')
plt.ylabel('Frequency')
plt.savefig('overall_delivery_delay_distribution.png')
plt.show()
print("Analysis 1: Histogram of Delivery Delay (Days) created.")
print("This chart shows the frequency of shipments being early, on time, or late.")


##### 1. Why did you pick the specific chart?

**1. Overall Delivery Delay Distribution**
 A histogram is ideal for visualizing the frequency and distribution of a continuous variable like delivery delay.

##### 2. What is/are the insight(s) found from the chart?

The chart shows most shipments are delivered on time or slightly early, as the distribution is centered near zero days

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. This insight confirms overall reliability, highlighting that improvement efforts can be targeted at the long tail of late deliveries.

#### Chart - 2

In [None]:
# Chart - 2 visualization code

# 2. Analysis: Delivery Performance by Shipment Mode
# Goal: Compare the timeliness of different shipment methods.
avg_delay_by_mode = df.groupby('Shipment Mode')['Delivery Delay (Days)'].mean().sort_values()
plt.figure(figsize=(10, 6))
avg_delay_by_mode.plot(kind='bar')
plt.title('Average Delivery Delay by Shipment Mode')
plt.xlabel('Shipment Mode')
plt.ylabel('Average Delay (Days)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('avg_delivery_delay_by_mode.png')
plt.show()
print("Analysis 2: Bar chart of average Delivery Delay by Shipment Mode created.")
print("This chart compares the average delivery delay for Air, Ocean, and other shipment modes.")



##### 1. Why did you pick the specific chart?

**2. Delivery Performance by Shipment Mode**
 A bar chart is perfect for comparing a key metric (average delay) across distinct categories (shipment modes).

##### 2. What is/are the insight(s) found from the chart?

Ocean shipments have a much higher average delay than Air or Truck shipments, making them less reliable.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. This helps in logistics optimization by setting realistic expectations or re-evaluating the use of Ocean for time-sensitive deliveries.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# 3. Analysis: Top 10 Countries by Delivery Volume (Quantity)
# Goal: Identify the most active delivery destinations.
top_countries_by_quantity = df.groupby('Country')['Line Item Quantity'].sum().nlargest(10)
plt.figure(figsize=(12, 7))
top_countries_by_quantity.sort_values().plot(kind='barh', color='skyblue')
plt.title('Top 10 Countries by Total Line Item Quantity')
plt.xlabel('Total Line Item Quantity')
plt.ylabel('Country')
plt.tight_layout()
plt.savefig('top_10_countries_by_quantity.png')
plt.show()
print("Analysis 3: Horizontal bar chart of Top 10 Countries by Quantity created.")
print("This chart identifies the countries with the highest total shipment volume.")


##### 1. Why did you pick the specific chart?

**3. Top 10 Countries by Delivery Volume**
 A horizontal bar chart effectively ranks countries by a metric while allowing clear, untruncated labels.

##### 2. What is/are the insight(s) found from the chart?

South Africa and Uganda are the largest markets, receiving the highest volume of line item quantities.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. The business can focus resources and develop targeted strategies for these key, high-volume regions.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# 4. Analysis: Top 10 Countries with Highest Average Freight Cost
# Goal: Pinpoint high-cost delivery destinations.
top_countries_by_freight = df.groupby('Country')['Freight Cost (USD)'].mean().nlargest(10)
plt.figure(figsize=(12, 7))
top_countries_by_freight.sort_values().plot(kind='barh', color='lightcoral')
plt.title('Top 10 Countries by Average Freight Cost (USD)')
plt.xlabel('Average Freight Cost (USD)')
plt.ylabel('Country')
plt.tight_layout()
plt.savefig('top_10_countries_by_avg_freight_cost.png')
plt.show()
print("Analysis 4: Horizontal bar chart of Top 10 Countries by Average Freight Cost created.")
print("This chart shows which countries have the highest average freight costs per shipment.")


##### 1. Why did you pick the specific chart?

**4. Top 10 Countries with Highest Average Freight Cost**
A horizontal bar chart is great for ranking and comparing countries based on a numerical metric like average freight cost.

##### 2. What is/are the insight(s) found from the chart?

What is the insight? The Haiti and Dominican Republic have the highest average freight costs per shipment.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. This identifies high-cost routes that can be analyzed for optimization or for adjusting pricing strategies.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# 5. Analysis: Relationship between Line Item Value and Freight Cost
# Goal: See if more valuable shipments incur higher freight costs.
plt.figure(figsize=(12, 7))
sns.scatterplot(x='Line Item Value', y='Freight Cost (USD)', data=df, alpha=0.5)
plt.title('Relationship between Line Item Value and Freight Cost (USD)')
plt.xlabel('Line Item Value (USD)')
plt.ylabel('Freight Cost (USD)')
plt.xscale('log')  # Use a log scale to better visualize the wide range of values
plt.yscale('log')
plt.tight_layout()
plt.savefig('line_item_value_vs_freight_cost.png')
plt.show()
print("Analysis 5: Scatter plot of Line Item Value vs. Freight Cost created.")
print("This chart visualizes the correlation between the value of goods and their shipping cost.")


##### 1. Why did you pick the specific chart?

**5. Relationship between Line Item Value and Freight Cost**
A scatter plot is used to show the relationship between two continuous variables, in this case, value and cost.

##### 2. What is/are the insight(s) found from the chart?

There is a positive correlation between Line Item Value and Freight Cost, but with high variance, implying other factors are also involved.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. This insight can help validate pricing models and identify opportunities to reduce freight costs for valuable items.

#### Chart - 6

In [None]:

# 6. Analysis: Distribution of Products by Product Group
# Goal: Understand the product portfolio and distribution.
product_group_counts = df['Product Group'].value_counts()
plt.figure(figsize=(10, 8))
plt.pie(product_group_counts, labels=product_group_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Distribution of Shipments by Product Group')
plt.tight_layout()
plt.savefig('distribution_of_products_by_group.png')
plt.show()
print("Analysis 6: Pie chart of Product Group distribution created.")
print("This chart shows the proportion of each product group in the total shipments.")


##### 1. Why did you pick the specific chart?

**6. Distribution of Products by Product Group**
A pie chart is the best visualization for showing the proportion of each product group relative to the total portfolio.

##### 2. What is/are the insight(s) found from the chart?

The ARV product group dominates the portfolio, representing the vast majority of all shipments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. This highlights business focus, allowing for specialized optimization of the logistics for this dominant product category.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

# 7. Analysis: Top 10 Vendors by Line Item Value
# Goal: Identify the most valuable vendor relationships.
top_vendors_by_value = df.groupby('Vendor')['Line Item Value'].sum().nlargest(10)
plt.figure(figsize=(12, 7))
top_vendors_by_value.sort_values().plot(kind='barh', color='darkgreen')
plt.title('Top 10 Vendors by Total Line Item Value')
plt.xlabel('Total Line Item Value (USD)')
plt.ylabel('Vendor')
plt.tight_layout()
plt.savefig('top_10_vendors_by_value.png')
plt.show()
print("Analysis 7: Horizontal bar chart of Top 10 Vendors by Value created.")
print("This chart highlights the vendors that contribute the most value.")


##### 1. Why did you pick the specific chart?

**7. Top 10 Vendors by Line Item Value**
 A horizontal bar chart effectively ranks and compares vendors based on their total contribution to business value.

##### 2. What is/are the insight(s) found from the chart?

The SCMS from RDC and Aurobindo Pharma are key vendors, contributing the most to the total line item value.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. This identifies high-value vendor relationships that are critical to manage and foster for business growth.

#### Chart - 8

In [None]:
# Chart - 8 visualization code

# 8. Analysis: Delivery Performance by Vendor
# Goal: Evaluate vendor reliability in meeting deadlines.
top_vendors_by_delay = df.groupby('Vendor')['Delivery Delay (Days)'].mean().nlargest(10)
plt.figure(figsize=(12, 7))
top_vendors_by_delay.sort_values().plot(kind='barh', color='salmon')
plt.title('Top 10 Vendors with Highest Average Delivery Delay')
plt.xlabel('Average Delivery Delay (Days)')
plt.ylabel('Vendor')
plt.tight_layout()
plt.savefig('top_10_vendors_by_delay.png')
plt.show()
print("Analysis 8: Horizontal bar chart of Top 10 Vendors by Delay created.")
print("This chart pinpoints vendors that are potential bottlenecks.")


##### 1. Why did you pick the specific chart?

**8. Delivery Performance by Vendor**
A horizontal bar chart is ideal for ranking vendors by their average delivery delay.

##### 2. What is/are the insight(s) found from the chart?

Gilead Sciences and Medchem International are among the vendors with the highest average delivery delays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. This identifies underperforming vendors, providing a clear starting point for performance improvement and risk mitigation.

#### Chart - 9

In [None]:
# Chart - 9 visualization code

# 9. Analysis: Delivery Timelines over the Years
# Goal: Analyze if delivery performance has improved or worsened over time.
df['Year'] = df['Delivered to Client Date'].dt.year.dropna()
avg_delay_by_year = df.groupby('Year')['Delivery Delay (Days)'].mean()
plt.figure(figsize=(12, 7))
avg_delay_by_year.plot(kind='line', marker='o', color='purple')
plt.title('Average Delivery Delay Over the Years')
plt.xlabel('Year')
plt.ylabel('Average Delivery Delay (Days)')
plt.grid(True)
plt.tight_layout()
plt.savefig('avg_delivery_delay_over_years.png')
plt.show()
print("Analysis 9: Line plot of Average Delivery Delay over the Years created.")
print("This chart shows long-term trends in delivery timeliness.")



##### 1. Why did you pick the specific chart?

**9. Delivery Timelines over the Years**
A line plot is the most effective chart for showing trends of a metric over a continuous period like years.

##### 2. What is/are the insight(s) found from the chart?

Delivery performance has fluctuated over the years without a clear, consistent trend of improvement.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Negative. This suggests a need for a sustained, strategic focus on improving delivery timelines, as performance isn't self-correcting.

#### Chart - 10

In [None]:
# Chart - 10 visualization code

# 10. Analysis: Relationship between Weight and Freight Cost
# Goal: Understand if freight cost is directly proportional to weight.
plt.figure(figsize=(12, 7))
sns.scatterplot(x='Weight (Kilograms)', y='Freight Cost (USD)', data=df, alpha=0.5)
plt.title('Relationship between Weight and Freight Cost')
plt.xlabel('Weight (Kilograms)')
plt.ylabel('Freight Cost (USD)')
plt.xscale('log') # Log scale for better visualization of spread
plt.yscale('log')
plt.tight_layout()
plt.savefig('weight_vs_freight_cost.png')
plt.show()
print("Analysis 10: Scatter plot of Weight vs. Freight Cost created.")
print("This chart explores the correlation between a shipment's weight and its freight cost.")


##### 1. Why did you pick the specific chart?

**10. Relationship between Weight and Freight Cost**
A scatter plot is the right choice for visualizing the correlation between two continuous variables like weight and cost.

##### 2. What is/are the insight(s) found from the chart?

There is a clear positive correlation between Weight (Kilograms) and Freight Cost, but the variance highlights opportunities for cost optimization.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. This confirms the direct impact of weight on cost, validating pricing models and indicating areas for negotiation.

#### Chart - 11

In [None]:
# Chart - 11 visualization code

# 11. Analysis: Breakdown of Shipments by Vendor INCO Term
# Goal: Understand the distribution of vendor agreements.
inco_term_counts = df['Vendor INCO Term'].value_counts()
plt.figure(figsize=(10, 8))
plt.pie(inco_term_counts, labels=inco_term_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Distribution of Shipments by Vendor INCO Term')
plt.tight_layout()
plt.savefig('distribution_of_inco_terms.png')
plt.show()
print("Analysis 11: Pie chart of Vendor INCO Term distribution created.")
print("This chart shows the most common vendor agreements.")



##### 1. Why did you pick the specific chart?

**11. Breakdown of Shipments by Vendor INCO Term**
A pie chart is best for visualizing the proportion of different types of vendor agreements relative to the whole.

##### 2. What is/are the insight(s) found from the chart?

The EXW and FCA terms dominate, showing that the company frequently takes on freight and risk responsibilities.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. Understanding this distribution helps in managing financial responsibilities and assessing logistical risk.

#### Chart - 12

In [None]:
# Chart - 12 visualization code

# 12. Analysis: Cost-effectiveness by Fulfill Via Method
# Goal: Compare the average freight cost of fulfillment methods.
avg_freight_by_fulfill = df.groupby('Fulfill Via')['Freight Cost (USD)'].mean().sort_values()
plt.figure(figsize=(8, 6))
avg_freight_by_fulfill.plot(kind='bar', color=['darkblue', 'orange'])
plt.title('Average Freight Cost by Fulfillment Method')
plt.xlabel('Fulfillment Method')
plt.ylabel('Average Freight Cost (USD)')
plt.xticks(rotation=0)
plt.tight_layout()
plt.savefig('avg_freight_by_fulfill_method.png')
plt.show()
print("Analysis 12: Bar chart of Average Freight Cost by Fulfillment Method created.")
print("This chart compares the average cost of using a Regional Distribution Center vs. a Direct Drop.")


##### 1. Why did you pick the specific chart?

**12. Cost-effectiveness by Fulfill Via Method**

A bar chart is ideal for directly comparing the average cost across two distinct categories of fulfillment methods.

##### 2. What is/are the insight(s) found from the chart?

Direct Drop shipments have a much higher average freight cost than those fulfilled From RDC.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive. This provides a direct, actionable insight to reduce costs by shifting more shipments to the RDC fulfillment model.

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**1. Prioritize Air and Truck Shipments:** These modes have the lowest average delivery delays, ensuring time-sensitive goods reach clients quickly, which directly improves customer satisfaction.

**2. Optimize Ocean Freight Logistics:** Investigate the high average delays in Ocean shipments to identify bottlenecks. This could involve re-evaluating routes or negotiating with shipping partners.

**3. Encourage RDC Fulfillment:** Direct Drop shipments are significantly more expensive. Promote the use of Regional Distribution Centers to consolidate shipments and lower average freight costs.

**4. Target High-Cost Shipping Lanes:** Analyze routes to high-cost countries like Haiti and the Dominican Republic to find more economical logistics partners or methods.

**5. Review Vendor Performance:** Address the high average delivery delays from vendors like Gilead Sciences. This is crucial for maintaining a reliable and efficient supply chain.

**6. Leverage Vendor Value:** Maintain and strengthen relationships with high-value vendors, such as Aurobindo Pharma, to secure favorable terms and ensure a consistent supply chain.

**7. Monitor Delivery Timeliness:** Implement a dashboard to continuously track and analyze delivery delays over time. This helps in early detection of negative trends and proactive management.

**8. Standardize Data Collection:** Address the data inconsistencies found in fields like Weight and Freight Cost to improve the quality of future analysis and decision-making.

**9. Forecast Demand in Key Markets:** Use insights on top-volume countries like South Africa and Uganda to better forecast demand and optimize inventory and delivery planning.

**10. Align Vendor Terms:** Analyze the distribution of INCO Terms to ensure vendor agreements align with the company's risk and cost management strategy. This could lead to better financial outcomes.

# **Conclusion**

**Data-Driven Decision Making:** The project successfully transformed a raw, messy dataset into a clean, analytical resource, enabling data-driven decisions rather than relying on assumptions.

**Identified Operational Bottlenecks:**  Analysis revealed that Ocean freight is a primary source of delivery delays and that certain vendors consistently underperform, pinpointing areas that need immediate operational attention.

**Uncovered Cost Optimization Opportunities:** The data shows that Direct Drop fulfillment is significantly more expensive than using Regional Distribution Centers, providing a clear, actionable path to reduce freight costs.

**Pinpointed High-Value Relationships:** The analysis identified key markets (e.g., South Africa) and high-value vendors, allowing FedEx to focus resources on its most critical partnerships and markets.

**Validated Business Model Insights:** The project confirmed that freight costs are directly correlated with weight and value. This insight helps in validating pricing models and identifying areas for more strategic cost negotiation.

**Established a Baseline for Performance:** By creating a Delivery Delay (Days) metric, the project establishes a baseline for future performance monitoring, enabling the company to track progress and measure the impact of strategic changes over time.



### ***Hurrah! i have successfully completed your EDA Capstone Project !!!***