<a href="https://colab.research.google.com/github/chiragpandey37/FedEx-Logistics-Analysis/blob/main/FedEx_Logistics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - FedEx Logistics







##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

### Project Summary

The Exploratory Data Analysis (EDA) of FedEx Logistics focuses on shipment and delivery patterns, aiming to optimize operations, reduce costs, and improve efficiency. The dataset includes information on shipment modes, delivery timelines, product details, costs, and vendor data. This analysis seeks to uncover trends that will enable FedEx to streamline its logistics processes, enhance cost management, and make data-driven decisions to improve its global operations.

### Key Analysis

1. **Shipment Distribution by Country**:
   The analysis examined how shipments are distributed across different countries, identifying high-traffic regions where FedEx can potentially streamline operations, increase shipment frequency, or dedicate resources to reduce delays.
   
2. **Shipment Mode Usage**:
   Air, sea, and road shipments were evaluated to understand the balance between speed and cost. The analysis provided insights into how each mode is used for different types of shipments, such as prioritizing air for urgent deliveries and road/sea for less time-sensitive items.

3. **Freight Cost vs. Shipment Weight**:
   A key focus was understanding how shipment weight correlates with freight costs. Identifying these patterns allows for better cost management and optimization of pricing models, leading to fairer and more accurate customer charges.

4. **Vendor and Product Analysis**:
   The performance of vendors and the impact of different product types on shipment volumes and costs were analyzed. This helped pinpoint key vendor partnerships and opportunities to improve relationships or negotiate better terms with suppliers who contribute more to overall shipping costs.

### Findings and Recommendations

1. **Optimize High-Traffic Routes**:
   Countries with the highest shipment volumes should be prioritized for route optimization. FedEx can introduce more efficient delivery lanes or consolidate shipments to reduce costs and improve delivery speeds.

2. **Leverage Mixed Shipment Modes**:
   To balance speed and cost, FedEx can implement a hybrid strategy. Prioritize air freight for critical deliveries but increase the use of sea and road transport for bulk shipments that are less time-sensitive, ensuring optimal resource usage and cost control.

3. **Adjust Pricing Based on Weight Patterns**:
   Strong correlations between shipment weight and freight costs suggest an opportunity to refine FedEx’s pricing models. Introducing more precise weight thresholds will allow for a more transparent and competitive pricing strategy, benefiting both the company and its customers.

4. **Strengthen Vendor Relationships**:
   FedEx should focus on building stronger partnerships with vendors contributing to high shipment volumes. For vendors with high costs, exploring process improvements or renegotiating contract terms could lead to cost savings.

By implementing these recommendations, FedEx can improve efficiency, reduce costs, and enhance customer satisfaction, ultimately leading to stronger global logistics operations.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


### Problem Statement

FedEx Logistics faces challenges in optimizing its global supply chain operations, specifically in balancing shipment costs, delivery speed, and operational efficiency. With a vast network of shipments across various countries and a range of products transported through different modes (air, sea, and road), FedEx must identify key areas of inefficiency, such as high freight costs, suboptimal delivery routes, and inconsistent vendor performance.

The objective of this Exploratory Data Analysis (EDA) is to uncover insights from shipment and delivery data that will help FedEx address these challenges. By analyzing shipment patterns, costs, and vendor impacts, the goal is to develop actionable recommendations that FedEx can implement to streamline its logistics, optimize costs, and improve customer satisfaction.

#### **Define Your Business Objective?**

The business objective of this project is to **optimize FedEx Logistics operations** by leveraging data analysis to gain insights into their supply chain processes. Specifically, the project aims to:

1. **Streamline supply chain operations:** Identify bottlenecks and inefficiencies to improve workflows and ensure timely delivery.
2. **Minimize freight costs:** Analyze shipment data to identify opportunities for cost reduction through optimized routing, carrier selection, and negotiation of favorable shipping rates.
3. **Improve customer satisfaction:** Gain a deeper understanding of customer requirements and expectations to enhance service quality and address issues proactively.

By achieving these objectives, FedEx Logistics can improve its overall efficiency, reduce costs, and enhance customer satisfaction in its global supply chain operations.


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import files
upload=files.upload()

### Dataset First View

In [None]:
# Dataset First Look
df= pd.read_csv('/content/SCMS_Delivery_History_Dataset.csv')
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10,5))
sns.barplot(df.isnull())
#rotate x title in 90 degree
plt.xticks(rotation=90)
plt.title('missing values')
plt.show()

### What did you know about your dataset?

- The dataset contains information about FedEx's delivery history. It has 10,000 rows and 33 columns. Some of the important columns include:

 - Shipment Mode
 - Delivery Timelines
 - Product Details
 - Costs
 - Vendor Data

- There are some missing values in the dataset, notably in the Dosage, Line Item Insurance (USD) and Shipment Mode columns.

- The dataset contains information on various aspects of FedEx's logistics operations, such as shipment details, product information, delivery timelines, and associated costs. This dataset can be used to analyze trends and patterns in FedEx's delivery operations, identify areas for improvement, and optimize logistics strategies.

- The columns are of different data types including object,datetime, integer and float.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
data=pd.DataFrame(df.columns)
data

In [None]:
# Dataset Describe

df.describe()

### Variables Description

The dataset has variables related to shipment details, product information, delivery timelines, and costs.

Some key variables include:

 - ID: Unique identifier for each shipment.

 - Project Code: Code assigned to the project.

 - PQ #: Purchase quotation number.

 - PO / SO #: Purchase order or sales order number.

 - ASN/DN #: Advanced shipping notice or delivery note number.

 - Country: Destination country for the shipment.

 - Managed By: Entity managing the shipment (e.g., PMO - US).

 - Fulfill Via: Method of fulfillment (e.g., Direct Drop).

 - Vendor INCO Term: International commercial terms defining the responsibilities of the buyer and seller.

 - Shipment Mode: Mode of transportation (e.g., Air, Road).

 - PQ First Sent to Client Date: Date when the purchase quotation was first sent to the client.

 - PO Sent to Vendor Date: Date when the purchase order was sent to the vendor.

 - Scheduled Delivery Date: Planned delivery date.

 - Delivered to Client Date: Actual delivery date.

 - Delivery Recorded Date: Date of delivery recording.

 - Product Group: Category of the product (e.g., ARV).

 - Sub Classification: More specific product category.

 - Vendor: Supplier of the product.

 - Item Description: Detailed description of the item.

 - Molecule/Test Type: Type of molecule or test (if applicable).

 - Brand: Brand name of the product.

 - Dosage: Drug dosage (if applicable).

 - Dosage Form: Form of the drug (e.g., Tablet).

 - Unit of Measure (Per Pack): Unit of measurement for the product.

 - Line Item Quantity: Quantity of items in the shipment.

 - Line Item Value: Total value of the line item.

 - Pack Price: Price per pack.

 - Unit Price: Price per unit.

 - Manufacturing Site: Location of product manufacturing.

 - First Line Designation: Designation of the product (e.g., Yes/No).

 - Weight (Kilograms): Weight of the shipment.

 - Freight Cost (USD): Cost of shipping.

 - Line Item Insurance (USD): Insurance cost for the line item.

### Check Unique Values for each variable.

In [None]:
df.columns

In [None]:
# Check Unique Values for each variable.
df.nunique()

In [None]:
#unique values in every column
columns_to_check = ['ID', 'Project Code', 'PQ #', 'PO / SO #', 'ASN/DN #', 'Country',
       'Managed By', 'Fulfill Via', 'Vendor INCO Term', 'Shipment Mode',
       'PQ First Sent to Client Date', 'PO Sent to Vendor Date',
       'Scheduled Delivery Date', 'Delivered to Client Date',
       'Delivery Recorded Date', 'Product Group', 'Sub Classification',
       'Vendor', 'Item Description', 'Molecule/Test Type', 'Brand', 'Dosage',
       'Dosage Form', 'Unit of Measure (Per Pack)', 'Line Item Quantity',
       'Line Item Value', 'Pack Price', 'Unit Price', 'Manufacturing Site',
       'First Line Designation', 'Weight (Kilograms)', 'Freight Cost (USD)',
       'Line Item Insurance (USD)']  # List of columns to analyze
unique_values = df[columns_to_check].apply(lambda x: x.unique()).to_dict()
unique_values

## 3. ***Data Wrangling***

### Data Wrangling Code

Removing null values

In [None]:
#correct datatype
df['PQ First Sent to Client Date']=pd.to_datetime(df['PQ First Sent to Client Date'], errors='coerce')
df['PO Sent to Vendor Date']=pd.to_datetime(df['PO Sent to Vendor Date'], errors='coerce')
df['Scheduled Delivery Date']=pd.to_datetime(df['Scheduled Delivery Date'], errors='coerce')
df['Delivered to Client Date']=pd.to_datetime(df['Delivered to Client Date'], errors='coerce')
df['Delivery Recorded Date']=pd.to_datetime(df['Delivery Recorded Date'], errors='coerce')
df['Line Item Insurance (USD)']=pd.to_numeric(df['Line Item Insurance (USD)'])

In [None]:
df.info()

In [None]:
# Write your code to make your dataset analysis ready.
df.isnull().sum()

In [None]:
#check how much percentage of null values in columns
df.isnull().sum()/len(df)*100

In [None]:
#delete column PO Sent to Vendor Date
df.drop(columns=['PO Sent to Vendor Date'],inplace=True)

In [None]:
#fill missing value of PQ First Sent to Client Date
df['PQ First Sent to Client Date'].fillna(df['PQ First Sent to Client Date'].mode()[0],inplace=True)

In [None]:
#missing value is less than 5% in Line Item Insurace and Shipment mode so we have delete  that rows

df.dropna(subset=['Line Item Insurance (USD)','Shipment Mode'],inplace=True)

In [None]:
df['Shipment Mode'].dropna(inplace=True)

In [None]:
df.isnull().sum()

In [None]:
df.info()

In [None]:
# data type of dosage
df['Dosage'].dtypes

In [None]:
df['Dosage'].mode()

In [None]:
#fill the missing value in dosage
df['Dosage'].fillna(df['Dosage'].mode()[0],inplace=True)

In [None]:
#drop columns
df=df.drop(columns=['Project Code','PQ #','PO / SO #', 'ASN/DN #','Vendor INCO Term','PQ First Sent to Client Date','First Line Designation'])

In [None]:
df

In [None]:
#outliers of every column
def calculate_outliers(df, columns):
    outliers = {}

    for col in columns:
        # Convert to numeric, forcing errors to NaN (ignores non-numeric data)
        df[col] = pd.to_numeric(df[col], errors='coerce')

        # Drop rows where the column has NaN values to avoid issues
        df_clean = df.dropna(subset=[col])

        Q1 = df_clean[col].quantile(0.25)  # First quartile (25%)
        Q3 = df_clean[col].quantile(0.75)  # Third quartile (75%)
        IQR = Q3 - Q1  # Interquartile range

        # Define lower and upper bounds
        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR

        # Find outliers
        lower_outliers = df_clean[df_clean[col] < lower_bound][col]
        upper_outliers = df_clean[df_clean[col] > upper_bound][col]

        # Store the results in a dictionary
        outliers[col] = {
            'lower_bound': lower_bound,
            'upper_bound': upper_bound,
            'lower_outliers': lower_outliers,
            'upper_outliers': upper_outliers
        }

    return outliers

# List of numeric columns
numeric_columns = ['Unit of Measure (Per Pack)', 'Line Item Quantity',
                   'Line Item Value', 'Pack Price', 'Unit Price',
                   'Weight (Kilograms)', 'Freight Cost (USD)',
                   'Line Item Insurance (USD)']

# Get the outliers for all numeric columns
outliers = calculate_outliers(df, numeric_columns)

# Display outlier summary
for col, data in outliers.items():
    print(f"Column: {col}")
    print(f"  Lower bound: {abs(data['lower_bound'])}")
    print(f"  Upper bound: {data['upper_bound']}")



In [None]:
# shipment mode in which country
ship=pd.DataFrame(df.groupby('Shipment Mode')['Country'].unique().apply(list))
ship
df.groupby('Shipment Mode')['Country'].nunique().sort_values(ascending=False)


In [None]:
#which country have which manufaturing site
man=pd.DataFrame(df.groupby('Country')['Manufacturing Site'].unique().apply(list))
man


In [None]:
#find which which country have how many sites
df.groupby('Manufacturing Site')['Country'].nunique().sort_values(ascending=False)

In [None]:
#which country have most and least order
df.groupby('Country')['ID'].count().sort_values(ascending=False)
#

In [None]:
#make column of difference between scheduled dilevery date and actual dielevery date
df['Scheduled Delivery Date']=pd.to_datetime(df['Scheduled Delivery Date'])
df['Delivered to Client Date']=pd.to_datetime(df['Delivered to Client Date'])
df['Delivery Time']=df['Scheduled Delivery Date']-df['Delivered to Client Date']
df['Delivery Time'].sort_values(ascending=False)
# add column in dataframe
df['Delivery Time']=df['Delivery Time'].dt.days

In [None]:
df

### What all manipulations have you done and insights you found?

### Data Manipulations Performed:

1. **Data Type Conversion:**
   - Several columns containing dates were converted from the `object` type to the `datetime` type. This ensures that date-related operations and calculations can be performed accurately.

2. **Handling Missing Values:**
   - The **PO Sent to Vendor Date** column was removed due to a high percentage of missing values.
   - Rows with missing values in the **Line Item Insurance (USD)** and **Shipment Mode** columns were removed, as they represented a small percentage of the data.
   - Missing values in the **PQ First Sent to Client Date** and **Dosage** columns were filled using the mode (most frequent value) of their respective columns.

3. **Outlier Detection:**
   - Potential outliers for numeric columns like **Unit of Measure (Per Pack)**, **Line Item Quantity**, etc., were detected using the **interquartile range (IQR)** method. This helps identify unusual or extreme values for further investigation.

4. **Shipment Mode and Country Analysis:**
   - The relationship between **Shipment Mode** and **Country** was analyzed to determine which shipment modes are used in each country.
   - A count of unique countries served by each shipment mode was provided.

5. **Manufacturing Site and Country Analysis:**
   - The connection between **Manufacturing Site** and **Country** was explored to identify the locations of manufacturing sites.
   - The number of unique countries associated with each manufacturing site was calculated.

6. **Order Counts by Country:**
   - The number of orders (represented by unique order IDs) was calculated and sorted for each country to reveal which countries have the most and least orders.

7. **Delivery Time Difference:**
   - A new column, **Delivery Time**, was created by calculating the difference between the **Scheduled Delivery Date** and the **Delivered to Client Date**. This helps analyze delivery delays or early deliveries.


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# 1. Shipment Volume by Country
plt.figure(figsize=(10,6))
df['Country'].value_counts().plot(kind='bar', color='skyblue')
plt.title('Shipment Volume by Country')
plt.xlabel('Country')
plt.ylabel('Number of Shipments')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart showing the number of shipments per destination country can reveal which countries receive the highest shipment volumes. This helps in understanding market focus and distribution patterns.

##### 2. What is/are the insight(s) found from the chart?

This bar chart presents the shipment volume for each country. The insight gained from this chart is that the highest shipment volume is for South Africa, followed by Nigeria and then Kenya.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can lead to better resource allocation, improved delivery times, and targeted marketing. However, FedEx should avoid over-reliance on a few markets and monitor emerging regions for potential growth.

#### Chart - 2

In [None]:
# 2. Shipment Mode Distribution
plt.figure(figsize=(8,8))
df['Shipment Mode'].value_counts().plot(kind='pie', autopct='%1.1f%%', colors=sns.color_palette("Set3"))
plt.title('Shipment Mode Distribution')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart can provide a clear breakdown of the different transportation methods (e.g., air, sea, land). This will be useful in assessing the most commonly used shipment modes.

##### 2. What is/are the insight(s) found from the chart?

The pie chart reveals the following insights about shipment modes:

1. **Air is the most common mode of shipment**, accounting for a significant portion of the total shipments. This suggests that the majority of the deliveries are time-sensitive, as air transportation is typically faster, though more expensive.

2. **Ocean is the least used mode of shipment**, indicating that very few deliveries are done via this slower, but often cheaper method. This could mean that only certain types of goods or locations are served via ocean freight, or there is less demand for such deliveries in this dataset.

The prevalence of air shipments might suggest a focus on faster delivery times, which could impact the overall logistics costs, while the minor use of ocean shipping suggests a niche or limited usage for less urgent, bulk shipments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

### Positive Business Impact:
- **Air shipment dominance** suggests prioritizing fast deliveries, enhancing customer satisfaction and loyalty.
- **Strategic use of air for urgent deliveries** can help balance costs and improve profitability.

### Potential Negative Growth:
- **Over-reliance on air shipment** can lead to high operational costs, reducing profit margins.
- **Underutilization of ocean freight** means missed opportunities for cost savings on non-urgent or bulk shipments.

Balancing shipment modes is key to reducing costs and maintaining efficiency.

#### Chart - 3

In [None]:
# 3. Line Item Value Distribution
plt.figure(figsize=(10,6))
sns.histplot(df['Line Item Value'], bins=20, color='green', kde=True)
plt.title('Line Item Value Distribution')
plt.xlabel('Line Item Value (USD)')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

A histogram of line item values helps to understand the spread and distribution of shipment costs. It can reveal whether there are many low-value or high-value shipments, aiding in cost analysis.

##### 2. What is/are the insight(s) found from the chart?

From the chart, here are the key insights:

1. **Right-skewed distribution:** Most of the line item values are clustered toward the lower end of the range, indicating that the majority of the items have low values. There is a sharp decline in frequency as the value increases.
   
2. **Presence of high-value outliers:** A long tail extends to the right, showing a few items with significantly higher values, but they occur infrequently.

In summary, most line items have low values, but there are some high-value outliers in the data.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

### Positive Business Impact:
- **High volume of low-value items** ensures consistent cash flow.
- **High-value outliers** boost revenue through premium or bulk sales.

### Potential Negative Growth:
- **Over-reliance on low-value items** may lead to small profit margins, especially if costs increase.
- **Underutilization of high-value items** means missing out on potential growth.

Balancing both segments can drive profitability, while neglecting them could hinder growth.

#### Chart - 4

In [None]:
# 4. Average Unit Price by Dosage Form
plt.figure(figsize=(10,6))
avg_unit_price = df.groupby('Dosage Form')['Unit Price'].mean().sort_values(ascending=False)
avg_unit_price.plot(kind='bar', color='lightgreen')
plt.title('Average Unit Price by Dosage Form')
plt.xlabel('Dosage Form')
plt.ylabel('Average Unit Price (USD)')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

 A box plot can show the variability and distribution of pack prices for each product group. This helps in detecting price outliers and variations within different types of products, such as pharmaceutical items.

##### 2. What is/are the insight(s) found from the chart?

The chart showing **Average Unit Price by Dosage Form** provides the following insights:

1. **Injections have the highest average unit price** compared to other dosage forms, indicating they are the most expensive form of medication per unit. This may be due to the complexity of production or the nature of the products requiring injections.
   
2. **Test kits and ancillary items** are the next highest in terms of unit price, suggesting they may also be premium products or require specialized manufacturing.

3. **Tablets, capsules, and powders** have significantly lower average unit prices, making them the most affordable dosage forms. These might be produced at scale or are more standardized, resulting in lower costs.

Overall, this chart highlights a large price disparity between higher-cost dosage forms like injections and more common, lower-cost forms like tablets.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

### Positive Business Impact:
- **High-value products (injections, test kits)** boost revenue and profit margins.
- **Low-cost products (tablets, capsules)** ensure broad customer reach and stable sales.

### Potential Negative Growth:
- **Over-reliance on high-cost items** may limit market reach due to price sensitivity.
- **Low margins on affordable products** could lead to profitability challenges if costs rise.

Balancing both high-value and affordable products is key for sustained growth.

#### Chart - 5

In [None]:
# 5. Freight Cost vs Weight (Kilograms)
plt.figure(figsize=(10,6))
sns.scatterplot(x='Weight (Kilograms)', y='Freight Cost (USD)', data=df, hue='Product Group', palette='coolwarm')
plt.title('Freight Cost vs Weight')
plt.xlabel('Weight (Kilograms)')
plt.ylabel('Freight Cost (USD)')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot shows the relationship between shipment weight and its freight cost. This is useful to assess whether heavier shipments incur proportionally higher costs or if there are exceptions.

##### 2. What is/are the insight(s) found from the chart?

The **Freight Cost vs Weight** scatter plot provides the following insights:

1. **Positive correlation (but not linear):**
   - There is a general trend where higher weight leads to higher freight costs, but the relationship is not strictly linear. As weight increases, the cost varies, with some high weights resulting in lower-than-expected costs.

2. **Cluster of low-weight, low-cost shipments:**
   - The majority of shipments are concentrated in the low-weight (below 20,000 kg) and low-cost (below $50,000) range, indicating that most products fall within this category.

3. **Outliers:**
   - There are several notable outliers, with some shipments having extremely high freight costs (above $150,000) even at lower weights, suggesting possible inefficiencies or specialized shipping needs.
   
4. **Product group differentiation:**
   - The color coding by product group (e.g., HRDT, ARV, etc.) shows that different product types may follow different cost-weight patterns, though no single group dominates all weight or cost categories.

Overall, there is a pattern of increasing freight cost with weight, but outliers and variation suggest other factors influence costs. This can point to optimization opportunities in logistics.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

### Positive Business Impact:
- **Optimize shipping costs:** Identifying outliers with high freight costs at low weights highlights opportunities to reduce inefficiencies and optimize logistics.
- **Focus on most frequent shipments:** The concentration of low-weight, low-cost shipments suggests that improving processes in this area could enhance cost efficiency and profitability.

### Potential Negative Growth:
- **Unaddressed high-cost outliers:** If the high freight costs for certain low-weight shipments aren't managed, it could lead to unnecessary expenses, impacting profit margins.
  
Addressing inefficiencies and optimizing shipping processes can drive positive growth, while ignoring outliers may lead to increased costs.

#### Chart - 6

In [None]:
# 6. Line Item Quantity by Vendor
plt.figure(figsize=(12,6))
df.groupby('Vendor')['Line Item Quantity'].sum().sort_values(ascending=False).plot(kind='bar', color='purple')
plt.title('Line Item Quantity by Vendor')
plt.xlabel('Vendor')
plt.ylabel('Total Line Item Quantity')
plt.xticks(rotation=90)
plt.show()


##### 1. Why did you pick the specific chart?

A bar chart depicting line item quantity per vendor can reveal which vendors are responsible for supplying the most goods. This can help in identifying key vendors contributing to the overall supply chain.

##### 2. What is/are the insight(s) found from the chart?

The **Line Item Quantity by Vendor** chart shows the following insights:

1. **Highly Skewed Distribution:**
   - A few vendors (like Aurobindo from RDC and Mylan Laboratories) account for the vast majority of the total line item quantity. This indicates a highly concentrated supply chain where only a few vendors handle most of the product volume.

2. **Long Tail of Vendors:**
   - Many vendors contribute very small quantities, suggesting that a large number of suppliers are responsible for only a small portion of the overall procurement.

3. **Vendor Dependence:**
   - The business may be heavily reliant on a small number of key vendors, which could pose a risk in terms of supply chain disruptions if these vendors face issues.

This insight suggests potential opportunities for risk mitigation and vendor diversification.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

### Positive Business Impact:
- **Focus on top vendors:** The high concentration of purchases from a few vendors allows for strong relationships and potential cost-saving negotiations.
- **Streamline procurement:** Prioritizing key vendors may enhance operational efficiency and reduce complexity.

### Potential Negative Growth:
- **Vendor dependency risk:** Over-reliance on a few vendors increases vulnerability to supply chain disruptions, potentially leading to delays or shortages.

Diversifying the vendor base while maintaining strong relationships with key suppliers is essential for growth and risk mitigation.

#### Chart - 7

In [None]:
# 7. Scheduled vs Actual Delivery Time (Line Chart)
df['Scheduled Delivery Date'] = pd.to_datetime(df['Scheduled Delivery Date'])
df['Delivered to Client Date'] = pd.to_datetime(df['Delivered to Client Date'])
df['Delivery Delay'] = (df['Delivered to Client Date'] - df['Scheduled Delivery Date']).dt.days

plt.figure(figsize=(10,6))
sns.lineplot(data=df, x='Scheduled Delivery Date', y='Delivery Delay', label='Delivery Delay (Days)', color='red')
plt.title('Scheduled vs Actual Delivery Time')
plt.ylabel('Delivery Delay (Days)')
plt.xlabel('Scheduled Delivery Date')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A line chart showing the comparison between scheduled and actual delivery times can provide insights into shipping performance. It can highlight whether there are delays and how often deliveries meet their planned schedules.

##### 2. What is/are the insight(s) found from the chart?

## Insights from the "Scheduled vs Actual Delivery Time" Chart

**1. Delivery Delays:**
* **Frequent Delays:** The chart shows a significant number of positive values, indicating frequent delivery delays.
* **Magnitude of Delays:** Some delays are substantial, exceeding 100 days.
* **Year-Over-Year Fluctuations:** The magnitude and frequency of delays seem to vary across different years.

**2. Early Deliveries:**
* **Occasional:** Negative values suggest that deliveries were sometimes completed earlier than scheduled.
* **Less Frequent:** Early deliveries are less common compared to delays.

**3. Overall Trend:**
* **No Consistent Trend:** The chart doesn't reveal a clear overall trend in delivery delays or early deliveries over the years. It fluctuates significantly.

**Additional Insights (if data were more granular):**
* **Seasonal Patterns:** Analyzing data by month or quarter could reveal seasonal trends in delays or early deliveries.
* **Product or Region-Specific Trends:** If the data includes product or region information, it would be possible to identify specific areas where delays or early deliveries are more common.
* **Root Causes:** By correlating the chart with other data points (e.g., order volume, transportation methods, supplier performance), it might be possible to identify the root causes of delays or early deliveries.

**Overall, the chart highlights a significant issue with delivery performance, indicating a need for further investigation and potential improvements in the supply chain or logistics processes.**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the insights gained from the chart can help create a positive business impact.**

By understanding the patterns in delivery delays and early deliveries, businesses can:

* **Improve customer satisfaction:** Reducing delays and ensuring timely deliveries can enhance customer satisfaction and loyalty.
* **Optimize logistics operations:** Identifying areas with frequent delays can help businesses optimize their logistics processes, reducing costs and improving efficiency.
* **Strengthen supplier relationships:** Analyzing the root causes of delays can lead to better communication and collaboration with suppliers.

**However, it's important to note that some insights might lead to negative growth if not addressed properly:**

* **Excessive focus on reducing delays:** If a business becomes overly focused on reducing delays without considering other factors (e.g., costs, customer needs), it could lead to increased costs or decreased customer satisfaction.
* **Ignoring early deliveries:** While early deliveries might seem like a positive, they could also indicate inefficiencies in the supply chain that need to be addressed to avoid unnecessary costs.

**Therefore, it's crucial to analyze the insights holistically and implement strategies that balance the need for timely deliveries with other business objectives.**


#### Chart - 8

In [None]:


# Group data by Brand and Dosage Form, calculate counts, and create a DataFrame
count_of_dosage_forms = (
    df.groupby(["Brand", "Dosage Form"])
    .size()
    .to_frame(name="Count")
    .reset_index()
)

# Create a stacked bar chart with a custom color palette
plt.figure(figsize=(12, 6))
sns.barplot(
    x="Brand",
    y="Count",
    hue="Dosage Form",
    data=count_of_dosage_forms,
    palette="colorblind",  # Choose a color palette suitable for color blindness
)

# Customize labels, title, and legend
plt.xlabel("Brand", fontsize=12)
plt.ylabel("Count", fontsize=12)
plt.title("Dosage Forms Distribution by Brand", fontsize=14)
plt.xticks(rotation=45, ha="right", fontsize=10)  # Rotate x-axis labels for readability
plt.yticks(fontsize=10)
plt.legend(title="Dosage Form", fontsize=10, loc="upper right")  # Move legend to upper right corner

# Customize grid and axes for clarity
plt.grid(axis="y", linestyle="--", alpha=0.7)  # Add horizontal gridlines with transparency
plt.tight_layout()  # Adjust spacing between elements

plt.show()

##### 1. Why did you pick the specific chart?

I chose a bar chart to visualize the "Dosage Forms Distribution by Brand" data because it's the most suitable for comparing the frequency of different dosage forms across multiple brands

##### 2. What is/are the insight(s) found from the chart?

## Insights from the "Dosage Forms Distribution by Brand" Chart

**1. Dosage Form Dominance:**
* **Tablets:** Tablets are the most common dosage form across most brands, indicating their popularity or ease of manufacturing.
* **Variations:** Within the tablet category, different formulations (e.g., FDC, blister packaging) are also prevalent.

**2. Brand-Specific Preferences:**
* **Diverse Dosage Forms:** Some brands offer a wider variety of dosage forms, suggesting they cater to a broader range of patient needs or market segments.
* **Specializations:** Other brands may focus on specific dosage forms, potentially indicating a specialization in particular therapeutic areas.

**3. Niche Dosage Forms:**
* **Limited Usage:** Certain dosage forms, like delayed-release capsules or oral powders, are less common, suggesting they might have niche applications or be used for specific medical conditions.

**4. Brand-Specific Trends:**
* **Unique Formulations:** Some brands may have unique dosage form offerings that set them apart from competitors.

**Additional Insights (if data were more granular):**
* **Market Trends:** Analyzing the distribution of dosage forms over time could reveal market trends or changes in patient preferences.
* **Therapeutic Area Analysis:** If the data includes information on the therapeutic areas for each brand, it would be possible to identify dosage forms commonly used in specific medical conditions.
* **Regulatory Factors:** Understanding the regulatory landscape for different dosage forms could provide insights into the factors driving their popularity or limitations.

**Overall, the chart provides valuable information about the dosage form preferences of different brands, which can be used to understand market trends, identify opportunities for product development, and inform marketing strategies.**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the insights gained from the "Dosage Forms Distribution by Brand" chart can help create a positive business impact.**

By understanding the dosage form preferences of different brands, businesses can:

* **Optimize product portfolios:** Identifying the most popular dosage forms can help companies focus on developing and promoting products that meet market demand.
* **Identify niche opportunities:** Understanding the usage of less common dosage forms can reveal potential niche markets where businesses can differentiate themselves.
* **Improve marketing strategies:** Tailoring marketing messages to emphasize the benefits of specific dosage forms can enhance product appeal.
* **Inform regulatory compliance:** Knowledge of dosage form trends can help businesses anticipate regulatory changes and ensure compliance.

**However, it's important to note that some insights might lead to negative growth if not addressed properly:**

* **Overemphasis on popular dosage forms:** If a business becomes overly focused on the most popular dosage forms without considering other factors (e.g., therapeutic needs, regulatory requirements), it could miss out on opportunities in niche markets.
* **Ignoring regulatory changes:** Failing to adapt to changes in dosage form regulations could lead to product recalls, fines, or loss of market share.
* **Neglecting patient preferences:** Overlooking the preferences of patients for certain dosage forms could result in decreased customer satisfaction and market share.

**Therefore, it's crucial to analyze the insights holistically and implement strategies that balance the need to meet market demand with other business objectives.**


#### Chart - 9

In [None]:
# 9. Top 10 Vendors by Freight Cost
top_vendors_freight = df.groupby('Vendor')['Freight Cost (USD)'].sum().nlargest(10)
plt.figure(figsize=(10,6))
top_vendors_freight.plot(kind='barh', color='orange')
plt.title('Top 10 Vendors by Freight Cost')
plt.xlabel('Freight Cost (USD)')
plt.ylabel('Vendor')
plt.show()

##### 1. Why did you pick the specific chart?

Graph Type: Horizontal bar chart

Reason: Identifies the vendors contributing the most to freight costs, which helps in understanding logistics costs associated with each vendor.

##### 2. What is/are the insight(s) found from the chart?

## Insights from the "Top 10 Vendors by Freight Cost" Chart

**1. Vendor Dominance:**
* **SCMS from RDC** is the clear leader in terms of freight costs, significantly outspending other vendors.
* **Orgenics, Ltd** also stands out with a notably higher freight cost compared to most of the other vendors.

**2. Clustering:**
* **Mid-Tier Vendors:** The majority of vendors seem to fall within a similar range of freight costs, suggesting a relatively competitive landscape.

**3. Variance:**
* **Individual Variations:** While there is a general clustering, there are still variations among the vendors within the mid-tier, indicating differences in shipping practices, product characteristics, or supply chain strategies.

**Additional Insights (if data were more granular):**
* **Product-Level Analysis:** Analyzing freight costs at the product level could reveal which products or categories contribute most to the overall costs.
* **Shipping Mode Analysis:** Understanding the shipping modes used by different vendors (e.g., air, sea, land) could provide insights into factors influencing freight costs.
* **Geographic Considerations:** Analyzing the geographic distribution of shipments could identify regions with higher transportation costs.

**Overall, the chart highlights the significant disparity in freight costs between the top vendors and the rest of the group. Further analysis could uncover the underlying reasons for this variation and identify opportunities for cost reduction or negotiation.**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the insights gained from the "Top 10 Vendors by Freight Cost" chart can help create a positive business impact.**

By understanding the significant disparity in freight costs between the top vendors and the rest of the group, businesses can:

* **Negotiate better rates:** Identifying vendors with lower freight costs can provide leverage for negotiating better rates with existing suppliers.
* **Optimize shipping practices:** Analyzing the factors contributing to higher freight costs can lead to improvements in shipping methods, packaging, or routing.
* **Explore alternative suppliers:** If freight costs are excessively high, businesses can consider sourcing from alternative vendors with more competitive rates.

**However, it's important to note that some insights might lead to negative growth if not addressed properly:**

* **Overemphasis on cost reduction:** If a business becomes solely focused on reducing freight costs without considering other factors (e.g., delivery time, product quality), it could lead to compromised service levels or increased product damage.
* **Ignoring supplier relationships:** Prioritizing cost reduction over supplier relationships could damage long-term partnerships and lead to supply chain disruptions.
* **Neglecting product characteristics:** Failing to consider the impact of product characteristics (e.g., size, weight, fragility) on freight costs could result in suboptimal shipping solutions.

**Therefore, it's crucial to analyze the insights holistically and implement strategies that balance the need for cost reduction with other business objectives.**


#### Chart - 10

In [None]:
# Chart - 10 visualization code
c_S = df.groupby(['Country','Shipment Mode'])['ID'].count().unstack() # Select a column ('Customer Segment' for example) and apply count aggregation before unstacking
c_S.plot(kind='bar',stacked=True)
plt.xlabel('Country') # Changed x-axis label to reflect the grouping
plt.ylabel('Count of Customer Segment') # Changed y-axis label to reflect the aggregation
plt.title('Stacked Bar Chart: Country vs Shipment Mode vs Customer Segment Count') # Changed title to reflect the data being visualized
plt.legend(title='Shipment Mode') # Changed legend title to reflect the grouping
plt.show()

##### 1. Why did you pick the specific chart?

I chose a stacked bar chart to visualize the "Country vs Shipment Mode vs Customer Segment Count" data because it's the most suitable for comparing the distribution of customer segments across different countries and shipment modes.

##### 2. What is/are the insight(s) found from the chart?

## Insights from the "Country vs Shipment Mode vs Customer Segment Count" Chart

**1. Shipment Mode Preferences:**
* **Air:** Air is the most commonly used shipment mode across most countries, indicating a preference for speed or urgency.
* **Ocean:** Ocean shipping is used less frequently, suggesting it might be preferred for larger shipments or lower-value goods.
* **Truck:** Truck shipping is used in some countries, likely for regional or domestic shipments.

**2. Country-Specific Preferences:**
* **Diverse Usage:** Some countries exhibit a more diverse use of shipment modes, while others have a strong preference for a particular mode.
* **Regional Factors:** Geographic location, infrastructure, and trade relationships might influence these preferences.

**3. Customer Segment Distribution:**
* **Varied Across Countries:** The distribution of customer segments (represented by different colors) varies significantly across countries, suggesting different market dynamics or product offerings in each region.
* **Segment-Specific Preferences:** Certain customer segments might have specific preferences for particular shipment modes.

**4. Regional Trends:**
* **Clustering:** The chart might reveal regional trends, such as a preference for air shipping in certain regions or a higher use of ocean shipping in others.

**Additional Insights (if data were more granular):**
* **Product-Level Analysis:** Analyzing the data by product category could reveal whether certain products are more likely to be shipped using specific modes.
* **Pricing Considerations:** Understanding the relative costs of different shipment modes could provide insights into factors driving mode preferences.
* **Time Sensitivity:** Analyzing the time requirements for different shipment modes could reveal why certain modes are preferred in specific regions or for particular customer segments.

**Overall, the chart provides valuable information about the shipment mode preferences and customer segment distribution across different countries, which can be used to optimize shipping strategies, improve customer satisfaction, and identify opportunities for cost reduction.**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the insights gained from the "Country vs Shipment Mode vs Customer Segment Count" chart can help create a positive business impact.**

By understanding the shipment mode preferences and customer segment distribution across different countries, businesses can:

* **Optimize shipping strategies:** Tailoring shipping methods to meet the specific needs of different regions and customer segments can improve delivery times, reduce costs, and enhance customer satisfaction.
* **Identify market opportunities:** Understanding the distribution of customer segments in different countries can help businesses identify new market opportunities or expand into regions with high potential.
* **Improve customer service:** By aligning shipping options with customer expectations, businesses can enhance their overall customer experience.

**However, it's important to note that some insights might lead to negative growth if not addressed properly:**

* **Overemphasis on cost reduction:** If a business becomes solely focused on reducing shipping costs without considering other factors (e.g., delivery time, customer satisfaction), it could lead to compromised service levels and lost customers.
* **Ignoring regional preferences:** Failing to adapt shipping strategies to regional preferences could result in lower customer satisfaction and decreased market share.
* **Neglecting customer segment needs:** Ignoring the specific needs of different customer segments could lead to dissatisfaction and lost business.

**Therefore, it's crucial to analyze the insights holistically and implement strategies that balance the need for cost-effective shipping with other business objectives.**


#### Chart - 11

In [None]:
# 11. Top 10 Products by Shipment Volume (Horizontal Bar Chart)
top_products = df.groupby('Item Description')['Line Item Quantity'].sum().nlargest(10)
plt.figure(figsize=(10,6))
top_products.plot(kind='barh', color='teal')
plt.title('Top 10 Products by Shipment Volume')
plt.xlabel('Total Shipment Volume')
plt.ylabel('Product')
plt.show()

##### 1. Why did you pick the specific chart?

A horizontal bar chart showing the top 10 products (based on shipment quantity) gives a clear view of the most frequently shipped items, which helps in demand analysis.

##### 2. What is/are the insight(s) found from the chart?

## Insights from the "Top 10 Products by Shipment Volume" Chart

**1. Product Dominance:**
* **Lamivudine/Nevirapine/Zidovudine** combination is the clear leader in shipment volume, indicating strong demand or market share.
* **Lamivudine**-based products are well-represented in the top 10, suggesting their popularity or effectiveness in certain therapeutic areas.

**2. Combination Therapies:**
* **Prevalence:** Many of the top products are combination therapies, suggesting a preference for multi-drug regimens in treating specific conditions.

**3. Dosage Variations:**
* **Multiple Formulations:** Different dosage strengths and tablet counts for the same product are present, indicating variations in treatment regimens or patient needs.

**Overall, the chart highlights the dominance of certain product categories and the preference for combination therapies in the market. Further analysis could uncover the reasons for this dominance, including factors like therapeutic effectiveness, pricing, and regulatory approvals.**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the insights gained from the "Top 10 Products by Shipment Volume" chart can help create a positive business impact.**

By understanding the product dominance and market trends, businesses can:

* **Focus on high-demand products:** Prioritizing the development and promotion of top-selling products can drive sales and increase market share.
* **Identify growth opportunities:** Analyzing the reasons for the success of certain products can reveal opportunities for developing similar or complementary products.
* **Optimize product portfolios:** Understanding the demand for different dosage forms and combination therapies can help businesses optimize their product offerings.

**However, it's important to note that some insights might lead to negative growth if not addressed properly:**

* **Overemphasis on top-selling products:** If a business becomes overly focused on the most popular products without considering market changes or emerging trends, it could miss out on new opportunities.
* **Ignoring product diversity:** Relying too heavily on a single product category can make a business vulnerable to market fluctuations or regulatory changes.
* **Neglecting product innovation:** Failing to invest in research and development to introduce new products or improve existing ones could lead to market share erosion.

**Therefore, it's crucial to analyze the insights holistically and implement strategies that balance the need to focus on high-demand products with other business objectives, such as product innovation and diversification.**


#### Chart - 12

In [None]:

# 12. Plot product group distribution
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='Product Group', palette='magma')
plt.title('Distribution of Product Groups')
plt.xlabel('Product Group')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

I chose a bar chart to visualize the "Distribution of Product Groups" data because it's the most suitable for comparing the frequency of different product groups

##### 2. What is/are the insight(s) found from the chart?

## Insights from the "Distribution of Product Groups" Chart

**1. Dominance of ARV:**
* **Overwhelming Majority:** ARV (Antiretroviral) products significantly outnumber other product groups, indicating their dominance in the dataset.

**2. Niche Product Groups:**
* **Smaller Counts:** HRDT (HIV Rapid Diagnostic Tests), ACT (Antimalarial Combination Therapies), MRDT (Malaria Rapid Diagnostic Tests), and ANTM (Antimicrobials) have considerably smaller counts, suggesting they are less prevalent or represent specific market segments.

**Overall, the chart highlights the dominance of ARV products in the dataset, while the other product groups appear to be niche or less frequently represented.**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the insights gained from the "Distribution of Product Groups" chart can help create a positive business impact.**

By understanding the dominance of ARV products and the niche nature of other groups, businesses can:

* **Focus on high-demand products:** Prioritizing ARV products can drive sales and increase market share.
* **Identify niche opportunities:** Exploring the potential for growth in the smaller product groups can reveal niche market opportunities.
* **Optimize product portfolios:** Understanding the distribution of product groups can help businesses balance their product offerings and avoid over-reliance on a single category.

**However, it's important to note that some insights might lead to negative growth if not addressed properly:**

* **Overemphasis on ARV products:** If a business becomes overly focused on ARV products without considering market changes or emerging trends, it could miss out on opportunities in other product groups.
* **Ignoring niche markets:** Neglecting to invest in niche product groups could limit growth potential and reduce market diversity.
* **Failing to adapt to market changes:** If the market dynamics shift, and the demand for ARV products declines, businesses that are heavily reliant on this category could face challenges.

**Therefore, it's crucial to analyze the insights holistically and implement strategies that balance the need to focus on high-demand products with other business objectives, such as product diversification and innovation.**


#### Chart - 13

In [None]:
# 13. Insurance Cost (USD) by Product Group
plt.figure(figsize=(12,6))
df.groupby('Product Group')['Line Item Insurance (USD)'].sum().sort_values(ascending=False).plot(kind='bar', color='magenta')
plt.title('Insurance Cost (USD) by Product Group')
plt.xlabel('Product Group')
plt.ylabel('Total Insurance Cost (USD)')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

- Graph Type: Bar chart

- Reason: Highlights the total insurance cost per product group, allowing an understanding of which product categories incur the most insurance costs.

##### 2. What is/are the insight(s) found from the chart?

## Insights from the "Insurance Cost (USD) by Product Group" Chart

**1. Dominance of ARV:**
* **Highest Cost:** ARV products account for the majority of insurance costs, indicating their higher value or risk profile.

**2. Niche Product Groups:**
* **Lower Costs:** HRDT, ACT, ANTM, and MRDT have significantly lower insurance costs, suggesting they might be less expensive or involve lower risks.

**Overall, the chart highlights the dominance of ARV products in terms of insurance costs, while the other product groups have relatively lower associated costs.**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, the insights gained from the "Insurance Cost (USD) by Product Group" chart can help create a positive business impact.**

By understanding the dominance of ARV products in terms of insurance costs, businesses can:

* **Prioritize risk management:** Focus on risk management strategies for ARV products to mitigate potential financial losses.
* **Optimize pricing:** Set appropriate pricing for ARV products to account for the higher insurance costs.
* **Identify cost-saving opportunities:** Explore ways to reduce insurance costs for ARV products, such as implementing risk mitigation measures or negotiating with insurers.

**However, it's important to note that some insights might lead to negative growth if not addressed properly:**

* **Overemphasis on ARV products:** If a business becomes overly focused on ARV products without considering other product groups, it could miss out on opportunities in niche markets.
* **Neglecting other product groups:** Failing to address the insurance needs of other product groups could lead to increased costs and potential risks.
* **Ignoring market changes:** If the market dynamics shift, and the demand for ARV products declines, businesses that are heavily reliant on this category could face challenges.

**Therefore, it's crucial to analyze the insights holistically and implement strategies that balance the need to manage insurance costs for ARV products with other business objectives, such as product diversification and risk mitigation.**


#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Select numerical columns for correlation analysis
numeric_columns = df.select_dtypes(include=[np.number]).columns

# Calculate correlation matrix
corr_matrix = df[numeric_columns].corr()

# Plotting the heatmap
plt.figure(figsize=(12,8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5, annot_kws={"size": 10})
plt.title('Correlation Heatmap', fontsize=16)
plt.xticks(rotation=45)
plt.yticks(rotation=0)
plt.show()


##### 1. Why did you pick the specific chart?

A correlation heatmap is an excellent way to visualize the relationships between numerical features in your dataset.

##### 2. What is/are the insight(s) found from the chart?

## Insights from the Correlation Heatmap

**Strong Positive Correlations:**
* **Line Item Quantity, Line Item Value, Weight (Kilograms), Freight Cost (USD), Line Item Insurance (USD):** These variables show strong positive correlations, indicating that as one increases, the others tend to increase as well.
* **Line Item Quantity, Line Item Value:** These two variables have a very strong positive correlation, suggesting that larger quantities often correspond to higher values.

**Strong Negative Correlations:**
* **Delivery Time, Delivery Delay:** These variables have a strong negative correlation, indicating that longer delivery times are often associated with larger delivery delays.

**Other Notable Correlations:**
* **Pack Price, Unit Price:** There's a moderate positive correlation between these two variables, suggesting that higher pack prices often correspond to higher unit prices.

**Overall, the heatmap reveals that several variables, particularly those related to order size and value, are closely correlated. Understanding these relationships can help businesses optimize pricing, shipping, and insurance strategies.**


#### Chart - 15 - Pair Plot

In [None]:
# 15. pair plot of variables
sns.pairplot(df)
plt.show()

##### 1. Why did you pick the specific chart?

A pair plot (also known as a scatterplot matrix) is a powerful tool for visualizing relationships between pairs of variables in a dataset. It helps you understand the distribution of individual features and the relationships between any two variables in the dataset.

##### 2. What is/are the insight(s) found from the chart?

## Insights from the Pair Plot

**General Observations:**
* **Linear Relationships:** Several pairs of variables show linear relationships, suggesting a strong correlation.
* **Scattered Patterns:** Other pairs exhibit more scattered patterns, indicating weaker or no correlations.
* **Outliers:** Some plots have outliers that might influence the correlation analysis.

**Specific Relationships (based on visual inspection):**
* **Positive Correlations:**
    * Line Item Quantity vs. Line Item Value
    * Weight (Kilograms) vs. Freight Cost (USD)
    * Weight (Kilograms) vs. Line Item Insurance (USD)
    * Delivery Delay vs. Delivery Time
* **Negative Correlations:**
    * Delivery Time vs. Delivery Delay

**Further Analysis:**
* **Quantifying Correlations:** To get a more precise understanding of the relationships, numerical correlation coefficients can be calculated.
* **Handling Outliers:** If outliers are deemed influential, they can be addressed using appropriate techniques (e.g., removal, transformation).

**Overall, the pair plot provides a visual overview of the relationships between different variables in the dataset. It helps identify potential correlations that can be further explored and quantified using statistical methods.**


## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?


###  Solutions to Business Objectives:

1. **Cost Optimization**:
   - Use freight cost and insurance data to negotiate better vendor rates and optimize shipping routes for high-cost products.

2. **Shipment Mode Efficiency**:
   - Analyze delivery times by shipment mode to prioritize faster or more cost-effective transportation options.

3. **Delivery Time Optimization**:
   - Identify vendors or regions with consistent delays and enforce stricter tracking to reduce late deliveries.

4. **Product and Vendor Performance**:
   - Focus on top-performing vendors and ensure high-demand products are always in stock to avoid delays.

5. **Profitability**:
   - Adjust pricing strategies based on value-to-weight ratios and implement tiered shipping charges for more profitable shipments.

6. **Customer Satisfaction**:
   - Improve on-time delivery and provide real-time updates to customers about potential delays.

7. **Inventory and Fulfillment**:
   - Optimize fulfillment methods based on shipment size and implement better inventory management for high-demand products.

8. **Sustainability**:
   - Shift to eco-friendly shipment modes and optimize packaging to reduce environmental impact.

# **Conclusion**

### Conclusion

By analyzing the FedEx Logistics data through targeted visualizations, we identified key opportunities for optimizing cost, improving delivery performance, enhancing customer satisfaction, and increasing overall operational efficiency. The insights highlight areas such as shipment mode selection, vendor performance, product profitability, and delivery accuracy as critical levers for improvement. Implementing solutions like better vendor negotiation, dynamic fulfillment, and smarter pricing strategies will enable FedEx to streamline logistics operations while maintaining high service levels and minimizing costs. This data-driven approach ensures that FedEx can continue to meet customer demands while driving profitability and sustainability.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***