In [None]:
# Supply Chain Risk & Profitability Analysis (SQL)

## Business Problem
Supply chains often prioritise cost and profitability without fully accounting
for operational risk. This project analyses supplier performance by combining
financial metrics with quality and reliability indicators to identify
high-profit but high-risk suppliers.

## Objective
Use SQL to identify suppliers that contribute significantly to gross profit
while presenting elevated operational risk, enabling more informed sourcing
and risk mitigation decisions.

    ## Technical Skills Demonstrated
- SQL aggregations (SUM, AVG)
- CASE WHEN logic
- HAVING clauses with subqueries
- Dynamic KPI creation (supplier risk rate)
- NULL handling and data cleaning
- Translating business questions into SQL queries


In [6]:
import pandas as pd
import sqlite3


# Load CSV
# Replace with Kaggle dataset path
path = "/kaggle/input/supplychain/supply_chain_clean.csv"
df = pd.read_csv(path)

df.head()
df.columns = (
    df.columns
      .str.strip()
      .str.lower()
      .str.replace("£", "", regex=False)
      .str.replace("%", "pct", regex=False)
      .str.replace(" ", "_")
      .str.replace("-", "_")
)

df.columns
money_cols = [
    "price", "revenue_generated", "shipping_costs", "manufacturing_costs",
    "costs", "total_fulfilment_cost", "gross_profit"
]

for c in money_cols:
    if c in df.columns:
        df[c] = (
            df[c].astype(str)
                 .str.replace("£", "", regex=False)
                 .str.replace(",", "", regex=False)
        )
        df[c] = pd.to_numeric(df[c], errors="coerce")

# defect rates if they are like "4.85%" as text
if "defect_rates" in df.columns:
    df["defect_rates"] = (
        df["defect_rates"].astype(str)
          .str.replace("%", "", regex=False)
    )
    df["defect_rates"] = pd.to_numeric(df["defect_rates"], errors="coerce")

df.describe(include="all").transpose().head(20)


Index(['product_type', 'sku', 'price', 'availability',
       'number_of_products_sold', 'revenue_generated', 'stock_levels',
       'lead_times', 'order_quantities', 'shipping_times', 'shipping_carriers',
       'shipping_costs', 'supplier_name', 'location', 'production_volumes',
       'manufacturing_costs', 'inspection_results', 'defect_rates',
       'transportation_modes', 'costs', 'total_fulfilment_cost',
       'gross_profit', 'margin_pct', 'on_time_flag', 'quality_pass_flag'],
      dtype='object')

In [7]:
conn = sqlite3.connect("supply_chain.db")

df.to_sql("supply_chain", conn, if_exists="replace", index=False)

# quick check
pd.read_sql("SELECT COUNT(*) AS rows FROM supply_chain", conn)


Unnamed: 0,rows
0,135


In [33]:
q = """
SELECT
  supplier_name,
  SUM(gross_profit) AS total_gross_profit,
  AVG(on_time_flag) AS on_time_rate,
  AVG(quality_pass_flag) AS quality_pass_rate,
  AVG(high_defect_flag) AS high_defect_rate,
  AVG(supplier_risk_flag) AS supplier_risk_rate
FROM supply_chain
GROUP BY supplier_name
ORDER BY total_gross_profit DESC;
"""
import sqlite3

conn = sqlite3.connect("supply_chain.db")

pd.read_sql(
    "SELECT name FROM sqlite_master WHERE type='table';",
    conn
)

df.to_sql(
    "supply_chain",
    conn,
    if_exists="replace",
    index=False
)
pd.read_sql(
    "SELECT name FROM sqlite_master WHERE type='table';",
    conn
)
pd.read_sql(
    "SELECT * FROM supply_chain LIMIT 5;",
    conn
)

q = """
SELECT
  supplier_name,
  SUM(gross_profit) AS total_gross_profit,
  AVG(on_time_flag) AS on_time_rate,
  AVG(quality_pass_flag) AS quality_pass_rate,
  AVG(CASE WHEN inspection_results = 'Fail' THEN 1.0 ELSE 0.0 END) AS inspection_fail_rate,
  AVG(defect_rates) AS avg_defect_rate_pct
FROM supply_chain
WHERE supplier_name IS NOT NULL
GROUP BY supplier_name
ORDER BY total_gross_profit DESC;
"""
pd.read_sql(q, conn)


Unnamed: 0,supplier_name,total_gross_profit,on_time_rate,quality_pass_rate,inspection_fail_rate,avg_defect_rate_pct
0,Supplier 1,140637.33,0.37037,0.481481,0.222222,1.803333
1,Supplier 2,113094.84,0.409091,0.227273,0.363636,2.361818
2,Supplier 5,99785.0,0.333333,0.166667,0.388889,2.665
3,Supplier 3,90037.64,0.6,0.133333,0.2,2.466
4,Supplier 4,75843.92,0.5,0.0,0.666667,2.338889


## Insights: 

Supplier 1 (BEST)
- Highest gross profit
- Best quality pass rate
- Low inspection failure
- Lowest defect rate
Preferred supplier

Supplier 3

- Best on-time performance
- Weak quality pass rate
- Moderate defects
Operationally reliable, quality improvement needed

Supplier 4 (WORST)
- Lowest profit
- 0% quality pass
- 66% inspection failure
- High defect rate
High-risk supplier – audit or replace

Supplier performance analysis shows that high profitability does not always correlate with operational reliability. Supplier 1 demonstrates the strongest balance of profit and quality, while Supplier 4 exhibits consistently poor quality outcomes despite moderate on-time delivery.

In [34]:
q = """
SELECT
  supplier_name,
  SUM(gross_profit) AS total_gross_profit,
  AVG(on_time_flag) AS on_time_rate,
  AVG(quality_pass_flag) AS quality_pass_rate,
  AVG(defect_rates) AS avg_defect_rate,
  (
    AVG(on_time_flag) * 0.3 +
    AVG(quality_pass_flag) * 0.3 +
    (1 - AVG(defect_rates)/100.0) * 0.4
  ) AS supplier_score
FROM supply_chain
WHERE supplier_name IS NOT NULL
GROUP BY supplier_name
ORDER BY supplier_score DESC;
"""
pd.read_sql(q, conn)


Unnamed: 0,supplier_name,total_gross_profit,on_time_rate,quality_pass_rate,avg_defect_rate,supplier_score
0,Supplier 1,140637.33,0.37037,0.481481,1.803333,0.648342
1,Supplier 3,90037.64,0.6,0.133333,2.466,0.610136
2,Supplier 2,113094.84,0.409091,0.227273,2.361818,0.581462
3,Supplier 4,75843.92,0.5,0.0,2.338889,0.540644
4,Supplier 5,99785.0,0.333333,0.166667,2.665,0.53934


## Supplier Score Analysis

This query evaluates supplier performance using a **weighted supplier score** that balances profitability, delivery reliability, quality outcomes, and defect risk.

### Methodology
Each supplier is scored using a weighted combination of key operational KPIs:

- **On-time delivery rate (30%)** - measures delivery reliability
- **Quality pass rate (30%)** - measures inspection and quality performance
- **Defect rate (40%)** - penalises suppliers with higher defect levels

The supplier score is calculated as:

Supplier Score =  
(0.3 × On-Time Rate) + (0.3 × Quality Pass Rate) + (0.4 × (1 − Defect Rate))

This approach ensures that suppliers with strong delivery and quality performance are prioritised over those with high defect risk, even if they generate high profit.

### Output
The query returns:
- Total gross profit per supplier
- On-time delivery rate
- Quality pass rate
- Average defect rate
- Final weighted supplier score

Suppliers are ranked from highest to lowest score to support data-driven supplier selection and risk management decisions.


In [45]:
q = """
SELECT
  supplier_name,
  SUM(gross_profit) AS total_gross_profit,
  AVG(
    CASE 
      WHEN inspection_results = 'Fail' OR defect_rates > 3 THEN 1.0 
      ELSE 0.0 
    END
  ) AS supplier_risk_rate
FROM supply_chain
WHERE supplier_name IS NOT NULL
GROUP BY supplier_name
ORDER BY total_gross_profit DESC;
"""
pd.read_sql(q, conn)



Unnamed: 0,supplier_name,total_gross_profit,supplier_risk_rate
0,Supplier 1,140637.33,0.444444
1,Supplier 2,113094.84,0.545455
2,Supplier 5,99785.0,0.5
3,Supplier 3,90037.64,0.466667
4,Supplier 4,75843.92,0.722222


## Supplier Risk Rate Analysis

This analysis calculates a supplier risk rate based on operational quality
signals rather than relying on precomputed flags.

### Risk Definition
A supplier is considered high risk if:
- The inspection result is recorded as **Fail**, or
- The defect rate exceeds **3%**

### Metric
Supplier Risk Rate =  
(Number of high-risk orders) / (Total orders per supplier)

### Purpose
This metric highlights suppliers that may threaten supply chain reliability
despite contributing to overall profitability.

## Key Findings

- Supplier 1 demonstrates the strongest balance between profitability and
  operational reliability, with a comparatively low supplier risk rate.
- Supplier 4 exhibits consistently high supplier risk, driven by frequent
  inspection failures and elevated defect rates.
- Several high-profit suppliers still present elevated operational risk,
  indicating that profit alone is not a sufficient supplier selection metric.

## Recommendations

- Prioritise suppliers with high gross profit **and** low supplier risk rate.
- Conduct quality audits for suppliers with supplier risk rates exceeding 50%.
- Introduce supplier scorecards that combine profitability and operational
  reliability to guide sourcing decisions.


## Conclusion
This analysis demonstrates that high profitability does not always correlate
with operational reliability. By deriving supplier risk dynamically in SQL,
the project identifies a small subset of suppliers that warrant immediate
attention. This approach supports more resilient supply chain decision-making
by balancing cost efficiency with operational risk.
