# Predicting Procurement Compliance Using KPI-Driven Machine Learning Models

**Author:** Brittany Dowdle
**Date:** 7/4/2025
**Objective:** This notebook is part of the overall [capstone project](https://github.com/Bdowdle4/Dowdle_Analytics_Capstone). The ultimate goal is to identify patterns and predictors of supplier non-compliance using machine learning techniques.

## Introduction
This project uses the [Procurement KPI Analysis Dataset](https://www.kaggle.com/datasets/shahriarkabir/procurement-kpi-analysis-dataset) to predict compliance of suppliers. The dataset includes purchase order records with key procurement attributes. To ensure data quality and model readiness, this notebook will clean and preprocess the dataset to prepare for exporatory analysis and predictive modeling. 

****

### Imports
In the code cell below are the necessary Python libraries for this notebook. *All imports should be at the top of the notebook.*

In [21]:
import pandas as pd
import numpy as np

### Load and Inspect the Data

In [22]:
# Load the dataset
df = pd.read_csv("C:/Users/Britt/Documents/44688/Procurement KPI Analysis Dataset.csv")

# Preview the structure and details
print("Dataset Info:")
print(df.info(), '\n')

print("Dataset Description:")
print(df.describe(), '\n')

print("First Few Rows:")
print(df.head(), '\n')

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 777 entries, 0 to 776
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   PO_ID             777 non-null    object 
 1   Supplier          777 non-null    object 
 2   Order_Date        777 non-null    object 
 3   Delivery_Date     690 non-null    object 
 4   Item_Category     777 non-null    object 
 5   Order_Status      777 non-null    object 
 6   Quantity          777 non-null    int64  
 7   Unit_Price        777 non-null    float64
 8   Negotiated_Price  777 non-null    float64
 9   Defective_Units   641 non-null    float64
 10  Compliance        777 non-null    object 
dtypes: float64(3), int64(1), object(7)
memory usage: 66.9+ KB
None 

Dataset Description:
          Quantity  Unit_Price  Negotiated_Price  Defective_Units
count   777.000000  777.000000        777.000000       641.000000
mean   1094.660232   58.283822         53.66072

### Standardize Column Names

In [23]:
# Convert all column names to lowercase and remove white space
df.columns = df.columns.str.strip().str.lower()

# View column names
print(df.columns)


Index(['po_id', 'supplier', 'order_date', 'delivery_date', 'item_category',
       'order_status', 'quantity', 'unit_price', 'negotiated_price',
       'defective_units', 'compliance'],
      dtype='object')


### Convert Dates to Datetime Format

In [24]:
# Parse date fields
df['order_date'] = pd.to_datetime(df['order_date'], errors='coerce')
df['delivery_date'] = pd.to_datetime(df['delivery_date'], errors='coerce')
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 777 entries, 0 to 776
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   po_id             777 non-null    object        
 1   supplier          777 non-null    object        
 2   order_date        777 non-null    datetime64[ns]
 3   delivery_date     690 non-null    datetime64[ns]
 4   item_category     777 non-null    object        
 5   order_status      777 non-null    object        
 6   quantity          777 non-null    int64         
 7   unit_price        777 non-null    float64       
 8   negotiated_price  777 non-null    float64       
 9   defective_units   641 non-null    float64       
 10  compliance        777 non-null    object        
dtypes: datetime64[ns](2), float64(3), int64(1), object(5)
memory usage: 66.9+ KB
None
