# Path 1: PM Strategy Execution & Completion Analysis
## INSY6500 - PM Analysis Project

**Analyst:** Mike Moyer

**Research Question:** *How do different departments approach preventative maintenance? Does it reveal anything about their operational philosophy?*

**Approach:** This notebook follows the EDA workflow to analyze the relationship between different departments and their planned PM activities.

---

## Section 1 - Data Loading and Check

### 1.1 Setup & Data Loading

Import required libraries and load both datasets using the approach from the data loader template.

In [1]:
# Standard imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

print("Libraries imported")

Libraries imported


### 1.2 Define Data Path

In [23]:
# Define data directory
DATA_DIR = Path('../data')

# Define file paths
FORECAST_FILE = DATA_DIR / '103ki_pm_forecast.csv'

### 1.3 Load Forecast Data 

Scheduled PM activities for the coming fiscal year.

In [24]:
# Load future workload forecast data
df_forecast = pd.read_csv(FORECAST_FILE, 
                        encoding='cp1252',
                        parse_dates = ['DUE_DATE'],
                        dtype={
                            'INTERVAL': 'category',
                            'JOB_TYPE': 'category',
                            'LABOR_CRAFT': 'category',
                            'PMSCOPETYPE': 'category',
                            'DEPT': 'category',
                            'DEPT_NAME': 'category',
                            'DEPT_TYPE' : 'category',
                            'PLANT' : 'category',
                            'LINE' : 'category',
                            'ZONENAME' : 'category',
                            'PROCESSNAME' : 'category'
                        })
print(f"Forecast data shape: {df_forecast.shape[0]:,} rows, {df_forecast.shape[1]} columns")

Forecast data shape: 131,397 rows, 22 columns


In [25]:
# Quick preview
print("First 3 rows of Forecast data:")
display(df_forecast.head(3))

First 3 rows of Forecast data:


Unnamed: 0,DUE_DATE,PMNUM,COUNTKEY,PMDESCRIPTION,INTERVAL,FORECASTJP,JOB_TYPE,LABOR_CRAFT,PLANNED_LABOR_HRS,TOTAL_MATERIAL_COST,...,PMSCOPETYPE,LOCATION,LOCATIONDESC,PLANT,DEPT,DEPT_NAME,DEPT_TYPE,LINE,ZONENAME,PROCESSNAME
0,2026-04-01,PM104088,2026-04-01-PM104088,LP DIE CAST MACHINE #4 PM - M,1-MONTHS,JP211217,INSPECTION,ESTMULT,1.75,,...,ASSET,3DCLCAA4XXMC,Casting Die Cast Machine 4 Machine,3,3DC,DIE CAST,DC,L,LP Casting,Casting Die Cast Machine 4
1,2026-04-01,PM104114,2026-04-01-PM104114,LPDC #1 PF DEBURR STATION 1M PM,1-MONTHS,JP211160,INSPECTION,ESTMULT,0.5,,...,ASSET,3DCLPF01XXPF,Pre-finish Line 1 Pre-finish,3,3DC,DIE CAST,DC,L,Pre-finish,Pre-finish Line 1
2,2026-04-01,PM104116,2026-04-01-PM104116,LPDC #1 PF INSPECTION STATION PM,1-MONTHS,JP211164,INSPECTION,ESTMULT,0.333333,,...,ASSET,3DCLPF01XXPF,Pre-finish Line 1 Pre-finish,3,3DC,DIE CAST,DC,L,Pre-finish,Pre-finish Line 1


In [26]:
#quick data type / info & nun-null check
print("\n Data Types and Non-Null Counts:")
df_forecast.info()


 Data Types and Non-Null Counts:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131397 entries, 0 to 131396
Data columns (total 22 columns):
 #   Column                  Non-Null Count   Dtype         
---  ------                  --------------   -----         
 0   DUE_DATE                131397 non-null  datetime64[ns]
 1   PMNUM                   131397 non-null  object        
 2   COUNTKEY                131397 non-null  object        
 3   PMDESCRIPTION           131397 non-null  object        
 4   INTERVAL                131397 non-null  category      
 5   FORECASTJP              131324 non-null  object        
 6   JOB_TYPE                131320 non-null  category      
 7   LABOR_CRAFT             104483 non-null  category      
 8   PLANNED_LABOR_HRS       130568 non-null  float64       
 9   TOTAL_MATERIAL_COST     2275 non-null    float64       
 10  TASK_COUNT              131397 non-null  int64         
 11  TOTAL_TASK_DESC_LENGTH  131293 non-null  float64       
 

---
### 1.4 Explore the Interval Column

In [35]:
print("="*80)
print("INTERVAL FIELD EXPLORATION")
print("="*80)

# 1. Unique interval values and their frequency
print("\n-> Unique Interval Values:")
interval_counts = df_forecast['INTERVAL'].value_counts().sort_index()
print(f"\n   Total unique intervals: {df_forecast['INTERVAL'].nunique()}")
print("\n   Top 20 most common intervals:")
display(interval_counts.head(20).to_frame())

# 2. What units are used?
print("\n-> Units Found in Interval Field:")
# Extract the unit part (everything after the dash)
units = df_forecast['INTERVAL'].str.extract(r'-([A-Z]+)$')[0].value_counts()
display(units.to_frame(name='count'))

# 3. Full sorted list for reference
# Extract numeric part
df_forecast['interval_number'] = df_forecast['INTERVAL'].str.extract(r'^(\d+)-')[0].astype(float)
df_forecast['interval_unit'] = df_forecast['INTERVAL'].str.extract(r'-([A-Z]+)$')[0]
print("\n-> Complete Interval List (sorted by unit then number):")
all_intervals = df_forecast.groupby(['interval_unit', 'interval_number']).size().reset_index(name='count')
all_intervals = all_intervals.sort_values(['interval_unit', 'interval_number'])
display(all_intervals)

# Clean up temporary columns
df_forecast = df_forecast.drop(['interval_number', 'interval_unit'], axis=1)

INTERVAL FIELD EXPLORATION

-> Unique Interval Values:

   Total unique intervals: 47

   Top 20 most common intervals:


Unnamed: 0_level_0,count
INTERVAL,Unnamed: 1_level_1
1-DAYS,5980
1-MONTHS,43056
1-WEEKS,34739
1-YEARS,4986
10-WEEKS,5
10-YEARS,21
12-MONTHS,320
12-WEEKS,5
13-WEEKS,128
14-DAYS,78



-> Units Found in Interval Field:


Unnamed: 0_level_0,count
0,Unnamed: 1_level_1
MONTHS,76063
WEEKS,38203
DAYS,10855
YEARS,6276



-> Complete Interval List (sorted by unit then number):


Unnamed: 0,interval_unit,interval_number,count
0,DAYS,1.0,5980
1,DAYS,6.0,156
2,DAYS,7.0,2843
3,DAYS,14.0,78
4,DAYS,21.0,68
5,DAYS,28.0,39
6,DAYS,32.0,11
7,DAYS,40.0,72
8,DAYS,45.0,1316
9,DAYS,60.0,120


In [41]:
# Save interval analysis for reference
OUTPUT_DIR = Path('../outputs')
OUTPUT_DIR.mkdir(exist_ok=True)

# Save as both pickle and CSV
all_intervals.to_pickle(OUTPUT_DIR / 'interval_analysis.pkl')
all_intervals.to_csv(OUTPUT_DIR / 'interval_analysis.csv', index=False)

#### Overlaps and Observations

1. Distribution by Unit:
   *  **DAYS:** 11 distinct values (8,855 total PMs)
   *  **WEEKS:** 10 distinct values (38,203 total PMs)
   *  **MONTHS:** 15 distinct values (76,063 total PMs)
   *  **YEARS:** 10 distinct values (6,276 total PMs)
2. Overlapping Unit / Frequencies:
   * **1-WEEKS (34,739 PMs) vs 7-DAYS (2,843 PMs)** - same frequency, different units
   * **2-WEEKS (788 PMs) vs 14-DAYS(78 PMs)** 
   * **4-WEEKS (1,417 PMs) vs 1-MONTHS (43,056 PMs)** - roughly similar 
   * **52-WEEKS (3 PMs) vs 1-YEARS (4,986 PMs)** 
3. Weird Outliers to Note:
    * 45-DAYS (1,316 PMs) - unusual, not quite monthly or bimonthly
    * 40-DAYS (72 PMs)
    * 32-DAYS (11 PMs)
    * 26-MONTHS (2 PMs)
    * 36-MONTHS (12 PMs)

#### Parsing Strategy
To overcome these differences in text / yet similarities in frequence I will reparse the data with by converting the Frequency Unit to days and multiply them by the frequency to normalize their frequency.  

The Conversion will be 

```python
conversion = {
    'DAYS': 1,
    'WEEKS': 7,
    'MONTHS': 30.42,  # Average month length
    'YEARS': 365.25   # Account for leap years
}
```

And the Frequency Bands will be based on below WORK IN PROGRESS
|Band|Days|Note|
|---|---|---|
|Daily:| < 4
|Weekly:| 4-17
|Biweekly:| 18-25
|Monthly:| 26-60
|Quarterly:| 61-120
|Semi-Annual:|121-270
|Annual:| 271-545
|Multi-Year:| 546+ 
