# Data Loading & Initial Reconnaissance
## INSY6500 - PM Analysis Project

**Purpose:** Load and perform initial exploration of PM forecast and performance datasets

**Datasets:**
- `101ki_pm_performance.csv` - 12-month historical performance (April 2024 - March 2025)
- `103ki_pm_forecast.csv` - 12-month PM forecast (April 2026 - March 2027)

---

## 1. Setup & Imports

In [2]:
# Standard imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

print("Libraries imported")

Libraries imported


## 2. Define Data Paths

In [3]:
# Define data directory
DATA_DIR = Path('../data')

# Define file paths
FORECAST_FILE = DATA_DIR / '103ki_pm_forecast.csv'
PERFORMANCE_FILE = DATA_DIR / '101ki_pm_performance.csv'

## 3. Load Data

### 3.1 Load Performance Data (101ki)
Historical performance metrics from previous fiscal year.

In [5]:
# Load Historical performance data
df_performance = pd.read_csv(PERFORMANCE_FILE)
print(f"Performance data shape: {df_performance.shape[0]:,} rows, {df_performance.shape[1]} columns")

Performance data shape: 18,476 rows, 7 columns


### 3.2 Load Forecast Data (103ki)
Scheduled PM activities for the coming fiscal year.

In [10]:
# Load future workload forecast data
df_forecast = pd.read_csv(FORECAST_FILE, encoding='cp1252')
print(f"Forecast data shape: {df_forecast.shape[0]:,} rows, {df_forecast.shape[1]} columns")

Forecast data shape: 131,717 rows, 22 columns
