# Facilities KPI Scorecard
This notebook serves as a repository of public-facing KPIs for the Department of General Services' Facilities Division. The purpose of this notebook is to have, in a single public place, the entire pipeline for calculating the division's three KPIs that are based in Archibus. Those KPIs are:

1. % of Corrective Maintenance Work Requests Completed On Time
2. % of Preventive Maintenance Work Requests Completed On Time
3. Preventive Maintenace to Corrective Maintenance Ratio

Part of the intention of this notebook is to build _transparency_ and _reproducibility_ by capturing the entire end-to-end process used to calculate these metrics in one place.

## Setup

### Import packages

In [None]:
# workhorse modules
import pandas as pd
import numpy as np
from datetime import timedelta, datetime
import re
from pathlib import Path
import seaborn as sns

import datadotworld as dw

# local utility functions
from utils import (
    add_cm_benchmarks,
    add_fiscal_year,
    set_pd_params,
    tidy_up_df,
    cast_dtypes,
    glue_date_time,
    compute_days_to_completion,
    consolidate_prob_types,
    compute_pm_cm,
    compute_kpi_table,
)

### Set pandas options
This makes Pandas print all rows and columns to the output when requested.

In [None]:
set_pd_params()

pd.options.mode.chained_assignment = None  # default='warn'

### Import the data from Data.World
Data is a copy of Archibus's `wrhwr` table. To see the exact query used to generate the input data, see `/sql/input_for_FMD_KPIs.sql`.

In [None]:
kpis_raw = dw.query(
    dataset_key="dgsbpio/auditfinding3", query="select * from wrhwr_01072021"
).dataframe

print(f"The work orders dataframe has {kpis_raw.shape[0]:,} rows.")

kpis_raw.sample(3, random_state=444)[["wr_id", "prob_type"]]

In [None]:
users = dw.query(
    dataset_key="dgsbpio/auditfinding3", query="select * from archibus_user_roles"
).dataframe

users = users.applymap(lambda x: x.strip() if isinstance(x, str) else x)

## Data cleaning

### Basic cleaning
- removes white spaces in strings to facilitate matching, 
- drops rows with no problem type, 
- renames a few columns

In [None]:
wr_tidy = cast_dtypes(kpis_raw)
wr_tidy = tidy_up_df(wr_tidy)

print(f"The tidied work orders dataframe has {wr_tidy.shape[0]:,} rows.")

### Combine date and time columns to get timestamps
This takes the date from a date column and the time from a time column and combines them into a single timestamp.

This transformation allows us to know the time to completion with greater precision. 

In [None]:
# glue the date and time for request
wr_dt = glue_date_time(wr_tidy, "date_requested", "time_requested", "requested_dt")

# glue the date and time for completion
wr_dt = glue_date_time(wr_dt, "date_completed", "time_completed", "completed_dt")

# convert "date closed_order" to date time (this column has no time information)
wr_dt["date_closed"] = wr_dt["date_closed"].astype("datetime64")

### Examine the cleaned data

In [None]:
wr_dt[
    [
        "wr_id",
        "problem_type",
        "requested_dt",
        "completed_dt",
        "date_closed",
        "status",
    ]
].sample(3, random_state=451)

## Data preparation

In [None]:
wr_joined = wr_dt.merge(
    users[["user_name", "role_name"]],
    how="left",
    left_on="requestor",
    right_on="user_name",
)

### Include days to completion

In [None]:
wr_durations = compute_days_to_completion(wr_joined)

In [None]:
wr_durations[
    [
        "wr_id",
        "role_name",
        "problem_type",
        "requested_dt",
        "completed_dt",
        "date_closed",
        "days_to_completion",
        "status",
    ]
].sample(3, random_state=446)

### Add fiscal year
Note that the function `entirely_within_fiscal_year()` keeps only those rows where the work order was requested and closed in the same fiscal year. __Other rows that straddle two fiscal years are dropped__.

For comparison, I've included the function `add_fiscal_year()`, which derives the fiscal year from the request date or from the completion date — and drops no rows.

In [None]:
wr_fy = add_fiscal_year(wr_durations, assign_fy_on="closure")

In [None]:
cond_fy = wr_fy["fiscal_year"].isin(range(2016, 2022))
wr_fy = wr_fy[cond_fy]

## KPI: % PMs Completed On Time 
The goal here is to filter the data down to preventive maintenance only, and then show how many are completed before a given benchmark.

### Filter to PM only, and for relevant fiscal years only

In [None]:
pm_list = [
    "BUILDING INTERIOR INSPECTION",
    "BUILDING PM",
    "HVAC|PM",
    "INSEPCTION",
    "PREVENTIVE MAINT",
]


cond_pm = wr_fy["problem_type"].isin(pm_list)

wr_pm = wr_fy[cond_pm]
wr_pm["benchmark"] = 21

print(f"The filtered work orders dataframe has {wr_pm.shape[0]:,} rows.")

#### Compute the benchmark and add 'is_on_time' column

In [None]:
def compute_is_on_time(row):
    row["is_on_time"] = row["days_to_completion"] <= row["benchmark"]
    return row


pms_on_time = wr_pm.apply(compute_is_on_time, axis=1)

#### Group by fiscal year and get % on time

In [None]:
pm_compliance = compute_kpi_table(pms_on_time, "percent_PMs_on_time", "total_PMs")
pm_compliance

## KPI: PM:CM Ratio

The two lists below contain the exact same problem types mentioned in last year's scorecard. So we would expect to be able to replicate last year's results closely.

In [None]:
CM_list = [
    "BOILER",
    "CHILLERS",
    "COOLING TOWERS",
    "HVAC",
    "HVAC INFRASTRUCTURE",
    "HVAC|REPAIR",
]

PM_list = [
    "HVAC|PM",
    "PREVENTIVE MAINT",
]

### Filter to HVAC rows only

In [None]:
cond_cm = wr_fy["problem_type"].isin(CM_list)
cond_pm = wr_fy["problem_type"].isin(PM_list)

wr_HVAC = wr_fy[cond_cm | cond_pm]
wr_HVAC["is_pm"] = wr_HVAC["problem_type"].isin(PM_list)

print(f"We've gone from {len(wr_fy):,} rows to {len(wr_HVAC):,} rows.")

### Compute all PM/CM stats by fiscal year

In [None]:
pm_cm_results = compute_pm_cm(wr_HVAC, PM_list)

In [None]:
pm_cm_results

In [None]:
count_plot_data = pd.melt(
    pm_cm_results, id_vars=["year"], value_vars=["count_cm", "count_pm"]
)

sns.lineplot(data=count_plot_data, y="value", x="year", hue="variable")

sns.despine()

## KPI: Percent of CM Work Requests Completed On-Time
Here are the key facts needed to understand the agency's new method for computing this KPI:

- Only CM problem types are considered, so all PM work orders are dropped.
- The work orders are first assigned a "primary" problem type, which consolidates the number of problem types
- Each of these primary problem types has a benchmark, which is then added to the work request's row
- Finally, the work order is determined to be on-time based on comparing its time to completion to its benchmark

In [None]:
wr_cm = wr_fy.copy()

consolidated_wrs = wr_cm.apply(consolidate_prob_types, axis=1)

cond_cm = consolidated_wrs["primary"] != "PREVENTIVE"
consolidated_cms = consolidated_wrs[cond_cm]

print(
    f"Dropping some unbenchmarked small categories takes us from {len(consolidated_wrs):,} rows to {len(consolidated_cms):,} rows."
)

In [None]:
cms_benchmarked = consolidated_cms.apply(add_cm_benchmarks, axis=1)

cms_benchmarked.sample(6, random_state=444)[
    ["problem_type", "primary", "benchmark", "days_to_completion"]
]

In [None]:
cms_on_time = cms_benchmarked.apply(compute_is_on_time, axis=1)

In [None]:
cm_compliance = compute_kpi_table(cms_on_time, "Percent CMs on Time", "Count of CMs")
cm_compliance

In [None]:
# kpis_fy = add_fiscal_year(kpis_raw, assign_fy_on="closure")

# table_df = kpis_fy.groupby("fiscal_year")[["wr_id"]].count()
# table_df