# Facilities KPI Scorecard
This notebook serves as a repository of public-facing KPIs for the Department of General Services' Facilities Division. The purpose of this notebook is to have, in a single public place, the entire pipeline for calculating the division's three KPIs that are based in Archibus. Those KPIs are:

1. % of Corrective Maintenance Work Requests Completed On Time
2. % of Preventive Maintenance Work Requests Completed On Time
3. Preventive Maintenace to Corrective Maintenance Ratio

Part of the intention of this notebook is to build _transparency_ and _reproducibility_ by capturing the entire end-to-end process used to calculate these metrics in one place.

## Setup

### Import packages

In [1]:
# workhorse modules
import pandas as pd
import numpy as np
from datetime import timedelta, datetime
import re
from pathlib import Path
import seaborn as sns
import datadotworld as dw
import matplotlib.pyplot as plt
from faker import Faker

# local utility functions
from utils import (
    add_cm_benchmarks,
    add_fiscal_year,
    set_pd_params,
    tidy_up_df,
    cast_dtypes,
    glue_date_time,
    compute_days_to_completion,
    compute_days_open,
    consolidate_prob_types,
    compute_pm_cm,
    compute_pm_cm_by_month,
    compute_kpi_table,
    compute_kpi_table_by_month,
)

from vis_utils import set_plot_params, pointplot_with_barplot

### Set pandas options
This makes Pandas print all rows and columns to the output when requested.

In [2]:
set_pd_params()
set_plot_params()
pd.options.mode.chained_assignment = None  # default='warn'

### Import the work order data from Data.World
This data is a copy of Archibus's `wrhwr` table stored at DGS's account on Data.World. To see the exact query used to generate the input data, see `/sql/input_for_FMD_KPIs.sql`.

In [3]:
kpis_raw = dw.query(
    dataset_key="dgsbpio/auditfinding3", query="select * from wrhwr_20210329"
).dataframe

In [4]:
print(f"The work orders dataframe has {kpis_raw.shape[0]:,} rows.")

kpis_raw.sample(3, random_state=444)[["wr_id", "prob_type", "date_requested"]]

The work orders dataframe has 107,236 rows.


Unnamed: 0,wr_id,prob_type,date_requested
60889,90036,HVAC,2017-06-27 00:00:00.000
63377,92431,ELEC/LIGHT,2017-08-17 00:00:00.000
67566,96618,,2017-11-27 12:54:42.950


In [5]:
kpis_raw.columns

Index(['wr_id', 'requestor', 'status', 'bl_id', 'prob_type', 'date_completed',
       'time_completed', 'date_assigned', 'time_assigned', 'date_requested',
       'time_requested', 'date_closed', 'description', 'work_team_id',
       'priority_label', 'cf_notes', 'cost_total', 'invoice_number',
       'po_number', 'supervisor'],
      dtype='object')

### Import the users table
The only reason why we need this second table is to figure out which work requests of type "OTHER" were requested by FMD staff and which were requested by a Service Request Liaison. 

OTHERs requested internally will be given the primary problem type "OTHER-INTERNAL", while OTHERs requested by SRLs will be given the primary problem type "OTHER-EXTERNAL".

In [6]:
users = dw.query(
    dataset_key="dgsbpio/auditfinding3", query="select * from archibus_user_roles"
).dataframe

users = users.applymap(lambda x: x.strip() if isinstance(x, str) else x)

## Data cleaning
For the purposes of this project, DGS is keeping the cleaning stage very simple. We're not attempting to remove duplicates or outliers, both of which involve relatively complex operations. 

### Basic cleaning
- removes white spaces in strings to facilitate matching, 
- drops rows with no problem type, or rows created by a test,
- renames a few columns

In [7]:
# make sure each column is of the correct data type
wr_tidy = cast_dtypes(kpis_raw)
# basic cleaning
wr_tidy = tidy_up_df(wr_tidy)

print(f"The tidied work orders dataframe has {len(wr_tidy):,} rows.")
print(f"By tidying the data, we have removed {len(kpis_raw) - len(wr_tidy):,} rows.")

The tidied work orders dataframe has 107,109 rows.
By tidying the data, we have removed 127 rows.


### Drop canceled and rejected work orders
The data comes to us with many canceled work orders. Those shouldn't count against us as not having been completed on time. So we drop them here. 

In [8]:
# drop rows that were canceled
cond_valid = ~wr_tidy["status"].isin(["Can", "Rej"])

wr_valid = wr_tidy[cond_valid]

print(
    f"By dropping canceled work orders, we have removed {len(wr_tidy) - len(wr_valid):,} rows."
)

By dropping canceled work orders, we have removed 3,567 rows.


### Combine date and time columns to get timestamps
This takes the date from a date column and the time from a time column and combines them into a single timestamp.

This transformation allows us to know the time to completion with greater precision. 

In [9]:
# glue the date and time for request
wr_dt = glue_date_time(wr_valid, "date_requested", "time_requested", "requested_dt")

# glue the date and time for completion
wr_dt = glue_date_time(wr_dt, "date_completed", "time_completed", "completed_dt")

# convert "date closed" to a valid datetime (this column has no time information)
wr_dt["date_closed"] = wr_dt["date_closed"].astype("datetime64")

### Examine the cleaned data
The display below gives us a sense of what some key columns in the data now look like. 

In [10]:
wr_dt[
    [
        "wr_id",
        "problem_type",
        "requested_dt",
        "completed_dt",
        "date_closed",
        "status",
    ]
].sample(4, random_state=451)

Unnamed: 0,wr_id,problem_type,requested_dt,completed_dt,date_closed,status
51589,79755,SERV/PEST,2016-12-09 09:55:07,2016-12-22 10:03:25,2017-01-06 15:12:31.043,Clo
80276,110831,BATHROOM_FIXT,2018-09-13 16:18:51,2018-10-11 08:20:36,2020-01-24 13:56:46.653,Clo
38977,63674,OTHER,2016-03-22 10:45:32,2016-03-24 13:23:11,2016-04-03 15:57:12.443,Clo
37906,62592,PREVENTIVE MAINT,2016-03-01 12:30:06,2016-04-06 06:26:30,2016-04-06 06:38:30.340,Clo


## Data preparation

### Merge work requests and user data
Now we bring the data from the users table into the work request table. This lets us figure out the rose of the user who requested each work request. 

In [11]:
wr_joined = wr_dt.merge(
    users[["user_name", "role_name"]],
    how="left",
    left_on="requestor",
    right_on="user_name",
)

### Add days to completion
Next, we compute the duration of time between the request and completion, for each job. We store that number in a column called "days_to_completion". 

In [12]:
wr_durations = compute_days_open(wr_joined)
wr_durations = compute_days_to_completion(wr_durations)

Let's check out three rows of the data, now that we've added some columns. Notice that `days_to_completion` is empty if the work request hasn't been marked completed yet. 

In [13]:
wr_durations[
    [
        "wr_id",
        "role_name",
        "problem_type",
        "requested_dt",
        "completed_dt",
        "date_closed",
        "days_to_completion",
        "status",
    ]
].sample(3, random_state=446)

Unnamed: 0_level_0,wr_id,role_name,problem_type,requested_dt,completed_dt,date_closed,days_to_completion,status
requested_dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2015-06-15 09:28:35,37940,SUPERVISOR - BOC,OTHER,2015-06-15 09:28:35,2015-06-16 13:23:11,2015-06-18 00:00:00.000,1.16,Clo
2016-01-04 15:43:32,58136,GATEKEEPER - BOC,ELEC/LIGHT,2016-01-04 15:43:32,2016-04-11 10:25:58,2016-04-11 10:28:41.910,97.78,Clo
2018-10-20 11:52:08,112354,SUPERVISOR - BOC,OTHER,2018-10-20 11:52:08,2019-04-29 14:18:30,2020-01-28 13:17:27.393,191.1,Clo


### Add fiscal year
There are different ways to assign a fiscal year to a work request. Currently, we're doing it by the fiscal year in which the work request was closed. Other options would include the fiscal year of the request. 


In [14]:
wr_fy = add_fiscal_year(wr_durations, assign_fy_on="closure")

After assigning the fiscal year, we can drop work requests from FY15 and older. 

In [15]:
cond_fy = wr_fy["fiscal_year"].isin(range(2016, 2022))
wr_fy = wr_fy[cond_fy]

print(len(wr_fy))

66307


### Anonymize names

In [16]:
def split_names(row):
    if "." in row["supervisor"]:
        full_name = row["supervisor"].split(".")
        row["supervisor_fname"] = full_name[0]
        row["supervisor_lname"] = full_name[1]
    if row["supervisor"] == "NULL":
        row["supervisor_fname"] = "NULL"
        row["supervisor_lname"] = "NULL"
    return row


wr_names = wr_fy.apply(split_names, axis=1)

In [17]:
def replace_names(df, faker):
    # empty string values as nan
    df["supervisor"] = df["supervisor"].replace("NULL", np.nan)
    # name replacement dictionary
    fname_replacements = {
        name: faker.first_name().upper().replace(" ", "")
        for name in df["supervisor_fname"].unique()
        if name is not np.nan
    }
    lname_replacements = {
        name: faker.last_name().upper().replace(" ", "")
        for name in df["supervisor_lname"].unique()
        if name is not np.nan
    }

    # apply replacement
    df = df.replace(
        {"supervisor_fname": fname_replacements, "supervisor_lname": lname_replacements}
    )
    df["supervisor_anon"] = df["supervisor_fname"] + "." + df["supervisor_lname"]
    df = df.drop(columns=["supervisor_fname", "supervisor_lname"])
    df = df.replace(np.nan, "NULL")
    return df


fake = Faker()
wr_names = replace_names(wr_names, fake)

print(f"Length: {len(wr_names)}")

wr_names[["supervisor_anon"]].head()

Length: 66307


Unnamed: 0_level_0,supervisor_anon
requested_dt,Unnamed: 1_level_1
2013-10-23 10:13:32,BRITTANY.WARD
2013-10-23 13:45:19,ALISON.MILLER
2013-10-28 14:17:29,ALISON.MILLER
2013-10-30 08:35:21,KEVIN.SALAZAR
2013-11-20 13:20:58,ANGELA.JACOBSON


### Consolidate many problem types into primary types
As shown below, consolidating all the different problem types gets us from a starting place of 110 problem types to only 30. Much more manageable to have 30 benchmarks than 110.

In [18]:
wr_primary = wr_names.copy()

wr_primary = wr_primary.apply(consolidate_prob_types, axis=1)

In [19]:
print(
    f"Consolidation takes us from {wr_primary['problem_type'].nunique()} problem types to only {wr_primary['primary'].nunique():,}."
)

Consolidation takes us from 110 problem types to only 30.


### Anonymize names

In [20]:
def split_names(row):
    if "." in row["supervisor"]:
        full_name = row["supervisor"].split(".")
        row["supervisor_fname"] = full_name[0]
        row["supervisor_lname"] = full_name[1]
    if row["supervisor"] == "NULL":
        row["supervisor_fname"] = "NULL"
        row["supervisor_lname"] = "NULL"
    return row


wr_names = wr_fy.apply(split_names, axis=1).head(12)

In [21]:
def replace_names(df, faker):
    # empty string values as nan
    df[["supervisor_fname", "supervisor_lname"]] = df[
        ["supervisor_fname", "supervisor_lname"]
    ].replace("NULL", np.nan)
    # name replacement dictionary
    fname_replacements = {
        name: faker.first_name().upper().replace(" ", "")
        for name in df["supervisor_fname"].unique()
        if name is not np.nan
    }
    lname_replacements = {
        name: faker.last_name().upper().replace(" ", "")
        for name in df["supervisor_lname"].unique()
        if name is not np.nan
    }

    # apply replacement
    df = df.replace(
        {"supervisor_fname": fname_replacements, "supervisor_lname": lname_replacements}
    )
    df["supervisor_anon"] = df["supervisor_fname"] + "." + df["supervisor_lname"]
    df = df.drop(columns=["supervisor_fname", "supervisor_lname"])
    df["supervisor_anon"] = df["supervisor_anon"].replace(np.nan, "NULL")
    return df


fake = Faker()
wr_names = replace_names(wr_names, fake)

wr_names[["supervisor_anon"]].head()

Unnamed: 0_level_0,supervisor_anon
requested_dt,Unnamed: 1_level_1
2013-10-23 10:13:32,
2013-10-23 13:45:19,SANDRA.LONG
2013-10-28 14:17:29,SANDRA.LONG
2013-10-30 08:35:21,BRENDA.LONG
2013-11-20 13:20:58,DEVON.THOMPSON


## KPI: % PMs Completed On Time 
The goal here is to filter the data down to preventive maintenance only, and then show how many are completed on or before the benchmark (21 days).

### Filter to valid PM only, and for relevant fiscal years only
After noticing that many PMs have the status "Can" — meaning they were canceled — we fil

In [22]:
# this defines which problem types are considered PMs
pm_list = [
    "BUILDING INTERIOR INSPECTION",
    "BUILDING PM",
    "HVAC|PM",
    "INSEPCTION",
    "PREVENTIVE MAINT",
]

# filter data to PM types only
cond_pm = wr_primary["problem_type"].isin(pm_list)

# apply filter conditions
wr_pm = wr_primary[cond_pm]

print(f"The filtered PMs dataframe has {wr_pm.shape[0]:,} rows.")

The filtered PMs dataframe has 6,980 rows.


#### Compute the benchmark and add `is_on_time` column
Here we tell Python that PMs are on-time if completed in 21 days. 

Then the function `compute_is_on_time` compares the column `days_to_completion` to the benchmark and writes down whether the work request was completed on time. 

In [23]:
def compute_is_on_time(row):
    row["is_on_time"] = int(row["days_to_completion"]) <= int(row["benchmark"])
    return row


# assign benchmark
wr_pm["benchmark"] = 21

# add "is_on_time" column with performance data
pms_on_time = wr_pm.apply(compute_is_on_time, axis=1)

ValueError: invalid literal for int() with base 10: 'NULL'

In [None]:
pms_on_time.columns

In [None]:
pms_on_time[
    [
        "wr_id",
        "problem_type",
        "requested_dt",
        "completed_dt",
        "days_to_completion",
        "is_on_time",
        "status",
    ]
].sample(3, random_state=446)

#### Group by fiscal year and get % on time
Now that we've stored all this information, we can group by the fiscal year to get each year's KPI, together with a count of how many PMs 

In [None]:
pm_compliance = compute_kpi_table(pms_on_time, "percent_PMs_on_time", "total_PMs")
pm_compliance

In [None]:
pointplot_with_barplot(
    pm_compliance,
    x="fiscal_year",
    point_y="percent_PMs_on_time",
    bar_y="total_PMs",
    ymax_bar=2_500,
    ylabel_point="Percent PMs On Time",
    ylabel_bar="Total PMs",
    title="Percent of PMs On Time By Fiscal Year",
)

## KPI: PM:CM Ratio

The two lists below contain the exact same problem types mentioned in last year's scorecard. So we would expect to be able to replicate last year's results closely.

In [None]:
CM_list = [
    "BOILER",
    "CHILLERS",
    "COOLING TOWERS",
    "HVAC",
    "HVAC INFRASTRUCTURE",
    "HVAC|REPAIR",
]

PM_list = [
    "HVAC|PM",
    "PREVENTIVE MAINT",
]

### Filter to HVAC rows only

In [None]:
cond_cm = wr_primary["problem_type"].isin(CM_list)
cond_pm = wr_primary["problem_type"].isin(PM_list)

wr_HVAC = wr_primary[cond_cm | cond_pm]
wr_HVAC["is_pm"] = wr_HVAC["problem_type"].isin(PM_list)

print(
    f"Filtering to HVAC request only takes us from {len(wr_fy):,} rows to {len(wr_HVAC):,} rows."
)

### Compute all PM/CM stats by fiscal year
First we deploy a custom function that counts the PMs and CMs in each fiscal year and then calculates the PM:CM ratio.

In [None]:
pm_cm_results = compute_pm_cm(wr_HVAC, PM_list)

In [None]:
pm_cm_results

### Plot PM:CM ratio by fiscal year

In [None]:
pointplot_with_barplot(
    pm_cm_results,
    x="fiscal_year",
    point_y="pm_cm_ratio",
    bar_y="count_hvac",
    ymax_bar=5_000,
    ylabel_point="PM:CM Ratio",
    ylabel_bar="Total HVAC Work Requests",
    title="Preventive to Corrective Ratio By Fiscal Year",
    yticklabels=["0", "0", "0.25:1", "0.5:1", "0.75:1", "1:1"],
)

### Plot the number of PMs and CMs by fiscal year
We can get a little more insight into what's going on with the ratio by checking out the raw counts of preventive and corrective maintenance work requests by fiscal year. 

In [None]:
count_plot_data = pd.melt(
    pm_cm_results, id_vars=["fiscal_year"], value_vars=["count_cm", "count_pm"]
)

ax = sns.lineplot(data=count_plot_data, y="value", x="fiscal_year", hue="variable")
new_labels = ["Corrective", "Preventive"]
plt.legend(title="Number of Work Orders", loc="upper left", labels=new_labels)

ax.set(
    title="Volume of HVAC PMs And CMs By Fiscal Year",
    xlabel="Fiscal Year",
    ylim=(0, 3000),
)
sns.despine()

## KPI: Percent of CM Work Requests Completed On-Time
Here are the key facts needed to understand the agency's new method for computing this KPI:

- Only CM problem types are considered, so all PM work orders are dropped.
- The work orders are first assigned a "primary" problem type, which consolidates the number of problem types
- Each of these primary problem types has a benchmark, which is then added to the work request's row
- Finally, the work order is determined to be on-time based on comparing its time to completion to its benchmark

### Filter to get CMs only

In [None]:
cond_cm = wr_primary["primary"] != "PREVENTIVE"
consolidated_cms = wr_primary[cond_cm]

print(
    f"Dropping some unbenchmarked small categories takes us from {len(consolidated_wrs):,} rows to {len(consolidated_cms):,} rows."
)

### Apply benchmarks
Now we look up the benchmark for each work request and add it in a new column. The function `add_cm_benchmarks()` contains the list of benchmarks and problem types.

In [None]:
cms_benchmarked = consolidated_cms.apply(add_cm_benchmarks, axis=1)

cms_benchmarked.sample(6, random_state=444)[
    ["problem_type", "primary", "benchmark", "days_to_completion"]
]

### Compute whether requests are on time

In [None]:
cms_on_time = cms_benchmarked.apply(compute_is_on_time, axis=1)

### Group by fiscal year to get KPI

In [None]:
cm_compliance = compute_kpi_table(cms_on_time, "percent_CMs_on_time", "total_CMs")
cm_compliance

### Plot % CMs on time by fiscal year

In [None]:
pointplot_with_barplot(
    cm_compliance,
    x="fiscal_year",
    point_y="percent_CMs_on_time",
    bar_y="total_CMs",
    ymax_bar=15_000,
    ylabel_point="Percent CMs On Time",
    ylabel_bar="Total CMs",
    title="Percent of CMs On Time By Fiscal Year",
)

## Monthly figures for current FY
First, enter the date of the first day that is outside the range. For example, if we want to show monthly data up to the end of February 2021, set the variable `end_date` to "03-01-2021."

In [None]:
end_date = "04-01-2021"

### PMs on Time

In [None]:
pms_on_time_current_fy = compute_kpi_table_by_month(
    pms_on_time,
    "percent_PMs_on_time",
    "total_PMs",
    end_date=end_date,
)
pms_on_time_current_fy

In [None]:
pointplot_with_barplot(
    pms_on_time_current_fy,
    x="year_month",
    point_y="percent_PMs_on_time",
    bar_y="total_PMs",
    ymax_bar=700,
    ylabel_point="% PMs Completed On Time (Red Lines)",
    ylabel_bar="Total PMs Closed (Grey Bars)",
    title="Percent of PMs Completed On Time By Month (FY21)",
)

### PM:CM Ratio

In [None]:
pm_cm_ratio_current_fy = compute_pm_cm_by_month(wr_HVAC, PM_list, end_date=end_date)
pm_cm_ratio_current_fy

In [None]:
pointplot_with_barplot(
    pm_cm_ratio_current_fy,
    x="year_month",
    point_y="pm_cm_ratio",
    bar_y="count_hvac",
    yaxis_freq=200,
    ymax_point=1_000,
    ymax_bar=1_200,
    ylabel_point="PM:CM Ratio (Red Lines)",
    ylabel_bar="Total HVAC Work Requests Closed (Grey Bars)",
    title="Preventive to Corrective Ratio By Month (FY21)",
    yticklabels=[
        "0",
        "0",
        "2:1",
        "4:1",
        "6:1",
        "8:1",
    ],
)

In [None]:
pm_cm_ratio_current_fy = pm_cm_ratio_current_fy.rename(
    columns={"count_cm": "HVAC CMs", "count_pm": "HVAC PMs"}
)

count_plot_data = pd.melt(
    pm_cm_ratio_current_fy, id_vars=["year_month"], value_vars=["HVAC CMs", "HVAC PMs"]
)


ax = sns.pointplot(
    data=count_plot_data, y="value", marker="o", x="year_month", hue="variable"
)
# new_labels = ["Corrective", "Preventive"]
# plt.legend(title="Number of Work Orders", loc="upper left", labels=new_labels)

ax.set(
    title="Volume of HVAC PMs And CMs By Month",
    xlabel="Fiscal Year",
    ylabel="Number of Work Requests Closed",
    ylim=(0, 800),
)
sns.despine()

In [None]:
cms_on_time_current_fy = compute_kpi_table_by_month(
    cms_on_time,
    "percent_CMs_on_time",
    "total_CMs",
    end_date=end_date,
)

cms_on_time_current_fy

In [None]:
pointplot_with_barplot(
    cms_on_time_current_fy,
    x="year_month",
    point_y="percent_CMs_on_time",
    bar_y="total_CMs",
    ymax_bar=4_000,
    ylabel_point="% CMs Completed On Time (Red Lines)",
    ylabel_bar="Total CMs Closed (Grey Bars)",
    title="Percent of CMs Completed On Time By Month (FY21)",
)

### Percent backlog

In [None]:
wr_fy["is_backlog"] = wr_fy["days_open"] > 90

In [None]:
def compute_backlog_table_by_month(
    df, label_for_KPI=None, label_for_totals=None, current_fy=2021, end_date=None
):
    df = df.copy()
    try:
        end_date = pd.to_datetime(end_date)
    except Exception:
        print(f"Date string {end_date} cannot be converted to a date.")
    # filter to current fy
    cond_current_fy = df["fiscal_year"] == current_fy
    cond_end_date = df["date_closed"] < end_date
    df = df[cond_current_fy & cond_end_date]
    table_df = (
        df[["wr_id", "date_closed", "is_backlog"]]
        .resample("M", on="date_closed")
        .agg({"is_backlog": "mean", "wr_id": "count"})
    )
    table_df["year_month"] = table_df.index.strftime("%b-%y")
    table_df["is_backlog"] = table_df["is_backlog"].apply(lambda x: round(x * 100, 2))
    table_df = table_df.rename(
        columns={"is_backlog": label_for_KPI, "wr_id": label_for_totals}
    )
    return table_df


backlog_plot_data = compute_backlog_table_by_month(
    wr_fy,
    label_for_KPI="percent_backlog",
    label_for_totals="total_work_orders",
    end_date=end_date,
)

In [None]:
pointplot_with_barplot(
    backlog_plot_data,
    x="year_month",
    point_y="percent_backlog",
    bar_y="total_work_orders",
    ymax_bar=5_000,
    ylabel_point="% Backlog (Red Lines)",
    ylabel_bar="Total WRs Closed (Grey Bars)",
    title="Percent of WRs Closed that Were Older than 90 Days (FY21)",
)

## Backlog charts

For this we need `wr_durations` because `wr_fy` has assigned the date based on the date of closure.

### Backlog by problem type

In [None]:
wr_primary[wr_primary["status"] == "Com"][["days_to_close", "status"]].sample(10)

In [None]:
def group_backlog_by_col(df, col="primary", backlog_length: int = 90, top_n: int = 10):
    df = df.copy()
    # filter to backlog
    cond_backlog = df["days_to_close"] >= backlog_length

    cond_status = df["status"].isin(["A", "AA", "Com", "I", "HC", "S"])
    df = df[cond_status & cond_backlog]
    counts_df = pd.DataFrame(
        df.groupby(col)["wr_id"].count().sort_values(ascending=False)
    )
    counts_df = counts_df.head(top_n)
    return counts_df.reset_index()


primary_df = group_backlog_by_col(
    backlog_df,
    col="primary",
    backlog_length=90,
)

In [None]:
ax = sns.barplot(data=primary_df, x="primary", y="wr_id", color="lightgreen")
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment="right")
ax.set(title="Size of Backlog By Primary Problem Type (> 90 Days)", ylabel="Count of work requests")
sns.despine()

### Backlog by supervisor

In [None]:
supervisor_df = group_backlog_by_col(
    backlog_df, col="supervisor_anon", backlog_length=90
)

```Clo    92499 0 Closed
Com     6238 1 Completed
Can     3515 0 Canceled
AA      2821 1 Assigned to Work Order
I        827   Issued and In Process
R        515   Rejected
HC       301   Unknown
S        167   Stopped
Rej      105   Rejected
HMI       83   Unknown 
RMI       82
HL        33
HP        20
HI        11
HA         8
A          3```

In [None]:
ax = sns.barplot(data=supervisor_df, x="supervisor", y="wr_id", color="lightgreen")
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment="right")
ax.set(title="Size of Backlog By Supervisor", ylabel="Count of work requests")
sns.despine()

### Backlog by status

In [None]:
status_df = group_backlog_by_col(backlog_df, col="status", backlog_length=90)
status_df["status"] = status_df["status"].map(
    {"Com": "Complete", "AA": "Assigned", "I": "Issued/in process", "S": "Stopped", "HC": "HC"}
)

In [None]:
ax = sns.barplot(data=status_df, x="status", y="wr_id", color="lightgreen")
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment="right")
ax.set(title="Size of Backlog By Status (> 90 Days)", ylabel="Count of work requests")
sns.despine()

In [None]:
backlog_df.columns

In [None]:
violin_plot_data = backlog_df[backlog_df['status'].isin(["A", "AA", "Com", "I", "HC", "S"])]
ax = sns.violinplot(x="status", y="days_open", data=violin_plot_data)
ax.set(title="Backlog WRs with 'Completed' Status Are Older than Others")
sns.despine()