# Nurse staffing strategies for enhanced patient care

Analysis of the Center for Medicare & Medicaid Services Nurse Staffing
Dataset

Matthew Bain  
2024-03-22

I analyze a medical staffing dataset and identify avenues to improve
work satisfaction among nurses and the quality of care provided at
United States medical institutions.

\[…\]

## Imports

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from great_tables import GT
from pandas.plotting import scatter_matrix

from src.stylesheet import customize_plots
from src.inspection import make_df, display, display2

## The dataset

### Load the data

We begin by exploring the data to get to know the features and patterns
on which we will base our analysis.

In [4]:
if 'data' not in locals():
    data = pd.read_csv(
        "../data/raw/PBJ_Daily_Nurse_Staffing_Q1_2024.zip",
        encoding='ISO-8859-1',
        low_memory=False
    )
else:
    print("data loaded.")

### Inspect the data

In [18]:
GT(data.sample(10))

In [7]:
df = data.describe().round(1)
# display(Markdown(df.to_markdown()))
GT(df.reset_index())

### Group the features

We note that there are 91 records per provider
(`len(data["WorkDate"].unique())`) and 1,330,966 records in the table
overall. The following table, which collapses the raw data across
providers, thus has 14,626 $\left( \frac{1330966}{91} \right)$ entries.

In [8]:
df = data.loc[:, [
    "STATE",
    "COUNTY_NAME", "COUNTY_FIPS",
    "CITY",
    "PROVNAME", "PROVNUM",
    # "MDScensus"
]].value_counts()
# df.to_frame()
GT(df.reset_index().head(n=20))

In [17]:
GT(data[["CY_Qtr", "WorkDate", "MDScensus"]].head())

### Clean the data

## Explore the dataset

### Visualize distributions

### Visualize relationships

In [11]:
attributes = ["Hrs_RN", "Hrs_LPN_ctr", "Hrs_CNA", "Hrs_NAtrn", "Hrs_MedAide"]
n = len(attributes)

fig, axs = plt.subplots(n, n, figsize=(8, 8))
scatter_matrix(
    data[attributes].sample(200),
    ax=axs, alpha=.7,
    hist_kwds=dict(bins=15, linewidth=0)
)
fig.align_ylabels(axs[:, 0])
fig.align_xlabels(axs[-1, :])
for ax in axs.flatten():
    ax.tick_params(axis='both', which='both', length=3.5)

# save_fig("scatter_matrix_plot")

plt.show()

### Compare groups

## Feature engineer

### Join geographical data

### Join seasonal data

## Analyze geography

## Analyze seasonality

## Model

## Extra visualizations

### Sparklines

In [85]:
# Plot sparklines of average work hours across 91 days by state
(
    GT(gt_df.head(), rowname_col="STATE")
    .fmt_nanoplot(
        columns="lines",
        reference_line="mean",
        reference_area=["min", "q1"]
    )
    .tab_header(
        title="Nurse hours worked in the United States",
        subtitle="The top 5 busiest states",
    )
    .tab_stubhead(label="State")
    .cols_label(
        lines="Total hours worked over 91 days",
    )
)

## Archive