<center><h1>Sales Opportunities from 2024 Quarter 2<br>Centers for Medicaid and Medicare Services Data</h1>

*Prepared By: Chris Mims*

6 November 2024</center>

With Clipboard Health's mission of revolutionizing the marketplace for healthcare talent by paving the way in reliability, affordability, and ease of use for both facilities and and healthcare professionals, Clipboard Health could make a strong impact for long-term care facilities and their residents. The Centers for Medicaid and Medicare Services (CMS) compiles and publishes a public data set that details the number of payroll hours worked by healthcare professionals on a daily basis at each of the participating facilities. Insights from the data contained in the [Q2 2024 CMS Payroll Based Journal Nurse Staffing report](https://data.cms.gov/quality-of-care/payroll-based-journal-daily-nurse-staffing/data) lead to the following recommendations. 

## High Level Overview of Available Data

Of the 14564 facilities contained in the Q2 2024 CMS data, there are 5750 facilities that are not currently utilizing independent healthcare professionals (contractors).

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt 
from sklearn.preprocessing import StandardScaler
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
plt.style.use('ggplot')
pd.set_option('display.max_columns', 200)
import plotly.io as pio
pio.renderers.default='notebook'
import plotly.offline as poff
poff.init_notebook_mode()

In [None]:
pbj_data = pd.read_csv('PBJ_Daily_Nurse_Staffing_Q2_2024.csv', low_memory= False, encoding = 'latin-1')
pbj_data.columns = [x.lower() for x in pbj_data.columns.tolist()]

In [None]:
ctr_hrs_cols = [x for x in pbj_data.columns.tolist() if x.find('_ctr') >= 0]
ctr_hrs_df = pbj_data[['provnum'] + ctr_hrs_cols].copy()
ctr_hrs_df['ctr_hrs_ttl'] = ctr_hrs_df[ctr_hrs_cols].apply(lambda x: x.sum(), axis = 1)
ctr_hrs_df = ctr_hrs_df.groupby('provnum').agg({'ctr_hrs_ttl':'sum'})
ctr_hrs_df.insert(0, 'provnum', ctr_hrs_df.index)
ctr_hrs_df.reset_index(drop = True, inplace = True)
ctr_hrs_df['no_ctr_hrs'] = ctr_hrs_df[['ctr_hrs_ttl']].apply(lambda x: 'No Contractor Hours' if x['ctr_hrs_ttl'] == 0 else 'Has Contractor Hours', axis = 1)
ctr_hrs_df = ctr_hrs_df.groupby('no_ctr_hrs').agg({'provnum':'nunique'})
ctr_hrs_df.insert(0, 'label', ctr_hrs_df.index)
ctr_hrs_df.reset_index(drop = True, inplace = True)
# ctr_hrs_df.head()

In [None]:
num_days = len(pbj_data['workdate'].unique().tolist())

In [None]:
pbj_ratios = pbj_data.groupby('provnum').agg(mdscensus_mean = ('mdscensus','mean'),
                                             mdscensus_sum = ('mdscensus','sum'),
                                             hrs_rndon = ('hrs_rndon','sum'),
                                             hrs_rndon_emp = ('hrs_rndon_emp','sum'),
                                             hrs_rdon_ctr = ('hrs_rndon_ctr','sum'),
                                             hrs_rnadmin = ('hrs_rnadmin','sum'),
                                             hrs_rnadmin_emp = ('hrs_rnadmin_emp','sum'),
                                             hrs_rnadmin_ctr = ('hrs_rnadmin_ctr','sum'),
                                             hrs_rn = ('hrs_rn','sum'),
                                             hrs_rn_emp = ('hrs_rn_emp','sum'),
                                             hrs_rn_ctr = ('hrs_rn_ctr','sum'),
                                             hrs_lpnadmin = ('hrs_lpnadmin','sum'),
                                             hrs_lpnadmin_emp = ('hrs_lpnadmin_emp','sum'),
                                             hrs_lpnadmin_ctr = ('hrs_lpnadmin_ctr','sum'),
                                             hrs_lpn = ('hrs_lpn','sum'),
                                             hrs_lpn_emp = ('hrs_lpn_emp','sum'),
                                             hrs_lpn_ctr = ('hrs_lpn_ctr','sum'),
                                             hrs_cna = ('hrs_cna','sum'),
                                             hrs_cna_emp = ('hrs_cna_emp','sum'),
                                             hrs_cna_ctr = ('hrs_cna_ctr','sum'),
                                             hrs_natrn = ('hrs_natrn','sum'),
                                             hrs_natrn_emp = ('hrs_natrn_emp','sum'),
                                             hrs_natrn_ctr = ('hrs_natrn_ctr','sum'),
                                             hrs_medaide = ('hrs_medaide','sum'),
                                             hrs_medaide_emp = ('hrs_medaide_emp','sum'),
                                             hrs_medaide_ctr = ('hrs_medaide_ctr','sum'))
pbj_ratios.insert(0, 'provnum', pbj_ratios.index)
pbj_ratios.reset_index(drop = True, inplace = True)
pbj_ratios_cols = pbj_ratios.columns.tolist()
emp_cols = [pbj_ratios_cols[i] for i in range(4, len(pbj_ratios_cols), 3)]
ctr_cols = [pbj_ratios_cols[i] for i in range(5, len(pbj_ratios_cols), 3)]
pbj_ratios['hrs_ttl'] = pbj_ratios[[pbj_ratios_cols[i] for i in range(3, len(pbj_ratios_cols), 3)]].apply( \
    lambda x: x.sum(), axis = 1)
pbj_ratios['hrs_ttl_emp'] = pbj_ratios[emp_cols].apply( \
    lambda x: x.sum(), axis = 1)
pbj_ratios['hrs_ttl_ctr'] = pbj_ratios[ctr_cols].apply( \
    lambda x: x.sum(), axis = 1)
all_pbj_cols = pbj_ratios.columns.tolist()
all_emp_cols = [pbj_ratios_cols[i] for i in range(4, len(pbj_ratios_cols), 3)]
all_ctr_cols = [pbj_ratios_cols[i] for i in range(5, len(pbj_ratios_cols), 3)]
# pbj_ratios.head()

In [None]:
no_ctr_locs = pbj_ratios.query('hrs_ttl_ctr == 0').provnum.tolist()
# print(len(no_ctr_locs))

In [None]:
pbj_none_ctr = pbj_ratios[pbj_ratios['hrs_ttl_ctr'] == 0].copy()
pbj_with_ctr = pbj_ratios[pbj_ratios['hrs_ttl_ctr'] != 0].copy()
hrs_avg_all_typs = pbj_ratios[['provnum', 'mdscensus_sum'] + [all_pbj_cols[i] for i in range(3, len(all_pbj_cols), 3)]].copy()
hrs_avg_all_typs.insert(3, 'hrs_avg_day_cens_rndon', hrs_avg_all_typs['hrs_rndon'] / hrs_avg_all_typs['mdscensus_sum'])
hrs_avg_all_typs.insert(5, 'hrs_avg_day_cens_rnadmin', hrs_avg_all_typs['hrs_rnadmin'] / hrs_avg_all_typs['mdscensus_sum'])
hrs_avg_all_typs.insert(7, 'hrs_avg_day_cens_rn', hrs_avg_all_typs['hrs_rn'] / hrs_avg_all_typs['mdscensus_sum'])
hrs_avg_all_typs.insert(9, 'hrs_avg_day_cens_lpnadmin', hrs_avg_all_typs['hrs_lpnadmin'] / hrs_avg_all_typs['mdscensus_sum'])
hrs_avg_all_typs.insert(11, 'hrs_avg_day_cens_lpn', hrs_avg_all_typs['hrs_lpn'] / hrs_avg_all_typs['mdscensus_sum'])
hrs_avg_all_typs.insert(13, 'hrs_avg_day_cens_cna', hrs_avg_all_typs['hrs_cna'] / hrs_avg_all_typs['mdscensus_sum'])
hrs_avg_all_typs.insert(15, 'hrs_avg_day_cens_natrn', hrs_avg_all_typs['hrs_natrn'] / hrs_avg_all_typs['mdscensus_sum'])
hrs_avg_all_typs.insert(17, 'hrs_avg_day_cens_medaide', hrs_avg_all_typs['hrs_medaide'] / hrs_avg_all_typs['mdscensus_sum'])
hrs_avg_all_typs['hrs_avg_day_cens_ttl'] = hrs_avg_all_typs['hrs_ttl'] / hrs_avg_all_typs['mdscensus_sum']
hrs_avg_all_typs_cols = hrs_avg_all_typs.columns.tolist()
hrs_avg_all_typs.drop([hrs_avg_all_typs_cols[i] for i in range(2, len(hrs_avg_all_typs_cols), 2)] + ['mdscensus_sum'], axis = 1, inplace = True)
# hrs_avg_all_typs.head()

In [None]:
def get_emp_title(str1):
    match str1:
        case str1 if str1 in ['hrs_rndon', 'hrs_rndon_emp', 'hrs_rndon_ctr']:
            return 'Director of Nursing'
        case str1 if str1 in ['hrs_rnadmin', 'hrs_rnadmin_emp', 'hrs_rnadmin_ctr']:
            return 'Registered Nurse Admin'
        case str1 if str1 in ['hrs_rn', 'hrs_rn_emp', 'hrs_rn_ctr']:
            return 'Registered Nurse'
        case str1 if str1 in ['hrs_lpnadmin', 'hrs_lpndamin_emp', 'hrs_lpnadmin_ctr']:
            return 'Licensed Practical Nurse Admin'
        case str1 if str1 in ['hrs_lpn', 'hrs_lpn_emp', 'hrs_lpn_ctr']:
            return 'Licensed Practical Nurse'
        case str1 if str1 in ['hrs_cna', 'hrs_cna_emp', 'hrs_cna_ctr']:
            return 'Cert. Nursing Assistant'
        case str1 if str1 in ['hrs_natrn', 'hrs_natrn_emp', 'hrs_natrn_ctr']:
            return 'Nurse Aide in Training'
        case str1 if str1 in ['hrs_medaide', 'hrs_medaide_emp', 'hrs_medaide_ctr']:
            return 'Med Aide/Technician'
        case _:
            return pd.NA 

def get_emp_type(str2):
    if str2.find('_emp') > 0:
        return 'Employee'
    elif str2.find('_ctr') > 0:
        return 'Contractor'
    elif str2.find('census') > 0:
        return 'Census'
    else:
        return 'Total'
    
ratio_melt = pd.melt(pbj_ratios, id_vars = ['provnum'], value_vars = all_pbj_cols[3:], var_name = 'hrs_cat', value_name = 'total_hours')
ratio_melt['emp_title'] = ratio_melt[['hrs_cat']].apply(lambda x: get_emp_title(x['hrs_cat']), axis = 1)
ratio_melt['emp_type'] = ratio_melt[['hrs_cat']].apply(lambda x: get_emp_type(x['hrs_cat']), axis = 1)

In [None]:
def get_emp_title(str1):
    match str1:
        case str1 if str1.endswith('rndon'):
            return 'Director of Nursing'
        case str1 if str1.endswith('rnadmin'):
            return 'Registered Nurse Admin'
        case str1 if str1.endswith('natrn'):
            return 'Nurse Aide in Training'
        case str1 if str1.endswith('rn'):
            return 'Registered Nurse'
        case str1 if str1.endswith('lpnadmin'):
            return 'Licensed Practical Nurse Admin'
        case str1 if str1.endswith('lpn'):
            return 'Licensed Practical Nurse'
        case str1 if str1.endswith('cna'):
            return 'Cert. Nursing Assistant'
        case str1 if str1.endswith('medaide'):
            return 'Med Aide/Technician'
        case _:
            return 'Total' 

hrs_avg_all_typs_melt = pd.melt(hrs_avg_all_typs, id_vars = ['provnum'], value_vars = hrs_avg_all_typs.columns.tolist()[1:], var_name = 'hrs_cat', value_name = 'avg_hours')
hrs_avg_all_typs_melt['emp_title'] = hrs_avg_all_typs_melt[['hrs_cat']].apply(lambda x: get_emp_title(x['hrs_cat']), axis = 1)
hrs_avg_all_typs_melt['emp_type'] = hrs_avg_all_typs_melt[['hrs_cat']].apply(lambda x: get_emp_type(x['hrs_cat']), axis = 1)
box_plt_data = hrs_avg_all_typs_melt.copy()

In [None]:
prov_info = pd.read_csv('NH_ProviderInfo_Oct2024.csv', low_memory = False, encoding = 'latin-1')
prov_info.columns = [x.lower().replace(' ', '_') for x in prov_info.columns.tolist()]
prov_info.rename({'cms_certification_number_(ccn)':'provnum'}, axis = 1, inplace = True)
# prov_info.head()

In [None]:
def get_region(str1):
    match str1:
        case str1 if str1 in ['WA', 'OR', 'ID', 'MT', 'WY', 'CA', 'NV', 'UT', 'CO', 'AZ', 'NM', 'AK', 'HI']:
            return 'West'
        case str1 if str1 in ['ND', 'SD', 'MN', 'NE', 'KS', 'IA', 'MO', 'WI', 'IL', 'MI', 'IN', 'OH']:
            return 'Midwest'
        case str1 if str1 in ['OK', 'TX', 'AR', 'LA', 'MS', 'AL', 'GA', 'FL', 'SC', 'NC', 'TN', 'KY', 'WV', 'VA', 'MD', 'DE', 'DC']:
            return 'South'
        case str1 if str1 in ['PA', 'NJ', 'NY', 'CT', 'RI', 'MA', 'NH', 'VT', 'ME']:
            return 'Northeast'
        case _: 
            return pd.NA 

per_state = prov_info[prov_info['provnum'].isin(no_ctr_locs)].copy()
per_state = per_state[['provnum', 'state']].copy()
state_cnt = per_state.groupby('state').agg(count = ('provnum','nunique'))
state_cnt.insert(0, 'state', state_cnt.index)
state_cnt.reset_index(drop = True, inplace = True)
state_cnt['region'] = state_cnt[['state']].apply(lambda x: get_region(x['state']), axis = 1)
state_cnt.sort_values(['region','state'], inplace = True)
state_cnt.dropna(inplace = True)
# state_cnt.head()

In [None]:
plt1 = px.bar(ctr_hrs_df, y = 'provnum', x = 'label', color = 'label',
                   text = 'provnum')
plt1.update_layout(title = dict(text = 'Number of Facilities',
                                subtitle = dict(text = 'with/without Contractor Hours'), 
                                x = 0.5, xanchor = 'center'),
                   yaxis_title = 'Number of facilities',
                   xaxis_title = '',
                   showlegend = False,
                   width = 400, height = 800)
plt1.update_traces(hovertemplate = '<b>%{x}</b><br>Count: %{y}', textposition = 'outside')
# plt1.show()

In [None]:
plt2 = go.Figure()
plt2.add_trace(
    go.Box(
        x = box_plt_data.query('emp_title == "Director of Nursing"')['avg_hours'],
        name = 'Director of Nursing'
    )
)
plt2.add_trace(
    go.Box(
        x = box_plt_data.query('emp_title == "Registered Nurse Admin"')['avg_hours'],
        name = 'Registered Nurse Admin'
    )
)
plt2.add_trace(
    go.Box(
        x = box_plt_data.query('emp_title == "Registered Nurse"')['avg_hours'],
        name = 'Registered Nurse'
    )
)
plt2.add_trace(
    go.Box(
        x = box_plt_data.query('emp_title == "Licensed Practical Nurse Admin"')['avg_hours'],
        name = 'LPN Admin'
    )
)
plt2.add_trace(
    go.Box(
        x = box_plt_data.query('emp_title == "Licensed Practical Nurse"')['avg_hours'],
        name = 'Licensed Practical Nurse'
    )
)
plt2.add_trace(
    go.Box(
        x = box_plt_data.query('emp_title == "Cert. Nursing Assistant"')['avg_hours'],
        name = 'Cert. Nursing Assistant'
    )
)
plt2.add_trace(
    go.Box(
        x = box_plt_data.query('emp_title == "Nurse Aide in Training"')['avg_hours'],
        name = 'Nurse Aide in Training'
    )
)
plt2.add_trace(
    go.Box(
        x = box_plt_data.query('emp_title == "Med Aide/Technician"')['avg_hours'],
        name = 'Med Aide/Technician'
    )
)
plt2.add_trace(
    go.Box(
        x = box_plt_data.query('emp_title == "Total"')['avg_hours'],
        name = 'All Types'
    )
)
plt2.update_layout(title = dict(text = 'Distribution of Average Monthly Healthcare Professional Hours',
                                subtitle = dict(text = 'by Profession Type Across Providers Not Utilizing Contractors'),
                                x = 0.5, xanchor = 'center'),
                   yaxis_title = '',
                   xaxis_title = 'Average Daily Hours per Resident',
                   showlegend = False, height = 800)
plt2.show()

In [None]:
plt3 = make_subplots(rows = 1, cols = 4,
                        specs = [[{'type':'table'},{'type':'table'},{'type':'table'},{'type':'table'}]],
                        subplot_titles = ['<b>Northeast</b>', '<b>Midwest</b>', '<b>South</b>', '<b>West</b>'],
                        horizontal_spacing = 0.001)

plt3.add_trace(
    go.Table(
        header = dict(values = ['State', 'Count']),
        cells = dict(values=[[f'<b>{x}</b>' for x in state_cnt.query('region == "Northeast"')['state'].tolist()],
                             state_cnt.query('region == "Northeast"')['count'].tolist()]),
        name = 'Northeast'
    ), row = 1, col = 1
)
plt3.add_trace(
    go.Table(
        header = dict(values = ['State', 'Count']),
        cells = dict(values = [[f'<b>{x}</b>' for x in state_cnt.query('region == "Midwest"')['state'].tolist()],
                               state_cnt.query('region == "Midwest"')['count'].tolist()]),
        name = 'Midwest'
    ), row = 1, col = 2
)
plt3.add_trace(
    go.Table(
        header = dict(values = ['State', 'Count']),
        cells = dict(values = [[f'<b>{x}</b>' for x in state_cnt.query('region == "South"')['state'].tolist()],
                               state_cnt.query('region == "South"')['count'].tolist()]),
        name = 'South'
    ), row = 1, col = 3
)
plt3.add_trace(
    go.Table(
        header = dict(values = ['State', 'Count']),
        cells = dict(values = [[f'<b>{x}</b>' for x in state_cnt.query('region == "West"')['state'].tolist()],
                               state_cnt.query('region == "West"')['count'].tolist()]),
        name = 'West'
    ), row = 1, col = 4
)
plt3.update_traces(cells_height = 25)
plt3.update_layout(width = 480, height = 660,
                      margin_b = 20,
                      margin_l = 20,
                      margin_r = 20,
                      margin_t = 80,
                      title = {'text': 'Number of Facilities Without Contractor Hours',
                               'x': 0.5, 'xanchor': 'center'})
plt3.show()

Looking at the large number of facilities that could be targeted, a more detailed analysis of the data could narrow the focus to a subset of facilities. This would reduce the amount of resources needed and allow for Clipboard Health's philosophy of being dedicated to deeply understanding each community, striving to make life easier for facilities, healthcare professionals, and patients alike. Breaking the facilities withoout contractor hours into regions as defined by the CDC, which can be found [here](https://www.cdc.gov/nchs/hus/sources-definitions/geographic-region.htm), each region's potential based on the number of facilities in each state can be observed. Now, knowing which geographic areas could be most beneficial and which states within those regions are most saturated, a closer look at other metics might be helpful.

One metric that stood out while diving into the data is the distribution of hours over the type of healthcare professional. Since the data spans over the last completed quarter (Q2 2024) and over verying sizes of facilities, the data is best ingested by averaging the number of hours per resident per day. From the boxplots below, the majority of hours reported come from Certified Nursing Assistants (CNAs); with a median of just over 2. CNAs, Licensed Practical Nurses (LPNs), and Registered Nurses (RNs), are the top 3 most utilized positons. 

These insights give us high-level locations by region and state, the number of facilities in those locations, and the overall usage of each profession within facilities. Another dimension to this data is the ownership type of the facilities. This data comes from the [Provider Information dataset](https://data.cms.gov/provider-data/dataset/4pq5-n9py). There are 3 top level ownership types: for profit, non profit, and government; each with their own sub types. Taking a look at the break out for all facilities and facilities that do not utilize contractors, there does not seem to be a significant difference between the proportions of ownership types. 

Now that the high level analysis of the data has been completed, there still seems to be a key metric missing that could guide the sales team to locations that would benefit most from Clipboard Health's offerings. In the Provider Information dataset, there are many metrics for each facility that rates their performance. Taking a look into some of these rating may give insights as to which facilities could benefit from the use of supplemental contractor hours.

In [None]:
ratings_cols = [
    'provnum',
    'ownership_type',
    'overall_rating',
    'health_inspection_rating',
    'qm_rating',
    'long-stay_qm_rating',
    'short-stay_qm_rating',
    'staffing_rating',
    'reported_lpn_staffing_hours_per_resident_per_day',
    'reported_rn_staffing_hours_per_resident_per_day',
    'reported_licensed_staffing_hours_per_resident_per_day',
    'reported_total_nurse_staffing_hours_per_resident_per_day',
    'total_number_of_nurse_staff_hours_per_resident_per_day_on_the_weekend',
    'registered_nurse_hours_per_resident_per_day_on_the_weekend'
]
no_ctr_prov_info = prov_info[prov_info['provnum'].isin(no_ctr_locs)].copy()
no_ctr_ratings = no_ctr_prov_info[ratings_cols].copy()

In [None]:
no_ctr_all_rate = no_ctr_ratings.copy()
no_ctr_all_rate.drop(['provnum','ownership_type'], axis = 1, inplace = True)
no_ctr_all_rate.astype('float')
no_ctr_all_rate_corr = no_ctr_all_rate.corr()
no_ctr_all_rate_corr.drop(ratings_cols[8:], axis = 1, inplace = True)
no_ctr_all_rate_corr.drop(ratings_cols[2:8], axis = 0, inplace = True)
no_ctr_all_rate_corr.insert(0, 'na', ['LPN Staffing', 'RN Staffing', 'Licensed Staffing', 'Total Nurse Staffing',
                                      'Weekend Nurse Staffing', 'Weekend RN Staffing'])
no_ctr_all_rate_corr.columns = ['Reported Hours per<br>Resident per Day', 'Overall<br>Rating', 
                                'Health Inspection<br>Rating', 'Quality Measure<br>Rating',
                                'Long-Stay Quality<br>Measure Rating', 'Short-Stay Quality<br>Measure Rating', 'Staff<br>Rating']
no_ctr_all_rate_corr.reset_index(drop = True, inplace = True)
for col in no_ctr_all_rate_corr.columns[1:]:
    no_ctr_all_rate_corr[col] = no_ctr_all_rate_corr[[col]].apply(lambda x: f'{x[col]: 0.2f}', axis = 1)
# no_ctr_all_rate_corr

In [None]:
prov_own_types = prov_info[['provnum', 'ownership_type']].copy()
prov_own_types[['own_out', 'own_in']] = prov_own_types['ownership_type'].str.split(' - ', expand = True)
prov_own_types['own_top_cnt'] = prov_own_types.groupby('ownership_type')['provnum'].transform('nunique')
prov_own_types['own_out_cnt'] = prov_own_types.groupby('own_out')['provnum'].transform('nunique')
prov_own_types['own_in_cnt'] = prov_own_types.groupby(['own_out', 'own_in'])['provnum'].transform('nunique')
prov_own_types.drop('provnum', axis = 1, inplace = True)
prov_own_types.drop_duplicates(inplace = True)
# prov_own_types.head()

In [None]:
no_ctr_own_types = prov_info[prov_info['provnum'].isin(no_ctr_locs)].copy()
no_ctr_own_types = no_ctr_own_types[['provnum', 'ownership_type']].copy()
no_ctr_own_types[['own_out', 'own_in']] = no_ctr_own_types['ownership_type'].str.split(' - ', expand = True)
no_ctr_own_types['own_top_cnt'] = no_ctr_own_types.groupby('ownership_type')['provnum'].transform('nunique')
no_ctr_own_types['own_out_cnt'] = no_ctr_own_types.groupby('own_out')['provnum'].transform('nunique')
no_ctr_own_types['own_in_cnt'] = no_ctr_own_types.groupby(['ownership_type'])['provnum'].transform('nunique')
no_ctr_own_types.drop('provnum', axis = 1, inplace = True)
no_ctr_own_types.drop_duplicates(inplace = True)
# no_ctr_own_types.head()

In [None]:
plt4 = go.Figure()

# Traces
# Full Breakdown
plt4.add_trace(
    go.Pie(labels = prov_own_types.ownership_type.tolist(), values = prov_own_types.own_in_cnt.tolist())
)
# Top Level Breakdown
plt4.add_trace(
    go.Pie(labels = prov_own_types.own_out.tolist(), values = prov_own_types.own_in_cnt.tolist())
)
# For Profit Breakout
plt4.add_trace(
    go.Pie(labels = prov_own_types.query('own_out == "For profit"').own_in.tolist(),
           values = prov_own_types.query('own_out == "For profit"').own_in_cnt.tolist())
)
plt4.add_trace(
    go.Pie(labels = prov_own_types.query('own_out == "Non profit"').own_in.tolist(),
           values = prov_own_types.query('own_out == "Non profit"').own_in_cnt.tolist())
)
plt4.add_trace(
    go.Pie(labels = prov_own_types.query('own_out == "Government"').own_in.tolist(),
           values = prov_own_types.query('own_out == "Government"').own_in_cnt.tolist())
)
plt4.update_layout(
    updatemenus = [
        dict(
            active = -1,
            buttons = list([
                dict(label = 'Full',
                     method = 'update',
                     args = [{'visible': [True, False, False, False, False]},
                             {'title': 'Facilities by Ownership Type'}]),
                dict(label = 'Top Level',
                     method = 'update',
                     args = [{'visible': [False, True, False, False, False]},
                             {'title': 'Facilities by Top Level Ownership Type'}]),
                dict(label = 'For Profit',
                     method = 'update',
                     args = [{'visible': [False, False, True, False, False]},
                             {'title': 'For Profit Facilities by Type'}]),
                dict(label = 'Non Profit',
                     method = 'update',
                     args = [{'visible': [False, False, False, True, False]},
                             {'title': 'Non Profit Facilities by Type'}]),
                dict(label = 'Government',
                     method = 'update',
                     args = [{'visible': [False, False, False, False, True]},
                             {'title': 'Government Facilities by Type'}])
            ])
        )
    ]
)
# Set Title
plt4.update_layout(title_text = 'Facilities by Ownership Type')
plt4.show()

In [None]:
plt5 = go.Figure()

# Traces
# Full Breakdown
plt5.add_trace(
    go.Pie(labels = no_ctr_own_types.ownership_type.tolist(), values = no_ctr_own_types.own_in_cnt.tolist())
)
# Top Level Breakdown
plt5.add_trace(
    go.Pie(labels = no_ctr_own_types.own_out.tolist(), values = no_ctr_own_types.own_in_cnt.tolist())
)
# For Profit Breakout
plt5.add_trace(
    go.Pie(labels = no_ctr_own_types.query('own_out == "For profit"').own_in.tolist(),
           values = no_ctr_own_types.query('own_out == "For profit"').own_in_cnt.tolist())
)
plt5.add_trace(
    go.Pie(labels = no_ctr_own_types.query('own_out == "Non profit"').own_in.tolist(),
           values = no_ctr_own_types.query('own_out == "Non profit"').own_in_cnt.tolist())
)
plt5.add_trace(
    go.Pie(labels = no_ctr_own_types.query('own_out == "Government"').own_in.tolist(),
           values = no_ctr_own_types.query('own_out == "Government"').own_in_cnt.tolist())
)
plt5.update_layout(
    updatemenus = [
        dict(
            active = -1,
            buttons = list([
                dict(label = 'Full',
                     method = 'update',
                     args = [{'visible': [True, False, False, False, False]},
                             {'title': {'text': 'Facilities by Ownership Type',
                                        'subtitle': {'text' : 'without Utilizing Contractors'}}}]),
                dict(label = 'Top Level',
                     method = 'update',
                     args = [{'visible': [False, True, False, False, False]},
                             {'title': {'text': 'Facilities by Top Level Ownership Type',
                                        'subtitle': {'text' : 'without Utilizing Contractors'}}}]),
                dict(label = 'For Profit',
                     method = 'update',
                     args = [{'visible': [False, False, True, False, False]},
                             {'title': {'text': 'For Profit Facilities by Sub Type',
                                        'subtitle': {'text' : 'without Utilizing Contractors'}}}]),
                dict(label = 'Non Profit',
                     method = 'update',
                     args = [{'visible': [False, False, False, True, False]},
                             {'title': {'text': 'Non Profit Facilities by Sub Type',
                                        'subtitle': {'text' : 'without Utilizing Contractors'}}}]),
                dict(label = 'Government',
                     method = 'update',
                     args = [{'visible': [False, False, False, False, True]},
                             {'title': {'text': 'Government Facilities by Sub Type',
                                        'subtitle': {'text' : 'without Utilizing Contractors'}}}])
            ])
        )
    ]
)
# Set Title
plt5.update_layout(title_text = 'Facilities by Ownership Type')
plt5.show()

In [None]:
plt6 = go.Figure(
    go.Table(
        header = dict(values = list(no_ctr_all_rate_corr.columns), align = 'center'),
        cells = dict(values = [no_ctr_all_rate_corr[x] for x in no_ctr_all_rate_corr.columns],
                     align = ['right', 'center', 'center', 'center', 'center', 'center', 'center'])
    )
)
plt6.show()

## Target facilities that: 
- Are currently <em>not</em> utilizing independent healthcare professionals (contractors).
- Have an average nursing staff hours per resident per day that is below the state average.
- Also have an overall rating that is below that of the state average.

After looking into the relationships between the number of reported hours and some of the rating metrics that are in the Provider Information dataset, there were no highly correlated pairings. The best pairings did show some positive, though weak, correlations between the average number of registered nurse or certified nurse assistant hours per resident per day and the overall and staffing ratings. Since these showed the highest corrleation, a common sense insight would be to target those facilities that have low average staffing hours per resident per day and low overall ratings. With regulations varying from state to state, comparing facilities to others within the same state allows for better comparison. By taking the average of the number of staff hours per resident per day and overall ratings for each state and then calculating the difference between each facility's metrics and their state averages, facilities whose differences are the most negative would benefit the most from Clipboard Health's services. 

Below are two maps showing the locations of facilites who's average staff hours per resident per day and overall rating scores fall below that of their state's average for that metric, with the map on the left filtered by average staff hours per resident per day and the map on the right filtered by overall score. In order to focus on facilities that would benefit the most from the addition of staff hours by contractors, only facilites that are below average in both metrics are shown. Each map can be filtered by the top level ownership type, as well as, clicking on each of the legend options will hide or show that option within the map. The size of the bubble is relative to the difference between that facility's metric and the state average. The further away from the average, the larger the size of the bubble. Hovering over each of the bubbles will show a pop-up box with details about the facility, including their name, address, and a few metrics.

In [None]:
prov_sub = prov_info[['provnum', 'state', 'county/parish', 'ownership_type', 'overall_rating', 'staffing_rating']].copy()
prov_avgs = pd.merge(prov_sub, pbj_ratios, how = 'left', on = 'provnum')
prov_avgs_state = prov_avgs.groupby('state').agg(
    avg_overall_rating_st = ('overall_rating', 'mean'),
    avg_staffing_rating_st = ('staffing_rating', 'mean'),
    mdscensus_sum = ('mdscensus_sum', 'sum'),
    hrs_rndon = ('hrs_rndon', 'sum'),
    hrs_rnadmin = ('hrs_rnadmin', 'sum'),
    hrs_rn = ('hrs_rn', 'sum'),
    hrs_lpnadmin = ('hrs_lpnadmin', 'sum'),
    hrs_lpn = ('hrs_lpn', 'sum'),
    hrs_cna = ('hrs_cna', 'sum'),
    hrs_natrn = ('hrs_natrn', 'sum'),
    hrs_medaide = ('hrs_medaide', 'sum'),
    hrs_ttl = ('hrs_ttl', 'sum')
)
prov_avgs_state.insert(0, 'state', prov_avgs_state.index)
prov_avgs_state.drop(['GU', 'PR'], axis = 0, inplace = True)
prov_avgs_state.reset_index(drop = True, inplace = True)
prov_avgs_state['avg_hrs_rn_st'] = prov_avgs_state[['hrs_rn', 'mdscensus_sum']].apply( \
    lambda x: x['hrs_rn'] / x['mdscensus_sum'], axis = 1)
prov_avgs_state['avg_hrs_cna_st'] = prov_avgs_state[['hrs_cna', 'mdscensus_sum']].apply( \
    lambda x: x['hrs_cna'] / x['mdscensus_sum'], axis = 1)
prov_avgs_state['avg_hrs_ttl_st'] = prov_avgs_state[['hrs_ttl', 'mdscensus_sum']].apply( \
    lambda x: x['hrs_ttl'] / x['mdscensus_sum'], axis = 1)
prov_avgs_state.drop(prov_avgs_state.columns.tolist()[3:-3], axis = 1, inplace = True)
st_avg_comp = pd.merge(prov_sub, hrs_avg_all_typs, how = 'left', on = 'provnum')
keep_cols = [st_avg_comp.columns.tolist()[i] for i in [0,1,2,3,4,5,8,11,14]]
st_avg_comp = st_avg_comp[keep_cols].copy()
st_avg_comp = pd.merge(st_avg_comp, prov_avgs_state, how = 'left', on = 'state')
st_avg_comp['oall_rate_diff'] = st_avg_comp['overall_rating'] - st_avg_comp['avg_overall_rating_st']
st_avg_comp['staff_rate_diff'] = st_avg_comp['staffing_rating'] - st_avg_comp['avg_staffing_rating_st']
st_avg_comp['avg_hrs_rn_diff'] = st_avg_comp['hrs_avg_day_cens_rn'] - st_avg_comp['avg_hrs_rn_st']
st_avg_comp['avg_hrs_cna_diff'] = st_avg_comp['hrs_avg_day_cens_cna'] - st_avg_comp['avg_hrs_cna_st']
st_avg_comp['avg_hrs_ttl_diff'] = st_avg_comp['hrs_avg_day_cens_ttl'] - st_avg_comp['avg_hrs_ttl_st']
st_avg_comp = st_avg_comp[st_avg_comp.columns.tolist()[:6] + st_avg_comp.columns.tolist()[-11:-8] + st_avg_comp.columns.tolist()[-5:]].copy()
st_avg_comp_no_ctr = st_avg_comp[st_avg_comp['provnum'].isin(no_ctr_locs)].copy()
target_audience = st_avg_comp_no_ctr[(st_avg_comp_no_ctr['oall_rate_diff'] < 0) &
                                     (st_avg_comp_no_ctr['avg_hrs_ttl_diff'] < 0)].copy()
prov_lat_lon = prov_info[['provnum', 'provider_name', 'provider_address', 'city/town', 'zip_code', 'latitude', 'longitude']].copy()
target_audience = pd.merge(target_audience, prov_lat_lon, how = 'left', on = 'provnum')
for col in ['provider_name', 'provider_address', 'city/town']:
    target_audience[col] = target_audience[[col]].apply(lambda x: x[col].title(), axis = 1)
target_audience['text'] = target_audience[['provider_name', 'provider_address', 'city/town', 'state', 'zip_code',
                                           'ownership_type', 'overall_rating', 'avg_overall_rating_st', 
                                           'staffing_rating', 'avg_staffing_rating_st', 'hrs_avg_day_cens_ttl']].apply( \
    lambda x: f'<b>{x.provider_name}</b><br>{x.provider_address}<br>{x['city/town']}, {x.state}  {x.zip_code}<br>' +
              f'Ownership Type: {x.ownership_type}<br>' +
              f'Overall Rating: {x.overall_rating} | State Avg: {x.avg_overall_rating_st: 0.2f}<br>' +
              f'Staffing Rating: {x.staffing_rating} | State Ave: {x.avg_staffing_rating_st: 0.2f}<br>' +
              f'Avg Hours/Resident/Day: {x.hrs_avg_day_cens_ttl: 0.2f}', axis = 1)
target_audience.insert(4, 'own_top', pd.NA) 
target_audience['own_top'] = target_audience[['ownership_type']].apply(lambda x: x['ownership_type'].split(' - ')[0], axis = 1)

In [None]:
colors = ['crimson', 'lightseagreen', 'royalblue']
plt7 = go.Figure() 
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience['longitude'],
        lat = target_audience['latitude'],
        text = target_audience['text'],
        marker = dict(size = abs(target_audience['avg_hrs_ttl_diff']) * 10),
        name = 'All'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "For profit - Corporation"')['longitude'],
        lat = target_audience.query('ownership_type == "For profit - Corporation"')['latitude'],
        text = target_audience.query('ownership_type == "For profit - Corporation"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "For profit - Corporation"')['avg_hrs_ttl_diff']) * 10),
        name = 'For profit - Corporation'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "For profit - Individual"')['longitude'],
        lat = target_audience.query('ownership_type == "For profit - Individual"')['latitude'],
        text = target_audience.query('ownership_type == "For profit - Individual"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "For profit - Individual"')['avg_hrs_ttl_diff']) * 10),
        name = 'For profit - Individual'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "For profit - Limited Liability company"')['longitude'],
        lat = target_audience.query('ownership_type == "For profit - Limited Liability company"')['latitude'],
        text = target_audience.query('ownership_type == "For profit - Limited Liability company"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "For profit - Limited Liability company"')['avg_hrs_ttl_diff']) * 10),
        name = 'For profit - Limited Liability company'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "For profit - Partnership"')['longitude'],
        lat = target_audience.query('ownership_type == "For profit - Partnership"')['latitude'],
        text = target_audience.query('ownership_type == "For profit - Partnership"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "For profit - Partnership"')['avg_hrs_ttl_diff']) * 10),
        name = 'For profit - Partnership'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - City"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - City"')['latitude'],
        text = target_audience.query('ownership_type == "Government - City"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - City"')['avg_hrs_ttl_diff']) * 10),
        name = 'Government - City'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - City/county"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - City/county"')['latitude'],
        text = target_audience.query('ownership_type == "Government - City/county"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - City/county"')['avg_hrs_ttl_diff']) * 10),
        name = 'Government - City/county'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - County"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - County"')['latitude'],
        text = target_audience.query('ownership_type == "Government - County"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - County"')['avg_hrs_ttl_diff']) * 10),
        name = 'Government - County'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - Federal"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - Federal"')['latitude'],
        text = target_audience.query('ownership_type == "Government - Federal"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - Federal"')['avg_hrs_ttl_diff']) * 10),
        name = 'Government - Federal'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - Hospital district"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - Hospital district"')['latitude'],
        text = target_audience.query('ownership_type == "Government - Hospital district"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - Hospital district"')['avg_hrs_ttl_diff']) * 10),
        name = 'Government - Hospital district'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - State"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - State"')['latitude'],
        text = target_audience.query('ownership_type == "Government - State"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - State"')['avg_hrs_ttl_diff']) * 10),
        name = 'Government - State'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Non profit - Church related"')['longitude'],
        lat = target_audience.query('ownership_type == "Non profit - Church related"')['latitude'],
        text = target_audience.query('ownership_type == "Non profit - Church related"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Non profit - Church related"')['avg_hrs_ttl_diff']) * 10),
        name = 'Non profit - Church related'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Non profit - Corporation"')['longitude'],
        lat = target_audience.query('ownership_type == "Non profit - Corporation"')['latitude'],
        text = target_audience.query('ownership_type == "Non profit - Corporation"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Non profit - Corporation"')['avg_hrs_ttl_diff']) * 10),
        name = 'Non profit - Corporation'
    )
)
plt7.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Non profit - Other"')['longitude'],
        lat = target_audience.query('ownership_type == "Non profit - Other"')['latitude'],
        text = target_audience.query('ownership_type == "Non profit - Other"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Non profit - Other"')['avg_hrs_ttl_diff']) * 10),
        name = 'Non profit - Other'
    )
)
plt7.update_layout(title = 'Target Facilities by Average Hours per Resident per Day Variance',
                   geo = dict(scope = 'usa'))

plt7.update_layout(
    updatemenus = [
        dict(
            active = 0,
            buttons = list([
                dict(label = 'All',
                     method = 'update',
                     args = [{'visible': [True, False, False, False, False, False, False,
                                          False, False, False, False, False, False, False]},
                             {'title': {'text': 'Facilities with Below Average Staff Hours per Resident',
                                        'subtitle':{'text':'(bubble size indicates variance from state average)'}}}]),
                dict(label = 'For Profit',
                     method = 'update',
                     args = [{'visible': [False, True, True, True, True, False, False,
                                          False, False, False, False, False, False, False]},
                             {'title': {'text': 'For Profit Facilities with Below Average Staff Hours per Resident',
                                        'subtitle':{'text':'(bubble size indicates variance from state average)'}}}]),
                dict(label = 'Government',
                     method = 'update',
                     args = [{'visible': [False, False, False, False, False, True, True,
                                          True, True, True, True, False, False, False]},
                             {'title': {'text': 'Government Facilities with Below Average Staff Hours per Resident',
                                        'subtitle':{'text':'(bubble size indicates variance from state average)'}}}]),
                dict(label = 'Non Profit',
                     method = 'update',
                     args = [{'visible': [False, False, False, False, False, False, False,
                                          False, False, False, False, True, True, True]},
                             {'title': {'text': 'Non Profit Facilities with Below Average Staff Hours per Resident',
                                        'subtitle':{'text':'(bubble size indicates variance from state average)'}}}]),
            ])
        )
    ]
)

plt7.show()

In [None]:
plt8 = go.Figure() 
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience['longitude'],
        lat = target_audience['latitude'],
        text = target_audience['text'],
        marker = dict(size = abs(target_audience['oall_rate_diff']) * 10),
        name = 'All'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "For profit - Corporation"')['longitude'],
        lat = target_audience.query('ownership_type == "For profit - Corporation"')['latitude'],
        text = target_audience.query('ownership_type == "For profit - Corporation"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "For profit - Corporation"')['oall_rate_diff']) * 10),
        name = 'For profit - Corporation'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "For profit - Individual"')['longitude'],
        lat = target_audience.query('ownership_type == "For profit - Individual"')['latitude'],
        text = target_audience.query('ownership_type == "For profit - Individual"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "For profit - Individual"')['oall_rate_diff']) * 10),
        name = 'For profit - Individual'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "For profit - Limited Liability company"')['longitude'],
        lat = target_audience.query('ownership_type == "For profit - Limited Liability company"')['latitude'],
        text = target_audience.query('ownership_type == "For profit - Limited Liability company"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "For profit - Limited Liability company"')['oall_rate_diff']) * 10),
        name = 'For profit - Limited Liability company'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "For profit - Partnership"')['longitude'],
        lat = target_audience.query('ownership_type == "For profit - Partnership"')['latitude'],
        text = target_audience.query('ownership_type == "For profit - Partnership"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "For profit - Partnership"')['oall_rate_diff']) * 10),
        name = 'For profit - Partnership'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - City"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - City"')['latitude'],
        text = target_audience.query('ownership_type == "Government - City"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - City"')['oall_rate_diff']) * 10),
        name = 'Government - City'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - City/county"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - City/county"')['latitude'],
        text = target_audience.query('ownership_type == "Government - City/county"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - City/county"')['oall_rate_diff']) * 10),
        name = 'Government - City/county'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - County"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - County"')['latitude'],
        text = target_audience.query('ownership_type == "Government - County"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - County"')['oall_rate_diff']) * 10),
        name = 'Government - County'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - Federal"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - Federal"')['latitude'],
        text = target_audience.query('ownership_type == "Government - Federal"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - Federal"')['oall_rate_diff']) * 10),
        name = 'Government - Federal'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - Hospital district"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - Hospital district"')['latitude'],
        text = target_audience.query('ownership_type == "Government - Hospital district"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - Hospital district"')['oall_rate_diff']) * 10),
        name = 'Government - Hospital district'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Government - State"')['longitude'],
        lat = target_audience.query('ownership_type == "Government - State"')['latitude'],
        text = target_audience.query('ownership_type == "Government - State"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Government - State"')['oall_rate_diff']) * 10),
        name = 'Government - State'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Non profit - Church related"')['longitude'],
        lat = target_audience.query('ownership_type == "Non profit - Church related"')['latitude'],
        text = target_audience.query('ownership_type == "Non profit - Church related"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Non profit - Church related"')['oall_rate_diff']) * 10),
        name = 'Non profit - Church related'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Non profit - Corporation"')['longitude'],
        lat = target_audience.query('ownership_type == "Non profit - Corporation"')['latitude'],
        text = target_audience.query('ownership_type == "Non profit - Corporation"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Non profit - Corporation"')['oall_rate_diff']) * 10),
        name = 'Non profit - Corporation'
    )
)
plt8.add_trace(
    go.Scattergeo(
        locationmode = 'USA-states',
        lon = target_audience.query('ownership_type == "Non profit - Other"')['longitude'],
        lat = target_audience.query('ownership_type == "Non profit - Other"')['latitude'],
        text = target_audience.query('ownership_type == "Non profit - Other"')['text'],
        marker = dict(size = abs(target_audience.query('ownership_type == "Non profit - Other"')['oall_rate_diff']) * 10),
        name = 'Non profit - Other'
    )
)
plt8.update_layout(title = 'Target Facilities by Overall Rating Variance',
                   geo = dict(scope = 'usa'))

plt8.update_layout(
    updatemenus = [
        dict(
            active = 0,
            buttons = list([
                dict(label = 'All',
                     method = 'update',
                     args = [{'visible': [True, False, False, False, False, False, False,
                                          False, False, False, False, False, False, False]},
                             {'title': {'text': 'Facilities with Below Average Overall Rating',
                                        'subtitle':{'text':'(bubble size indicates variance from state average)'}}}]),
                dict(label = 'For Profit',
                     method = 'update',
                     args = [{'visible': [False, True, True, True, True, False, False,
                                          False, False, False, False, False, False, False]},
                             {'title': {'text': 'For Profit Facilities with Below Average Overall Rating',
                                        'subtitle':{'text':'(bubble size indicates variance from state average)'}}}]),
                dict(label = 'Government',
                     method = 'update',
                     args = [{'visible': [False, False, False, False, False, True, True,
                                          True, True, True, True, False, False, False]},
                             {'title': {'text': 'Government Facilities with Below Average Overall Rating',
                                        'subtitle':{'text':'(bubble size indicates variance from state average)'}}}]),
                dict(label = 'Non Profit',
                     method = 'update',
                     args = [{'visible': [False, False, False, False, False, False, False,
                                          False, False, False, False, True, True, True]},
                             {'title': {'text': 'Non Profit Facilities with Below Average Overall Rating',
                                        'subtitle':{'text':'(bubble size indicates variance from state average)'}}}]),
            ])
        )
    ]
)

plt8.show()

## Facilities with Available Beds 

The ability to increase staff through Clipboard Health's platform, enables facilites to fill available beds. By taking the number of available beds, mutiplying that by the state average number of hours per resident per day, and dividing by a typical 8 hour shift, the number of additional staff is computed. The sales team could focus on facilities that would need to increase staff by any given number of additional staff required to meet the state's average. Focusing on facilities that would require many additional staff, or those that would need shorter or infrequent additional staff would not benefit from filling available beds. Also, only those facilities that have an above average overall rating and who's average staff hours per resident per day is above their state average should be targeted. Since any facility not at or above these averages, would not likely benefit from opening more beds, but rather by supplementing their staffing needs to improve in these metrics.

Assumptions made:
- Same amount of staff is needed for each day of the week
- One shift = 8 hours
- Only CNA and RN shifts will be considered

Another calculation that could be done, if sales deems it to be necessary, is to find the correct mix of LPN, RN, and CNA shifts to better support filling any available beds. 

In [None]:
open_beds = prov_info[['provnum', 'state', 'overall_rating', 'number_of_certified_beds', 'average_number_of_residents_per_day']].copy()
open_beds['open_beds'] = open_beds[['number_of_certified_beds', 'average_number_of_residents_per_day']].apply( \
    lambda x: x['number_of_certified_beds'] - np.ceil(x['average_number_of_residents_per_day']), axis = 1)
open_beds.drop('average_number_of_residents_per_day', axis = 1, inplace = True)
open_beds = open_beds[open_beds['open_beds'] > 0].copy()
open_bed_hrs = pd.merge(open_beds, prov_avgs_state, how = 'left', on = 'state')
open_bed_hrs.drop('avg_staffing_rating_st', axis = 1, inplace = True)
open_bed_hrs['rn_hrs_need'] = open_bed_hrs['open_beds'] * open_bed_hrs['avg_hrs_rn_st']
open_bed_hrs['cna_hrs_need'] = open_bed_hrs['open_beds'] * open_bed_hrs['avg_hrs_cna_st']
open_bed_hrs['ttl_hrs_need'] = open_bed_hrs['open_beds'] * open_bed_hrs['avg_hrs_ttl_st']
open_bed_hrs['rn_shifts_need'] = open_bed_hrs['rn_hrs_need'] / 8
open_bed_hrs['cna_shifts_need'] = open_bed_hrs['cna_hrs_need'] / 8
open_bed_hrs['ttl_shifts_need'] = open_bed_hrs['ttl_hrs_need'] / 8
open_bed_hrs = pd.merge(open_bed_hrs, hrs_avg_all_typs[['provnum', 'hrs_avg_day_cens_ttl']], how = 'left', on = 'provnum')
open_bed_hrs = open_bed_hrs[(open_bed_hrs['overall_rating'] > open_bed_hrs['avg_overall_rating_st']) &
                            (open_bed_hrs['hrs_avg_day_cens_ttl'] > open_bed_hrs['avg_hrs_ttl_st']) & 
                            (open_bed_hrs['ttl_shifts_need'] <= 35)].copy()
bed_lat_lon = prov_info[['provnum', 'provider_name', 'provider_address', 'city/town', 'zip_code', 'latitude', 'longitude']].copy()
for col in ['provider_name', 'provider_address', 'city/town']:
    bed_lat_lon[col] = bed_lat_lon[[col]].apply(lambda x: x[col].title(), axis = 1)
open_bed_hrs = pd.merge(open_bed_hrs, bed_lat_lon, how = 'left', on = 'provnum')
open_bed_hrs['text'] = open_bed_hrs[['provider_name', 'provider_address', 'city/town', 'state', 'zip_code',
                                     'open_beds', 'overall_rating', 'avg_overall_rating_st', 
                                     'avg_hrs_ttl_st', 'rn_shifts_need', 'cna_shifts_need', 'ttl_shifts_need']].apply( \
    lambda x: f'<b>{x.provider_name}</b><br>{x.provider_address}<br>{x['city/town']}, {x.state}  {x.zip_code}<br>' +
              f'<em>Number of Open Beds: {int(x.open_beds)}</em><br>' +
              f'Overall Rating: {x.overall_rating} | State Avg: {x.avg_overall_rating_st: 0.2f}<br>' +
              f'State Avg Hours/Resident/Day: {x.avg_hrs_ttl_st: 0.2f}<br>' +
              f'RN Shifts Needed: {x.rn_shifts_need: .2F}<br>' +
              f'CNA Shifts Needed: {x.cna_shifts_need: .2f}<br>' +
              f'Total Shifts Needed: {x.ttl_shifts_need: .2f}', axis = 1)
ttl_grps = [(1,2),(2,3),(3,4),(4,5),(5,6)]
rn_grps = [(1,2),(2,3),(3,4),(4,5),(5,6)]
cna_grps = [(1,2),(2,3),(3,4),(4,5),(5,6)]

In [None]:
plt9 = go.Figure()

for i in range(len(ttl_grps)):
    grp = ttl_grps[i]
    df_sub = open_bed_hrs[(open_bed_hrs['ttl_shifts_need'] >= grp[0]) &
                          (open_bed_hrs['ttl_shifts_need'] < grp[1])]
    plt9.add_trace(
        go.Scattergeo(
            locationmode = 'USA-states',
            lon = df_sub['longitude'],
            lat = df_sub['latitude'],
            text = df_sub['text'],
            marker = dict(size = (1/df_sub['ttl_shifts_need'])*100,
                          sizemode = 'area'),
            name = f'{grp[0]} - {grp[1]}'
        )
    )
plt9.update_layout(
    title = dict(text = 'Facilities with Open Beds by <b>Total</b> Number of<br>Additional Shifts Needed to Fill All Open Beds',
                 subtitle = dict(text = '~Click on legend items to toggle traces~')),
                 showlegend = True,
                 geo = dict(scope = 'usa')
)
plt9.show()

In [None]:
plt10 = go.Figure()

for i in range(len(rn_grps)):
    grp = rn_grps[i]
    df_sub = open_bed_hrs[(open_bed_hrs['rn_shifts_need'] >= grp[0]) &
                          (open_bed_hrs['rn_shifts_need'] < grp[1])]
    plt10.add_trace(
        go.Scattergeo(
            locationmode = 'USA-states',
            lon = df_sub['longitude'],
            lat = df_sub['latitude'],
            text = df_sub['text'],
            marker = dict(size = (1/df_sub['rn_shifts_need'])*80,
                          sizemode = 'area'),
            name = f'{grp[0]} - {grp[1]}'
        )
    )
plt10.update_layout(
    title = dict(text = 'Facilities with Open Beds by Total Number of<br>Additional <b>RN</b> Shifts Needed to Fill All Open Beds',
                 subtitle = dict(text = '~Click on legend items to toggle traces~')),
                 showlegend = True,
                 geo = dict(scope = 'usa')
)
plt10.show()

In [None]:
plt11 = go.Figure()

for i in range(len(cna_grps)):
    grp = cna_grps[i]
    df_sub = open_bed_hrs[(open_bed_hrs['cna_shifts_need'] >= grp[0]) &
                          (open_bed_hrs['cna_shifts_need'] < grp[1])]
    plt11.add_trace(
        go.Scattergeo(
            locationmode = 'USA-states',
            lon = df_sub['longitude'],
            lat = df_sub['latitude'],
            text = df_sub['text'],
            marker = dict(size = (1/df_sub['cna_shifts_need'])*80,
                          sizemode = 'area'),
            name = f'{grp[0]} - {grp[1]}'
        )
    )
plt11.update_layout(
    title = dict(text = 'Facilities with Open Beds by Total Number of<br>Additional <b>CNA</b> Shifts Needed to Fill All Open Beds',
                 subtitle = dict(text = '~Click on legend items to toggle traces~')),
                 showlegend = True,
                 geo = dict(scope = 'usa')
)
plt11.show()

## Facilities with Average Care Hours per Resident per Day Below CMS Requirements

The Centers for Medicare & Medicaid Services recently changed their guidelines on the number of hours of direct care per resident per day (HPRD) in a long-care facility that recieves Medicare or Medicaid funds. These new guidelines mandate that each resident recieve a minimum of 3.48 HPRD with 0.55 hours coming from a registered nurse, 2.45 hours coming from a certified nursing aide, and the remaining hours coming from a combination of any nursing staff, including LPNs. Another part of the mandate is that a registered nurse must be on-site 24 hours a day, 7 days a week. CMS released this final rule April 22nd, 2024, with a phased implementation. Non-rural area facilities have two years to implement the 24/7 on-site RN and 3.48 total HPRD rules, and three years to implement the 0.55 RN/2.45 CNA HPRD rules. Rural area facilities must implement these changes within three years and five years respectively. ("CMS releases final rule," 2024). 

With these new rules in mind, the data contained in the Q2 2024 CMS Payroll Based Journal Nurse Staffing report can highlight those facilities that, currently, would be deficient. Clipboard Health can then use this information to reach out to those facilities and offer their services to ensure that the facility has the ability to be in compliance with the new rules. Since there are four separate rules that factor into whether a facility is in compliance or not, much care needs to be taken while determining if the facility is, in fact, deficient. Direct interpretation of the new rule makes a facility deficient if any of the four rules are not met. 

When determining if the facility meets the 3.48 HPRD rule, the average HPRD of the total care hours must be at or above 3.48, the average RN hours must be above 0.55, and the average CNA hours must be above 2.45. Furthermore, the facility must have at least 24 hours per day of RN and/or RN Director of Nursing hours to meet the 24/7 rule. There are exemptions that can be applied to these rules, but the data needed to determine if a facility is exempt from some or all of these rules is not found within the datasets on the CMS website. Below is a map of all facilities that, if the new rules were in effect today, would be deficient. They are color coded based on the number of deficiencies.

In [None]:
def get_rn_hprd(lst):
    if lst[2]/lst[0] >= 0.55:
        return lst[2]/lst[0]
    if lst[1]/lst[0] >= 0.55:
        return lst[1]/lst[0]
    else:
        return (lst[1] +lst[2]) / lst[0]
    
def get_rn_hpd(lst):
    if lst[1]/num_days >= 24:
        return lst[1]/num_days
    if lst[0]/num_days >= 24:
        return lst[0]/num_days
    else:
        return (lst[0] +lst[1]) / num_days
    
def num_def(lst):
    tmp = 0
    if lst[0] < 0.55:
        tmp += 1
    if lst[1] < 2.45:
        tmp += 1
    if lst[2] < 3.48:
        tmp += 1
    if lst[3] < 24:
        tmp += 1
    return tmp
    
def def_list(lst):
    tmp = []
    if lst[0] < 0.55:
        tmp.append('RN HPRD')
    if lst[1] < 2.45:
        tmp.append('CNA HPRD')
    if lst[2] < 3.48:
        tmp.append('Total HPRD')
    if lst[3] < 24:
        tmp.append('24/7 RN On-Site')
    return ','.join(tmp)

cms_reqs = pbj_ratios[['provnum', 'mdscensus_sum', 'hrs_rndon', 'hrs_rn', 'hrs_lpn', 'hrs_cna']].copy()
cms_reqs['rn_hprd'] = cms_reqs[['mdscensus_sum', 'hrs_rndon', 'hrs_rn']].apply( \
    lambda x: get_rn_hprd([x['mdscensus_sum'], x['hrs_rndon'], x['hrs_rn']]), axis = 1)
cms_reqs['cna_hprd'] = cms_reqs[['mdscensus_sum', 'hrs_cna']].apply( \
    lambda x: x['hrs_cna'] / x['mdscensus_sum'], axis = 1)
cms_reqs['ttl_hprd'] = cms_reqs[['mdscensus_sum', 'hrs_rndon', 'hrs_rn', 'hrs_lpn', 'hrs_cna']].apply( \
    lambda x: (x.hrs_rndon + x.hrs_rn + x.hrs_lpn + x.hrs_cna) / x.mdscensus_sum, axis = 1)
cms_reqs['rn_24_7'] = cms_reqs[['hrs_rndon', 'hrs_rn']].apply( \
    lambda x: get_rn_hpd([x['hrs_rndon'], x['hrs_rn']]), axis = 1)
cms_reqs.drop(cms_reqs.columns.tolist()[1:6], axis = 1, inplace = True)
cms_reqs = cms_reqs[(cms_reqs['rn_hprd'] < 0.55) |
                    (cms_reqs['rn_hprd'] < 2.45) |
                    (cms_reqs['ttl_hprd'] < 3.48) |
                    (cms_reqs['rn_24_7'] < 24)].copy()
cms_reqs['num_defs'] = cms_reqs[['rn_hprd', 'cna_hprd', 'ttl_hprd', 'rn_24_7']].apply( \
    lambda x: num_def([x['rn_hprd'], x['rn_hprd'], x['ttl_hprd'], x['rn_24_7']]), axis = 1)
cms_reqs['def_list'] = cms_reqs[['rn_hprd', 'cna_hprd', 'ttl_hprd', 'rn_24_7']].apply( \
    lambda x: def_list([x['rn_hprd'], x['rn_hprd'], x['ttl_hprd'], x['rn_24_7']]), axis = 1)
cms_lat_lon = prov_info[['provnum', 'provider_name', 'provider_address', 'city/town', 'state', 'zip_code', 'latitude', 'longitude']].copy()
for col in ['provider_name', 'provider_address', 'city/town']:
    cms_lat_lon[col] = cms_lat_lon[[col]].apply(lambda x: x[col].title(), axis = 1)
cms_reqs = pd.merge(cms_reqs, cms_lat_lon, how = 'left', on = 'provnum')
cms_reqs['text'] = cms_reqs[cms_reqs.columns.tolist()[1:]].apply( \
    lambda x: f'<b>{x.provider_name}</b><br>{x.provider_address}<br>{x['city/town']}, {x.state}  {x.zip_code}<br>' +
              f'<em>Number of Deficiencies: {int(x.num_defs)}</em><br>' +
              f'{'Metric':<18}{'Facility':>10}/{'Requirement':>12}' +
              f'<em>{'RN HPRD:':<18}{x.rn_hprd:>10.2f}/{0.55:>12.2f}</em><br>' if x.rn_hprd < 0.55 else f'{'RN HPRD:':<18}{x.rn_hprd:>10.2f}/{0.55:>12.2f}<br>' +
              f'<em>{'CNA HPRD:':<18}{x.cna_hprd:>10.2f}/{2.45:>12.2f}</em><br>' if x.cna_hprd < 2.45 else f'{'CNA HPRD:':<18}{x.cna_hprd:>10.2f}/{2.45:>12.2f}<br>' +
              f'<em>{'Total HPRD:':<18}{x.ttl_hprd:>10.2f}/{3.48:>12.2f}</em><br>' if x.ttl_hprd < 3.48 else f'{'Total HPRD:':<18}{x.ttl_hprd:>10.2f}/{3.48:>12.2f}<br>' +
              f'<em>{'24/7 RN On-Site:':<18}{x.rn_24_7:>10.2f}/{24:>12d}</em>' if x.rn_24_7 < 24 else f'{'24/7 RN On-Site:':<18}{x.rn_24_7:>10.2f}/{24:>12d}', axis = 1)
# cms_reqs.head()

In [None]:
colors = ['royalblue','crimson','lightseagreen','orange']

plt12 = go.Figure()

for i in range(4):
    df = cms_reqs[cms_reqs['num_defs'] == i + 1]
    plt12.add_trace(
        go.Scattergeo(
            locationmode = 'USA-states',
            lon = df['longitude'],
            lat = df['latitude'],
            text = df['text'],
            marker = dict(size = (df['num_defs'])*25,
                          color = colors[i],
                          sizemode = 'area'),
            name = f'<b>{i + 1}</b> Deficiencies' if i > 0 else '<b>1</b> Deficiency'
        )
    )
plt12.update_layout(
    title = dict(text = 'Facilities with New CMS Rule Deficiencies by Number of Deficiencies',
                 subtitle = dict(text = '(bubble size based on number of deficiencies)<br>~Click on legend items to toggle traces~')),
                 showlegend = True,
                 geo = dict(scope = 'usa')
)
plt12.show()

## Conclusions

Even though there were no clear correlations to be foudn within in the data, some commonsense deductions could be used to focus in on facilities that could benefit from Clipboard Health's services. Facilities that are below the state averages for both staff hours per resident per day and their overall rating may be in need of additional hours to help increase these metrics. In other facilities that are already exceeding the state averages in these two areas and have available beds, the addition of new staff may be needed in order to fill those open beds. And with Clipboard Health's policy of not charging a facility for hiring a contracted employee, both the facility and the contractor could have a trial period without all the stress of taking on a new employee/employer directly. Lastly, with CMS instituting new rules that will be phased in over the next two to five years, filling gaps in staffing with contracted employees from Clipboard Health's marketplace could bring a facility into compliance quickly and easily. The suggestions contained in this document are still high level suggestions and could be refined with a larger deep-dive into specific areas and their demographics. 

## Citations

CMS releases final rule requiring minimum staffing standards for nursing homes. (2024, April 30). Retrieved from https://www.wsha.org/articles/cms-releases-final-rule-requiring-minimum-staffing-requirements-for-nursing-homes/

Clipboard Health Case Study © 2024 by Christopher Mims is licensed under CC BY-NC 4.0 