# Prepare DataFrames for Allocation Details

For experiments involving approximate optimality and finding optimal allocations, we need to compute a variety of things for each possible allocation. The purpose of this notebook is to pre-process and prepare the datasets which are used in this process. We will write our functions to be as general as possible, so as to create flexibility.

## Imports and Setup

### Import Packages

In [1]:
import sys
import os
from scipy.io import loadmat
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import seaborn as sns
%matplotlib inline

In [2]:
from matplotlib import colorbar
from matplotlib.colors import Normalize

In [3]:
plt.rcParams['figure.figsize'] = [20, 12]
sns.set_style('whitegrid')
plt.rcParams['font.size'] = 28.0
plt.rcParams['xtick.labelsize'] = 28.0
plt.rcParams['ytick.labelsize'] = 28.0

### Import Data

We import two families of datasets. 

1. Population data, supplied from the Census Planning Database. We preprocessed this in `dataset-characteristics.ipynb`.

2. Healthcare facility data, supplied from California Health and Human Services.

In [4]:
DATA_PATH = 'all-data/'

race_split_df_all = pd.read_csv(DATA_PATH + 'census-datasets/combined-al-cc/ethnicity_split_all.csv')
health_ins_split_df_all = pd.read_csv(DATA_PATH + 'census-datasets/combined-al-cc/health_ins_split_all.csv')
income_split_df_all = pd.read_csv(DATA_PATH + 'census-datasets/combined-al-cc/income_split_all.csv')

In [5]:
total_population_by_tract =\
        np.copy(health_ins_split_df_all[health_ins_split_df_all['Variable'] == 'No_Health_Ins_ACS_10_14']['Value']) +\
        np.copy(health_ins_split_df_all[health_ins_split_df_all['Variable'] == 'One_Plus_Health_Ins']['Value'])

In [6]:
# Load Facilities data.
al_fac_df = pd.read_csv(DATA_PATH + 'hospital-data/alameda-emergency-facilites.csv').iloc[:18, :].copy()
cc_fac_df = pd.read_csv(DATA_PATH + 'hospital-data/cc-healthcare-dataset.csv')
al_facilites_long_lats = np.array(al_fac_df[['LONGITUDE', 'LATITUDE']])
cc_facilites_long_lats = np.array(cc_fac_df[['LONGITUDE', 'LATITUDE']])
facilites_long_lats = np.vstack((al_facilites_long_lats, cc_facilites_long_lats))

In [10]:
# Load tract-facility distance matrix, computed via the Google Maps API. 
# Distances refer to the driving distance between pairs of locations. 
gmaps_distance_matrix = pd.read_csv(DATA_PATH + 'all_tract_facility_travel_distance_matrix.csv')

# Remove unnecessary first column.
gmaps_distance_matrix = gmaps_distance_matrix.iloc[:, 1:].copy()

# Convert to numpy array, and divide by 1000 to convert meters to kilometers. 
tract_facility_distance_matrix = np.array(gmaps_distance_matrix)[:, :] / 1000.0

In [12]:
tract_facility_distance_matrix.shape

(567, 25)

In [13]:
num_beds_arr = list(al_fac_df['TOTAL_NUMBER_BEDS']) + list(cc_fac_df['TOTAL_NUMBER_BEDS'])

In [14]:
sum(total_population_by_tract)

2640540.0

In [37]:
NUM_FACILITIES = len(num_beds_arr)

In [38]:
cur_facs_with_ab = [x for x in range(NUM_FACILITIES) if x < 13 or x > 16]

In [39]:
cur_facs_no_ab = [x for x in cur_facs_with_ab if x != 1]

## Setup functions for computing group effects

Once a particular set of facilities is opened, each census tract is assigned to the nearest open facility. Thus, computing the distance that members in each census tract have to travel is simple. From this, we can also easily compute total population travel distance. 

The next step is to compute the group effects for every method of grouping. Since we have three methods of grouping (race, health insurance, and income), we should write a method which computes the effect on each group under each grouping scheme. 

This gives a total of 10 effects to compute - 6 for race, 2 for health insurance, and 2 for income grouping. 

But we also have to account for the 2 ways of aggregating - averaging or not. This gives a total of 20 ways of computing group effects. 

In [40]:
def compute_group_effects(grouping_split_df, assigned_fac_distances): 
    '''
    Returns a dictionary with the group effects (both averaging and summing) from a 
    particular choice of facilities to open. 
    
    grouping_split_df: The Pandas Dataframe containing information about location and populations 
    of various groups. Example: Health insurance dataframe. 
    
    assigned_fac_distances: 1D Array indexed by census tract number. The i^th value is a float
    which is the distance (km) of the i^th tract to its assigned facility. 
    '''
    num_tracts = len(assigned_fac_distances)
    col_names = grouping_split_df.columns.values
    assert len(grouping_split_df) % num_tracts == 0, 'Number of tracts does not divide number of splits!'
    
    num_groups = len(grouping_split_df) / num_tracts
    
    population_count_col_name = col_names[-2]
    group_name_col_name = 'Variable'
    if 'Variable' not in col_names: 
        if 'variable' not in col_names: 
            assert False, 'No column to indicate name of group in grouping split df'
        group_name_col_name = 'variable'    
    
    out_dict = {}
    
    # rows of the grouping_split_df are grouped by lat/long pair. 
    # so the same census tract is repeated NUM_GROUPS times, and 
    # since we want to get all the tracts for a single group, must ensure
    # indices are spaced exactly NUM_GROUPS apart. 
    # example: [0, 6, 12, ..., 359 * 6]
    index_set = np.arange(num_tracts) * num_groups
    
    for i in range(int(num_groups)):
        population_counts = np.array(grouping_split_df.iloc[index_set + i][population_count_col_name])
        effect_sum = np.dot(population_counts, assigned_fac_distances)
        effect_avg = effect_sum / np.sum(population_counts)
        
        group_name = str(grouping_split_df.iloc[int(index_set[0] + i)][group_name_col_name])
        out_dict['{}_Sum'.format(group_name)] = effect_sum
        out_dict['{}_Avg'.format(group_name)] = effect_avg
    
    return out_dict

Let's test this out.

In [41]:
compute_group_effects(race_split_df_all, np.min(tract_facility_distance_matrix, axis=1))

{'NH_AIAN_alone_ACS_10_14_Sum': 34239.49603643779,
 'NH_AIAN_alone_ACS_10_14_Avg': 4.644605320820406,
 'NH_Asian_alone_ACS_10_14_Sum': 2817733.206803477,
 'NH_Asian_alone_ACS_10_14_Avg': 4.952924041836372,
 'NH_Blk_alone_ACS_10_14_Sum': 1325108.8128219028,
 'NH_Blk_alone_ACS_10_14_Avg': 4.543277708952991,
 'NH_NHOPI_alone_ACS_10_14_Sum': 966228.7606717693,
 'NH_NHOPI_alone_ACS_10_14_Avg': 7.907045629181408,
 'NH_SOR_alone_ACS_10_14_Sum': 1813345.4703301657,
 'NH_SOR_alone_ACS_10_14_Avg': 7.81406056836471,
 'NH_White_alone_ACS_10_14_Sum': 8894642.613336248,
 'NH_White_alone_ACS_10_14_Avg': 6.271154155266477}

In [42]:
def compute_effects(facs_to_open, tract_facility_distance_matrix): 
    '''
    facs_to_open: List of integers, corresponding to which facility numbers to open. 
    tract_facility_distance_matrix: Distance matrix between tracts and facilities. 
    Entry (i,j) is distance from tract i to facility j. 

    Returns: 
    assigned_fac_distances: List of length NUM_TRACTS (distance each tract must travel to nearest
    open facility)
    assigned_fac_numbers: List of length NUM_TRACTS. Each entry is an integer from facs_to_open,
    being the facililty number that this tract is assigned to. 
    '''
    distance_arr = tract_facility_distance_matrix[:, facs_to_open]
    assigned_facs = np.argmin(distance_arr, axis=1)
    # Go from relative to absolute indices
    assigned_fac_numbers = np.array([facs_to_open[i] for i in assigned_facs])
    
    # Get minimum distances from argmin array
    assigned_fac_distances = [distance_arr[i, assigned_facs[i]] for i in range(len(distance_arr))]
    
    return np.array(assigned_fac_distances), assigned_fac_numbers

Let's test the function.

In [43]:
compute_effects([1, 2, 3, 5, 7], tract_facility_distance_matrix)

(array([26.574, 24.903, 31.723, 24.19 , 24.095, 19.245, 17.581, 20.367,
        18.121, 21.144, 23.773, 15.515, 31.072, 19.6  , 16.076, 25.49 ,
        18.057, 14.285, 21.648, 14.365, 20.088, 15.156, 16.749, 14.266,
        18.771, 19.304, 13.483, 14.897, 19.673, 14.123, 22.059, 20.267,
        17.994, 13.766, 49.134, 14.099, 13.577, 14.483, 29.859, 10.964,
        10.869, 12.531, 16.682, 10.163, 12.123, 11.108, 13.28 ,  9.317,
         8.987,  7.149, 12.091, 12.89 ,  9.001,  6.965,  9.265,  8.418,
        15.386,  6.167,  8.557,  7.107, 10.582,  6.892,  4.679,  8.625,
         7.885,  6.04 ,  6.308,  5.611, 20.823,  6.43 ,  2.829,  6.864,
         2.121,  3.591,  0.933,  1.868,  2.633,  1.811,  1.281,  2.559,
         5.709, 52.41 ,  3.214,  4.533,  4.492,  3.593,  3.758,  2.598,
        41.168, 19.981,  3.916, 26.63 ,  3.585,  8.191,  4.924, 28.501,
         5.843, 30.692,  5.781,  5.2  , 38.851,  5.962, 10.703, 39.497,
        37.602, 39.014,  6.175,  6.62 , 28.849, 24.91 , 38.359, 

In [44]:
def compute_all_grouping_effects(grouping_dfs, facs_to_open, tract_facility_distance_matrix): 
    '''
    grouping_dfs: Iterable containing split_grouping_dfs. 
    Should include three dfs - Race, Health Insurance Status (Binarized), Income (Split by Poverty Level).
    
    facs_to_open: List of integers. Must be all in the range {0, 1, 2, .., num_facilities - 1}. 
    This list corresponds to the indices of facilities which are to be opened. 
    
    tract_facility_distance_matrix: Pairwise distance of each tract to each facility.
    
    Returns: 
    Dictionary with group level effects for Averaging vs Summing, for each grouping_df. 
    '''
    distances, indices = compute_effects(facs_to_open, tract_facility_distance_matrix)
    
    out_dict = {}
    total_indiv_dist = np.sum(np.multiply(distances, total_population_by_tract))
    out_dict = {
        'Total_Indiv_Dist': total_indiv_dist, 
        'Mean_Indiv_Dist': total_indiv_dist / np.sum(total_population_by_tract)
    }
    for grouping_df in grouping_dfs: 
        out_dict.update(compute_group_effects(grouping_df, distances))
    return out_dict

In [45]:
compute_all_grouping_effects((race_split_df_all, income_split_df_all, health_ins_split_df_all), [1, 2, 5, 7, 9], 
                            tract_facility_distance_matrix)

{'Total_Indiv_Dist': 52457647.745,
 'Mean_Indiv_Dist': 19.866257562846993,
 'NH_AIAN_alone_ACS_10_14_Sum': 76654.26712365527,
 'NH_AIAN_alone_ACS_10_14_Avg': 10.398190924516847,
 'NH_Asian_alone_ACS_10_14_Sum': 4978150.600099659,
 'NH_Asian_alone_ACS_10_14_Avg': 8.750438732660124,
 'NH_Blk_alone_ACS_10_14_Sum': 2429743.1620653193,
 'NH_Blk_alone_ACS_10_14_Avg': 8.330635069269578,
 'NH_NHOPI_alone_ACS_10_14_Sum': 3821296.055270147,
 'NH_NHOPI_alone_ACS_10_14_Avg': 31.271230480269416,
 'NH_SOR_alone_ACS_10_14_Sum': 7785884.83799745,
 'NH_SOR_alone_ACS_10_14_Avg': 33.550901743696514,
 'NH_White_alone_ACS_10_14_Sum': 33365874.776443772,
 'NH_White_alone_ACS_10_14_Avg': 23.524558922093952,
 'Prs_Blw_Pov_Lev_ACS_10_14_Sum': 5426740.896,
 'Prs_Blw_Pov_Lev_ACS_10_14_Avg': 17.429040460942566,
 'Above_poverty_level_Sum': 47030906.849,
 'Above_poverty_level_Avg': 20.19206211332925,
 'No_Health_Ins_ACS_10_14_Sum': 21492891.476375937,
 'No_Health_Ins_ACS_10_14_Avg': 28.6635862470067,
 'One_Plus_Hea

## Equity Metric Functions

Equity metrics are a function of the group effects. They are defined in the same way whether we use averaging for group effect computation or not (but note that the use of averaging means we are measuring something different! The resulting equity metric values will not be equal). 

We will compute equity metrics for each allocation, using the already computed group effects. To specify an equity metric, we must not only specify the metric being used, but also what grouping scheme the effects are from, and what aggregation method was used.

We are interested in metrics 1, 3, and 5 which we define below. These are representatives of three general equivalence classes of metrics we define in the paper - see the theory section.

**NOTE**: The numbering of metrics we use here is different than table 1 in the paper. This notebook (and our code in general) uses the number from Marsh and Schilling's 1994 paper "Equity Measurement in Facility Location."

In [81]:
def metric_1(effects_list): 
    # Min-Max
    return max(effects_list)

def metric_2(effects_list):
    mean_effect = np.mean(effects_list)
    return np.sum([(effect - mean_effect)**2 for effect in effects_list])

def metric_3(effects_list):
    # Mean absolute deviation
    mean_effect = np.mean(effects_list)
    return np.sum(np.abs(np.array(effects_list) - mean_effect))

def metric_4(effects_list):
    total = 0 
    for i in range(len(effects_list)): 
        for j in range(len(effects_list)): 
            total += np.abs(effects_list[i] - effects_list[j])
    return total

def metric_5(effects_list): 
    # Gini coefficient
    numerator = metric_4(effects_list)
    denom = 2 * len(effects_list) * sum(effects_list)
    return numerator / denom

## Capacity Excess Computation

Given a choice of facilities to open, we want to understand whether it places excessive strain on certain hospitals. By computing how much the allocation exceeds a "capacity limit" we can introdue a penalty in the objective term, so that the algorithm has an incentive to not exceed capacity too much. 

For details on how we compute capacity excess, see section 2 of the paper. Essentially, we treat the overall dataset person-to-bed ratio as a maximum. Then, depending on how many facilities are chosen to be open, we scale the capacity accordingly. Finally, we compute what the person-to-bed ratio of each facility is under the allocation chosen, and add up the deviations when a facility's capacity is exceeded. 

In [47]:
def parse_facs_to_open_str(x):
    y = x.replace('[', '').replace(']', '')
    nums = [int(z) for z in y.split(',')]
    return nums

In [48]:
def compute_global_pbr(num_beds_arr, current_open_facilities, total_population_by_tract): 
    '''
    Compute the global PBR (person-to-bed-ratio) amongst current open facilites. 
    This is just the number of people divide by the number of beds at currently open facilities. 
    
    num_beds_arr: List of integers. Number of beds at each facility. 
    current_open_facilities: List of integers. IDs of currently open facilities. 
    total_population_by_tract: List of integers. Population at each census tract. 
    '''
    total_population = np.sum(total_population_by_tract)
    total_beds = sum([num_beds_arr[x] for x in current_open_facilities])
    return total_population / float(total_beds)

In [57]:
def compute_capacity_deviation(tract_facility_distance_matrix, facs_to_open, num_beds_arr, 
                                total_population_by_tract, 
                               current_open_facilities = cur_facs_with_ab): 
    '''
    Returns the total capacity deviation when opening some collection of facilities. 
    
    tract_facility_distance_matrix: Table of census-facility pairwise distances. 
    facs_to_open: List of integers, with the ID of each facility to open. 
    num_beds_arr: 1D array of integers. Each entry gives the number of beds at that facility. 
    total_population_by_tract: 1D array of integers. Each entry gives the population of the corresponding
    census tract.
    '''
    assigned_facs_relative_indices = np.argmin(tract_facility_distance_matrix[:, facs_to_open], axis=1)
    assigned_facs_abs_indices = [facs_to_open[x] for x in assigned_facs_relative_indices]
    global_pbr = compute_global_pbr(num_beds_arr, current_open_facilities, total_population_by_tract)
    
    fac_loads = np.zeros(shape=(len(num_beds_arr,)))

    for i in range(len(total_population_by_tract)): 
        assigned_fac_num = assigned_facs_abs_indices[i]
        fac_loads[assigned_fac_num] += total_population_by_tract[i]
        
    assert np.isclose(sum(fac_loads), sum(total_population_by_tract)), 'Incorrect load calculation'
    total_deviation = 0.0
    
    # Rescale the person-bed-ratio by the ratio of number of facilities to open vs. 
    # number assumed to be open initially. 
    
    scaled_global_pbr = (global_pbr * len(num_beds_arr)) / len(facs_to_open)
    for fac_num in facs_to_open: 
        fac_beds_count = num_beds_arr[fac_num]
        fac_capacity = fac_beds_count * scaled_global_pbr
        local_dev = (fac_loads[fac_num] - fac_capacity)
        total_deviation += max(0.0, local_dev)
    return total_deviation

In [58]:
def compute_capacity_excess_for_df(facs_df, num_beds_arr): 
    '''
    A DataFrame tracking allocations and equity metrics. See `make_raw_effects_csv` in the notebook below. 
    
    Returns: The input dataframe, with a new column tracking capacity excess for each allocation. 
    '''
    deviations_arr = np.zeros(shape=(len(facs_df,)))
    for i in range(len(facs_df)): 
        facs_to_open_str = facs_df['Facs_To_Open'].iloc[i]
        if type(facs_to_open_str) == str: 
            facs_to_open = parse_facs_to_open_str(facs_to_open_str)
        elif type(facs_to_open_str) == list: 
            facs_to_open = facs_to_open_str
        deviation = compute_capacity_deviation(tract_facility_distance_matrix, facs_to_open, 
                                              num_beds_arr, total_population_by_tract)
        deviations_arr[i] = deviation
    facs_df['Capacity_Excess'] = deviations_arr
    return facs_df

### Check capacity excess

In [51]:
compute_global_pbr(num_beds_arr, cur_facs_with_ab, 
                   total_population_by_tract)

577.1672131147541

In [52]:
cur_facs_with_ab

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18, 19, 20, 21, 22, 23, 24]

In [53]:
compute_capacity_deviation(tract_facility_distance_matrix, cur_facs_with_ab, 
                   num_beds_arr, total_population_by_tract)

633140.1803278689

# Create DataFrames

Now that we have our helper functions, we can get started on building the main function: a method which generates a DataFrame for any experimental setup. An *experimental setup* is just a description of what kinds of options we want to consider. 

For example, suppose we want to know which facility ought to replace Alta Bates. Then our experimental setup would be the following: hold all open facilities open, except for Alta Bates, and then open one new facility. 

The DataFrame will track all the information relevant to this experiment.

Rows will correspond to all the unique choices we have. In this case, each one will correspond to the 20 fixed facilities, and one additional facility. 

Columns will track various things - the group effects, the values of the function under various equity metrics, and the capacity excess. 

## Create DataFrame with Group Effects

To start with, we create the DataFrame with just group effects. We will then pass this starting dataframe into another function which populates it with equity metric values and capacity excess.

In [54]:
from itertools import product, combinations

In [59]:
def make_raw_effects_csv(grouping_dfs, num_facs_to_open, fixed_fac_ids = [], 
                         excluded_fac_ids = [], savepath=None): 
    '''
    Makes the allocations DataFrame with just group effects in the columns.
    
    grouping_dfs: Iterable containing split_grouping_dfs. 
    Should include three dfs - Race, Health Insurance Status (Binarized), Income (Split by Poverty Level).
    
    num_facs_to_open: Integer. Number of new facilities to be open. 
    
    fixed_fac_ids: List of integers. The IDs of the facilities which are to be fixed as open. 
    Default setting is an empty list [], which means all facilities are opened from scratch. 
    
    excluded_fac_ids: List of integers. The IDs of facilities which are not valid candidate facilities. 
    Default setting is an empty list [], which means all facilities are candidates. 
    
    saveapth: String. The filepath to the directory where the output csv is saved. 
    '''
    blank_effects_dict = compute_all_grouping_effects(grouping_dfs, [1], tract_facility_distance_matrix)
    df_col_names = ['Facs_To_Open'] + list(blank_effects_dict.keys())
    
    # Initialize the empty dataframe. 
    facs_to_open_df = pd.DataFrame(columns=df_col_names)
    
    candidate_fac_ids = [x for x in range(NUM_FACILITIES) if x not in fixed_fac_ids + excluded_fac_ids]
    
    for fac_tuple in combinations(candidate_fac_ids, num_facs_to_open): 
        opened_facs = fixed_fac_ids + list(fac_tuple)
        results_dict = compute_all_grouping_effects(grouping_dfs, 
                                                    list(opened_facs), 
                                                    tract_facility_distance_matrix)
        results_dict['Facs_To_Open'] = opened_facs
        facs_to_open_df = facs_to_open_df.append(results_dict, ignore_index=True)
    if savepath: 
        facs_to_open_df.to_csv(savepath)
    return facs_to_open_df

Let's test this function with a realistic command - open 1 new facility if Alta Bates is closed and all others are held fixed. 

In [60]:
%%time
make_raw_effects_csv((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
                     1, fixed_fac_ids=cur_facs_no_ab, 
                     excluded_fac_ids=[1], savepath=None)

CPU times: user 103 ms, sys: 19.7 ms, total: 123 ms
Wall time: 123 ms


Unnamed: 0,Facs_To_Open,Total_Indiv_Dist,Mean_Indiv_Dist,NH_AIAN_alone_ACS_10_14_Sum,NH_AIAN_alone_ACS_10_14_Avg,NH_Asian_alone_ACS_10_14_Sum,NH_Asian_alone_ACS_10_14_Avg,NH_Blk_alone_ACS_10_14_Sum,NH_Blk_alone_ACS_10_14_Avg,NH_NHOPI_alone_ACS_10_14_Sum,...,NH_White_alone_ACS_10_14_Sum,NH_White_alone_ACS_10_14_Avg,No_Health_Ins_ACS_10_14_Sum,No_Health_Ins_ACS_10_14_Avg,One_Plus_Health_Ins_Sum,One_Plus_Health_Ins_Avg,Prs_Blw_Pov_Lev_ACS_10_14_Sum,Prs_Blw_Pov_Lev_ACS_10_14_Avg,Above_poverty_level_Sum,Above_poverty_level_Avg
0,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",17187650.0,6.509141,36687.910036,4.976734,3038642.0,5.34123,1377717.0,4.723649,971196.315497,...,9934470.0,7.004283,5504738.0,7.341289,11682910.0,6.179121,1730614.352,5.558207,15457030.0,6.636261
1,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",16837970.0,6.376715,35617.110036,4.831479,2960644.0,5.204129,1362918.0,4.67291,969795.830136,...,9688316.0,6.830733,5465493.0,7.288951,11372480.0,6.014933,1678722.604,5.391546,15159250.0,6.508411
2,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",17211990.0,6.518361,36740.459036,4.983862,3049496.0,5.36031,1382359.0,4.739567,971839.752497,...,9942427.0,7.009893,5509691.0,7.347895,11702300.0,6.189377,1734706.042,5.571348,15477290.0,6.644956
3,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",16346210.0,6.19048,35592.905036,4.828196,2931680.0,5.153217,1368893.0,4.693398,969761.876194,...,9215212.0,6.497171,5406949.0,7.210875,10939260.0,5.785803,1688187.461,5.421944,14658020.0,6.293217


## Create DataFrame with Equity Metrics and Capacity Deviation

We created a function to create a basic "allocations dataframe" in the previous section. This dataframe includes total individual distance and group effects under each choice of facilities to open. 

Now, we want to compute all of the things we use to measure the goodness of an allocation. In particular, this includes the computation of equity metrics (which are functions of group effects), and capacity deviation. 

In [61]:
aggregation_type = ['Sum', 'Avg']
grouping_labels=['Race', 'Income', 'Health_Ins']
metric_num_to_func = {
    1: metric_1, 
    3: metric_3,
    5: metric_5
}

In [62]:
def get_uq_group_names(grouping_df): 
    '''
    grouping_df: A grouping DataFrame with tract level data. 
    Returns: List of strings, corresponding to the labels of the unique groups in GROUPING_DF. 
    '''
    if 'Variable' in grouping_df.columns.values: 
        return list(grouping_df['Variable'].unique())
    elif 'variable' in grouping_df.columns.values: 
        return list(grouping_df['variable'].unique())
    else: 
        raise AssertionError('no column for group names in {}'.format(grouping_df.columns.values))

In [63]:
def compute_equity_metric(fac_opening_df, equity_metric_func, aggregation_type, grouping_df_col_names):
    '''
    Computes an equity metric from a raw effects DataFrame. 
    
    Returns a 1D array of floats with the metric for each allocation. 
    '''
    df_effect_col_names = ['{}_{}'.format(col_name, aggregation_type) for col_name in grouping_df_col_names]
    return fac_opening_df[df_effect_col_names].apply(equity_metric_func, axis=1)

In [64]:
def make_allocations_df_with_equity_metrics(grouping_dfs, num_facs_to_open, num_beds_arr, 
                                            fixed_fac_ids = [], 
                                            excluded_fac_ids = [], 
                                            equity_metric_nums = [1, 3, 5], savepath=None):
    '''
    Makes the allocations DataFrame with just group effects in the columns.
    
    grouping_dfs: Iterable containing split_grouping_dfs. 
    Should include three dfs - Race, Health Insurance Status (Binarized), Income (Split by Poverty Level).
    
    num_facs_to_open: Integer. Number of new facilities to be open. 
    num_beds_arr: List of integers. Number of beds at each facility, indexed by facility ID. 
    
    fixed_fac_ids: List of integers. The IDs of the facilities which are to be fixed as open. 
    Default setting is an empty list [], which means all facilities are opened from scratch. 
    
    excluded_fac_ids: List of integers. The IDs of facilities which are not valid candidate facilities. 
    Default setting is an empty list [], which means all facilities are candidates. 
    
    saveapth: String. The filepath to the directory where the output csv is saved. 
    
    
    '''
    
    
    raw_df = make_raw_effects_csv(grouping_dfs, num_facs_to_open, fixed_fac_ids, 
                         excluded_fac_ids, savepath)
    
    # Compute capacity excess
    raw_df = compute_capacity_excess_for_df(raw_df, num_beds_arr)
    
    aggregation_type = ['Sum', 'Avg']
    grouping_labels=['Race', 'Income', 'Health_Ins']
    
    metric_num_to_func = {
        1: metric_1, 
        2: metric_2,
        3: metric_3,
        4: metric_4, 
        5: metric_5
    }
    
    grouping_label_to_col_names = {
        'Race': get_uq_group_names(race_split_df_all),
        'Income': get_uq_group_names(income_split_df_all), 
        'Health_Ins': get_uq_group_names(health_ins_split_df_all)
    }
    
    # Iterate through all possible combinations of metric, aggregation type, and grouping. 
    for tup in product(equity_metric_nums, aggregation_type, grouping_labels): 
        metric_num, aggr_type, grouping_type = tup[0], tup[1], tup[2]
        new_col_name = 'Metric_{}_{}_{}'.format(metric_num, aggr_type, grouping_type)
        metric_func = metric_num_to_func[metric_num]
        col_names = grouping_label_to_col_names[grouping_type]
        metric_val = compute_equity_metric(raw_df, metric_func, aggr_type, col_names)
        raw_df[new_col_name] = metric_val
    
    if savepath:
        raw_df.to_csv(savepath)
    return raw_df        

Once again, let's test our new function to see if it works under the same scenario. 

In [65]:
%xmode context

Exception reporting mode: Context


In [66]:
raw_df = make_raw_effects_csv((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
                              1, fixed_fac_ids=cur_facs_no_ab, 
                     excluded_fac_ids=[1])

In [67]:
raw_df.head()

Unnamed: 0,Facs_To_Open,Total_Indiv_Dist,Mean_Indiv_Dist,NH_AIAN_alone_ACS_10_14_Sum,NH_AIAN_alone_ACS_10_14_Avg,NH_Asian_alone_ACS_10_14_Sum,NH_Asian_alone_ACS_10_14_Avg,NH_Blk_alone_ACS_10_14_Sum,NH_Blk_alone_ACS_10_14_Avg,NH_NHOPI_alone_ACS_10_14_Sum,...,NH_White_alone_ACS_10_14_Sum,NH_White_alone_ACS_10_14_Avg,No_Health_Ins_ACS_10_14_Sum,No_Health_Ins_ACS_10_14_Avg,One_Plus_Health_Ins_Sum,One_Plus_Health_Ins_Avg,Prs_Blw_Pov_Lev_ACS_10_14_Sum,Prs_Blw_Pov_Lev_ACS_10_14_Avg,Above_poverty_level_Sum,Above_poverty_level_Avg
0,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",17187650.0,6.509141,36687.910036,4.976734,3038642.0,5.34123,1377717.0,4.723649,971196.315497,...,9934470.0,7.004283,5504738.0,7.341289,11682910.0,6.179121,1730614.352,5.558207,15457030.0,6.636261
1,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",16837970.0,6.376715,35617.110036,4.831479,2960644.0,5.204129,1362918.0,4.67291,969795.830136,...,9688316.0,6.830733,5465493.0,7.288951,11372480.0,6.014933,1678722.604,5.391546,15159250.0,6.508411
2,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",17211990.0,6.518361,36740.459036,4.983862,3049496.0,5.36031,1382359.0,4.739567,971839.752497,...,9942427.0,7.009893,5509691.0,7.347895,11702300.0,6.189377,1734706.042,5.571348,15477290.0,6.644956
3,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",16346210.0,6.19048,35592.905036,4.828196,2931680.0,5.153217,1368893.0,4.693398,969761.876194,...,9215212.0,6.497171,5406949.0,7.210875,10939260.0,5.785803,1688187.461,5.421944,14658020.0,6.293217


In [68]:
out_df = make_allocations_df_with_equity_metrics((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
                     1, num_beds_arr, fixed_fac_ids=cur_facs_no_ab, 
                     excluded_fac_ids=[1], savepath=None)

In [69]:
out_df.head()

Unnamed: 0,Facs_To_Open,Total_Indiv_Dist,Mean_Indiv_Dist,NH_AIAN_alone_ACS_10_14_Sum,NH_AIAN_alone_ACS_10_14_Avg,NH_Asian_alone_ACS_10_14_Sum,NH_Asian_alone_ACS_10_14_Avg,NH_Blk_alone_ACS_10_14_Sum,NH_Blk_alone_ACS_10_14_Avg,NH_NHOPI_alone_ACS_10_14_Sum,...,Metric_3_Sum_Health_Ins,Metric_3_Avg_Race,Metric_3_Avg_Income,Metric_3_Avg_Health_Ins,Metric_5_Sum_Race,Metric_5_Sum_Income,Metric_5_Sum_Health_Ins,Metric_5_Avg_Race,Metric_5_Avg_Income,Metric_5_Avg_Health_Ins
0,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",17187650.0,6.509141,36687.910036,4.976734,3038642.0,5.34123,1377717.0,4.723649,971196.315497,...,6178171.0,7.791809,1.078054,1.162168,0.544406,0.399311,0.179727,0.1166,0.044203,0.042978
1,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",16837970.0,6.376715,35617.110036,4.831479,2960644.0,5.204129,1362918.0,4.67291,969795.830136,...,5906984.0,7.904349,1.116865,1.274019,0.541372,0.400301,0.175407,0.120514,0.046927,0.047881
2,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",17211990.0,6.518361,36740.459036,4.983862,3049496.0,5.36031,1382359.0,4.739567,971839.752497,...,6192610.0,7.761401,1.073608,1.158517,0.544272,0.399215,0.179892,0.116059,0.043942,0.04279
3,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",16346210.0,6.19048,35592.905036,4.828196,2931680.0,5.153217,1368893.0,4.693398,969761.876194,...,5532312.0,7.623057,0.871273,1.425072,0.532641,0.396723,0.169223,0.120208,0.037186,0.054824


## Save DataFrames for Later

Let's create some dataframes to save for later analysis. By changing the parameters of the `make_allocations_df_with_equity_metrics` function, you can create any other dataframes you'd like. 

In [72]:
# # Open facility to replace Alta Bates. A.B. is not an option. 

# make_allocations_df_with_equity_metrics((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
#                      1, num_beds_arr, fixed_fac_ids=[x for x in range(26) if (x != 1) and (x < 13 or x > 17)], 
#                      excluded_fac_ids=[1], 
#                      savepath='all-data/allocation-dfs/one_fac_replace_alta_bates_no_replacement.csv')

In [73]:
# make_allocations_df_with_equity_metrics((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
#                      1, num_beds_arr, fixed_fac_ids=[x for x in range(26) if (x != 1) and (x < 13 or x > 17)], 
#                      excluded_fac_ids=[], 
#                      savepath='all-data/allocation-dfs/one_fac_replace_alta_bates_with_replacement.csv')

### Two, Three facilities from scratch

In [70]:
%%time
make_allocations_df_with_equity_metrics((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
                     2, num_beds_arr, fixed_fac_ids=[], 
                     excluded_fac_ids=[], 
                     savepath='all-data/allocation-dfs/two_facs_to_open_from_scratch.csv')

CPU times: user 6.54 s, sys: 64.5 ms, total: 6.61 s
Wall time: 6.92 s


Unnamed: 0,Facs_To_Open,Total_Indiv_Dist,Mean_Indiv_Dist,NH_AIAN_alone_ACS_10_14_Sum,NH_AIAN_alone_ACS_10_14_Avg,NH_Asian_alone_ACS_10_14_Sum,NH_Asian_alone_ACS_10_14_Avg,NH_Blk_alone_ACS_10_14_Sum,NH_Blk_alone_ACS_10_14_Avg,NH_NHOPI_alone_ACS_10_14_Sum,...,Metric_3_Sum_Health_Ins,Metric_3_Avg_Race,Metric_3_Avg_Income,Metric_3_Avg_Health_Ins,Metric_5_Sum_Race,Metric_5_Sum_Income,Metric_5_Sum_Health_Ins,Metric_5_Avg_Race,Metric_5_Avg_Income,Metric_5_Avg_Health_Ins
0,"[0, 1]",7.393915e+07,28.001524,161749.630070,21.941421,1.441645e+07,25.340793,4.410502e+06,15.121880,4.183643e+06,...,2.523015e+07,37.018849,5.813442,6.254470,0.556068,0.403678,0.170614,0.145933,0.056375,0.053270
1,"[0, 2]",7.539549e+07,28.553058,161843.582952,21.954166,1.444036e+07,25.382812,4.346477e+06,14.902361,4.342227e+06,...,2.521916e+07,40.592344,5.893095,6.850685,0.556496,0.403551,0.167246,0.155793,0.056015,0.057026
2,"[0, 3]",8.086605e+07,30.624815,165301.958168,22.423296,1.444280e+07,25.387111,4.348210e+06,14.908303,4.795147e+06,...,2.542526e+07,50.987084,6.011391,8.859902,0.561180,0.402501,0.157206,0.178106,0.053052,0.068072
3,"[0, 4]",7.839035e+07,29.687240,163298.234873,22.151490,1.447317e+07,25.440494,4.384124e+06,15.031440,4.533187e+06,...,2.542177e+07,45.443115,6.113929,7.867017,0.559687,0.403505,0.162149,0.165717,0.055884,0.062662
4,"[0, 5]",7.494205e+07,28.381337,116967.599744,15.866715,8.199728e+06,14.413228,3.639782e+06,12.479382,5.105939e+06,...,1.705509e+07,77.532456,4.048495,14.271165,0.578375,0.396921,0.113788,0.268479,0.037717,0.113392
5,"[0, 6]",7.334301e+07,27.775762,114683.490111,15.556875,7.478458e+06,13.145401,4.118212e+06,14.119733,5.157940e+06,...,1.559814e+07,76.770679,2.782911,14.984575,0.570186,0.392505,0.106337,0.266837,0.026045,0.120793
6,"[0, 7]",7.516526e+07,28.465867,129875.148628,17.617631,1.006381e+07,17.689861,3.458655e+06,11.858370,5.118736e+06,...,1.844330e+07,71.185679,4.047786,13.068175,0.566583,0.396874,0.122685,0.249543,0.037592,0.104415
7,"[0, 8]",7.144191e+07,27.055795,146739.260575,19.905256,1.208526e+07,21.243100,4.563028e+06,15.644832,4.791280e+06,...,2.102742e+07,49.734254,1.539721,9.163531,0.538010,0.388003,0.147164,0.184812,0.014544,0.078900
8,"[0, 9]",7.345311e+07,27.817458,117027.646242,15.874861,7.687407e+06,13.512685,4.221754e+06,14.474736,5.167035e+06,...,1.589232e+07,75.431450,2.615405,14.754915,0.566907,0.391863,0.108180,0.261095,0.024381,0.118972
9,"[0, 10]",7.307142e+07,27.672908,131145.833402,17.790000,1.007870e+07,17.716021,3.656890e+06,12.538041,5.030052e+06,...,1.802203e+07,66.894331,3.348565,12.618011,0.558736,0.394670,0.123318,0.239080,0.031718,0.103771


In [71]:
%%time
make_allocations_df_with_equity_metrics((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
                     3, num_beds_arr, fixed_fac_ids=[], 
                     excluded_fac_ids=[], 
                     savepath='all-data/allocation-dfs/three_facs_to_open_from_scratch.csv')

CPU times: user 49.8 s, sys: 621 ms, total: 50.4 s
Wall time: 52 s


Unnamed: 0,Facs_To_Open,Total_Indiv_Dist,Mean_Indiv_Dist,NH_AIAN_alone_ACS_10_14_Sum,NH_AIAN_alone_ACS_10_14_Avg,NH_Asian_alone_ACS_10_14_Sum,NH_Asian_alone_ACS_10_14_Avg,NH_Blk_alone_ACS_10_14_Sum,NH_Blk_alone_ACS_10_14_Avg,NH_NHOPI_alone_ACS_10_14_Sum,...,Metric_3_Sum_Health_Ins,Metric_3_Avg_Race,Metric_3_Avg_Income,Metric_3_Avg_Health_Ins,Metric_5_Sum_Race,Metric_5_Sum_Income,Metric_5_Sum_Health_Ins,Metric_5_Avg_Race,Metric_5_Avg_Income,Metric_5_Avg_Health_Ins
0,"[0, 1, 2]",7.365147e+07,27.892578,160120.827070,21.720473,1.436382e+07,25.248278,4.311818e+06,14.783529,4.182325e+06,...,2.502012e+07,37.561066,5.897632,6.334305,0.556636,0.404076,0.169855,0.148996,0.057506,0.054119
1,"[0, 1, 3]",7.283682e+07,27.584061,157243.841984,21.330208,1.411927e+07,24.818408,4.109572e+06,14.090107,4.171747e+06,...,2.443644e+07,38.707998,5.927159,6.550085,0.558058,0.404434,0.167748,0.155345,0.058524,0.056468
2,"[0, 1, 4]",7.344911e+07,27.815945,158813.389070,21.543118,1.430199e+07,25.139592,4.271214e+06,14.644315,4.182100e+06,...,2.487118e+07,37.912232,5.930825,6.391589,0.556748,0.404261,0.169309,0.150752,0.058032,0.054729
3,"[0, 1, 5]",5.949877e+07,22.532805,99535.731920,13.502073,7.477388e+06,13.143521,3.039724e+06,10.422020,3.886872e+06,...,1.446774e+07,55.202274,4.142315,10.466853,0.568647,0.401205,0.121580,0.242625,0.049431,0.105538
4,"[0, 1, 6]",5.831517e+07,22.084563,97901.273061,13.280358,6.755647e+06,11.874868,3.512837e+06,12.044140,3.993548e+06,...,1.303410e+07,55.333858,2.951643,11.325712,0.558729,0.395985,0.111756,0.240417,0.035211,0.115421
5,"[0, 1, 7]",6.045352e+07,22.894377,113199.592974,15.355583,9.353416e+06,16.441144,2.874239e+06,9.854637,3.940095e+06,...,1.599622e+07,49.894934,4.326097,9.427587,0.563911,0.401738,0.132302,0.223881,0.050916,0.094537
6,"[0, 1, 8]",5.880248e+07,22.269111,130718.328436,17.732009,1.136664e+07,19.979930,3.984579e+06,13.661558,3.845972e+06,...,1.821873e+07,32.987568,2.029843,6.693512,0.531406,0.391565,0.154915,0.158012,0.023610,0.070562
7,"[0, 1, 9]",5.856968e+07,22.180946,100323.500647,13.608935,6.964760e+06,12.242439,3.617716e+06,12.403728,4.016701e+06,...,1.332603e+07,54.299589,2.814891,11.156254,0.554860,0.395284,0.113762,0.234154,0.033343,0.113418
8,"[0, 1, 10]",5.892954e+07,22.317230,114718.220878,15.561586,9.360171e+06,16.453019,3.073048e+06,10.536276,3.920065e+06,...,1.551080e+07,46.904613,3.727202,9.266446,0.554966,0.399455,0.131605,0.213398,0.044598,0.095259
9,"[0, 1, 11]",7.341929e+07,27.804648,158844.890070,21.547391,1.429638e+07,25.129730,4.271707e+06,14.646005,4.181803e+06,...,2.484297e+07,37.895650,5.907471,6.405860,0.556661,0.404183,0.169186,0.150732,0.057809,0.054866


### Make DFs of Alta Bates Replacement, with Fac. 4 Beds Rescaled

Sutter Health has announced that they plan to expand capacity at their nearby Summit Campus in Oakland to handle the displacement which closing Alta Bates will cause. We are interested in understanding just how much expansion is needed. Since Sutter Health has not released details, we will test a variety of expansions of the capacity of Fac. 17.

Our "replacement" experiment will have 5 options. 

Options 1 - 4 will open facilities 13 - 16, respectively. Facility 4 will remain open with no capacity improvements. 

Option 5 will expand facility 14 by some fixed amount. No new facility will be opened.

In [77]:
def rescale_beds(fac_numbers_to_scale, num_beds_arr, scaling_factor): 
    '''
    Returns a copy of num_beds_arr, but with certain facilities' bed numbers scaled. 
    
    fac_numbers_to_scale: List of integers. IDs of facilities whose beds to scale. 
    num_beds_arr: List of integers. Number of beds at each facility. 
    scaling_factor: Float. The factor by which to scale the number of beds at select facilities. 
    '''
    beds_arr_copy = np.copy(num_beds_arr)
    for fac_id in fac_numbers_to_scale: 
        beds_arr_copy[fac_id] = scaling_factor * num_beds_arr[fac_id]
    return beds_arr_copy

Now we need a custom function to create an allocation DF for this situation. 

In [98]:
def make_raw_effects_csv_custom(grouping_dfs, facs_to_open_lists, 
                                savepath=None): 
    '''
    Makes the allocations DataFrame with just group effects in the columns.
    
    grouping_dfs: Iterable containing split_grouping_dfs. 
    Should include three dfs - Race, Health Insurance Status (Binarized), Income (Split by Poverty Level).
    
    facs_to_open_lists: List of lists. Each element is a list of facility IDs (integers)
    to open. 
    savepath: String. The filepath to the directory where the output csv is saved. 
    '''
    blank_effects_dict = compute_all_grouping_effects(grouping_dfs, [1], tract_facility_distance_matrix)
    df_col_names = ['Facs_To_Open'] + list(blank_effects_dict.keys())
    
    # Initialize the empty dataframe. 
    facs_to_open_df = pd.DataFrame(columns=df_col_names)
    
    for fac_list in facs_to_open_lists: 
        results_dict = compute_all_grouping_effects(grouping_dfs, 
                                                    fac_list, 
                                                    tract_facility_distance_matrix)
        results_dict['Facs_To_Open'] = fac_list
        facs_to_open_df = facs_to_open_df.append(results_dict, ignore_index=True)
    if savepath: 
        facs_to_open_df.to_csv(savepath)
    return facs_to_open_df

In [79]:
def contains_new_fac(facs_to_open_list, cur_open_facs): 
    '''
    Returns whether facs_to_open_list contains a new facility or not. 
    
    facs_to_open_list: List of integers. IDs of the new facilities to open. 
    cur_open_facs: List of integers. IDs of the current open facilities.
    '''
    for item in facs_to_open_list: 
        if item not in cur_open_facs: 
            return True
    return False

In [82]:
def compute_capacity_excess_for_df_custom(facs_df, num_beds_arr, rescale_factor_fac_4 = 1.0): 
    '''
    A DataFrame tracking allocations and equity metrics. See `make_raw_effects_csv` in the notebook below. 
    
    rescale_factor_fac_4: Float. The factor by which to rescale the capacity of facility 4. 
    
    Returns: The input dataframe, with a new column tracking capacity excess for each allocation. 
    '''
    deviations_arr = np.zeros(shape=(len(facs_df,)))
    for i in range(len(facs_df)): 
        facs_to_open_str = facs_df['Facs_To_Open'].iloc[i]
        if type(facs_to_open_str) == str: 
            facs_to_open = parse_facs_to_open_str(facs_to_open_str)
        elif type(facs_to_open_str) == list: 
            facs_to_open = facs_to_open_str
        
        if contains_new_fac(facs_to_open, cur_facs_no_ab): 
            new_num_beds_arr = rescale_beds([4], np.array(num_beds_arr).copy(), rescale_factor_fac_4)
        else:
            new_num_beds_arr = np.array(num_beds_arr).copy()
        
        deviation = compute_capacity_deviation(tract_facility_distance_matrix, facs_to_open, 
                                              new_num_beds_arr, total_population_by_tract)
        deviations_arr[i] = deviation
    facs_df['Capacity_Excess'] = deviations_arr
    return facs_df

In [94]:
def make_allocations_df_with_equity_metrics_custom(grouping_dfs, facs_to_open_lists, num_beds_arr, 
                                                   rescale_factor_fac_4 = 1.0, 
                                            equity_metric_nums = [1, 3, 5], savepath=None):
    '''
    Makes the allocations DataFrame with just group effects in the columns.
    
    grouping_dfs: Iterable containing split_grouping_dfs. 
    Should include three dfs - Race, Health Insurance Status (Binarized), Income (Split by Poverty Level).
    
    num_facs_to_open: Integer. Number of new facilities to be open. 
    num_beds_arr: List of integers. Number of beds at each facility, indexed by facility ID. 
    rescale_factor_fac_4: Float. The factor by which to rescale the capacity of facility 4. 
    equity_metric_nums: List of integers. Which equity metrics to compute.
    
    saveapth: String. The filepath to the directory where the output csv is saved. 
    '''
    
    raw_df = make_raw_effects_csv_custom(grouping_dfs, facs_to_open_lists)
    
    # Compute capacity excess
    raw_df = compute_capacity_excess_for_df_custom(raw_df, num_beds_arr, rescale_factor_fac_4)
    
    aggregation_type = ['Sum', 'Avg']
    grouping_labels=['Race', 'Income', 'Health_Ins']
    
    metric_num_to_func = {
        1: metric_1, 
        2: metric_2,
        3: metric_3,
        4: metric_4, 
        5: metric_5
    }
    
    grouping_label_to_col_names = {
        'Race': get_uq_group_names(race_split_df_all),
        'Income': get_uq_group_names(income_split_df_all), 
        'Health_Ins': get_uq_group_names(health_ins_split_df_all)
    }
    
    # Iterate through all possible combinations of metric, aggregation type, and grouping. 
    for tup in product(equity_metric_nums, aggregation_type, grouping_labels): 
        metric_num, aggr_type, grouping_type = tup[0], tup[1], tup[2]
        new_col_name = 'Metric_{}_{}_{}'.format(metric_num, aggr_type, grouping_type)
        metric_func = metric_num_to_func[metric_num]
        col_names = grouping_label_to_col_names[grouping_type]
        metric_val = compute_equity_metric(raw_df, metric_func, aggr_type, col_names)
        raw_df[new_col_name] = metric_val
    
    if savepath:
        raw_df.to_csv(savepath)
    return raw_df

Let's test our new functions.

In [92]:
rescale_beds([4], num_beds_arr, 1.5)

array([135, 347, 190, 249, 604, 217, 341,  93, 167, 106, 130, 315, 216,
       159,  69,  99,  75, 145, 554, 233,  50, 167, 244, 123, 150])

We need a helper function to generate the lists of facilties to open.

In [87]:
def gen_one_fac_replacement_options(cur_facs_with_ab, cur_facs_no_ab): 
    unopened_facs = [x for x in range(NUM_FACILITIES) if x not in cur_facs_with_ab]
    
    # List of 5 empty lists. 
    out_lists = [[]] * 5
    
    for index, x in enumerate(unopened_facs): 
        out_lists[index] = sorted([x for x in cur_facs_no_ab] + [unopened_facs[index]])
    
    out_lists[-1] = [x for x in cur_facs_no_ab]
    return out_lists

In [96]:
options = gen_one_fac_replacement_options(cur_facs_with_ab, cur_facs_no_ab)

Now let's test the main function.

In [108]:
make_raw_effects_csv_custom((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
                            options)

Unnamed: 0,Facs_To_Open,Total_Indiv_Dist,Mean_Indiv_Dist,NH_AIAN_alone_ACS_10_14_Sum,NH_AIAN_alone_ACS_10_14_Avg,NH_Asian_alone_ACS_10_14_Sum,NH_Asian_alone_ACS_10_14_Avg,NH_Blk_alone_ACS_10_14_Sum,NH_Blk_alone_ACS_10_14_Avg,NH_NHOPI_alone_ACS_10_14_Sum,...,NH_White_alone_ACS_10_14_Sum,NH_White_alone_ACS_10_14_Avg,No_Health_Ins_ACS_10_14_Sum,No_Health_Ins_ACS_10_14_Avg,One_Plus_Health_Ins_Sum,One_Plus_Health_Ins_Avg,Prs_Blw_Pov_Lev_ACS_10_14_Sum,Prs_Blw_Pov_Lev_ACS_10_14_Avg,Above_poverty_level_Sum,Above_poverty_level_Avg
0,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17...",17187650.0,6.509141,36687.910036,4.976734,3038642.0,5.34123,1377717.0,4.723649,971196.315497,...,9934470.0,7.004283,5504738.0,7.341289,11682910.0,6.179121,1730614.352,5.558207,15457030.0,6.636261
1,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 17...",16837970.0,6.376715,35617.110036,4.831479,2960644.0,5.204129,1362918.0,4.67291,969795.830136,...,9688316.0,6.830733,5465493.0,7.288951,11372480.0,6.014933,1678722.604,5.391546,15159250.0,6.508411
2,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 17...",17211990.0,6.518361,36740.459036,4.983862,3049496.0,5.36031,1382359.0,4.739567,971839.752497,...,9942427.0,7.009893,5509691.0,7.347895,11702300.0,6.189377,1734706.042,5.571348,15477290.0,6.644956
3,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 16, 17...",16346210.0,6.19048,35592.905036,4.828196,2931680.0,5.153217,1368893.0,4.693398,969761.876194,...,9215212.0,6.497171,5406949.0,7.210875,10939260.0,5.785803,1688187.461,5.421944,14658020.0,6.293217
4,"[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18...",17228280.0,6.52453,36784.370036,4.989819,3052643.0,5.365841,1388420.0,4.760348,971946.626497,...,9949182.0,7.014656,5511092.0,7.349764,11717190.0,6.197252,1736590.515,5.5774,15491690.0,6.651141


In [109]:
test_out = make_allocations_df_with_equity_metrics_custom((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
                                              options, num_beds_arr, rescale_factor_fac_4=3.0)

In [110]:
test_out['Facs_To_Open'].iloc[0]

[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 18, 19, 20, 21, 22, 23, 24]

In [111]:
test_out['Facs_To_Open'].iloc[-1]

[0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18, 19, 20, 21, 22, 23, 24]

In [105]:
test_out['Capacity_Excess']

0    850326.993681
1    909358.390262
2    872625.993681
3    856885.111503
4    627708.393443
Name: Capacity_Excess, dtype: float64

As expected, the last option (rescaling facility 4) has a much lower capacity excess.

Let's save these dataframes for later analysis.

In [112]:
%%time
# Open facility to replace Alta Bates. A.B. is not an option. 
for scale_fac in np.linspace(1, 10, 10): 
    scaling_fac = np.round(scale_fac, 2)
    savepath = 'all-data/allocation-dfs/alta-bates-replacements-rescaled-beds/'
    savepath += 'one_fac_replace_alta_bates_no_replacement_fac_4_{}_rescale.csv'.format(scaling_fac)
    options = gen_one_fac_replacement_options(cur_facs_with_ab, cur_facs_no_ab)
    make_allocations_df_with_equity_metrics_custom((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
                                              options, num_beds_arr, rescale_factor_fac_4=scaling_fac, 
                                                   savepath=savepath)
    print(scaling_fac)

1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
CPU times: user 2.12 s, sys: 68.7 ms, total: 2.19 s
Wall time: 2.32 s


Let's make some more dataframes with much larger scaling factors. 

In [113]:
%%time
# Open facility to replace Alta Bates. A.B. is not an option. 
for scale_fac in range(15, 205, 5): 
    scaling_fac = np.round(scale_fac, 2)
    savepath = 'all-data/allocation-dfs/alta-bates-replacements-rescaled-beds/'
    savepath += 'one_fac_replace_alta_bates_no_replacement_fac_4_{}_rescale.csv'.format(scaling_fac)
    options = gen_one_fac_replacement_options(cur_facs_with_ab, cur_facs_no_ab)
    make_allocations_df_with_equity_metrics_custom((race_split_df_all, health_ins_split_df_all, income_split_df_all), 
                                              options, num_beds_arr, rescale_factor_fac_4=scaling_fac, 
                                                   savepath=savepath)
    print(scaling_fac)

15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
120
125
130
135
140
145
150
155
160
165
170
175
180
185
190
195
200
CPU times: user 8.14 s, sys: 180 ms, total: 8.32 s
Wall time: 9.1 s
