# feature discovery to programatically model a patient floor of a hospital 

the problem of modeling a patient floor is well suited to a constraint programming model. however, we have some work to do in order to understand what the constraints that must be passed to the model are. we want to be able to do this with the minimum level of information that we are likely to recieve when we are asked to provide pricing for a project. we would expect to know who the client is, what the floor type is, and the number of beds on the floor. these will be our first few features. but we also need to say something about what the shape of the building is likely to be. whather it's kind of square or more of a rectangle. we'll call this aspect_ratio, and add this to the little list of required features.

so here goes. we'll start by setting up a dataframe with the features I mentioned above for for a number of projects, as well as several targets we will want to solve for

In [92]:
import pandas as pd
import numpy as np
import numpy.linalg as la
df = pd.DataFrame([['mc_auburn', 'mc', 'ms', 24, 14, 21.5, 16, 21.5, 177.5, 90.6667, 15405.5]
                   ,['mc_milgard', 'mc', 'ms', 24, 15.25, 22.5, 15.25, 22.5, 217.5, 102, 22991]
                   ,['mc_rainier', 'mc', 'ms', 18, 15.25, 22, 16.6875, 22, 130, 129, 16138] # these square floors are defo having an effect
                   ,['mc_rainier', 'mc', 'nicu', 24, 15.25, 14, 15.25, 14, 130, 129, 16138]
                   ,['mc_rainier', 'mc', 'ldrp', 16, 15.25, 22, 23.4167, 22, 130, 129, 16138]
                   ,['swedish_nw_tower', 'swedish', 'icu', 48, 16, 24, 19, 24, 451.875, 100.5, 43852]
                   ,['swedish_nw_tower', 'swedish', 'ms', 48, 16, 24, 19, 24, 451.875, 100.5, 43852]
                   ,['swedish_issaquah', 'swedish', 'ms', 36, 15, 21, 19.75, 21, 332.875, 80.6667, 26988.86]
                   ,['swedish_issaquah', 'swedish', 'icu', 36, 15, 21, 15, 21, 304.1458, 80, 24540.54]
                   ,['sc_river', 'sc', 'pe', 20, 15.5, 25.5, 16.75, 25.5, 187, 86.3334, 18514]
                   ,['sc_forrest_c', 'sc', 'icu', 28, 16, 24, 18.5, 25, 368, 83.75, 31780] # may be acu, but we're saying icu is the same thing
                   ,['sc_hospital_a', 'sc', 'pe', 32, 15.5, 24, 17.6, 26.4, 368.5, 84.3334, 34127]
                   ,['st_micheal_2', 'cf', 'icu', 18, 13.75, 20.8334, 14.75, 20.8334, 175, 72.3334, 12502]
                   ,['st_micheal_1', 'cf', 'icu', 24, 15.5, 22.9, 18.6, 22.9, 383.5, 76, 28953] # changed from ccu
                   ,['st_micheal_1', 'cf', 'ms', 32, 15.5, 22.9, 18.6, 22.9, 383.5, 76, 28953]]
                   , columns=['hospital', 'client', 'floor_type', 'number_of_beds', 'patient_room_width', 'patient_room_length', 'ada_patient_room_width', 'ada_patient_room_length', 'floorplate_x', 'floorplate_y', 'floorplate_sqft'])
df = df.set_index('hospital')
df['patient_room_sqft'] = df['patient_room_length'] * df['patient_room_width']
df['ada_patient_room_sqft'] = df['ada_patient_room_length'] * df['ada_patient_room_width']
df = df[['client', 'floor_type', 'number_of_beds', 'patient_room_width', 'patient_room_length', 'patient_room_sqft', 'ada_patient_room_width', 'ada_patient_room_length', 'ada_patient_room_sqft', 'floorplate_x', 'floorplate_y', 'floorplate_sqft']]
df

Unnamed: 0_level_0,client,floor_type,number_of_beds,patient_room_width,patient_room_length,patient_room_sqft,ada_patient_room_width,ada_patient_room_length,ada_patient_room_sqft,floorplate_x,floorplate_y,floorplate_sqft
hospital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
mc_auburn,mc,ms,24,14.0,21.5,301.0,16.0,21.5,344.0,177.5,90.6667,15405.5
mc_milgard,mc,ms,24,15.25,22.5,343.125,15.25,22.5,343.125,217.5,102.0,22991.0
swedish_nw_tower,swedish,icu,48,16.0,24.0,384.0,19.0,24.0,456.0,451.875,100.5,43852.0
swedish_nw_tower,swedish,ms,48,16.0,24.0,384.0,19.0,24.0,456.0,451.875,100.5,43852.0
swedish_issaquah,swedish,ms,36,15.0,21.0,315.0,19.75,21.0,414.75,332.875,80.6667,26988.86
swedish_issaquah,swedish,icu,36,15.0,21.0,315.0,15.0,21.0,315.0,304.1458,80.0,24540.54
sc_river,sc,pe,20,15.5,25.5,395.25,16.75,25.5,427.125,187.0,86.3334,18514.0
sc_forrest_c,sc,icu,28,16.0,24.0,384.0,18.5,25.0,462.5,368.0,83.75,31780.0
sc_hospital_a,sc,pe,32,15.5,24.0,372.0,17.6,26.4,464.64,368.5,84.3334,34127.0
st_micheal_2,cf,icu,18,13.75,20.8334,286.45925,14.75,20.8334,307.29265,175.0,72.3334,12502.0


In [93]:
df_show = df
df_show['floorplate_aspect_ratio'] = round(df['floorplate_x'] / df['floorplate_y'],1)
df_show = df_show[['client', 'floor_type', 'number_of_beds', 'floorplate_aspect_ratio', 'floorplate_sqft']]
df_show

Unnamed: 0_level_0,client,floor_type,number_of_beds,floorplate_aspect_ratio,floorplate_sqft
hospital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
mc_auburn,mc,ms,24,2.0,15405.5
mc_milgard,mc,ms,24,2.1,22991.0
swedish_nw_tower,swedish,icu,48,4.5,43852.0
swedish_nw_tower,swedish,ms,48,4.5,43852.0
swedish_issaquah,swedish,ms,36,4.1,26988.86
swedish_issaquah,swedish,icu,36,3.8,24540.54
sc_river,sc,pe,20,2.2,18514.0
sc_forrest_c,sc,icu,28,4.4,31780.0
sc_hospital_a,sc,pe,32,4.4,34127.0
st_micheal_2,cf,icu,18,2.4,12502.0


doing some basic math to add a couple columns. aspect ratio to use as a feature, sqft per bed to use as a target

In [94]:

df = df.copy()
floor_types, floor_codes = [(df['floor_type'] == 'ms'), (df['floor_type'] == 'ldrp'), (df['floor_type'] == 'nicu'), (df['floor_type'] == 'icu'), (df['floor_type'] == 'pe'), (df['floor_type'] == 'uk'), (df['floor_type'] == 'ccu')], [1,2,3,4,5,6,7]
df['floor_code'] = np.select(floor_types, floor_codes)
clients, client_codes = [(df['client'] == 'mc'), (df['client'] == 'swedish'), (df['client'] == 'sc'), (df['client'] == 'sc'), (df['client'] == 'cf')], [1,2,3,4,5]
df['client_code'] = np.select(clients, client_codes)
df['aspect_ratio'] = round(df['floorplate_x'] / df['floorplate_y'],1)
df['sqft_per_bed'] = round(df['floorplate_sqft'] / df['number_of_beds'])
df = df[['client', 'client_code', 'floor_type', 'floor_code', 'number_of_beds', 'patient_room_width', 'patient_room_length', 'patient_room_sqft', 'ada_patient_room_width', 'ada_patient_room_length', 'ada_patient_room_sqft', 'floorplate_x', 'floorplate_y', 'aspect_ratio', 'floorplate_sqft', 'sqft_per_bed']]
df

Unnamed: 0_level_0,client,client_code,floor_type,floor_code,number_of_beds,patient_room_width,patient_room_length,patient_room_sqft,ada_patient_room_width,ada_patient_room_length,ada_patient_room_sqft,floorplate_x,floorplate_y,aspect_ratio,floorplate_sqft,sqft_per_bed
hospital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
mc_auburn,mc,1,ms,1,24,14.0,21.5,301.0,16.0,21.5,344.0,177.5,90.6667,2.0,15405.5,642.0
mc_milgard,mc,1,ms,1,24,15.25,22.5,343.125,15.25,22.5,343.125,217.5,102.0,2.1,22991.0,958.0
swedish_nw_tower,swedish,2,icu,4,48,16.0,24.0,384.0,19.0,24.0,456.0,451.875,100.5,4.5,43852.0,914.0
swedish_nw_tower,swedish,2,ms,1,48,16.0,24.0,384.0,19.0,24.0,456.0,451.875,100.5,4.5,43852.0,914.0
swedish_issaquah,swedish,2,ms,1,36,15.0,21.0,315.0,19.75,21.0,414.75,332.875,80.6667,4.1,26988.86,750.0
swedish_issaquah,swedish,2,icu,4,36,15.0,21.0,315.0,15.0,21.0,315.0,304.1458,80.0,3.8,24540.54,682.0
sc_river,sc,3,pe,5,20,15.5,25.5,395.25,16.75,25.5,427.125,187.0,86.3334,2.2,18514.0,926.0
sc_forrest_c,sc,3,icu,4,28,16.0,24.0,384.0,18.5,25.0,462.5,368.0,83.75,4.4,31780.0,1135.0
sc_hospital_a,sc,3,pe,5,32,15.5,24.0,372.0,17.6,26.4,464.64,368.5,84.3334,4.4,34127.0,1066.0
st_micheal_2,cf,5,icu,4,18,13.75,20.8334,286.45925,14.75,20.8334,307.29265,175.0,72.3334,2.4,12502.0,695.0


now we'll get out just the features and the target we hope to use in our solution. I'm also taking the logs of the number of beds and the sqft per bed to tigthen up the solution

In [95]:
df_reduced = df.iloc[:,[1,3,4,13,-1]].copy()
df_reduced = df_reduced.reset_index(drop=True)
df_reduced['sqft_per_bed'] = np.log(df_reduced['sqft_per_bed'])
df_reduced['number_of_beds'] = np.log(df_reduced['number_of_beds'])
df_reduced

Unnamed: 0,client_code,floor_code,number_of_beds,aspect_ratio,sqft_per_bed
0,1,1,3.178054,2.0,6.464588
1,1,1,3.178054,2.1,6.864848
2,2,4,3.871201,4.5,6.817831
3,2,1,3.871201,4.5,6.817831
4,2,1,3.583519,4.1,6.620073
5,2,4,3.583519,3.8,6.52503
6,3,5,2.995732,2.2,6.830874
7,3,4,3.332205,4.4,7.034388
8,3,5,3.465736,4.4,6.971669
9,5,4,2.890372,2.4,6.543912


now we'll separate out the features into a matrix, A

In [96]:
A = df_reduced.iloc[:,0:4].to_numpy()
print(A)

[[1.         1.         3.17805383 2.        ]
 [1.         1.         3.17805383 2.1       ]
 [2.         4.         3.87120101 4.5       ]
 [2.         1.         3.87120101 4.5       ]
 [2.         1.         3.58351894 4.1       ]
 [2.         4.         3.58351894 3.8       ]
 [3.         5.         2.99573227 2.2       ]
 [3.         4.         3.33220451 4.4       ]
 [3.         5.         3.4657359  4.4       ]
 [5.         4.         2.89037176 2.4       ]
 [5.         4.         3.17805383 5.        ]
 [5.         1.         3.4657359  5.        ]]


here's vector b

In [97]:
b = df_reduced.iloc[:,4].to_numpy()
b

array([6.4645883 , 6.86484778, 6.81783057, 6.81783057, 6.62007321,
       6.52502966, 6.83087423, 7.03438793, 6.9716686 , 6.54391185,
       7.09506438, 6.80793494])

and here's x, as well as x plugged back in to each row of A. 

also, exp is taken of of sqft_per_bed, number_of_beds, and the solution as the antidote to the log we took previously

In [98]:
x = la.lstsq(A,b, rcond=None)[0]
# x = list(la.inv(A.T @ A) @ A.T @ b)   # this is the real math. inverse of identity dotted with transpose dotted with b. it's maybe more clear to say it's a linear least squares solution for an over-determined matrix
solution = [x @ A[i] for i in range(len(A))]
print(f'here\'s x: {x}')
df_solved = df_reduced.copy()
df_solved['sqft_per_bed'] = np.exp(df_solved['sqft_per_bed'])
df_solved['number_of_beds'] = np.exp(df_solved['number_of_beds'])
df_solved['solution'] = np.exp(solution)
df_solved['delta'] = round((df_solved['solution'] - df_solved['sqft_per_bed']) / df_solved['sqft_per_bed'], 2)
df_solved

here's x: [ 0.31018592  0.06860256  2.10405549 -0.3859983 ]


Unnamed: 0,client_code,floor_code,number_of_beds,aspect_ratio,sqft_per_bed,solution,delta
0,1,1,24.0,2.0,642.0,541.094255,-0.16
1,1,1,24.0,2.1,958.0,520.606072,-0.46
2,2,4,48.0,4.5,914.0,1484.764497,0.62
3,2,1,48.0,4.5,914.0,1208.582889,0.32
4,2,1,36.0,4.1,750.0,769.933434,0.03
5,2,4,36.0,3.8,682.0,1062.00214,0.56
6,3,5,20.0,2.2,926.0,835.108128,-0.1
7,3,4,28.0,4.4,1135.0,677.032077,-0.4
8,3,5,32.0,4.4,1066.0,960.331625,-0.1
9,5,4,18.0,2.4,695.0,1075.398385,0.55


the next few steps are repeating the original steps, but for a project that wasn't in the original set. it's a test

In [99]:
df2 = pd.DataFrame([['mc_good_sam', 'mc', 'ms', 40, 15.6458, 27.4375, 17.83334, 27.4375, 365, 109.4375, 35586]
                    ,['mc_good_sam', 'mc', 'icu', 40, 15.6458, 27.4375, 17.83334, 27.4375, 365, 109.4375, 35586]]
                  ,columns=['hospital', 'client', 'floor_type', 'number_of_beds', 'patient_room_width', 'patient_room_length', 'ada_patient_room_width', 'ada_patient_room_length', 'floorplate_x', 'floorplate_y', 'floorplate_sqft'])
df2 = df2.set_index('hospital')
df2['patient_room_sqft'] = df2['patient_room_length'] * df2['patient_room_width']
df2['ada_patient_room_sqft'] = df2['ada_patient_room_length'] * df2['ada_patient_room_width']
df2 = df2[['client', 'floor_type', 'number_of_beds', 'patient_room_width', 'patient_room_length', 'patient_room_sqft', 'ada_patient_room_width', 'ada_patient_room_length', 'ada_patient_room_sqft', 'floorplate_x', 'floorplate_y', 'floorplate_sqft']]
floor_types, floor_codes = [(df2['floor_type'] == 'ms'), (df2['floor_type'] == 'ldrp'), (df2['floor_type'] == 'nicu'), (df2['floor_type'] == 'icu'), (df2['floor_type'] == 'pe'), (df2['floor_type'] == 'uk'), (df2['floor_type'] == 'ccu')], [1,2,3,4,5,6,7]
df2['floor_code'] = np.select(floor_types, floor_codes)
clients, client_codes = [(df2['client'] == 'mc'), (df2['client'] == 'swedish'), (df2['client'] == 'sc'), (df2['client'] == 'sc'), (df2['client'] == 'cf')], [1,2,3,4,5]
df2['client_code'] = np.select(clients, client_codes)
df2['aspect_ratio'] = round(df2['floorplate_x'] / df2['floorplate_y'],1)
df2['sqft_per_bed'] = round(df2['floorplate_sqft'] / df2['number_of_beds'])
df2 = df2[['client', 'client_code', 'floor_type', 'floor_code', 'number_of_beds', 'patient_room_width', 'patient_room_length', 'patient_room_sqft', 'ada_patient_room_width', 'ada_patient_room_length', 'ada_patient_room_sqft', 'floorplate_x', 'floorplate_y', 'aspect_ratio', 'floorplate_sqft', 'sqft_per_bed']]
df2

Unnamed: 0_level_0,client,client_code,floor_type,floor_code,number_of_beds,patient_room_width,patient_room_length,patient_room_sqft,ada_patient_room_width,ada_patient_room_length,ada_patient_room_sqft,floorplate_x,floorplate_y,aspect_ratio,floorplate_sqft,sqft_per_bed
hospital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
mc_good_sam,mc,1,ms,1,40,15.6458,27.4375,429.281637,17.83334,27.4375,489.302266,365,109.4375,3.3,35586,890.0
mc_good_sam,mc,1,icu,4,40,15.6458,27.4375,429.281637,17.83334,27.4375,489.302266,365,109.4375,3.3,35586,890.0


In [100]:
df2_reduced = df2.iloc[:,[1,3,4,13,-1]].copy()
df2_reduced = df2_reduced.reset_index(drop=True)
df2_reduced['sqft_per_bed'] = np.log(df2_reduced['sqft_per_bed'])
df2_reduced['number_of_beds'] = np.log(df2_reduced['number_of_beds'])
df2_reduced

Unnamed: 0,client_code,floor_code,number_of_beds,aspect_ratio,sqft_per_bed
0,1,1,3.688879,3.3,6.791221
1,1,4,3.688879,3.3,6.791221


In [101]:
A2 = df2_reduced.iloc[:,0:4].to_numpy()
print(A2)

[[1.         1.         3.68887945 3.3       ]
 [1.         4.         3.68887945 3.3       ]]


In [102]:
print([x @ A2[i] for i in range(len(A2))])

[6.8666011547627575, 7.072408819910266]


In [103]:
solution2 = [x @ A2[i] for i in range(len(A2))]
df2_solved = df2_reduced.copy()
floor_codes, floor_types = [(df2_solved['floor_code'] == 1),(df2_solved['floor_code'] == 2),(df2_solved['floor_code'] == 3),(df2_solved['floor_code'] == 4),(df2_solved['floor_code'] == 5),(df2_solved['floor_code'] == 6),(df2_solved['floor_code'] == 7)], ['ms', 'ldrp', 'picu', 'icu', 'pe', 'uk', 'ccu']
df2_solved['floor_type'] = np.select(floor_codes, floor_types)
clients, client_codes = [(df2_solved['client_code'] == 1),(df2_solved['client_code'] == 2),(df2_solved['client_code'] == 3),(df2_solved['client_code'] == 4),(df2_solved['client_code'] == 5)], ['mc', 'swedish', 'sc', 'sc', 'cf']
df2_solved['client'] = np.select(clients, client_codes)
df2_solved['number_of_beds'] = np.exp(df2_solved['number_of_beds'])
df2_solved['sqft_per_bed'] = np.exp(df2_solved['sqft_per_bed'])
df2_solved['solution'] = np.exp(solution2)
df2_solved['delta'] = (df2_solved['solution'] - df2_solved['sqft_per_bed']) / df2_solved['sqft_per_bed']
df2_solved = df2_solved[['client', 'floor_type', 'number_of_beds', 'solution', 'delta']]
df2_solved

Unnamed: 0,client,floor_type,number_of_beds,solution,delta
0,mc,ms,40.0,959.681208,0.078293
1,mc,icu,40.0,1178.984578,0.324702


as you can see, these results are not acceptable. what does this mean? what have we learned?

well, it does not mean this problem cannot be solved. 

the main thing we have learned is that this problem can be expressed as a math problem. this is very usefull information

it may be the case that the linear least squares method that I used is not the most suitable approach, and that perhaps a non-linear function would perform better. 

also, more data certainly couldn't hurt.