**Demand in ICU beds to treat non-COVID patients with chronic conditions!**



**Task 4: Unintended consequences from health resource re-allocation toward COVID-19**

In [None]:
from IPython.display import Image
Image(filename='../input/clarity-image/deep_dive.png')



Purpose of this notebook is to find answers to **task 4** which asked: **How has Covid-19 affected non-COVID-related healthcare availability (e.g. for cancer, cardiovascular disease, dialysis, etc. patients)?**
    
**My Goal: is to find how the shortage in hospital resources, especially ICU beds would affect non-COVID patients with chronic health conditions.**

This highlight on SHADAC pulled me and I decided to dive deeper starting from here.

In [None]:
from IPython.display import Image

Image(filename='../input/shadacpic/Screen Shot 2020-06-26 at 8.30.14 PM.png')


**And what is SHADAC, you ask?**

It is a multidisciplinary health policy research center with a focus on state health policy. SHADAC is supported by 
the Robert Wood Johnson Foundation and is affiliated with the Health Policy and Management Division of the School of 
Public Health at the University of Minnesota.

**In order to solve I needed answers and data on one or more of the following:**
* What is the percentage of patients with one or more chronic health conditions per state?
* What is the demand and supply of ICU Beds per state?
* Would it be possible to make a decent connection between 1 and 2?

---- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 
    
After checking the datasets on Namara API and on Kaggle, I went after some of the sources and it turned out that 
SHADAC (https://www.shadac.org/) did a state level health compare and had data on percentage of chronic disease for years: 2005 to 2018.
    
**About the data:** The dataset: **Prevalence of Diabetes, CVD and Asthma in Adults**

represents percent of adults who report having one or more of the following chronic conditions: diabetes, cardiovascular disease, heart attack, stroke and asthma for the civilian non-institutionalized population 18 years and over. The percentage of the chronic diseases were based on USCensus 2010 data. Further it assures that the samples 
were well chosen to reflect the population. Also the data was recently updated. 
FYI: I do not think the data is available on Namara.

---- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 

**Data source:** http://statehealthcompare.shadac.org/Data
(Cool thing is that there are a plenty of options to choose data on different health conditions and also filter using 
 different attributes)

---- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----

I decided to use the data on percentage of patients with chronic conditions in every state starting from 2005 to 2018 to forecast the future percentage values of the patients with chronic conditions especially for the year 2020.

My Hypothesis: is that if I can forecast the values for the year 2020 and come up with an approx. percentage of those patients that might need to access to the ICU Beds per state, then it would make sense to compare that against the number of available ICU Beds per that state.
    

Luckily the data was possible because of Harvard Global Health Initiative (HGHI) on hospital capacity by state.

So used the following dataset from HGHI: hospital-capacity-by-state-40 population- 
        

Enough talking... I hear you!

In [None]:
# Let's import all the necessary packages
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.api import SimpleExpSmoothing
from statsmodels.tsa.seasonal import seasonal_decompose
import plotly.graph_objects as go
from plotly.offline import iplot
import numpy as np
%matplotlib inline

In [None]:

df=pd.read_csv("../input/chronicdiseasepercentage/chronic_disease_2010 to 2018_transpose.csv", encoding='utf8',engine='python')
df.head()

Let's set 'Year' as index to plot a few states.



In [None]:
df= df.set_index('Year')

In [None]:
#Plot a few states to get a feel for the data
fig, axes = plt.subplots()
fig.suptitle('Chronic Disease Distribution', fontsize=14)
plt.xlabel('Year', fontsize=14)
plt.ylabel('Percent got infected', fontsize=14)
df['California'].plot(ax=axes, label='California', legend=True)
df['New York'].plot(ax=axes, label='New York',legend=True)
df['Alabama'].plot(ax=axes, label='Alabama',legend=True)
df['Minnesota'].plot(ax=axes, label='Minnesota',legend=True)

**Observation of the above plot:**

A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes we 
will refer to a trend as "changing direction", when it might go from an increasing trend to a decreasing trend. 
On a high level it looks like the patterns exhibit trend and seasonality changes. 

But I want bring to light what **CDC (Centers For Disease Control and Prevention)** suggests about the trend:
    
It is always difficult to discern long-term trends by comparing one year to the next. Such comparisons will be 
especially difficult to make for 2010 and 2011, given the change in the methods that were used in surveying and later 
updating. Changes in the 2011 data are likely to show indications of somewhat higher occurrences of risk behaviors common to younger adults and to certain racial or ethnic minority groups.

**Based on what CDC suggests, it is clear that a trend does not exist.**

What about the validity of the data? The sample is representative of the population of each state.



In [None]:
# Let's decompose data points for California to check whether trend and seasonality exist

result_add = seasonal_decompose(df['California'].values, model='additive', period=1)
result_add.plot();


**Observation:** It is clear that there is no trend or seasonality demonstrated.

In [None]:
# Let's forecast the future values of chronic disease for New York as per my hypothesis.

data1 = df['New York'].values.tolist()
index1= pd.date_range(start='2005', end='2018', freq='AS')#here AS stands for 'year start frequency'. check pd.date_range for more info
chronicdisease1 = pd.Series(data1, index1)

Lets plot **SimpleExpSmoothing** to demonstrate the chronic disease distribution in New York from 2005 to 2018.

Here we run three variants of simple exponential smoothing:
* In fit1, we do not use the auto optimization but instead choose to explicitly provide the model with 𝛼=0.2 parameter  
* In fit2, as above, we choose the 𝛼=0.6
* In fit3, we allow statsmodels to automatically find an optimized 𝛼 value for us. This is the recommended approach.
 

In [None]:

plt.xlabel('Year', fontsize=14)
plt.ylabel('Percent got infected', fontsize=14)
fit1 = SimpleExpSmoothing(chronicdisease1).fit(smoothing_level=0.2,optimized=False)#setting the smoothing_level manually
fcast1 = fit1.forecast(2).rename(r'$\alpha=0.2$')
fit2 = SimpleExpSmoothing(chronicdisease1).fit(smoothing_level=0.6,optimized=False)#setting the smoothing_level manually
fcast2 = fit2.forecast(2).rename(r'$\alpha=0.6$')
fit3 = SimpleExpSmoothing(chronicdisease1).fit()
fcast3 = fit3.forecast(2).rename(r'$\alpha=%s$'%fit3.model.params['smoothing_level'])
ax = chronicdisease1.plot(marker='o', color='black', figsize=(12,8))
fcast1.plot(marker='o', ax=ax, color='blue', legend=True)
fit1.fittedvalues.plot(marker='o', ax=ax, color='blue')
fcast2.plot(marker='o', ax=ax, color='red', legend=True)

fit2.fittedvalues.plot(marker='o', ax=ax, color='red')
fcast3.plot(marker='o', ax=ax, color='green', legend=True)
fit3.fittedvalues.plot(marker='o', ax=ax, color='green')
plt.title('Chronic disease distribution in New York from 2005 to 2018', fontsize=14);



The legend in the above plot is inaccurate. 

**Legend:**
* Red line represents smoothing_level=0.6
* Blue line represents smoothing_level=0.2
* Black line represents the actual data for New York
* Green line represents smoothing_level that's automatically optimized by statsmodels


**How to interpret the result?**

Simple exponential smoothing has a "flat" forecast function. That is, all forecasts take the same value, equal to the 
last level component.That is, all forecasts take the same value, equal to the last level component. Remember that these forecasts will only be suitable if the time series has no trend or seasonal component.

Interpreting the point forecasts without accounting for large uncertainty can be very misleading. So this is just an estimate.

In [None]:
# To get the names of the columns, i.e., location names to use it in the function: forecast_values in 
# the cell below.

def to_get_column_values_list(df):
    column_val=[]
    for element in df.columns:
        column_val.append(df[element].values.tolist())       
    return column_val

In [None]:
# Saving and pass it through the function: forecast_values 

column_val=to_get_column_values_list(df)

In [None]:
# Let's write a reusable code to forecast the percentage of chronic patients in the year 2020 for all 50 states 

def forecast_values(df, arr, c):
    
    df_result=[]
    for item in arr:
        data=item
        index= pd.date_range(start='2005', end='2018', freq='AS')#start and end year remain the same for all the locations
        chronicdisease = pd.Series(data, index)
        fit_result = SimpleExpSmoothing(chronicdisease).fit()#Fitting simple exponential smoothing to forecast
        forecast_result=fit_result.forecast(2)#forecasting only 2 years from 2018 to 2020
        df_result.append(forecast_result.values[1])#accessing the forecasted values for the year 2020
        
    df["2020"]=df_result#saving the result as a column in the dataframe

In [None]:
#Let's transpose the dataset as it is easy to add the forecasted 2020 values as a column. I have uploaded the file and so reading it here from it.

df_transpose=pd.read_csv("../input/chronicdiseasepercentage1/df_transpose.csv", encoding='utf8',engine='python')


In [None]:
# Calling the above function

forecast_values(df_transpose, column_val, "Location")

In [None]:
#Let's now check the forecasted values for the year 2020, Voila!

df_transpose.head()

In [None]:
# Reading in the harvard data on hospital_capacity assuming that 40% of population per state had contracted COVID

df_harvard_health=pd.read_csv("../input/harvardhealth/hospital-capacity-by-state-40-population-contracted.csv", encoding='utf8',engine='python')


In [None]:
#Checking the top 5 records

df_harvard_health.head()


**Harvard Global Health Initiative (HGHI)**

From the source: Harvard Global Health Initiative (HGHI) and its updated in March 2020
    
The data can be downloaded from Namara using the search query: Hospital Capacity by State: 40% population contracted
    
It was hard for me to find what each field meant. You can save time and check this link:https://globalepidemics.org/our-data-guide/


**A little about this data:** The HGHI takes look at hospital capacity in communities across the United States under three major perspectives considering: 

* 20% of population in a state has contracted the disease
* 40% of population in a state has contracted the disease
* 60% of population in a state has contracted the disease

For the purpose of this notebook, let us consider the data where 40% of population in a state has contracted the 
disease. The data is based on a 50% reduction in occupancy. Also the harvard data has considered the data within 
HRR – Hospital Referral Region (HRR), specifying a market within which people generally go to the same hospitals.

---- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 

**Why these three perspectives?**

Marc Lipsitch, head of the Harvard T.H. Chan School of Public Health’s Center for Communicable Disease Dynamics, has been running projections to figure out how many adults across the world will be infected before a vaccine hits the market (one won’t be available for at least a year) or **herd immunity** kicks in — when enough people have developed 
immunity to the virus, from having caught it, so that it can’t easily be transmitted any more. He concluded that 
between 20% and 60% of adults worldwide will ultimately get infected. 


**What I ended up using and how?** 

I used the following from the above dataset: HGHI
* state
* adult_population
* available_icu_beds
------------------------------------------------------------------------------------------------------------------------ 

**Reasoning?**

* Remember, we have percentage of patients with chronic disease, per state, for the year 2020 that we forecasted using the reusable code above? It is yearly data.
* I needed some meaningful base to compare the percentage and available beds. 
* So I went with the Harvard Health data available on ICU beds for 12 months.

------------------------------------------------------------------------------------------------------------------------ 

**How?**

I used the following features:

* adult_population: How many people over the age of 18 living within the HRR (HRR: Hospital Referral Region (HRR), specifying a market within which people generally go to the same hospitals)
* state, of course
* available_icu_beds: How many ICU beds are unoccupied on average?

------------------------------------------------------------------------------------------------------------------------ 

**Data Transformation**

* number of chronic patients per state = forecasted percentage of chronic patients * adult_population 
* CDC (The Centers for Disease Control and Prevention) publishes that 4 in 10 Americans have one or more chronic diseases. Based on that estimate, let's do a simple math by considering 1 out of 4 patients with chronic disease need ICU beds. 

     number of chronic patients that need ICU beds = (1/4) * number of chronic patients per state

* gap in ICU bed capacity per state = number of chronic patients that need ICU beds per state - available_icu_beds per state

I have done the above mentioned data transformation and have also uploaded the data (gap_icu_beds.csv) for further use.

In [None]:
#The file after transformation with gap in ICU beds calculated looks like the following:

df_gap_icu=pd.read_csv("../input/gap-in-icu-beds/gap_icu_beds.csv", encoding='utf8',engine='python')

df_gap_icu.head()


In [None]:
# Plotting: Available ICU Beds Vs. Needed ICU Beds for non-COVID patients with one or more chronic health conditions

plt.rcParams["figure.figsize"] = (20,15)
labels = df_gap_icu["location"].values.tolist()
ICU_remain = df_gap_icu["available_icu_beds"].values.tolist()
Patients_need_ICU = df_gap_icu["num_need_ICU_beds"].values.tolist()

x = np.arange(len(labels))  # the label locations
width = 0.65  # the width of the bars

fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, ICU_remain, width=0.4, label='Available ICU Beds', color='g',align='center')
rects2 = ax.bar(x + width/2, Patients_need_ICU, width=0.4, label='Needed ICU Beds', color='r', align='center')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Number of ICU Beds', fontsize=14)
ax.set_title('Gap in hospital resource_ICU Beds',fontsize=14)
ax.set_xticks(x)
ax.set_xticklabels(labels, fontsize=14, rotation = 45, ha="right")
ax.legend(prop={'size':14})
fig.tight_layout()



****Let us find out the states that need more than or equal to 6000 ICU beds in twelve months.

In [None]:


df_gap_icu.loc[df_gap_icu["num_need_ICU_beds"]>=6000, "location"]
                          
                          

****US States that need greater than or equal to 2000 ICU Beds in next six months

In [None]:

states=['California', 'Florida', 'New York', 'Pennsylvania', 'Texas']
ICU_available = [3381, 2044, 1670, 1427, 2573]#used this command to get values here: df_gap_icu.loc[df_gap_icu['location']=='Texas', ["num_need_ICU_beds"]]
ICU_need = [15841, 10355, 9127, 6411, 11370]

fig = go.Figure(data=[go.Bar(name='Have ICU Beds', x=states, y=ICU_available),
    go.Bar(name='Need ICU Beds', x=states, y=ICU_need)
])

fig.update_layout(barmode='group', title="Supply Vs. Demand in ICU Beds for twelve months", yaxis_title="Number of ICU Beds")
fig.show()

**More ICU Beds needed in the next 12 months**

In [None]:

data = [dict(type = 'choropleth',
            colorscale = 'Reds',
            locations=df_gap_icu['state'], # spatial coordinates
            z = df_gap_icu['gap_in_ICU_beds'], # data to be color-coded
            locationmode = 'USA-states', # set of locations that match entries in locations
            colorbar = {'title':"How many more ICU Beds are needed?"},
           )]

layout = dict(title = 'Gap in ICU Beds for a 12 month period',
              geo = dict(scope='usa', showlakes = True)) # limite map scope to USA)

fig = dict( data=data, layout=layout )

url = iplot( fig, filename='ICU-Beds-cloropleth-map')

**Observation of the above plot:**

Clearly the states: **California, Texas, Florida,** and **New York** have more need for ICU beds to treat non-COVID chronic patients in the coming months.

**Conclusion:**

In this notebook, we took a peek at how hospital resources, especially ICU beds are available to treat non-COVID patients with chronic conditions. And it looks like many states will face a surge in ICU beds demand to treat their chronic patients in the coming months.