# Predicting Admission to ICU for confirmed COVID-19 cases

![Coronavirus](https://img.webmd.com/dtmcms/live/webmd/consumer_assets/site_images/article_thumbnails/other/1800x1200_virus_3d_render_red_03_other.jpg?resize=*:350px)

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It was first identified in December 2019 in Wuhan, Hubei, China, and has resulted in an ongoing pandemic. The first confirmed case has been traced back to 17 November 2019 in Hubei. As of 15 July 2020, more than 13.2 million cases have been reported across 188 countries and territories, resulting in more than 577,000 deaths. More than 7.37 million people have recovered.

Common symptoms include fever, cough, fatigue, shortness of breath, and loss of smell and taste. While the majority of cases result in mild symptoms, some progress to acute respiratory distress syndrome (ARDS) possibly precipitated by cytokine storm, multi-organ failure, septic shock, and blood clots. The time from exposure to onset of symptoms is typically around five days, but may range from two to fourteen days.

The virus is primarily spread between people during close contact, most often via small droplets produced by coughing, sneezing, and talking. The droplets usually fall to the ground or onto surfaces rather than travelling through air over long distances. Transmission may also occur through smaller droplets that are able to stay suspended in the air for longer periods of time. Less commonly, people may become infected by touching a contaminated surface and then touching their face. It is most contagious during the first three days after the onset of symptoms, although spread is possible before symptoms appear, and from people who do not show symptoms. The standard method of diagnosis is by real-time reverse transcription polymerase chain reaction (rRT-PCR) from a nasopharyngeal swab. Chest CT imaging may also be helpful for diagnosis in individuals where there is a high suspicion of infection based on symptoms and risk factors; however, guidelines do not recommend using CT imaging for routine screening.

Recommended measures to prevent infection include frequent hand washing, maintaining physical distance from others (especially from those with symptoms), quarantine (especially for those with symptoms), covering coughs, and keeping unwashed hands away from the face. The use of cloth face coverings such as a scarf or a bandana has been recommended by health officials in public settings to minimise the risk of transmissions, with some authorities requiring their use. Health officials also stated that medical-grade face masks, such as N95 masks, should only be used by healthcare workers, first responders, and those who directly care for infected individuals.

There are no vaccines nor specific antiviral treatments for COVID-19. Management involves the treatment of symptoms, supportive care, isolation, and experimental measures. The World Health Organization (WHO) declared the COVID‑19 outbreak a public health emergency of international concern (PHEIC) on 30 January 2020 and a pandemic on 11 March 2020. Local transmission of the disease has occurred in most countries across all six WHO regions.
[source](https://en.wikipedia.org/wiki/Coronavirus_disease_2019 "Wikipedia")

## There is an imminent shortage of ICU compared to the amount of patients infected
![ICU](https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/AWG6I4CLAQI6VCQ73YKZPPTMXQ.jpg)

# Signs and symptoms of COVID-19
Fever is the most common symptom of COVID-19, but is highly variable in severity and presentation, with some older, immunocompromised, or critically ill people not having fever at all. In one study, only 44% of people had fever when they presented to the hospital, while 89% went on to develop fever at some point during their hospitalization.

Other common symptoms include cough, loss of appetite, fatigue, shortness of breath, sputum production, and muscle and joint pains. Symptoms such as nausea, vomiting, and diarrhoea have been observed in varying percentages. Less common symptoms include sneezing, runny nose, sore throat, and skin lesions. Some cases in China initially presented with only chest tightness and palpitations. A decreased sense of smell or disturbances in taste may occur. Loss of smell was a presenting symptom in 30% of confirmed cases in South Korea.

As is common with infections, there is a delay between the moment a person is first infected and the time he or she develops symptoms. This is called the incubation period. The typical incubation period for COVID‑19 is five or six days, but it can range from one to fourteen days with approximately ten percent of cases taking longer.

An early key to the diagnosis is the tempo of the illness. Early symptoms may include a wide variety of symptoms but infrequently involves shortness of breath. Shortness of breath usually develops several days after initial symptoms. Shortness of breath that begins immediately along with fever and cough is more likely to be anxiety than COVID-19. The most critical days of illness tend to be those following the development of shortness of breath. A minority of cases do not develop noticeable symptoms at any point in time. These asymptomatic carriers tend not to get tested, and their role in transmission is not fully known. Preliminary evidence suggested they may contribute to the spread of the disease. In June 2020, a spokeswoman of WHO said that asymptomatic transmission appears to be "rare," but the evidence for the claim was not released. The next day, WHO clarified that they had intended a narrow definition of "asymptomatic" that did not include pre-symptomatic or paucisymptomatic (weak symptoms) transmission and that up to 41% of transmission may be asymptomatic. Transmission without symptoms does occur.
[source](https://en.wikipedia.org/wiki/Coronavirus_disease_2019 "Wikipedia")

# Coronavirus global Heatmap
![Coronavirus heatmap](https://www.en.etemaaddaily.com/pages/health/coronavirusupdates/85global.png)

# Projections for ICU Bed Count in India
One of the reasons why a massive surge in cases is challenging is that it puts immense pressure on the health care system of the concerned hotspot. We are already seeing this in Mumbai. A small percentage of COVID-19 patients need critical care in the ICU. So more the total cases, greater will be the requirement of ICU beds.

With that in mind, let’s take a look at the ICU bed projections for key states and cities.

Our projections indicate that all states, including Maharashtra, will have an adequate number of ICU beds when the peak number of cases hits the states.

Maharashtra, Tamil Nadu, Uttar Pradesh and Rajasthan are projected to have an excess of 5351; 6484; 2648; and 3320 ICU beds when these states hit their peaks, respectively.

Meanwhile, Gujarat, West Bengal, Delhi and Madhya Pradesh are projected to have an excess of 2270; 1869; 795 and 1319 excess ICU beds when they hit their respective peaks.

Chennai is projected to have 504 excess ICU beds at the time of its peak.

Mumbai is, not surprisingly, an exception to the trend, and is projected to fall short by 51 ICU beds when the peak hits the city.

The numbers for Mumbai and Maharashtra could take a turn for the worse looking at the surge in cases this week. Going ahead, we also need to keep a lookout for Uttar Pradesh and Bihar which are seeing a large influx of returning migrants.
[source](https://www.timesnownews.com/times-facts/article/times-fact-india-outbreak-report-what-the-latest-projections-say-about-the-icu-beds-in-your-state/595787 "timesofindia")

![toi](https://imgk.timesnownews.com/story/ICU_beds_new_ISTOClk.jpg?tr=w-600,h-450,fo-auto)

## In this model, I try to predict whether a confirmed COVID-19 patient will need to use the ICU or not. This can help the medical staff predict effectively and can optimize usage of medical resources.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt#Visual representation and EDA
import time

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/covid19/Kaggle_Sirio_Libanes_ICU_Prediction.xlsx


In [2]:
#Reading the data
data = pd.read_excel('/kaggle/input/covid19/Kaggle_Sirio_Libanes_ICU_Prediction.xlsx')
data

Unnamed: 0,PATIENT_VISIT_IDENTIFIER,AGE_ABOVE65,AGE_PERCENTIL,GENDER,DISEASE GROUPING 1,DISEASE GROUPING 2,DISEASE GROUPING 3,DISEASE GROUPING 4,DISEASE GROUPING 5,DISEASE GROUPING 6,...,TEMPERATURE_DIFF,OXYGEN_SATURATION_DIFF,BLOODPRESSURE_DIASTOLIC_DIFF_REL,BLOODPRESSURE_SISTOLIC_DIFF_REL,HEART_RATE_DIFF_REL,RESPIRATORY_RATE_DIFF_REL,TEMPERATURE_DIFF_REL,OXYGEN_SATURATION_DIFF_REL,WINDOW,ICU
0,0,1,60th,0,0.0,0.0,0.0,0.0,1.0,1.0,...,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,0-2,0
1,0,1,60th,0,0.0,0.0,0.0,0.0,1.0,1.0,...,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,2-4,0
2,0,1,60th,0,0.0,0.0,0.0,0.0,1.0,1.0,...,,,,,,,,,4-6,0
3,0,1,60th,0,0.0,0.0,0.0,0.0,1.0,1.0,...,-1.000000,-1.000000,,,,,-1.000000,-1.000000,6-12,0
4,0,1,60th,0,0.0,0.0,0.0,0.0,1.0,1.0,...,-0.238095,-0.818182,-0.389967,0.407558,-0.230462,0.096774,-0.242282,-0.814433,ABOVE_12,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1920,384,0,50th,1,0.0,0.0,0.0,0.0,0.0,0.0,...,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,0-2,0
1921,384,0,50th,1,0.0,0.0,0.0,0.0,0.0,0.0,...,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,2-4,0
1922,384,0,50th,1,0.0,0.0,0.0,0.0,0.0,0.0,...,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,4-6,0
1923,384,0,50th,1,0.0,0.0,0.0,0.0,0.0,0.0,...,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,-1.000000,6-12,0


In [3]:
data.shape

(1925, 231)

In [4]:
data.columns

Index(['PATIENT_VISIT_IDENTIFIER', 'AGE_ABOVE65', 'AGE_PERCENTIL', 'GENDER',
       'DISEASE GROUPING 1', 'DISEASE GROUPING 2', 'DISEASE GROUPING 3',
       'DISEASE GROUPING 4', 'DISEASE GROUPING 5', 'DISEASE GROUPING 6',
       ...
       'TEMPERATURE_DIFF', 'OXYGEN_SATURATION_DIFF',
       'BLOODPRESSURE_DIASTOLIC_DIFF_REL', 'BLOODPRESSURE_SISTOLIC_DIFF_REL',
       'HEART_RATE_DIFF_REL', 'RESPIRATORY_RATE_DIFF_REL',
       'TEMPERATURE_DIFF_REL', 'OXYGEN_SATURATION_DIFF_REL', 'WINDOW', 'ICU'],
      dtype='object', length=231)

### Data Preprocessing

In [5]:
for i in data.columns:
    if type(data[i].iloc[0]) == str:
        factor = pd.factorize(data[i])
        data[i] = factor[0]
        definitions = factor[1]

### Splitting the data into Independent and Dependent Vectors and splitting it for training

In [6]:
from sklearn.model_selection import train_test_split
#Independent Vector
X = data[list(data.columns)[:-1]].values
#Dependent Vector
y = data[data.columns[-1]].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, stratify=y)

### Scaling the data

In [7]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(np.nan_to_num(X_train))
X_test = scaler.transform(np.nan_to_num(X_test))

## Applying Random Forest Classifier

In [8]:
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import precision_recall_fscore_support
model = RandomForestClassifier(n_jobs=64,n_estimators=200,criterion='entropy',oob_score=True)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
acc =  metrics.accuracy_score(y_test, y_pred)

In [9]:
from sklearn.metrics import roc_curve, auc
print('accuracy ' +str(acc))
#print('average auc ' +str(roc_auc["average"]))
prfs = precision_recall_fscore_support(y_test, y_pred, labels = [0,1])
fpr, tpr, _ = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)
print('precision:',prfs[0] )
print('recall', prfs[1])
print('fscore', prfs[2])

accuracy 0.8753246753246753
precision: [0.87261146 0.88732394]
recall [0.97163121 0.61165049]
fscore [0.91946309 0.72413793]


In [10]:
from sklearn.model_selection import StratifiedKFold
from xgboost  import XGBClassifier
from sklearn.metrics import roc_auc_score

In [11]:
params = {
        'min_child_weight': [1, 5, 10],
        'gamma': [0.5, 1, 1.5, 2, 5],
        'subsample': [0.6, 0.8, 1.0],
        'colsample_bytree': [0.6, 0.8, 1.0],
        'max_depth': [3, 4, 5]
        }

## Using RandomizedSearchCV 

In [12]:
from sklearn.model_selection import RandomizedSearchCV
folds = 3
param_comb = 5
xgb = XGBClassifier(n_estimators=100)
skf = StratifiedKFold(n_splits=folds, shuffle = True, random_state = 1001)

random_search = RandomizedSearchCV(xgb, param_distributions=params, n_iter=param_comb,
                                   scoring='roc_auc', n_jobs=4, cv=skf.split(X,y), verbose=3, random_state=1001)
random_search.fit(X, y)

Fitting 3 folds for each of 5 candidates, totalling 15 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  15 out of  15 | elapsed:    9.1s finished


RandomizedSearchCV(cv=<generator object _BaseKFold.split at 0x7f7e874b7ed0>,
                   estimator=XGBClassifier(base_score=None, booster=None,
                                           colsample_bylevel=None,
                                           colsample_bynode=None,
                                           colsample_bytree=None, gamma=None,
                                           gpu_id=None, importance_type='gain',
                                           interaction_constraints=None,
                                           learning_rate=None,
                                           max_delta_step=None, max_depth=None,
                                           min_child_weight=None, missing...
                                           random_state=None, reg_alpha=None,
                                           reg_lambda=None,
                                           scale_pos_weight=None,
                                           subsample=None, tr

# RandomizedsearchCV's best parameters

In [13]:
xgb2 = XGBClassifier(base_score=0.5, booster='gbtree',
                                           colsample_bylevel=1,
                                           colsample_bynode=1,
                                           colsample_bytree=1, gamma=0,
                                           gpu_id=-1, importance_type='gain',
                                           interaction_constraints='',
                                           learning_rate=0.300000012,
                                           max_delta_step=0, max_depth=6,
                                           min_child_weight=1, missing=np.nan,
                                           num_parallel_tree=1, random_state=0,
                                           reg_alpha=0, reg_lambda=1,
                                           scale_pos_weight=1, subsample=1,
                                           tree_method='exact',
                                           validate_parameters=1,
                                           verbosity=None)
training_start = time.perf_counter()
xgb2.fit(X_train, y_train)
training_end = time.perf_counter()
prediction_start = time.perf_counter()
preds = xgb2.predict(X_test)
prediction_end = time.perf_counter()
acc_xgb = (preds == y_test).sum().astype(float) / len(preds)*100
xgb_train_time = training_end-training_start
xgb_prediction_time = prediction_end-prediction_start
print("XGBoost's prediction accuracy is: %3.2f" % (acc_xgb))
print("Time consumed for training: %4.3f" % (xgb_train_time))
print("Time consumed for prediction: %6.5f seconds" % (xgb_prediction_time))

XGBoost's prediction accuracy is: 90.39
Time consumed for training: 0.829
Time consumed for prediction: 0.00588 seconds


## Using XGBoost gave me a prediction accuracy of 90.39%.
## Feel free to fork my kernel and modify it for better results