# Predicting COVID-19 ICU Admission Using Neural Network

## License

The original dataset is under **Attribution-NonCommercial 4.0 (International CC BY-NC 4.0)** license.

Dataset is free to **share** and **adapt** under the following terms:

* credit to the original article is given and any changes are indicated (nothing has been changed as of 30/11/2021)
* material is not used for commercial purposes 

Credit:

* original material is published on [Kaggle](https://www.kaggle.com/) and accessible [here](https://www.kaggle.com/S%C3%ADrio-Libanes/covid19).

## Intro

This repository contains source code and report for a seminar paper in the context of the course *Machine Learning* in the winter semester 2021/2021 at Faculty of Computer and Information science, University of Ljubljana.

The dataset contains anonymized data from Hospital Sírio-Libanês, São Paulo and Brasilia. 

### Context (*copied from the above-mentioned Kaggle article*)
COVID-19 pandemic impacted the whole world, overwhelming healthcare systems - unprepared for such intense and lengthy request for ICU beds, professionals, personal protection equipment and healthcare resources.
Brazil recorded first COVID-19 case on February 26 and reached community transmission on March 20.

## Task

Predict admission to the ICU of confirmed COVID-19 cases.
Based on the data available, is it feasible to predict which patients will need intensive care unit support?
The aim is to provide tertiary and quarternary hospitals with the most accurate answer, so ICU resources can be arranged or patient transfer can be scheduled (*copied from Kaggle article*).

## Dataset

Data has been cleaned and scaled by column according to Min Max Scaler. In total, there are 54 features (expanded when pertinent to the mean, median, max, min, diff and relative diff). 

### Available Data

Features in the dataset can be grouped in four groups.



| Group | Amount of features |
| ----- | :------------------: |
| Demographics | 3 |
| Grouped diseases | 9 |
| Blood results | 36 |
| Vital signs | 6 |
| **Total**| **54** |



### Window Concept

Data for each patient has been grouped in five windows, each containing diagnostic results from the respective time window.



| Window      | Description |
| ----------- | ----------- |
| 0-2         | From 0 to 2 hours of the admission |
| 2-4         | From 2 to 4 hours of the admission |
| 4-6         | From 4 to 6 hours of the admission |
| 6-12        | From 6 to 12 hours of the admission |
| Above 12    | Above 12 hours from admission |



Kaggle article warns not to use data from the window where the target variable is 1. This means we need to manipulate our data a little. For example let's take a look at the following time tables:



| Window      | Patient admitted to ICU | Data can be used for modelling | Target variable |
| ----------- | :-----------: | :-----------: | :-----------: |
| 0-2         | False | True | 1 |
| 2-4         | False | True | 1 |
| 4-6         | False | True | 1 |
| 6-12        | True | False |  |
| Above 12    | True | False |  |



Patient is admitted in the fourth time window (6-12 from initial non-ICU admission). This means we can use data from the first three time windows with target variable being 1 (patient being admitted to the ICU ward).



| Window      | Patient admitted to ICU | Data can be used for modelling | Target variable |
| ----------- | :-----------: | :-----------: | :-----------: |
| 0-2         | False | True | 0 |
| 2-4         | False | True | 0 |
| 4-6         | False | True | 0 |
| 6-12        | False | True | 0 |
| Above 12    | False | True | 0 |



Patient is never admitted to the ICU, we can therefore use all time windows with target variable 0.

### Null Values

If we take a look at the following snippet from the original Kaggle article:

```
It is reasonable to assume that a patient who does not have a measurement recorded in a time window is clinically stable, potentially presenting vital signs and blood labs similar to neighboring windows. Therefore, one may fill the missing values using the next or previous entry. Attention to multicollinearity and zero variance issues in this data when choosing your algorithm.
```

We will be filling missing values from neighbouring cells, as specified in the snippet above.

## Import

Before you begin to run your code, you need to load all required modules. Simply execute the code block below. This block also enables Jupyter's auto-reloading feature, so you dont need to re-import modules whenever you change them.

In [178]:
# In order to import from the python file without hassle, we add the current
# directory to the python path
import sys; sys.path.append(".")

# Auto-reload
%load_ext autoreload
%autoreload 2

# Utilities module
import src.utilities as util

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Data Preparation

Instructions on how to set up the environment are specified in the [README](https://github.com/JakobSkornik/covid19-admission/blob/main/README.md) file.

The original dataset is provided in a single *xlsx* file. Let us first import the dataset and store it in a single pandas DataFrame.

In [None]:
# Load Excel into DataFrame
dataset = util.load_xlsx("data/Kaggle_Sirio_Libanes_ICU_Prediction.xlsx")

# Print first 10 elements of DataFrame
dataset.head(10)

Lets take a look at the datatypes present in this dataset.

In [None]:
dataset.dtypes.unique()

The next thing we want to do, is to add the target variable to all the rows. If there's at least one positive value in the **ICU** column for a single patient, the target variable is 1.

First we obtain the target variable for every patient.

In [None]:
# Create a df with PATIENT_ID/TARGET columns
patient_target_df = util.get_target_variables(dataset)
patient_target_df.head(10)

Now we append target variable to each row of the **dataset** dataframe.

In [None]:
dataset = util.append_target_variable(dataset, patient_target_df)
dataset.head()

Now we can remove the rows, where dataset contains value 1 in the column **ICU**.

In [None]:
dataset = dataset[dataset.ICU != 1]
dataset.head()

Next, we remove metadata column containing patient ID **PATIENT_VISIT_IDENTIFIER** and column **ICU**, since every row has the same ICU value 0.

In [None]:
dataset = dataset.drop(["PATIENT_VISIT_IDENTIFIER", "ICU"], axis=1)
dataset.head()

We still have to deal with null values. As specified above, we will fill null values with neighbouring values. We can take advantage of the **pd.DataFrame.fillna** method.

In [None]:
dataset_backward_fill = dataset.fillna(method="bfill")
dataset_forward_fill = dataset.fillna(method="ffill")

backward_filled = dataset_backward_fill.isna().sum().all()
forward_filled = dataset_forward_fill.isna().sum().all()

backward_filled, forward_filled

Both methods successfully filled the dataset. We can select either one of those, or save them to compare results between them later on.

In [None]:
dataset = dataset_backward_fill
dataset.head()

There are still whitespace characters in column names, so we replace them with underscores.

In [None]:
dataset.columns = dataset.columns.str.replace(" ", "_")
dataset.head()

We also need to encode the **AGE_PERCENTIL** column. We can see that there are 10 distinct values.

In [None]:
dataset.AGE_PERCENTIL.unique()

We can use pandas.get_dummies method, which will map a single column with n possible values, into n different binary columns. Column representing the original value, will contain value 1.

For example the following column:

| Value |
| :----: |
| a |
| b |
| c |
| a |
| a |

will get mapped into:

| Value_a | Value_b | Value_c |
| :---: | :---: | :---: |
| 1 | 0 | 0 |
| 0 | 1 | 0 |
| 0 | 0 | 1 |
| 1 | 0 | 0 |
| 1 | 0 | 0 |

In [None]:
dataset = util.get_dummies(dataset, cols=["AGE_PERCENTIL"])
dataset.head()

The final non-numeric column is the **WINDOW** column. Using this column, we can create 6 different datasets. There are five windows, so we can use each distinct value as a separate dataset and an additional dataset with all time windows. I will show the example for window 2-4. Keep in mind, when we create a dataset for a window, all previous windows must be included aswell.

After we extract a dataset for a desired window, **WINDOW** column can be dropped.

All datasets will then be created and stored in a new object, with all required metadata using a helper method.

In [None]:
window_24_dataset = dataset[(dataset.WINDOW == "0-2") | (dataset.WINDOW == "2-4")]
window_24_dataset = window_24_dataset.drop("WINDOW", axis=1)
window_24_dataset.head()

Lets make sure that the datatypes of the curated dataset are all numeric.

In [None]:
window_24_dataset.dtypes.unique()

We can now apply this data preparation again, this time for all time windows and for different fill methods separately in a script. The final dictionary is structured as follows:

* **datasets**: *dict*
  * **ffill_datasets**: *dict*
    * **window_0_2**: *pd.Dataframe*
    * **window_2_4**: *pd.DataFrame*
    * **window_4_6**: *pd.DataFrame*
    * **window_6_12**: *pd.DataFrame*
    * **window_all**: *pd.DataFrame*
  * **bfill_datasets**: *dict*
    * . . .

In [151]:
datasets = util.get_datasets()

## Neural Network

The first question as to why one would want to design a neural network by hand, can be answered easily; to understand how such algorithms work at a deep level and because it's fun.

The whole neural network is designed as a package, that we can use in this notebook. We will start with a simple stochastic gradient descent backpropagation algorithm.

The first step is to desgin a simple neural network thats capable of approximating some basic boolean functions.

In [1]:
# In order to import from the python file without hassle, we add the current
# directory to the python path
import sys; sys.path.append(".")

# Auto-reload
%load_ext autoreload
%autoreload 2

from src.neural_network.basic import BasicNeuralNetwork

In [74]:
# Create a neural network
basic5_AND = BasicNeuralNetwork(
    input_size=2,
    output_size=2,
    iterations=200,
    logs=True,
    log_frequency=20,
    alpha=1,
    alpha_decay=0.001,
    layer_size=5
)

# Create dataset
X = [
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
]

# AND target values
y = [0, 0, 0, 1]

# Train on the dataset
basic5_AND.learn(X, y)

# Evaluate model
basic5_AND_confusion_matrix = util.evaluate_custom(X, y, basic5_AND)

0: loss: 0.7362408378181899, accuracy: 0.75, learning_rate: 1.0
20: loss: 0.08632305650808583, accuracy: 1.0, learning_rate: 0.9803921568627451
40: loss: 0.02079537572234387, accuracy: 1.0, learning_rate: 0.9615384615384615
60: loss: 0.010269519290616, accuracy: 1.0, learning_rate: 0.9433962264150942
80: loss: 0.006620037406907475, accuracy: 1.0, learning_rate: 0.9259259259259258
100: loss: 0.0047786532429224884, accuracy: 1.0, learning_rate: 0.9090909090909091
120: loss: 0.003699604081862335, accuracy: 1.0, learning_rate: 0.8928571428571428
140: loss: 0.0030146116628634546, accuracy: 1.0, learning_rate: 0.8771929824561403
160: loss: 0.0025307760151538467, accuracy: 1.0, learning_rate: 0.8620689655172414
180: loss: 0.002182254365437768, accuracy: 1.0, learning_rate: 0.8474576271186441
Finished learning. Accuracy: 1.0.
RESULTS:

    TP: 1,
    TN: 3,
    FP: 0,
    FN: 0
    accuracy: 4 / 4 = 100.0%
    


Neural network seems to work for an AND method. Let's try XOR.

In [75]:
# Create a neural network
basic5_XOR = BasicNeuralNetwork(
    input_size=2,
    output_size=2,
    iterations=200,
    logs=True,
    log_frequency=20,
    alpha=1,
    alpha_decay=0.001,
    layer_size=5
)

# Create dataset
X = [
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
]

# XOR target values
y = [0, 1, 1, 0]

# Train on the dataset
basic5_XOR.learn(X, y)

# Evaluate model
basic5_XOR_confusion_matrix = util.evaluate_custom(X, y, basic5_XOR)

0: loss: 0.6645764462484502, accuracy: 0.75, learning_rate: 1.0
20: loss: 0.17532361130278246, accuracy: 1.0, learning_rate: 0.9803921568627451
40: loss: 0.045186423592068856, accuracy: 1.0, learning_rate: 0.9615384615384615
60: loss: 0.024914716906131618, accuracy: 1.0, learning_rate: 0.9433962264150942
80: loss: 0.017501967144621735, accuracy: 1.0, learning_rate: 0.9259259259259258
100: loss: 0.012952949764513205, accuracy: 1.0, learning_rate: 0.9090909090909091
120: loss: 0.010325487554211317, accuracy: 1.0, learning_rate: 0.8928571428571428
140: loss: 0.008632486974339897, accuracy: 1.0, learning_rate: 0.8771929824561403
160: loss: 0.007429856751153955, accuracy: 1.0, learning_rate: 0.8620689655172414
180: loss: 0.006409485392622069, accuracy: 1.0, learning_rate: 0.8474576271186441
Finished learning. Accuracy: 1.0.
RESULTS:

    TP: 2,
    TN: 2,
    FP: 0,
    FN: 0
    accuracy: 4 / 4 = 100.0%
    


Let's use our neural network on the actual dataset.

In [153]:
dataset = datasets["window_all"].copy()

X = dataset.drop("TARGET", axis=1).to_numpy()

y = dataset[["TARGET"]].to_numpy()
y = y.T[0]

In [81]:
# Create a neural network
basic5_covid = BasicNeuralNetwork(
    input_size=X.shape[1],
    output_size=2,
    iterations=2000,
    logs=True,
    log_frequency=200,
    alpha=1,
    alpha_decay=0.001,
    layer_size=5
)

basic5_covid.learn(X, y)
basic5_covid_confusion_matrix = util.evaluate_custom(X, y, basic5_covid)

0: loss: 7.745457238201585, accuracy: 0.3262411347517731, learning_rate: 1.0
200: loss: 0.6314844496650722, accuracy: 0.6737588652482269, learning_rate: 0.8333333333333334
400: loss: 0.6314844496650722, accuracy: 0.6737588652482269, learning_rate: 0.7142857142857143
600: loss: 0.6314844496650722, accuracy: 0.6737588652482269, learning_rate: 0.625
800: loss: 0.6314844496650722, accuracy: 0.6737588652482269, learning_rate: 0.5555555555555556
1000: loss: 0.6314844496650722, accuracy: 0.6737588652482269, learning_rate: 0.5
1200: loss: 0.6314844496650722, accuracy: 0.6737588652482269, learning_rate: 0.45454545454545453
1400: loss: 0.6314844496650722, accuracy: 0.6737588652482269, learning_rate: 0.41666666666666663
1600: loss: 0.6314844496650722, accuracy: 0.6737588652482269, learning_rate: 0.3846153846153846
1800: loss: 0.6314844496650723, accuracy: 0.6737588652482269, learning_rate: 0.35714285714285715
Finished learning. Accuracy: 0.6737588652482269.
RESULTS:

    TP: 0,
    TN: 950,
    F

At just 5 neurons in the hidden layer, the network simply reduces the problem to the majority class. Let's try increasing the number of neurons and compare results. Keep in mind, that the neural network uses **Stochastic Gradient Descent** optimization (same for previous examples).

In [159]:
# Create a neural network
basic64_covid = BasicNeuralNetwork(
    input_size=X.shape[1],
    output_size=2,
    iterations=20000,
    logs=True,
    log_frequency=2000,
    alpha=1,
    alpha_decay=0.001,
    layer_size=64
)

basic64_covid.learn(X, y)
basic64_covid_confusion_matrix = util.evaluate_custom(X, y, basic64_covid)

0: loss: 2.862640055254513, accuracy: 0.6716312056737589, learning_rate: 1.0
2000: loss: 0.33949805716523584, accuracy: 0.8191489361702128, learning_rate: 0.3333333333333333
4000: loss: 0.2568380210135458, accuracy: 0.8666666666666667, learning_rate: 0.2
6000: loss: 0.23237158410664194, accuracy: 0.8851063829787233, learning_rate: 0.14285714285714285
8000: loss: 0.21459464063490916, accuracy: 0.8971631205673759, learning_rate: 0.1111111111111111
10000: loss: 0.2041087678660211, accuracy: 0.9, learning_rate: 0.09090909090909091
12000: loss: 0.19730943899444436, accuracy: 0.9035460992907801, learning_rate: 0.07692307692307693
14000: loss: 0.19220216077105673, accuracy: 0.9070921985815603, learning_rate: 0.06666666666666667
16000: loss: 0.1884859330194672, accuracy: 0.9106382978723404, learning_rate: 0.058823529411764705
18000: loss: 0.18472467790284847, accuracy: 0.9099290780141844, learning_rate: 0.05263157894736842
Finished learning. Accuracy: 0.9141843971631206.
RESULTS:

    TP: 437,

We achieve accuracy of about 80%, but there are a lot of false negatives which is, due to the nature of our problem, very important. It is better to have higher specificty and catch every admission with a few false admissions, than miss-diagnose patients. Let's try AdaGrad optimizer and increase compare the results.

In [160]:
# Create a neural network
adagrad64_covid = BasicNeuralNetwork(
    input_size=X.shape[1],
    output_size=2,
    iterations=20000,
    logs=True,
    log_frequency=2000,
    alpha=1,
    alpha_decay=0.001,
    layer_size=64,
    optimizer="adagrad"
)

adagrad64_covid.learn(X, y)
adagrad64_covid_confusion_matrix = util.evaluate_custom(X, y, adagrad64_covid)

0: loss: 2.716393144741217, accuracy: 0.6546099290780142, learning_rate: 1.0
2000: loss: 0.3792006875480715, accuracy: 0.8148936170212766, learning_rate: 0.3333333333333333
4000: loss: 0.3164302512050618, accuracy: 0.8560283687943262, learning_rate: 0.2
6000: loss: 0.29087622899170645, accuracy: 0.8702127659574468, learning_rate: 0.14285714285714285
8000: loss: 0.28636079598423053, accuracy: 0.8709219858156029, learning_rate: 0.1111111111111111
10000: loss: 0.2728250405552108, accuracy: 0.8815602836879433, learning_rate: 0.09090909090909091
12000: loss: 0.2807012215189236, accuracy: 0.875177304964539, learning_rate: 0.07692307692307693
14000: loss: 0.26603078069989117, accuracy: 0.8836879432624114, learning_rate: 0.06666666666666667
16000: loss: 0.2594751296869806, accuracy: 0.8858156028368794, learning_rate: 0.058823529411764705
18000: loss: 0.2561534230187364, accuracy: 0.8886524822695036, learning_rate: 0.05263157894736842
Finished learning. Accuracy: 0.8921985815602836.
RESULTS:

 

AdaGrad optimizer reduced the number of false negatives by quite a lot, although it increased the number of false positives. Let's try Adam optimizer as well.

In [161]:
# Create a neural network
adam64_covid = BasicNeuralNetwork(
    input_size=X.shape[1],
    output_size=2,
    iterations=20000,
    logs=True,
    log_frequency=2000,
    alpha=1,
    alpha_decay=0.001,
    layer_size=64,
    optimizer="adam"
)

adam64_covid.learn(X, y)
adam64_covid_confusion_matrix = util.evaluate_custom(X, y, adam64_covid)

0: loss: 10.18578294916905, accuracy: 0.32269503546099293, learning_rate: 1.0
2000: loss: 0.3911959045253613, accuracy: 0.8085106382978723, learning_rate: 0.3333333333333333
4000: loss: 0.3392113341770831, accuracy: 0.8390070921985816, learning_rate: 0.2
6000: loss: 0.3133103012013904, accuracy: 0.850354609929078, learning_rate: 0.14285714285714285
8000: loss: 0.30068975648881446, accuracy: 0.8659574468085106, learning_rate: 0.1111111111111111
10000: loss: 0.28714804843759195, accuracy: 0.8645390070921986, learning_rate: 0.09090909090909091
12000: loss: 0.2771837209642236, accuracy: 0.8765957446808511, learning_rate: 0.07692307692307693
14000: loss: 0.2707767626657338, accuracy: 0.8765957446808511, learning_rate: 0.06666666666666667
16000: loss: 0.2645394363731891, accuracy: 0.8780141843971632, learning_rate: 0.058823529411764705
18000: loss: 0.2626652150300176, accuracy: 0.8829787234042553, learning_rate: 0.05263157894736842
Finished learning. Accuracy: 0.8836879432624114.
RESULTS:

 

In [179]:
util.visualize_custom(adam64_covid_confusion_matrix)

AttributeError: module 'src.utilities' has no attribute 'visualize_custom'