# Prediction with Darts using One ICU Stay

This prediction approach is inspired by the workflow described [in this blogpost](https://medium.com/unit8-machine-learning-publication/training-forecasting-models-on-multiple-time-series-with-darts-dc4be70b1844). For further information, see [documentation of Darts](https://unit8co.github.io/darts/generated_api/darts.html).

<ins>Darts provides two categories of models:</ins>

1. **Deep Learning Forecasting Models**:
 * Available approaches: RNN, TCN, N-BEATS and Transformer
 * "Global": can be trained on multiple series and can forecast future values of any series

2. **Non Neural-Net Forecasting Models**:
 * Available approaches: ARIMA, Exponential Smoothing, FFT, Prophet amd Theta method
 * "Local": can only be trained on single time series and can forecast the future of only this series

Since we want to train our model on multiple series and are already familiar with RNNs, we take an `RNNModel` for our first attempt with Darts. So let's start with the prediction of a heart rate (HR) series using only this series or the corresponding blood pressure (NPBs) series as second series or as covariate.

## Imports

**Note for Windows user:** Either use `pip install u8darts[torch]` to install core and neural networks of Darts or follow [instructions here](https://www.lucasmelin.com/getting-started-with-fbprophet-on-windows-10) to setup fbprophet first before executing `pip install darts`.

In [None]:
from darts import TimeSeries
from darts.dataprocessing.transformers import Scaler, MissingValuesFiller
from darts.metrics import mape
from darts.models import RNNModel

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

## Read and Preprocess Data

In [None]:
# Read cleaned chartevents
chartevents_subset = pd.read_parquet('../../data/chartevents_clean.parquet', engine='pyarrow')

# Extract heart rate series to predict
HR_series = chartevents_subset[(chartevents_subset['ITEMID'] == 220045)
                                       & (chartevents_subset['ICUSTAY_ID'] == 208809)]
HR_series = TimeSeries.from_dataframe(
    df=HR_series,
    time_col='CHARTTIME',
    value_cols=['VALUENUM_CLEAN'],
    freq='H' # can be any offset alias: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
)

# Extract blood pressure series for covariate use
NBPs_series = chartevents_subset[(chartevents_subset['ITEMID'] == 220179)
                                           & (chartevents_subset['ICUSTAY_ID'] == 208809)]
NBPs_series = TimeSeries.from_dataframe(
    df=NBPs_series,
    time_col='CHARTTIME',
    value_cols=['VALUENUM_CLEAN'],
    freq='H'
)

In [None]:
# Plot pure series
sns.set_style('whitegrid')
plt.figure(figsize=(8,5))
HR_series.plot(label='Heart Rate')
NBPs_series.plot(label='Blood Pressure')

# Adjust texts
plt.legend()
plt.title('TimeSeries of Heart Rate and Blood Pressure', fontweight='bold')
plt.xlabel('Time')
plt.ylabel('Value')

plt.show()
#plt.savefig('../../plots/rnn/single_stay/HR_NBPs_as_timeseries.png', dpi=1200)

In [None]:
# Apply filler
# Method must be in ['linear', 'time', 'index', 'values', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh', 'spline', 'polynomial', 'from_derivatives', 'piecewise_polynomial', 'pchip', 'akima', 'cubicspline']
filler = MissingValuesFiller()
HR_filled = filler.transform(HR_series, method='time')
NBPs_filled = filler.transform(NBPs_series, method='time')

# Plot filled series
sns.set_style('whitegrid')
plt.figure(figsize=(8,5))
HR_filled.plot(label='Heart Rate')
NBPs_filled.plot(label='Blood Pressure')

# Adjust texts
plt.legend()
plt.title('Filled TimeSeries of Heart Rate and Blood Pressure', fontweight='bold')
plt.xlabel('Time')
plt.ylabel('Value')

plt.show()
#plt.savefig('../../plots/rnn/single_stay/HR_NBPs_as_timeseries_filled.png', dpi=1200)

In [None]:
# Normalize both value series between 0 and 1
HR_scaler, NBPs_scaler = Scaler(), Scaler()
HR_scaled = HR_scaler.fit_transform(HR_filled)
NBPs_scaled = NBPs_scaler.fit_transform(NBPs_filled)

# Extract train and test data sets (ca. 80/20 division, looked up date manually)
HR_train, HR_test = HR_scaled.split_after(pd.Timestamp('2114-04-06'))
NBPs_train, NBPs_test = NBPs_scaled.split_after(pd.Timestamp('2114-04-06'))

## Create LSTM Models

In [None]:
print(len(HR_train)) # 182
print(len(HR_test))  # 66

print(len(NBPs_train)) # 182
print(len(NBPs_test))  # 65 -> not needed

In [None]:
### Create models
### (input and output lengths: https://unit8co.github.io/darts/examples/02-multi-time-series-and-covariates.html#Training-Process-(behind-the-scenes))

# Create model for training only with heart rate series
rnn_model_single = RNNModel(model='LSTM',
                            input_chunk_length=14, # 182 : 14 = 13 chunks
                            output_chunk_length=11 #  66 : 11 =  6 chunks
                            )

# Create model for training with both series
rnn_model_both = RNNModel(model='LSTM',
                          input_chunk_length=14,
                          output_chunk_length=11
                          )

# Create model for training with both series (but blood pressure series as covariate)
rnn_model_cov = RNNModel(model='LSTM',
                          input_chunk_length=14,
                          output_chunk_length=66 # otherwise, we do not predict whole test data
                          )

## Train Models and Predict Series

In [None]:
### Train models
### (note: can built train data with Sequence’s of TimeSeries + use fit_from_dataset(), if dataset do not fit in memory)

# Train only with heart rate series
rnn_model_single.fit(
    series=HR_train)

# Train with heart rate and blood pressure series
rnn_model_both.fit(
    series=[HR_train, NBPs_train])

# Train with heart rate series and blood pressure series as covariate
rnn_model_cov.fit(
    series=HR_train,
    covariates=NBPs_train)

### Predict heart rate series
### (note: forecast horizon "n" can be bigger than output_chunk_length if no covariates are used)

HR_predicted_single = rnn_model_single.predict(
    n=len(HR_test), # predict 66 values
    series=HR_train) # specifies what should be predicted (want to know what comes after HR_train)

HR_predicted_both = rnn_model_both.predict(
    n=len(HR_test),
    series=HR_train)

HR_predicted_cov = rnn_model_cov.predict(
    n=len(HR_test),
    series=HR_train,
    covariates=NBPs_train)

## Check Accuracy of Prediction with MAPE

In [None]:
# Look into predicted time series
print(HR_predicted_single)

In [None]:
### Calculate mean absolute percentage errors: MAPE < 20% is good

HR_mape_both = mape(HR_test, HR_predicted_both)
print(HR_mape_both)

HR_mape_cov = mape(HR_test, HR_predicted_cov)
print(HR_mape_cov) # better MAPE if less values are predicted

HR_mape_single = mape(HR_test, HR_predicted_single)
print(HR_mape_single)

In [None]:
# Plot prediction using only heart rate series
sns.set_style('whitegrid')
plt.figure(figsize=(8,5))
HR_scaled.plot(label='Heart Rate - actual')
HR_predicted_single.plot(label='Heart Rate - predicted')

# Adjust texts
plt.legend()
plt.suptitle('Prediction of Heart Rate (LSTM Model and HR Series Only)', fontweight='bold')
plt.title(f'MAPE = {round(HR_mape_single, 2)}%')
plt.xlabel('Time')
plt.ylabel('Scaled Value')

plt.show()
#plt.savefig('../../plots/rnn/single_stay/HR_prediction_LSTM_single.png', dpi=1200)

In [None]:
# Plot prediction using both series
sns.set_style('whitegrid')
plt.figure(figsize=(8,5))
HR_scaled.plot(label='Heart Rate - actual')
NBPs_scaled.plot(label='Blood Pressure - actual')
HR_predicted_both.plot(label='Heart Rate - predicted')

# Adjust texts
plt.legend()
plt.suptitle('Prediction of Heart Rate (LSTM Model and Both Series Used)', fontweight='bold')
plt.title(f'MAPE = {round(HR_mape_both, 2)}%')
plt.xlabel('Time')
plt.ylabel('Scaled Value')

plt.show()
#plt.savefig('../../plots/rnn/single_stay/HR_prediction_LSTM_both.png', dpi=1200)

In [None]:
# Plot prediction with covariate
sns.set_style('whitegrid')
plt.figure(figsize=(8,5))
HR_scaled.plot(label='Heart Rate - actual')
NBPs_scaled.plot(label='Blood Pressure - actual')
HR_predicted_cov.plot(label='Heart Rate - predicted')

# Adjust texts
plt.legend()
plt.suptitle('Prediction of Heart Rate (LSTM Model and NBPs as Covariate)', fontweight='bold')
plt.title(f'MAPE = {round(HR_mape_cov, 2)}%')
plt.xlabel('Time')
plt.ylabel('Scaled Value')

plt.show()
#plt.savefig('../../plots/rnn/single_stay/HR_prediction_LSTM_cov.png', dpi=1200)

In [None]:
# Rescale
HR_predicted_cov_rescaled = HR_scaler.inverse_transform(HR_predicted_cov)
HR_test_rescaled = HR_scaler.inverse_transform(HR_test)
HR_mape_cov_rescaled = mape(HR_test_rescaled, HR_predicted_cov_rescaled)

# Plot rescaled prediction with covariate
sns.set_style('whitegrid')
plt.figure(figsize=(8,5))
HR_filled.plot(label='Heart Rate - actual')
NBPs_filled.plot(label='Blood Pressure - actual')
HR_predicted_cov_rescaled.plot(label='Heart Rate - predicted')

# Adjust texts
plt.legend()
plt.suptitle('Prediction of Heart Rate (LSTM Model and NBPs as Covariate)', fontweight='bold')
plt.title(f'MAPE = {round(HR_mape_cov_rescaled, 2)}%') # TODO: different MAPE than with scaled values?
plt.xlabel('Time')
plt.ylabel('Value')

plt.show()
#plt.savefig('../../plots/rnn/single_stay/HR_prediction_LSTM_cov_rescaled.png', dpi=1200)

## Conclusion

Using only one ICU stay was more of a test approach to learn about Darts. Since the results for taking such a small data set are quite ok, the next step is to use the NBPs series as covariates with a larger subset of CHARTEVENTS/ with more chunks.