**Data Preprocessing**:

Normalization: Scale continuous features to a similar range to help the CNN model train more efficiently.

Reshape Data: Although CNNs are typically used for spatial data (e.g., images), we can reshape our time-series data into a 2D format that a CNN can process. Each row (patient's record) can be considered a "channel", similar to how images have RGB channels.

**Feature Selection**:
Given the wide range of features, including lab test results and categorical data, we will initially use all available features. Based on the model's performance, we can later perform feature importance analysis to refine our selection.

**Model Architecture**:
Input Layer: Shape the input layer to match the dimensions of our preprocessed data.

Convolutional Layers: These will learn spatial hierarchies from the data. We can experiment with different numbers of filters and kernel sizes.

Pooling Layers: Use pooling to reduce the dimensions of the data progressively.

Flatten Layer: Flatten the output from the convolutional and pooling layers to a 1D vector for the final prediction.

Dense Layers: After flattening, use one or more dense layers for prediction.
Output Layer: Since we are predicting a continuous value (time from admission to ICU out time), the output layer should have a single neuron.

**Compilation and Training**:
Loss Function: Use a regression loss function, such as Mean Squared Error (MSE), since this is a regression problem.

Optimizer: An optimizer like Adam is typically a good starting point.

Metrics: Track metrics like Mean Absolute Error (MAE) to monitor performance during training.

**Evaluation and Refinement**:
Depending on the initial results, refine the model by adjusting its architecture, tuning hyperparameters, or revisiting the feature selection process.

In [None]:
import pandas as pd
import numpy as np

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd
train_data = pd.read_excel('/content/drive/MyDrive/SPH 6004/Assignment 2/static_train_df.xlsx')
test_data = pd.read_excel('/content/drive/MyDrive/SPH 6004/Assignment 2/static_test_df.xlsx')

Mounted at /content/drive


In [None]:
los_icu_stats = train_data['los_icu'].describe()

los_icu_stats

count    14289.000000
mean         4.861710
std          6.045013
min          1.000000
25%          1.800000
50%          2.880000
75%          5.260000
max        101.730000
Name: los_icu, dtype: float64

In [None]:
# Ensure datetime columns are correctly formatted
train_data['hosp_admittime'] = pd.to_datetime(train_data['hosp_admittime'], format='%m/%d/%y %H:%M').astype('int64') // 10**9
train_data['hosp_dischtime'] = pd.to_datetime(train_data['hosp_dischtime'], format='%m/%d/%y %H:%M').astype('int64') // 10**9
train_data['icu_intime'] = pd.to_datetime(train_data['icu_intime'], format='%m/%d/%y %H:%M').astype('int64') // 10**9
train_data['icu_outtime'] = pd.to_datetime(train_data['icu_outtime'], format='%m/%d/%y %H:%M').astype('int64') // 10**9
# train_data['los_icu'] = pd.to_datetime(train_data['los_icu'])

# Calculate target variable in training data
train_data['time_to_icu_discharge'] = train_data['icu_outtime'] - train_data['hosp_admittime']

test_data['hosp_admittime'] = pd.to_datetime(test_data['hosp_admittime'], format='%m/%d/%y %H:%M').astype('int64') // 10**9
test_data['hosp_dischtime'] = pd.to_datetime(test_data['hosp_dischtime'], format='%m/%d/%y %H:%M').astype('int64') // 10**9
test_data['icu_intime'] = pd.to_datetime(test_data['icu_intime'], format='%m/%d/%y %H:%M').astype('int64') // 10**9
test_data['icu_outtime'] = pd.to_datetime(test_data['icu_outtime'], format='%m/%d/%y %H:%M').astype('int64') // 10**9
# test_data['los_icu'] = pd.to_datetime(test_data['los_icu'])

test_data['time_to_icu_discharge'] = test_data['icu_outtime'] - train_data['hosp_admittime']

In [None]:
train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14289 entries, 0 to 14288
Data columns (total 48 columns):
 #   Column                                                           Non-Null Count  Dtype  
---  ------                                                           --------------  -----  
 0   id                                                               14289 non-null  int64  
 1   hosp_admittime                                                   14289 non-null  int64  
 2   hosp_dischtime                                                   14289 non-null  int64  
 3   icu_intime                                                       14289 non-null  int64  
 4   icu_outtime                                                      14289 non-null  int64  
 5   los_icu                                                          14289 non-null  float64
 6   icu_death                                                        14289 non-null  int64  
 7   gender                                  

In [None]:
numerical_features = train_data.select_dtypes(include=['float64', 'int64']).columns.drop(['id', 'hosp_admittime', 'hosp_dischtime', 'icu_intime',  'icu_outtime', 'los_icu', 'time_to_icu_discharge', 'weight_admit', 'height'])
X_train = train_data[numerical_features]
y_train = train_data['time_to_icu_discharge']

X_test = test_data[numerical_features]
y_test = test_data['time_to_icu_discharge']

In [None]:
numerical_features

Index(['icu_death', 'gender', 'admission_age', 'charlson_score',
       'atrial_fibrillation', 'malignant_cancer', 'chf', 'ckd', 'cld', 'copd',
       'diabetes', 'hypertension', 'ihd', 'stroke', 'icu_outcome',
       'race_encode_African', 'race_encode_Asian', 'race_encode_Caucasian',
       'race_encode_Hispanic', 'race_encode_Not Specified',
       'race_encode_South American', 'admission_type_DIRECT EMER.',
       'admission_type_DIRECT OBSERVATION', 'admission_type_ELECTIVE',
       'admission_type_EU OBSERVATION', 'admission_type_EW EMER.',
       'admission_type_OBSERVATION ADMIT',
       'admission_type_SURGICAL SAME DAY ADMISSION', 'admission_type_URGENT',
       'first_careunit_Cardiac Vascular Intensive Care Unit (CVICU)',
       'first_careunit_Coronary Care Unit (CCU)',
       'first_careunit_Medical Intensive Care Unit (MICU)',
       'first_careunit_Medical/Surgical Intensive Care Unit (MICU/SICU)',
       'first_careunit_Neuro Intermediate', 'first_careunit_Neuro Stepdo

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
# Reshape X_train_scaled and X_test_scaled for CNN input:
X_train_reshaped = X_train_scaled.reshape((X_train_scaled.shape[0], X_train_scaled.shape[1], 1))
X_test_reshaped = X_test_scaled.reshape((X_test_scaled.shape[0], X_test_scaled.shape[1], 1))

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv1D, Flatten, MaxPooling1D, Dropout
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam

In [None]:
model = Sequential([
    Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(X_train_scaled.shape[1], 1)),
    MaxPooling1D(pool_size=2),
    Dropout(0.5),
    Flatten(),
    Dense(50, activation='relu', kernel_regularizer=l2(0.001)),  # L2 regularization
    Dropout(0.5),
    Dense(1)
])

model.compile(optimizer=Adam(), loss='mse')

In [None]:
model.fit(X_train_reshaped, y_train, epochs=100, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.src.callbacks.History at 0x7873751019c0>

In [None]:
test_loss = model.evaluate(X_test_reshaped, y_test)
print(f'Test Loss: {test_loss}')

Test Loss: 1.933967011101016e+18


In [None]:
predictions = model.predict(X_test_reshaped)



In [None]:
predictions

array([[nan],
       [nan],
       [nan],
       ...,
       [nan],
       [nan],
       [nan]], dtype=float32)