# Classify Expired Patients using Outlier Detection

In our analysis of the MIMIC-III dataset, we observed a significant imbalance in the target variable. Patients who expired within 30 days (indicated by EXPIRE\_FLAG\_30D=1) constituted only 9.4\% of our study population. This imbalance poses challenges for predictive modeling, as standard classifiers may become biased towards the majority class, leading to poor detection of the minority class. To mitigate this issue, we explored the idea of detecting expired patients as outliers within the data distribution. The underlying assumption was that patients who expired within 30 days might exhibit unusual patterns or feature values compared to the surviving patients. By treating them as anomalies, we aimed to identify these patients using unsupervised outlier detection methods.


In [32]:
import pandas as pd
from sklearn.model_selection import train_test_split

from sklearn.ensemble import IsolationForest
from sklearn.metrics import matthews_corrcoef
from sklearn.metrics import classification_report

In [28]:
data=pd.read_csv("all_patients_data_cleaned.csv")
data=data.drop(columns=["SUBJECT_ID","HADM_ID","ICUSTAY_ID"	,"LOS_HOSPITAL","LOS_ICU","HOSPITAL_EXPIRE_FLAG"])

In [20]:
from sklearn.preprocessing import StandardScaler

train_data=data.drop(columns=["EXPIRE_FLAG_30D"])
train_data=StandardScaler(with_std=True).fit_transform(train_data)
train_label=data["EXPIRE_FLAG_30D"]

# Isolation Forest

In [27]:
clf = IsolationForest(random_state=505,max_features=0.5,bootstrap=True,n_estimators=1000).fit(train_data)
y_pred=clf.predict(train_data)
# y_pred=(y_pred<np.percentile(y_pred,0.9))
y_pred=(y_pred==1)
print(matthews_corrcoef(train_label,y_pred))
print(classification_report(train_label,y_pred))

-0.11718182932076321
              precision    recall  f1-score   support

           0       0.75      0.04      0.08     12702
           1       0.09      0.87      0.16      1321

    accuracy                           0.12     14023
   macro avg       0.42      0.46      0.12     14023
weighted avg       0.69      0.12      0.08     14023



# One Class SVM

In [9]:
from sklearn.svm import OneClassSVM

In [22]:
SVM_outlier_clf=OneClassSVM(nu=0.1)
SVM_outlier_clf.fit(train_data)
y_pred=SVM_outlier_clf.predict(train_data)
y_pred=(y_pred==1)
print(matthews_corrcoef(train_label,y_pred))
print(classification_report(train_label,y_pred))

-0.13120446874072692
              precision    recall  f1-score   support

           0       0.79      0.09      0.16     12702
           1       0.08      0.78      0.15      1321

    accuracy                           0.15     14023
   macro avg       0.44      0.43      0.15     14023
weighted avg       0.72      0.15      0.16     14023



Based on the result above, outlier detection techniques fail to classify expired patients as outliers, as we observe a negative matthew correlation coefficient for both models. 