# Notebook: ssp.primary.ipynb â€“ Baseline Sepsis Survival Prediction & Feature Analysis
#### ----------------------------------------------------------------------------------------------

### Objective:
This notebook establishes an initial machine learning pipeline to predict in-hospital survival outcome (Alive = 1, Dead = 0) in patients diagnosed with sepsis, using logistic regression as a baseline model on an imbalanced dataset. The primary goals are:

- Investigate data structure and class imbalance in sepsis survival outcomes.
- Train and evaluate a logistic regression model with appropriate handling of class imbalance (e.g., class weights, resampling).
- Interpret feature importance via model coefficients to identify which clinical and demographic features contribute the most information gain toward survival prediction.
- Establish performance benchmarks (AUC-ROC, PR-AUC, F1-score, calibration) for comparison with future advanced models.
- Provide interpretable insights into key drivers of mortality risk (e.g., age, comorbidities, vital signs, lab values).

This serves as the foundational analysis for the Sepsis Survival Prediction (SSP) project, prioritizing transparency, reproducibility, and clinical interpretability before progressing to complex models.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import LabelEncoder
from imblearn.over_sampling import SMOTE
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, RocCurveDisplay
from imblearn.over_sampling import ADASYN
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score