# Evaluating and Comparing Clinical Prediction Models

Time estimate: **20** minutes


## Objectives
After completing this lab, you will be able to:

- Explain why model evaluation is critical in healthcare.
- Explain true positives, false positives, true negatives, and false negatives.
- Visualize model errors using a confusion matrix heatmap.
- Compute clinical evaluation metrics step by step.
- Compare models using patient-safety–focused reasoning.



## What you will do in this lab

In this lab, you will evaluate clinical prediction models by examining confusion matrices and computing key performance metrics step by step.

You will:

- Train two simple clinical prediction models.
- Examine where each model succeeds and fails.
- Visualize prediction errors in an intuitive format.
- Compute precision, recall, sensitivity, and specificity manually.
- Compare models based on clinical priorities.



## Overview
In healthcare, evaluating a prediction model is not about finding the highest score.
It is about understanding **who is helped**, **who is missed**, and **who might be unnecessarily alarmed**.

This lab deliberately slows down the evaluation process so that you can clearly understand and explain each step to clinical stakeholders.



## About the dataset/environment
You will work with a small, synthetic, de-identified dataset representing a
binary clinical outcome such as readmission or complication risk.


## Setup

In [None]:
# This cell prepares the environment and loads the dataset
# Everything here is synthetic and safe for learning

# Import pandas to work with tabular clinical data
import pandas as pd

# Import tools for model training and evaluation
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix

# Import visualization libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Load a synthetic patient-level dataset
# Each row represents a patient
data = pd.read_csv("https://machine-learning-for-healthcare-applications-f276df.gitlab.io/labs/lab5/patient_features_with_outcome.csv")

# Display the dataset so you know what you are working with
data.head()


## Step 1: Review the modeling dataset

In this step, you will look at the structure and basic statistics of the data.
This is similar to reviewing a cohort definition or registry extract
before doing any analysis.

**Why this matters in healthcare:**  
If you misunderstand the data, every downstream conclusion becomes unreliable.


In [None]:
# Show column names, data types, and missing values
data.info()

# Show summary statistics to understand typical ranges
data.describe()



## Step 2: Separate features and outcome

Here you will separate:
- Patient characteristics (inputs)
- The clinical outcome you want to predict

**Why this matters in healthcare:**  
Including outcome information in inputs creates misleading models.


In [None]:
# Create the feature matrix by removing the outcome column
X = data.drop(columns=["outcome"])

# Create the target variable
y = data["outcome"]

# Display both to confirm separation
X, y



## Step 3: Split data into training and test sets

You will reserve some patients for evaluation only.
The model will never see these patients during training.

**Why this matters in healthcare:**  
This will help you understand how the model will behave on future patients.


In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42
)

# Display the size of each split
X_train.shape, X_test.shape



## Step 4: Train a logistic regression model

You will start with logistic regression, a commonly used and interpretable model.

**Why this matters in healthcare:**  
Interpretable models are easier to explain, audit, and trust.


In [None]:
# Create a logistic regression model
log_model = LogisticRegression()

# Train the model using training data
log_model.fit(X_train, y_train)



## Step 5: Train a decision tree model

Next, you will train a decision tree model that can capture nonlinear patterns.

**Why this matters in healthcare:**  
More flexible models may capture complexity but can behave unpredictably.


In [None]:
# Create a decision tree classifier
tree_model = DecisionTreeClassifier(random_state=42)

# Train the model
tree_model.fit(X_train, y_train)



## Step 6: Generate predictions

Each model now predicts outcomes for the test patients.

**Why this matters in healthcare:**  
These predictions drive downstream clinical decisions.


In [None]:
# Generate predictions from logistic regression
y_pred_log = log_model.predict(X_test)

# Generate predictions from decision tree
y_pred_tree = tree_model.predict(X_test)

# Display predictions
y_pred_log, y_pred_tree



## Step 7: Build confusion matrices

A confusion matrix compares predicted outcomes with true outcomes.

**Why this matters in healthcare:**  
It shows exactly where the model helps or fails patients.


In [None]:
# Create confusion matrix for logistic regression
cm_log = confusion_matrix(y_test, y_pred_log)

# Create confusion matrix for decision tree
cm_tree = confusion_matrix(y_test, y_pred_tree)

# Display both confusion matrices
cm_log, cm_tree



## Step 8: Extract true and false predictions

Each confusion matrix contains four important quantities:

- **True Positive (TP):** High-risk patient correctly identified  
- **False Positive (FP):** Low-risk patient incorrectly flagged  
- **False Negative (FN):** High-risk patient missed by the model  
- **True Negative (TN):** Low-risk patient correctly reassured  

**Why this matters in healthcare:**  
- False negatives can delay care  
- False positives increase workload and anxiety


In [None]:
# Extract values from logistic regression confusion matrix
tn_l, fp_l, fn_l, tp_l = cm_log.ravel()

# Extract values from decision tree confusion matrix
tn_t, fp_t, fn_t, tp_t = cm_tree.ravel()

# Display extracted values
tn_l, fp_l, fn_l, tp_l, tn_t, fp_t, fn_t, tp_t



## Step 9: Visualize the confusion matrix using a heatmap

A heatmap provides an intuitive visual summary of model performance.

**Why this matters in healthcare:**  
Visuals help communicate results to non-technical stakeholders.


In [None]:
# Convert confusion matrix into a labeled DataFrame
cm_df = pd.DataFrame(
    cm_log,
    index=["Actual Low Risk", "Actual High Risk"],
    columns=["Predicted Low Risk", "Predicted High Risk"]
)

# Create a heatmap visualization
plt.figure()
sns.heatmap(cm_df, annot=True, fmt="d")

# Add labels and title
plt.title("Confusion Matrix – Logistic Regression")
plt.xlabel("Predicted Outcome")
plt.ylabel("Actual Outcome")

# Display the plot
plt.show()



## Step 10: Compute clinical evaluation metrics

You will now compute evaluation metrics manually:

- **Precision:** Of flagged patients, how many truly were high-risk?
- **Recall / Sensitivity:** Of high-risk patients, how many were detected?
- **Specificity:** Of low-risk patients, how many were correctly ruled out?


In [None]:
# Logistic regression metrics
precision_log = tp_l / (tp_l + fp_l) if (tp_l + fp_l) > 0 else 0
recall_log = tp_l / (tp_l + fn_l) if (tp_l + fn_l) > 0 else 0
specificity_log = tn_l / (tn_l + fp_l) if (tn_l + fp_l) > 0 else 0

# Decision tree metrics
precision_tree = tp_t / (tp_t + fp_t) if (tp_t + fp_t) > 0 else 0
recall_tree = tp_t / (tp_t + fn_t) if (tp_t + fn_t) > 0 else 0
specificity_tree = tn_t / (tn_t + fp_t) if (tn_t + fp_t) > 0 else 0

# Display all metrics
print("Logistic Regression Metrics")
print("Precision:", precision_log)
print("Recall (Sensitivity):", recall_log)
print("Specificity:", specificity_log)

print("\nDecision Tree Metrics")
print("Precision:", precision_tree)
print("Recall (Sensitivity):", recall_tree)
print("Specificity:", specificity_tree)




## Step 11: Compare models from a clinical perspective

You will compare both models side by side.

**Why this matters in healthcare:**  
The best model depends on whether missed cases or false alarms are more harmful.


In [None]:
# Create a comparison table
comparison_df = pd.DataFrame({
    "Metric": ["Precision", "Recall (Sensitivity)", "Specificity"],
    "Logistic Regression": [precision_log, recall_log, specificity_log],
    "Decision Tree": [precision_tree, recall_tree, specificity_tree]
})

# Display comparison
comparison_df


## Congratulations!

You have successfully completed this lab on evaluating and comparing clinical prediction models. You practiced using confusion matrices and performance metrics such as recall, precision, and specificity, and visualized model performance to better understand how predictions are evaluated in healthcare.


## Authors
Ramesh Sannareddy

<br>

© SkillUp. All rights reserved.

Materials may not be reproduced in whole or in part without written permission from SkillUp.