# Building Temporal Features for an Early Warning Model

Time estimate: **20** minutes


## Objectives
After completing this lab, you will be able to:
- Describe why temporal patterns matter in early warning systems.
- Engineer rolling, baseline, and trend-based features from time-stamped clinical data.
- Aggregate longitudinal data into model-ready patient features.



## What you will do in this lab

In this lab, you will organize and transform time-stamped clinical measurements into model-ready patient features.

You will:

- Review longitudinal vital signs data.
- Organize measurements by patient and time.
- Create recent-value, rolling-average, and trend features.
- Aggregate temporal features at the patient level.
- Practice feature engineering through guided exercises.



## Overview
Early warning models are built to detect **clinical deterioration early**.
Single measurements are often insufficient; instead, clinicians look for
**patterns over time** such as rising heart rate or falling blood pressure.

In this lab, you will learn how to translate raw time-stamped clinical data
into temporal features that reflect patient trajectories.



## About the dataset/environment
You will work with a **synthetic, de-identified longitudinal dataset**
containing repeated vital sign measurements for hospitalized patients.
Each patient has multiple observations over time.


## Setup

In [None]:
# This cell prepares the environment and loads a synthetic longitudinal dataset
# All data is artificial and safe for instructional purposes

# Import pandas for handling time-based clinical data
import pandas as pd

# Import numpy for numerical calculations
import numpy as np

# Set a random seed so results are reproducible
np.random.seed(42)

# Load a synthetic longitudinal vital signs dataset
# Each row represents one measurement at a specific point in time
data = pd.read_csv("https://machine-learning-for-healthcare-applications-f276df.gitlab.io/labs/lab4/longitudinal_vitals1.csv")

# Display the dataset so you understand its structure
data.head()



## Step 1: Review longitudinal clinical data

You will begin by reviewing how measurements are recorded over time.
Each patient appears multiple times, once for each observation.

**Why this matters in healthcare:**  
Early warning signals often appear only when trends are examined.


In [None]:
# Display column names, data types, and missing values
data.info()



In [None]:
# Display the first few rows to understand layout
data.head()


## Step 2: Sort data by patient and time

Temporal analysis requires correct chronological ordering.
You must ensure that each patient's records are sorted by time.

**Why this matters in healthcare:**  
Incorrect ordering can reverse trends and hide deterioration.


In [None]:
# Sort data by patient identifier and timestamp
data = data.sort_values(by=["patient_id", "timestamp"])

# Display sorted data
data



## Step 3: Create most-recent measurement features

One of the simplest temporal features is the most recent value,
which often reflects the patient's current state.

**Why this matters in healthcare:**  
Recent measurements frequently drive clinical action.


In [None]:
# Extract the most recent heart rate for each patient
latest_hr = data.groupby("patient_id")["heart_rate"].last()

# Extract the most recent systolic blood pressure for each patient
latest_sbp = data.groupby("patient_id")["systolic_bp"].last()

# Display latest measurements
latest_hr.head()

In [None]:
latest_sbp.head()


## Step 4: Create rolling average features

Rolling averages smooth out short-term noise
and highlight sustained changes.

**Why this matters in healthcare:**  
Sustained deterioration is often more clinically meaningful than isolated spikes.


In [None]:
# Compute rolling average heart rate over the last 3 measurements per patient
data["hr_rolling_mean_3"] = (
    data.groupby("patient_id")["heart_rate"]
        .rolling(window=3)
        .mean()
        .reset_index(level=0, drop=True)
)

# Display updated data
data



## Step 5: Compute change-from-baseline features

You will compare current values to a patient's baseline
to detect deviation from their normal state.

**Why this matters in healthcare:**  
Deviation from baseline often signals emerging risk.


In [None]:
# Identify baseline heart rate as the first measurement per patient
baseline_hr = data.groupby("patient_id")["heart_rate"].transform("first")

# Compute change from baseline
data["hr_change_from_baseline"] = data["heart_rate"] - baseline_hr

# Display updated data
data



## Step 6: Create trend features

Trend features capture the direction and speed of change.
Here, you will calculate differences between consecutive measurements.

**Why this matters in healthcare:**  
Rapid worsening can indicate impending deterioration.


In [None]:
# Compute heart rate change between consecutive measurements
data["hr_delta"] = (
    data.groupby("patient_id")["heart_rate"]
        .diff()
)

# Display updated data
data



## Step 7: Aggregate temporal features at the patient level

Early warning models typically operate on one row per patient.
You will now summarize each patient's recent trajectory.

**Why this matters in healthcare:**  
Models require concise summaries of patient trends.


In [None]:
# Aggregate temporal features to patient level
patient_features = data.groupby("patient_id").agg(
    latest_hr=("heart_rate", "last"),
    latest_sbp=("systolic_bp", "last"),
    avg_hr_rolling=("hr_rolling_mean_3", "last"),
    hr_change_baseline=("hr_change_from_baseline", "last"),
    avg_hr_trend=("hr_delta", "mean")
)

# Display patient-level features
patient_features



## Step 8: Review feature table for modeling

You will review the final feature table to ensure it aligns
with clinical expectations and logic.

**Why this matters in healthcare:**  
Feature review prevents silent errors entering models.


In [None]:
# Display structure and values of the final feature table
patient_features.info()
patient_features


## Exercises

### Exercise 1: Compute rolling average for systolic blood pressure
Compute rolling average for systolic blood pressure using a window size of 3.

In [None]:
# Your code goes here
# Compute rolling average for systolic blood pressure


<details>
<summary>Click here for a hint</summary>

Use a rolling window similar to heart rate.

</details>

<details>
<summary>Click here for solution</summary>

```python
data['sbp_rolling_mean_3'] = data.groupby('patient_id')['systolic_bp'].rolling(3).mean().reset_index(level=0, drop=True)
data.head()
```

</details>


### Exercise 2: Find max heart rate for all patients
Extract the max heart rate for each patient.

In [None]:
# Your code goes here


<details>
<summary>Click here for a hint</summary>

Use the .max() aggregation on heart_rate column.

</details>

<details>
<summary>Click here for solution</summary>

```python
max_heart_rate_data = data.groupby('patient_id')['heart_rate'].max()
max_heart_rate_data.head()
```

</details>


### Exercise 3: Find min systolic blood pressure for all patients
Extract the min systolic blood pressure for each patient.

In [None]:
# Your code goes here


<details>
<summary>Click here for a hint</summary>

Use the .min() aggregation on systolic blood pressure.

</details>

<details>
<summary>Click here for solution</summary>

```python
min_sbp_data = data.groupby('patient_id')['systolic_bp'].min()
min_sbp_data.head()
```

</details>


### Exercise 4: Compute baseline for systolic blood pressure
Compute the systolic blood pressure from the first recorded measurement for each patient.

In [None]:
# Your code goes here
# Identify baseline systolic blood pressure as the first measurement per patient


<details>
<summary>Click here for a hint</summary>

 Use `.transform("first")` to get the baseline from the `systolic_bp` column.

</details>

<details>
<summary>Click here for solution</summary>

```python
baseline_sbp = data.groupby("patient_id")["systolic_bp"].transform("first")
baseline_sbp.head()
```

</details>

### Exercise 5: Compute change from baseline for systolic blood pressure
Compute the change in systolic blood pressure from the first recorded measurement for each patient.

In [None]:
# Your code goes here

# Compute change from the baseline


# Display updated data



<details>
<summary>Click here for a hint</summary>

Similar to how `hr_change_from_baseline` was computed, use `.transform("first")` to get the baseline and then subtract it from the `systolic_bp` column.

</details>

<details>
<summary>Click here for solution</summary>

```python

data["sbp_change_from_baseline"] = data["systolic_bp"] - baseline_sbp
data.head()
```

</details>

## Congratulations!

You have successfully completed this lab on engineering temporal features
that support early warning and clinical risk detection. You practiced applying rolling, baseline, and trend-based operations to longitudinal data using a structured, step-by-step workflow.


## Authors
Ramesh Sannareddy

<br>

Â© SkillUp. All rights reserved.

Materials may not be reproduced in whole or in part without written permission from SkillUp.