
# Time–Frequency Analysis for Biomedical Time Series  

---
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
https://colab.research.google.com/github/ShamsaraE/time-series-medicine-biology-2026/blob/main/notebooks/Dominant_Frequency_FFT_STFT_Biomedical.ipynb
)

---

## Dominant Frequency (FFT) and Time-Varying Frequency (STFT) 

This notebook includes:

- Clear definitions (frequency, period, sampling rate, Nyquist)
- Mathematical foundation (DFT/FFT, sine/cosine representation, Euler relation)
- Dominant frequency estimation 
- Why we detrend and why we ignore frequency 0 (DC component)
- Biomedical interpretation (epidemiology, circadian, CGM, HRV)
- When FFT is not appropriate
- STFT intuition (“window-by-window”), spectrogram definition
- STFT mathematical definition, basis functions, conceptual differences
- STFT implementation + parameter effects (window size tradeoff)
- When STFT is not enough (motivation for wavelets)

---

## Learning objectives

1. Convert between **frequency** and **period** and interpret units correctly.
2. Compute an FFT-based spectrum and extract the **dominant frequency**.
3. Explain why **detrending** and skipping **frequency 0** are often necessary.
4. Explain and compute the **STFT** and interpret a **spectrogram**.
5. Choose FFT vs STFT for biomedical problems and understand limitations.


# 0. Quick Glossary with Concrete Examples

Let a discrete-time signal be $x_t$, for $t=0,1,\dots,N-1$.

Instead of only definitions, we attach real biomedical examples.

---

## 1 Sampling Interval $d$

**Definition**

Sampling interval $d$ = time between two consecutive measurements.

---

### Example 1 — CGM (Continuous Glucose Monitor)

CGM measures glucose every 5 minutes.

So:

$
d = 5 \text{ minutes}
$

In hours:

$
d = \frac{5}{60} = 0.0833 \text{ hours}
$

In days:

$
d = \frac{5}{1440} \approx 0.00347 \text{ days}
$


---

### Example 2 — Epidemiology

Weekly infection data:

$
d = 1 \text{ week}
$

Monthly hospital admissions:

$
d = 1 \text{ month}
$

---

## 2 Sampling Frequency $f_s$

**Definition**

$
f_s = \frac{1}{d}
$

It tells us how many samples we collect per unit time.

---

### CGM Example

If $d = 5$ minutes:

$
f_s = \frac{1}{5 \text{ minutes}}
$

In samples per hour:

$
f_s = \frac{60}{5} = 12 \text{ samples/hour}
$

In samples per day:

$
f_s = 12 \times 24 = 288 \text{ samples/day}
$

---

### Weekly epidemiology example

If data is weekly:

$
f_s = 1 \text{ sample/week}
$

---

## 3 Frequency $f$

**Definition**

Frequency = number of cycles per unit time.

Unit depends on your time unit:

- cycles/day  
- cycles/week  
- cycles/hour  

---

### Circadian rhythm example

One cycle per 24 hours:

$
f = \frac{1}{24} = 0.0417 \text{ cycles/hour}
$

If working in days:

$
f = 1 \text{ cycle/day}
$

---

### Annual epidemic cycle

One cycle per year:

If time unit = weeks:

$
f = \frac{1}{52} = 0.0192 \text{ cycles/week}
$

---

## 4 Period $P$

**Definition**

$
P = \frac{1}{f}
$

Period = time needed to complete one full cycle.

---

### Example

If FFT gives:

$
f = 0.0833 \text{ cycles/month}
$

Then:

$
P = \frac{1}{0.0833} \approx 12 \text{ months}
$

That means annual seasonality.

---

## 5 Nyquist Frequency $f_{Nyq}$

$
f_{Nyq} = \frac{f_s}{2}
$

It is the maximum frequency you can correctly detect.

If you try to detect higher frequency → aliasing occurs.

---

### CGM Example

We said:

$
f_s = 12 \text{ samples/hour}
$

So:

$
f_{Nyq} = 6 \text{ cycles/hour}
$

That means:

You cannot detect oscillations faster than one cycle every 10 minutes.

---

###  Why this matters

If physiology oscillates faster than Nyquist frequency,  
FFT will misinterpret it.

This is called **aliasing**.

---

## 6 DC Component (Frequency = 0)

Frequency $f = 0$ corresponds to the mean:

$
X(0) = \sum_{t=0}^{N-1} x_t
$

It represents the baseline level.

---

### Biomedical Interpretation

In infection data:

- DC component = average number of cases

In CGM:

- DC component = mean glucose level

It is NOT an oscillation.

This is why when detecting dominant oscillation we ignore $f=0$.

---



In [None]:

# Imports (core scientific stack)
import numpy as np
import matplotlib.pyplot as plt

# FFT utilities
from scipy.fft import fft, fftfreq

# STFT utilities
from scipy.signal import stft

np.random.seed(42)  # reproducibility for teaching



# 1. FFT / DFT: What are we computing?

## 1.1 From complex exponentials to sine/cosine (the “second equality” idea)

A key identity (Euler's formula) is:

$
e^{i\theta} = \cos(\theta) + i\sin(\theta)
$

This matters because Fourier methods decompose signals into sums of **cosines and sines**.
So, you can represent an oscillatory signal as:

$
x_t \approx \sum_{k} \left(a_k \cos(2\pi f_k t) + b_k \sin(2\pi f_k t)\right)
$

- $a_k, b_k$ are coefficients (how much cosine/sine at frequency $f_k$).
- This is often the most intuitive form : **“a combination of sine and cosine waves.”**

## 1.2 The Discrete Fourier Transform (DFT)

For a length-$N$ signal, the DFT is:

$
X_k = \sum_{t=0}^{N-1} x_t \, e^{-i 2\pi kt/N}, \quad k=0,1,\dots,N-1
$

- $X_k$ is a complex number describing amplitude and phase at frequency bin $k$.
- The FFT is simply a fast algorithm for computing the DFT.

## 1.3 Frequency bins and units

If your sampling interval is $d$, then the frequencies are:

$
f_k = \frac{k}{N d}
$

So the **frequency unit** is “cycles per unit time.”
The **period** is:

$
P_k = \frac{1}{f_k}
$

- In biomedical work, always interpret frequency/period in meaningful units:  
- cycles/day, cycles/week, cycles/hour, etc.



# 2. Dominant frequency: definition and biomedical meaning

## 2.1 Definition (dominant frequency)

Let $X(f)$ be the Fourier transform (DFT/FFT result). The magnitude spectrum is:

$
|X(f)|
$

The **dominant frequency** is:

$
f^* = \arg\max_{f>0} |X(f)|
$

And the **dominant period** is:

$
P^* = \frac{1}{f^*}
$

We restrict to $f>0$ because negative frequencies are redundant for real signals (they mirror positives).

## 2.2 Biomedical interpretation examples

- **Epidemiology**: weekly influenza cases often have $P \approx 52$ weeks (annual seasonality).
- **Circadian biology**: hormone rhythms often have $P \approx 24$ hours.
- **CGM glucose**: may show daily rhythm $(\approx 24$h) plus meal-related patterns (shorter periods).
- **Cardiology / HRV**: frequency bands correspond to physiological regulation (respiratory, autonomic).

Dominant frequency answers:  
- “What is the strongest repeating cycle in this data (after removing slow drift)?”



# 3. FFT dominant frequency: step-by-step (stationary example)

We simulate a *biomedical-style* time series with:
- A slow trend (e.g., gradual increase in incidence)
- A seasonal component with **true period = 12** time units
- Random noise (measurement + biological variability)

Then we recover the period from the spectrum.


In [None]:

# ------------------------------
# Create a stationary seasonal signal + trend
# ------------------------------

N = 144              # number of samples (e.g., 144 months = 12 years monthly)
d = 1.0              # sampling interval (1 month per sample here)
t = np.arange(N)     # discrete time index: 0..N-1

# Trend: slow increase (e.g., increasing incidence due to demographic changes)
trend = 2.0 * t + 100

# Seasonal component: true period P=12 (e.g., annual seasonality in monthly data)
P_true = 12
seasonal = 20.0 * np.sin(2*np.pi * t / P_true)

# Noise: measurement noise + unobserved variability
noise = np.random.normal(loc=0.0, scale=5.0, size=N)

# Observed signal
x = trend + seasonal + noise

plt.figure()
plt.plot(t, x)
plt.title("Simulated biomedical time series (trend + seasonality + noise)")
plt.xlabel("time index t (months)")
plt.ylabel("value")
plt.grid(True)
plt.show()



# 4. Why detrending matters (and how)

FFT is best interpreted when the signal is approximately **stationary** (no strong drift).  
A strong trend injects energy into very low frequencies and can dominate the spectrum.

A simple approach: **linear detrending** using least squares:

$
\hat{T}_t = \hat{\beta}_0 + \hat{\beta}_1 t
$
$
x_t^{(d)} = x_t - \hat{T}_t
$

We do this before FFT when our goal is to detect periodic components.


In [None]:

# ------------------------------
# Linear detrending (simple approach)
# ------------------------------

# Fit a line: x ≈ beta1*t + beta0
beta1, beta0 = np.polyfit(t, x, deg=1)

# Estimated trend
trend_hat = beta1 * t + beta0

# Detrended signal (remove slow drift)
x_detrended = x - trend_hat

# Visualize
plt.figure()
plt.plot(t, x, label="original")
plt.plot(t, trend_hat, label="estimated trend")
plt.title("Original signal and estimated linear trend")
plt.xlabel("t")
plt.ylabel("value")
plt.legend()
plt.grid(True)
plt.show()

plt.figure()
plt.plot(t, x_detrended)
plt.title("Detrended signal (used for FFT)")
plt.xlabel("t")
plt.ylabel("value")
plt.grid(True)
plt.show()



# 5. Why we ignore frequency 0 (the DC component)

In the FFT output, the **frequency \(f=0\)** term corresponds to the **mean level** of the detrended signal.

- It is called the **DC component** (direct current), by analogy to electronics.
- It does **not** represent oscillation; it represents baseline/offset.

So when searching for the dominant oscillation, we typically maximize over \(f>0\).



# 6. Compute FFT spectrum and extract dominant frequency

Steps:
1. Compute FFT of detrended signal: $X_k$  
2. Convert bins to frequencies $f_k$ using sampling interval $d$  
3. Keep positive frequencies  
4. Compute magnitude $|X_k|$  
5. Dominant frequency = argmax magnitude (excluding $f=0$)  
6. Convert to period $P=1/f$

We will print the estimated dominant period and compare to the true period (12).


In [None]:

# ------------------------------
# FFT-based dominant frequency detection
# ------------------------------

# FFT values (complex)
X = fft(x_detrended)

# Corresponding frequency bins in cycles per unit time (here: cycles/month)
freqs = fftfreq(N, d=d)

# Keep only positive frequencies (real signals are symmetric in frequency)
mask_pos = freqs > 0
freqs_pos = freqs[mask_pos]
mag_pos = np.abs(X[mask_pos])  # magnitude spectrum

# Plot spectrum
plt.figure()
plt.plot(freqs_pos, mag_pos)
plt.title("Magnitude spectrum |FFT(x_detrended)|")
plt.xlabel("frequency (cycles per month)")
plt.ylabel("magnitude")
plt.grid(True)
plt.show()

# Dominant frequency (largest magnitude)
idx_star = np.argmax(mag_pos)
f_star = freqs_pos[idx_star]
P_star = 1.0 / f_star

print(f"Estimated dominant frequency f*: {f_star:.6f} cycles/month")
print(f"Estimated dominant period P*: {P_star:.2f} months")
print(f"True period: {P_true} months")



# 7. When FFT is NOT appropriate (or needs caution)

FFT assumes the oscillations are reasonably stable over time (stationary or close).  
In biomedical data, FFT can mislead when:

1. **Changing periodicity**: cycle length changes over time (e.g., interventions changing epidemic waves).
2. **Transient oscillations**: oscillation appears only for a short period (e.g., short-term relapse pattern).
3. **Strong non-stationarity**: abrupt shifts, regime changes, structural breaks.
4. **Irregular sampling**: missing data, variable sampling intervals (standard FFT assumes regular sampling).
5. **Short signals**: poor frequency resolution; spectrum becomes noisy.
6. **Aliasing**: sampling too slowly to capture the biological oscillation (violating Nyquist).

If your core question is:
 “How does the dominant frequency change over time?”
then you want **STFT** (next section).



# 8. STFT: Short-Time Fourier Transform (time-varying frequency)

## 8.1 Core idea: “window-by-window” Fourier analysis

Instead of computing one spectrum for the entire signal:
- FFT gives: **entire signal → one frequency spectrum**

STFT does:
- Split the signal into overlapping windows
- Compute an FFT **inside each window**
- Stack the results over time

So STFT produces a 2D object:
$
|X(t,f)|
$
which describes how frequency content changes over time.

## 8.2 Spectrogram: what is it?

A **spectrogram** is a visualization of STFT magnitude:
- x-axis: time (window position)
- y-axis: frequency
- color/intensity: magnitude (energy)

It answers:
 “Which frequencies are active at which times?”



# 9. STFT mathematics and what changes conceptually

## 9.1 Mathematical definition

For discrete-time signal $x[n]$, STFT is:

$
STFT(\tau, f) = \sum_{n=-\infty}^{\infty} x[n]\; w[n-\tau]\; e^{-i2\pi f n}
$

Where:
- $w[\cdot]$ is a window function (e.g., Hann/Hamming) centered around $\tau$.
- The term $w[n-\tau]$ localizes the analysis around time $\tau$.

## 9.2 What changes compared to FFT?

FFT uses basis functions:
$
e^{i2\pi f t}
$
which extend over the entire time axis.

STFT uses *localized* basis functions:
$
w(t-\tau)\, e^{i2\pi f t}
$

So STFT is Fourier analysis that is localized in time.



# 10. A non-stationary biomedical example (frequency changes over time)

We simulate a signal where:
- First half has period $P_1 = 12$
- Second half has period $P_2 = 6$

This mimics:
- an epidemic pattern that changes due to intervention,
- changes in behavior,
- changes in pathogen variants,
- or altered physiological rhythms over time.

We will compare FFT vs STFT on this example.


In [None]:

# ------------------------------
# Non-stationary signal: period changes over time
# ------------------------------

N2 = 300
d2 = 1.0                # sampling interval (arbitrary time units)
t2 = np.arange(N2)

P1, P2 = 12, 6

# First half: P1
x1 = 10.0 * np.sin(2*np.pi * t2[:150] / P1)

# Second half: P2
x2 = 10.0 * np.sin(2*np.pi * t2[150:] / P2)

# Concatenate + noise
x_ns = np.concatenate([x1, x2]) + np.random.normal(0, 2.0, size=N2)

plt.figure()
plt.plot(t2, x_ns)
plt.title("Non-stationary signal: period changes from 12 to 6")
plt.xlabel("time index")
plt.ylabel("value")
plt.grid(True)
plt.show()



# 11. FFT on the non-stationary signal (global-only view)

FFT will show you the overall frequency content aggregated across time.
It cannot tell you *when* each frequency occurs.

So you may see both frequencies present, but not their time localization.


In [None]:

# ------------------------------
# FFT on non-stationary signal
# ------------------------------

X_ns = fft(x_ns)
freqs_ns = fftfreq(N2, d=d2)

mask_pos_ns = freqs_ns > 0
freqs_ns_pos = freqs_ns[mask_pos_ns]
mag_ns_pos = np.abs(X_ns[mask_pos_ns])

plt.figure()
plt.plot(freqs_ns_pos, mag_ns_pos)
plt.title("FFT spectrum of non-stationary signal (global summary)")
plt.xlabel("frequency")
plt.ylabel("magnitude")
plt.grid(True)
plt.show()



# 12. STFT implementation (step-by-step)

We compute STFT using `scipy.signal.stft`.

Key parameters:

- `fs`: sampling frequency \(f_s\). If sampling interval is \(d\), then \(f_s = 1/d\).
- `nperseg`: window length (number of samples per window).
- `noverlap`: overlap between windows (often ~50% to get smooth time evolution).

Output:
- `f`: frequency axis
- `t`: time axis (window centers)
- `Zxx`: complex STFT values; magnitude is `np.abs(Zxx)`

We then plot a **spectrogram** (magnitude over time and frequency).


In [None]:

# ------------------------------
# STFT computation
# ------------------------------

fs = 1.0 / d2      # sampling frequency (samples per unit time)
nperseg = 50       # window size (tradeoff: time vs frequency resolution)
noverlap = 25      # overlap between windows

f_stft, t_stft, Zxx = stft(x_ns, fs=fs, nperseg=nperseg, noverlap=noverlap)

# Spectrogram = magnitude of STFT
S = np.abs(Zxx)

plt.figure()
plt.pcolormesh(t_stft, f_stft, S, shading='gouraud')
plt.title("STFT Spectrogram (time-varying frequency)")
plt.xlabel("time (window center)")
plt.ylabel("frequency (cycles per unit time)")
plt.colorbar(label="magnitude")
plt.ylim(0, 0.3)   # zoom to useful band for this synthetic example
plt.show()

print("STFT shapes:")
print("f_stft:", f_stft.shape, "| t_stft:", t_stft.shape, "| Zxx:", Zxx.shape)



# 13. Window size tradeoff (the key teaching concept)

STFT has a fundamental tradeoff:

- Small window → good time localization, poor frequency resolution  
- Large window → good frequency resolution, poor time localization

This is often described as a time–frequency uncertainty principle:
$
\Delta t \cdot \Delta f \gtrsim \text{constant}
$

## Practical guidance (biomedical)
- **CGM** (rapid changes): smaller window might be needed to capture changing rhythms.
- **Epidemiology** (slow changes): larger window may be acceptable to estimate annual cycles.


In [None]:

# ------------------------------
# Demonstrate window size tradeoff by comparing two spectrograms
# ------------------------------

def plot_spectrogram(x, fs, nperseg, noverlap, title, fmax=None):
    f, tt, Z = stft(x, fs=fs, nperseg=nperseg, noverlap=noverlap)
    S = np.abs(Z)
    plt.figure()
    plt.pcolormesh(tt, f, S, shading='gouraud')
    plt.title(title)
    plt.xlabel("time")
    plt.ylabel("frequency")
    plt.colorbar(label="magnitude")
    if fmax is not None:
        plt.ylim(0, fmax)
    plt.show()

# Smaller window: better time resolution
plot_spectrogram(
    x_ns, fs=fs, nperseg=30, noverlap=15,
    title="Spectrogram with smaller window (nperseg=30): better time resolution",
    fmax=0.3
)

# Larger window: better frequency resolution
plot_spectrogram(
    x_ns, fs=fs, nperseg=100, noverlap=50,
    title="Spectrogram with larger window (nperseg=100): better frequency resolution",
    fmax=0.3
)



# 14. When should we use STFT? (decision guidance)

Use **FFT** when:
- You believe the dominant periodicity is stable (stationary-ish).
- You need a single global estimate of seasonality (e.g., annual cycle).

Use **STFT** when:
- Periodicity changes over time (non-stationary).
- You care about *when* a frequency appears/disappears.
- You suspect interventions / behavior / physiology changes the rhythm.

### Epidemiology example
- Variant changes, interventions, school terms → seasonality and wave frequency may change.
- STFT can reveal how a dominant frequency band strengthens or shifts through time.

### Physiology example (CGM, HR)
- Daily patterns change with sleep, meals, workdays vs weekends.
- STFT can show day-by-day rhythm variability.



# 15. When STFT is NOT enough

STFT still assumes that within each window, the signal is approximately stationary.

STFT can struggle when:
- Frequency changes very rapidly (within a window).
- Signal contains short bursts with sharp edges.
- You need multi-scale resolution: fine time for high frequencies, fine frequency for low frequencies.

In such cases, **Wavelet transforms** are often better for biomedical signals because they adapt resolution by scale.

