
# Session Project 02  
## Influenza Dynamics in Germany (RKI)  
### Age Groups, Temperature Coupling & Structural Break Analysis

---
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
https://colab.research.google.com/github/ShamsaraE/time-series-medicine-biology-2026/blob/main/notebooks/02_Project_RKI_Student.ipynb
)
---

## Dataset: RKI Influenza Surveillance (Germany)

Original column names (German) and their meaning:

| German Column     | English Meaning |
|------------------|----------------|
| Meldewoche       | Reporting week (ISO week, e.g. 2020-W01) |
| Region           | Region name |
| Region_Id        | Numeric region identifier |
| Altersgruppe     | Age group |
| Fallzahl         | Number of reported cases |
| Inzidenz         | Incidence (per 100,000) |

You will build weekly time series from **Fallzahl**.

---


## Part 0 — Load and Prepare Data (Provided)

In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

url = "https://raw.githubusercontent.com/robert-koch-institut/Influenzafaelle_in_Deutschland/main/IfSG_Influenzafaelle.tsv"
df = pd.read_csv(url, sep="\t")

# Keep only Germany
df = df[df["Region"] == "Deutschland"].copy()

# Convert ISO week to Monday date
df["date"] = pd.to_datetime(
    df["Meldewoche"] + "-1",
    format="%G-W%V-%u"
)

df = df.sort_values("date").set_index("date")

df.head()



# Part 1 — Construct Weekly Time Series

### Task 1.1
Construct a weekly time series of **total influenza cases** in Germany.

- Aggregate across age groups
- Ensure proper weekly frequency
- Plot the result

Questions:
- Are epidemic peaks symmetric?
- Do amplitudes remain constant over time?


In [None]:
# Your code here



### Task 1.2
Construct separate weekly time series for:

- 00-14  
- 15-59  
- 60+

Plot all series together.

Questions:
- Which age group peaks first?
- Which age group has highest variance?
- Which appears smoothest?


In [None]:
# Your code here



# Part 2 — Autocorrelation Structure

For:
- Total series
- Each age group

Compute:
- ACF (lags up to 120)
- PACF

Questions:
- Do you observe seasonal peaks near lag ≈ 52?
- Does short-lag ACF decay quickly or slowly?
- Which age group shows strongest persistence?


In [None]:
# Your code here



# Part 3 — Frequency Domain Analysis

Using a periodogram:

- Detect dominant frequency
- Convert frequency to period (in weeks)

Questions:
- Is dominant period near 52 weeks?
- Do age groups differ slightly?
- Is dominant period change during COVID era?


In [None]:
# Your code here



# Part 4 — Temperature Coupling

Download weekly temperature for Germany ( Open-Meteo).

Tasks:
1. Align temperature with influenza series --> I have already defined start_date and end_date aligned by influenza but you need to replace your correct dataframe name insteaf of flu
2. Standardize both series
3. Compute cross-correlation for lags ±20 weeks

Questions:
- At what lag is correlation strongest?
- Does temperature lead flu or vice versa?
- Why might raw cross-correlation be misleading?


In [None]:
start_date = flu.index.min().strftime("%Y-%m-%d")
end_date = flu.index.max().strftime("%Y-%m-%d")

api_url = (
    "https://archive-api.open-meteo.com/v1/archive?"
    "latitude=51.1657&longitude=10.4515"
    f"&start_date={start_date}&end_date={end_date}"
    "&daily=temperature_2m_mean"
    "&timezone=Europe/Berlin"
)

weather_json = requests.get(api_url).json()
temp_daily = pd.DataFrame(weather_json["daily"])
temp_daily["time"] = pd.to_datetime(temp_daily["time"])
temp_daily = temp_daily.set_index("time")

temp_weekly = temp_daily["temperature_2m_mean"].resample("W-MON").mean()

temp_weekly.plot(title="Weekly Mean Temperature (Germany, Open-Meteo)")
plt.show()

temp_weekly.head()



# Part 5 — Prewhitening

Fit AR(p) model (e.g., p=2) to both influenza and temperature series.

- Extract residuals
- Compute cross-correlation of residuals

Questions:
- How much does correlation magnitude change?
- What does this imply about seasonal confounding?
- Do age groups differ in residual coupling?


In [None]:
# Your code here



# Part 6 — Structural Break Analysis (COVID vs Post-COVID)

Define:

- COVID era: 2020-03-01 to 2022-06-30  
- Post-COVID: 2022-07-01 onward  

Tasks:
1. Compare mean and variance across eras
2. Compare ACF patterns
3. Compare dominant period

Questions:
- What changed during COVID era?
- Why does dominant period shift?


In [None]:
# Your code here
