
# Seasonality Detection and Modeling 
## COVID-19 Weekly Deaths – Germany  

---
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
https://colab.research.google.com/github/ShamsaraE/time-series-medicine-biology-2026/blob/main/notebooks/03_Assignment_Seasonality_COVID_Germany.ipynb)
---

You must:

1. Prepare the data correctly
2. Identify dominant frequency using spectral analysis
3. Estimate seasonal period using s = 1 / f_max
4. Model seasonality using harmonic regression (scikit-learn)
5. Evaluate performance on the last seasonal cycle
6. Compare against seasonal naive baseline
7. Interpret results 



# 1. Load and Prepare Data


In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.signal import periodogram

country = "Germany"

owid_url = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv"

df = pd.read_csv(owid_url, parse_dates=["date"])

df = df[df["location"] == country][[
    "date",
    "new_deaths"
]].dropna()

df = df[df["date"] >= "2020-03-01"].reset_index(drop=True)

df = df.set_index("date")

# Weekly aggregation (reduce weekday reporting noise)
ts = df["new_deaths"].resample("W-MON").sum().asfreq("W-MON")

ts.head()



# Task 1 – Visual Exploration

1. Plot the weekly time series.
2. Describe:
   - Is seasonality visible?
   - Are there structural breaks?
   - Is variance constant over time?
3. Write a short interpretation 


In [None]:
# Your code 


# Task 2 – Identify Dominant Frequency

1. Center the time series (remove mean).
2. Compute the periodogram.
3. Plot the periodogram (exclude zero frequency).
4. Identify the dominant frequency f_max.
5. Compute seasonal period:

   s = 1 / f_max

6. Is s approximately 52 weeks?
7. Is seasonality strong or weak?


In [None]:
# Your code here


# Task 3 – Variance Stabilization

1. Apply log(1 + y).
2. Plot transformed series.


In [None]:
# Your code here


# Task 4 – Harmonic Seasonal Regression

Using your estimated seasonal period s:

Construct:

$
\sin\left(\frac{2\pi t}{s}\right), \quad
\cos\left(\frac{2\pi t}{s}\right)
$

Model:

$
y_t = \beta_0 + \beta_1 \sin\left(\frac{2\pi t}{s}\right) + 
\beta_2 \cos\left(\frac{2\pi t}{s}\right) + \varepsilon_t
$

Use scikit-learn LinearRegression.

Interpret the coefficients.


In [None]:
# Your code here


# Task 5 – Seasonal Amplitude

Compute:

$
A = \sqrt{\beta_1^2 + \beta_2^2}
$

Interpret the magnitude of A.


In [None]:
# Your code here


# Task 6 – Out-of-Sample Evaluation

1. Reserve the last s weeks as test set.
2. Fit model on training set.
3. Forecast the last seasonal cycle.
4. Compute MAE (log scale).


In [None]:
# Your code here


# Task 7 – Seasonal Naive Baseline

Define seasonal naive forecast:

$ŷ_t = y_{t-s}$

Compute MAE.

Compare with harmonic regression.


In [None]:
# Your code here


# Task 8 – MASE Evaluation

Compute:

MASE = Model MAE / Seasonal Naive MAE

Interpret:

- MASE < 1
- MASE ≈ 1
- MASE > 1


In [None]:
# Your code here