# Standalone notebook: download UKHSA COVID-19 cases (England)

This notebook is a **standalone** version of the baseline observed workflow.

- It downloads UKHSA data using **plain `requests`** (no custom helper modules).
- It handles pagination by following the `next` link until there are no more pages.
- It saves a CSV and makes a simple time-series plot.

Output CSV:
- `../data/processed/observed/ukhsa_covid19_cases_by_day_england.csv`


In [None]:
from pathlib import Path

import pandas as pd
import matplotlib.pyplot as plt
import requests


## 1) Download the dataset from the UKHSA API

This endpoint returns a paginated JSON response with:
- `results`: the rows on this page
- `next`: a URL for the next page (or `null` when finished)


In [None]:
url = (
    'https://api.ukhsa-dashboard.data.gov.uk/v2/themes/infectious_disease'
    '/sub_themes/respiratory/topics/COVID-19'
    '/geography_types/Nation/geographies/England'
    '/metrics/COVID-19_cases_casesByDay'
)

rows = []
while url:
    data = requests.get(url, timeout=60).json()
    rows += data['results']
    url = data['next']

len(rows), rows[0].keys()


## 2) Convert to a DataFrame and keep only what we need

For a simple “time vs infected (proxy)” plot we only need:
- `date`
- `metric_value` (we rename it to `cases`)


In [None]:
df_raw = pd.DataFrame(rows)

df = df_raw[['date', 'metric_value']].dropna().copy()
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')
df = df.rename(columns={'metric_value': 'cases'})

df.head()


## 3) Save the CSV (so the group can reuse it)


In [None]:
out_dir = Path('..') / 'data' / 'processed' / 'observed'
out_dir.mkdir(parents=True, exist_ok=True)
out_path = out_dir / 'ukhsa_covid19_cases_by_day_england.csv'

df.to_csv(out_path, index=False)
out_path


## 4) Plot: time vs infected (proxy)


In [None]:
fig, ax = plt.subplots(figsize=(11, 5))
ax.plot(df['date'], df['cases'], linewidth=1)
ax.set_title('England: confirmed COVID-19 cases by day (UKHSA)')
ax.set_xlabel('Date')
ax.set_ylabel('Cases')
ax.grid(True, alpha=0.3)
fig.tight_layout()
plt.show()
