# Tutorial: Using Swagger / OpenAPI (UKHSA Dashboard API)

This notebook is a hands-on walkthrough for *finding and using* an API using its Swagger/OpenAPI documentation.

We’ll use the **UKHSA Data Dashboard API** as a real example and finish by downloading a respiratory time series (RSV positivity) into pandas.

## What are Swagger and OpenAPI?
- **OpenAPI** is a machine-readable description of an API (endpoints, parameters, responses).
- **Swagger UI** is a website that reads the OpenAPI description and gives you an interactive browser for the API.

### Key links
- Swagger UI (human-friendly): `https://api.ukhsa-dashboard.data.gov.uk/api/swagger`
- OpenAPI schema (machine-friendly JSON): `https://api.ukhsa-dashboard.data.gov.uk/api/schema`

## What you’ll learn
1. How to find the OpenAPI schema URL from Swagger UI
2. How to inspect the schema to discover endpoints
3. How to call endpoints with Python
4. How to handle pagination (when results are spread across pages)


In [None]:
import json
from pprint import pprint

import pandas as pd
import requests


## Step 1 — Open Swagger UI (manual step)

Open this in a browser:

- `https://api.ukhsa-dashboard.data.gov.uk/api/swagger`

In many APIs, Swagger UI shows a list of endpoints you can expand.

### How we find the schema URL
Under the hood, Swagger UI needs the OpenAPI schema file. For the UKHSA dashboard API, the schema is at:

- `https://api.ukhsa-dashboard.data.gov.uk/api/schema`


## Step 2 — Download the OpenAPI schema

The schema is just JSON. Let’s download it and inspect the top-level fields.

In [None]:
SCHEMA_URL = 'https://api.ukhsa-dashboard.data.gov.uk/api/schema'
schema = requests.get(SCHEMA_URL, timeout=60).json()
list(schema.keys())


In [None]:
schema['openapi'], schema['info']['title'], schema['info']['version']


## Step 3 — Discover endpoints from `paths`

In OpenAPI, `paths` is a mapping from URL path → HTTP methods (GET/POST/…).
Let’s list the available paths.

In [None]:
paths = sorted(schema['paths'].keys())
print('Number of paths:', len(paths))
for p in paths:
    print(p)


### Reading a path definition
Each path has one or more HTTP methods (often `get`). We can inspect one path to see:
- parameters
- response schemas
- docs text

Let’s inspect the `GET /v2/themes/` endpoint, which is the entry point for browsing datasets.

In [None]:
pprint(schema['paths']['/v2/themes/']['get'].keys())


In [None]:
print(schema['paths']['/v2/themes/']['get'].get('summary'))
print(schema['paths']['/v2/themes/']['get'].get('description'))


## Step 4 — Call an endpoint (themes → subthemes → topics → metrics)

The UKHSA dashboard API is hierarchical:

- theme
- sub_theme
- topic
- geography_type
- geography
- metric

We’ll browse that tree using the `v2` endpoints.

In [None]:
BASE = 'https://api.ukhsa-dashboard.data.gov.uk'

def get_json(path: str):
    r = requests.get(BASE + path, timeout=60)
    r.raise_for_status()
    return r.json()

themes = get_json('/v2/themes/')
themes


In [None]:
# Each theme is a dict with a name and a link
[t['name'] for t in themes]


We want respiratory infections, so we’ll use:

- theme: `infectious_disease`
- sub_theme: `respiratory`


In [None]:
subthemes = get_json('/v2/themes/infectious_disease/sub_themes/')
[s['name'] for s in subthemes]


In [None]:
topics = get_json('/v2/themes/infectious_disease/sub_themes/respiratory/topics')
[t['name'] for t in topics]


Now pick a topic, for example **RSV**, then discover what geography breakdowns are available.

In [None]:
geo_types = get_json('/v2/themes/infectious_disease/sub_themes/respiratory/topics/RSV/geography_types')
[g['name'] for g in geo_types]


For a first plot, the simplest is the national series:

- geography_type: `Nation`
- geography: `England`


In [None]:
geos = get_json('/v2/themes/infectious_disease/sub_themes/respiratory/topics/RSV/geography_types/Nation/geographies')
[g['name'] for g in geos]


Finally, list the available metrics (what you can plot).

In [None]:
metrics = get_json('/v2/themes/infectious_disease/sub_themes/respiratory/topics/RSV/geography_types/Nation/geographies/England/metrics')
[m['name'] for m in metrics]


## Step 5 — Download a metric dataset

We’ll download weekly RSV testing positivity:

- metric: `RSV_testing_positivityByWeek`

When you fetch a metric, you get paginated results (a `next` URL).
This is common in APIs so responses don’t become huge.

In [None]:
metric_url = (
    BASE
    + '/v2/themes/infectious_disease/sub_themes/respiratory/topics/RSV'
    + '/geography_types/Nation/geographies/England'
    + '/metrics/RSV_testing_positivityByWeek'
)
first_page = requests.get(metric_url, timeout=60).json()
first_page.keys()


In [None]:
first_page['count'], first_page['next']


In [None]:
first_page['results'][0]  # a single row


### Pagination helper
We’ll write a small function that follows `next` links until it has all rows.

In [None]:
def fetch_all_pages(url: str) -> list[dict]:
    rows = []
    next_url = url
    while next_url:
        payload = requests.get(next_url, timeout=60).json()
        rows.extend(payload['results'])
        next_url = payload['next']
    return rows

rows = fetch_all_pages(metric_url)
len(rows)


Convert to a DataFrame and plot time vs positivity.

In [None]:
df = pd.DataFrame(rows)
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')
df[['date','metric_value']].head()


In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10,4))
ax.plot(df['date'], df['metric_value'])
ax.set_title('England: RSV testing positivity (weekly, UKHSA API)')
ax.set_xlabel('Date')
ax.set_ylabel('Positivity')
ax.grid(True, alpha=0.3)
fig.tight_layout()


## Exercises (try these)
1. Change the topic from `RSV` to `Influenza` and plot `influenza_testing_positivityByWeek`
2. Change the geography_type from `Nation` to `UKHSA Region` (if available) and plot multiple regions
3. Use a rolling average to smooth noisy time series

Tip: use the browse endpoints to discover valid values, in this order:

`themes → sub_themes → topics → geography_types → geographies → metrics`
