# 6a. Exploring Indicator Metadata for QOF Prevalence

## Objective

This notebook uses the metadata CSVs generated by `06_api_exploration...` to perform a targeted search. The goal is to identify all indicators across our priority profiles that are related to **QOF (Quality and Outcomes Framework) prevalence**.

This creates a clean, high-value list of indicators that we can use for a targeted data download for the NHS South West region in the next notebook.

### 1. Setup and Load All Metadata Files

In [1]:
import pandas as pd
from pathlib import Path

# Set display options for better viewing
pd.set_option('display.max_rows', 200)
pd.set_option('display.max_colwidth', 150)

METADATA_PATH = Path("../data/metadata")
meta_files = list(METADATA_PATH.glob("meta_*.csv"))

# Load and combine all metadata files into one master DataFrame
all_meta_list = []
for f in meta_files:
    profile_key = f.stem.split('_')[2]
    df = pd.read_csv(f).assign(Profile=profile_key)
    all_meta_list.append(df)

master_meta_df = pd.concat(all_meta_list, ignore_index=True)

print(f"Successfully loaded and combined {len(master_meta_df)} indicators from {len(meta_files)} profiles.")

Successfully loaded and combined 543 indicators from 7 profiles.


### 2. Find All 'QOF prevalence' Indicators

Now we can perform a case-insensitive search on the 'Indicator' column for the specific phrase "QOF prevalence".

In [2]:
# Define the exact search term
search_term = 'QOF prevalence'

# Filter the DataFrame
qof_prevalence_indicators = master_meta_df[
    master_meta_df['Indicator'].str.contains(search_term, case=False, na=False)
].copy()

# Clean up the Indicator ID column
qof_prevalence_indicators['Indicator id'] = qof_prevalence_indicators['Indicator id'].astype(int)

# Sort for easier viewing
qof_prevalence_indicators.sort_values('Indicator', inplace=True)

print(f"Found {len(qof_prevalence_indicators)} indicators matching '{search_term}':")

# Display the results, focusing on the key columns
display(qof_prevalence_indicators[['Indicator id', 'Indicator', 'Unit', 'Profile']].reset_index(drop=True))

Found 45 indicators matching 'QOF prevalence':


Unnamed: 0,Indicator id,Indicator,Unit,Profile
0,90933,Asthma: QOF prevalence,%,Respiratory
1,90933,Asthma: QOF prevalence,%,GP
2,280,Atrial fibrillation: QOF prevalence,%,Cardio
3,280,Atrial fibrillation: QOF prevalence,%,GP
4,273,CHD: QOF prevalence,%,GP
5,273,CHD: QOF prevalence,%,Cardio
6,273,CHD: QOF prevalence,%,Dementia
7,258,CKD: QOF prevalence,%,Cardio
8,258,CKD: QOF prevalence,%,GP
9,253,COPD: QOF prevalence,%,Respiratory


### 3. Define Target NHS South West Geographies

With our target indicators identified, we now define the geographies for our future data pull. We are interested in the 7 South West ICBs (as parent areas) and small area data (LSOA/MSOA/GP) within them.

In [3]:
# Official ODS codes for the 7 South West ICBs
SW_ICB_CODES = {
    'QWE': 'NHS Bristol, North Somerset and South Gloucestershire',
    'QSL': 'NHS Bath and North East Somerset, Swindon and Wiltshire',
    'QJK': 'NHS Cornwall and Isles of Scilly',
    'QKK': 'NHS Devon',
    'QU9': 'NHS Dorset',
    'QVV': 'NHS Gloucestershire',
    'QJ2': 'NHS Somerset'
}

# Area Type IDs for the small area geographies we want to download
# We will use these in the next notebook to fetch data within each ICB
TARGET_AREA_TYPE_IDS = {
    'LSOA': 3,      # 2021 LSOAs
    'MSOA': 170,    # 2021 MSOAs
    'GP': 7
}

print("--- Target NHS South West ICBs ---")
for code, name in SW_ICB_CODES.items():
    print(f"{code}: {name}")

print("\n--- Target Data Levels ---")
print(TARGET_AREA_TYPE_IDS)

--- Target NHS South West ICBs ---
QWE: NHS Bristol, North Somerset and South Gloucestershire
QSL: NHS Bath and North East Somerset, Swindon and Wiltshire
QJK: NHS Cornwall and Isles of Scilly
QKK: NHS Devon
QU9: NHS Dorset
QVV: NHS Gloucestershire
QJ2: NHS Somerset

--- Target Data Levels ---
{'LSOA': 3, 'MSOA': 170, 'GP': 7}


## Next Steps

This notebook provides a clean, targeted list of QOF prevalence indicators. The next notebook, **07_downloading_sw_prevalence_data.ipynb**, will use this list and the defined geographies to perform a bulk download of all the data we need for our analysis.