## Commissioner-Level RTT Data Loader (Admitted, Non-Admitted, New)

This script loads RTT (Referral to Treatment) performance metrics at the commissioner level (National, Region, ICB) for the `Admitted`, `Non-Admitted`, and `New` pathway types into the `rtt_data` PostgreSQL table.

### Functionality:
- **User Input**: Specify the Excel file path, year, month, and the RTT pathway type (`Admitted`, `Non-Admitted`, or `New`).
- **Sheet Handling**: Iterates through a fixed set of sheets (`National`, `Region`, `ICB`), skipping any redundant national rows within regional and ICB sheets.
- **Transformation**: Melts the wide-format Excel data into long format, maps geo-levels and organization codes, and attaches relevant metadata.
- **Validation**: Ensures numeric conversion of all values and gracefully handles missing or malformed data.
- **Atomic Insert**: Concatenates all sheet results before writing to the database to avoid partial ingestion.

This script streamlines the ingestion of commissioner-level activity metrics and ensures uniform schema alignment across pathway types.


In [1]:
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.exc import SQLAlchemyError


In [None]:
# === COMMISSIONER RTT DATA LOADER (Admitted, Non-Admitted, New) ===

# === USER INPUT ===
FILE_PATH = '../data/commissioner/NonAdmitted_apr2023_march_2024/NonAdmitted-Commissioner-12-Mar24.xlsx'
YEAR = 2024
MONTH = 3
PATHWAY_TYPE = 'Non-Admitted'  # 'Admitted', 'Non-Admitted', 'New'

# === METRIC LOOKUP TABLE ===
METRICS_LOOKUP = {
    'Admitted': [
        'Total number of completed pathways (all)',
        'Average (median) waiting time (in weeks)',
        '95th percentile waiting time (in weeks)',
        'Total 52 plus weeks'
    ],
    'Non-Admitted': [
        'Total number of completed pathways (all)',
        'Average (median) waiting time (in weeks)',
        '95th percentile waiting time (in weeks)',
        'Total 52 plus weeks'
    ],
    'New': [
        'Number of new RTT clock starts during the month'
    ]
}

# === DB CONNECTION ===
engine = create_engine("postgresql://postgres:<password>@localhost:5432/nhs_dashboard")

# === SHEET CONFIG ===
SHEETS = [
    ('National', 'National', 'NAT', 'NHS ENGLAND'),
    ('Region', 'Region', None, None),
    ('ICB', 'ICB', None, None)
]

# === PROCESS EACH SHEET ===
all_dfs = []

for sheet_name, geo_level, fixed_code, fixed_name in SHEETS:
    print(f"\n Processing sheet: {sheet_name} for PATHWAY_TYPE: {PATHWAY_TYPE}")
    try:
        df = pd.read_excel(FILE_PATH, sheet_name=sheet_name, skiprows=13)
          # === REMOVE DUPLICATE NATIONAL ROWS from non-National sheets ===
        if sheet_name == 'Region':
            df = df[~((df['Region Name'].str.upper() == 'NHS ENGLAND') & (df['Region Code'] == '-'))]
        elif sheet_name =='ICB':
            df = df[~((df['ICB Name'].str.upper() == 'NHS ENGLAND') & (df['ICB Code'] == '-'))]
    except Exception as e:
        print(f"Failed to load sheet '{sheet_name}': {e}")
        raise

    # Assign org_code and org_name
    if geo_level == 'National':
        df['org_code'] = fixed_code
        df['org_name'] = fixed_name
        df['region_code'] = None
    elif geo_level == 'Region':
        df['org_code'] = df['Region Code']
        df['org_name'] = df['Region Name']
        df['region_code'] = df['Region Code']
    elif geo_level == 'ICB':
        df['org_code'] = df['ICB Code']
        df['org_name'] = df['ICB Name']
        df['region_code'] = None

    df['treatment_function_code'] = df['Treatment Function Code']
    df['treatment_function'] = df['Treatment Function']

    melted = df.melt(
        id_vars=['org_code', 'org_name', 'region_code', 'treatment_function_code', 'treatment_function'],
        value_vars=METRICS_LOOKUP[PATHWAY_TYPE],
        var_name='metric',
        value_name='value'
    )

    melted['value'] = pd.to_numeric(melted['value'], errors='coerce')
    melted['year'] = YEAR
    melted['month'] = MONTH
    melted['pathway_type'] = PATHWAY_TYPE
    melted['geo_level'] = geo_level

    all_dfs.append(melted)

# === CONCAT + LOAD ===
final_df = pd.concat(all_dfs, ignore_index=True)
try:
    with engine.begin() as conn:
        final_df.to_sql('rtt_data', conn, if_exists='append', index=False)
    print(f"Loaded {len(final_df)} rows from {FILE_PATH}")
except SQLAlchemyError as e:
    print(f"Operation aborted. Error: {str(e)}")



📄 Processing sheet: National for PATHWAY_TYPE: Non-Admitted

📄 Processing sheet: Region for PATHWAY_TYPE: Non-Admitted

📄 Processing sheet: ICB for PATHWAY_TYPE: Non-Admitted
✅ Loaded 4800 rows from ../data/commissioner/NonAdmitted_apr2023_march_2024/NonAdmitted-Commissioner-12-Mar24.xlsx
