# Script Setup

# Brownsea Island Membership & Equity Analysis Pipeline

## PROJECT OVERVIEW:
### This comprehensive machine learning pipeline analyzes membership visit patterns to Brownsea Island with a focus on socioeconomic equity and strategic intervention planning. The system integrates multiple data sources to identify under-served communities and optimize outreach strategies.

## KEY OBJECTIVES:


1.   Equity Analysis: Identify deprivation patterns and free school meal (FSM) rates across districts
2.   Visit Pattern Prediction: Machine learning models to predict expected visit rates
3.   Strategic Intervention: Classify districts into priority categories for targeted actions

## TECHNICAL STACK:
### Python, scikit-learn, XGBoost, LightGBM, CatBoost, Plotly, Pandas, GeoJSON


In [33]:
!gdown --folder 'https://drive.google.com/drive/folders/1UcsjO_HTJll6BY5YK2acHd49N8lZn6ER?usp=drive_link' -O ./data

Retrieving folder contents
Processing file 1zoKM6ndwVAu6BRs97_N02l4ZzFeHuYVY 2024-2025_england_census.csv
Processing file 1g7EkERqRj_IT1cAvccZAeQpY94h7Avoi 2024-2025_england_school_information.csv
Processing file 1bV9WU9EkeABvSjbuEHLwCusIoiAM4AUX BI visiting members information by post code 2025.csv
Processing file 1Xj96KyaHUy6boLfGIsnPojZ3o9687JpD england_census.csv
Processing file 1x36xVY72mVR-c1Ymts6EfpzF8td8m5C6 england_school_information.csv
Processing file 1X2tEPzjvhNi53Rh2gnGww7aFZX-lLwqh File_7_-_All_IoD2019_Scores__Ranks__Deciles_and_Population_Denominators_3.csv
Processing file 12DgxpYaIm6stMENrexgEHlg_Mb21W60T File_7_IoD2025_All_Ranks_Scores_Deciles_Population_Denominators.csv
Processing file 1KFjGjudE8SC0wg017Y2DhRPdElhfpoPd imd2019lsoa.csv
Processing file 1umN5N2mki87a37Z-eW_I4R2CW5axTcmn Local_Authority_District_to_Region_(April_2025)_Lookup_in_EN_v2.csv
Processing file 1MGrgmq7C3SI4ZjTpcV6FxzbmyKjGk7Ha Local_Authority_Districts_May_2024_Boundaries_UK_BFC_8116196853881618

# Setup

In [35]:
# Install necessary libraries
!pip install -q catboost

In [34]:
# Import Libraries and Setup
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
import lightgbm as lgb
import xgboost as xgb
import catboost as cb
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import warnings
import os
import subprocess
import json
import logging

from google.colab import files, drive
from IPython.display import display

# --- Set display options ---
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

warnings.filterwarnings('ignore')
pio.templates.default = "plotly_white"

# --- Setup Logging ---
logging.basicConfig(
    level=logging.INFO, format='[%(asctime)s] %(levelname)-8s - %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S',
    force=True
)
LOG = logging.getLogger("Membership_Data")
LOG.info("Logging setup successfully")


[2025-11-06 15:54:40] INFO     - Logging setup successfully


# Definition of Constants

In [36]:
class DeprivationConstants:
    """Constants for deprivation calculations"""
    HIGH_DEPRIVATION_POPULATION = 40
    LOW_DEPRIVATION_POPULATION = 40
    HIGH_FSM_THRESHOLD = 15
    MEDIUM_FSM_THRESHOLD = 8
    MOST_DEPRIVED_DECILES = [1, 2, 3]
    MODERATELY_DEPRIVED_DECILES = [4, 5, 6, 7]
    LEAST_DEPRIVED_DECILES = [8, 9, 10]
    LOW_VISIT_RATE_THRESHOLD = 5
    HIGH_VISIT_RATE_THRESHOLD = 10

class InterventionConstants:
    """Constants for intervention thresholds"""
    QUICK_WIN_MAX_DISTANCE_KM = 30
    QUICK_WIN_MIN_GAP = 5
    STRATEGIC_DEPRIVATION_THRESHOLD = 40
    STRATEGIC_FSM_THRESHOLD = 15
    SCALE_POPULATION_THRESHOLD = 50000
    SCALE_MIN_GAP = 3
    SUCCESS_MAX_DISTANCE_KM = 20
    SUCCESS_MIN_VISIT_RATE = 8
    URGENT_ACTION_VISIT_RATE = DeprivationConstants.LOW_VISIT_RATE_THRESHOLD
    URGENT_ACTION_GAP = 5

class GeographicConstants:
    """Constants for geographic areas"""
    BCP_DORSET_POSTCODES = ['BH', 'DT', 'SP']
    DORSET_POSTCODE_AREAS = ['BH', 'DT', 'SP']
    BI_LATITUDE = 50.68900
    BI_LONGITUDE = -1.95732

class TierLabels:
    """Constants for tier labels"""
    HIGH_DEPRIVATION = 'High Deprivation'
    LOW_DEPRIVATION = 'Low Deprivation'
    MIXED_DEPRIVATION = 'Mixed Deprivation'
    HIGH_FSM = 'High FSM'
    MEDIUM_FSM = 'Medium FSM'
    LOW_FSM = 'Low FSM'
    HIGH_VISIT_RATE = 'High Visit Rate'
    MEDIUM_VISIT_RATE = 'Medium Visit Rate'
    LOW_VISIT_RATE = 'Low Visit Rate'
    MOST_DEPRIVED = 'most_deprived'
    MODERATELY_DEPRIVED = 'moderately_deprived'
    LEAST_DEPRIVED = 'least_deprived'

class InterventionTypes:
    """Constants for intervention type labels"""
    EQUITY_PRIORITY = 'Equity Priority (High Deprivation)'
    URGENT_ACTION = 'Urgent Action (High Need)'
    NEARBY_OPPORTUNITY = 'Nearby Opportunity (Quick Win)'
    SCALE_POTENTIAL = 'Scale Potential (Large Population)'
    MODEL_DISTRICT = 'Model District (Success Story)'
    STANDARD_MONITORING = 'Standard Monitoring'

class ModelConstants:
    """Constants for model configuration"""
    TEST_SIZE = 0.2
    RANDOM_STATE = 42
    N_SPLITS_CV = 5

class VisualizationConstants:
    """Constants for visualization settings"""
    GEOJSON_REPO_URL = "https://github.com/missinglink/uk-postcode-polygons.git"
    GEOJSON_LOCAL_PATH = "/content/uk-postcode-polygons"
    MAP_CENTER_LAT = 50.75
    MAP_CENTER_LON = -2.2
    MAP_ZOOM = 8
    MAP_OPACITY = 0.5

CONFIG = {
    "file_paths": {
        "school_info": '/content/data/2024-2025_england_school_information.csv',
        "school_census": '/content/data/2024-2025_england_census.csv',
        "imd_decile": '/content/data/File_7_IoD2025_All_Ranks_Scores_Deciles_Population_Denominators.csv',
        "lad_names": '/content/data/Local_Authority_Districts_May_2024_Boundaries_UK_BFC_8116196853881618041.csv',
        "lad_regions": '/content/data/Local_Authority_District_to_Region_(April_2025)_Lookup_in_EN_v2.csv',
        "membership": '/content/data/BI visiting members information by post code 2025.csv',
        "ons_data": '/content/data/ONSPD_August_2025.csv'
    },
    "BI_coordinates": {
        "latitude": GeographicConstants.BI_LATITUDE,
        "longitude": GeographicConstants.BI_LONGITUDE
    },
    "deprivation_categories": {
        "most_deprived": DeprivationConstants.MOST_DEPRIVED_DECILES,
        "moderately_deprived": DeprivationConstants.MODERATELY_DEPRIVED_DECILES,
        "least_deprived": DeprivationConstants.LEAST_DEPRIVED_DECILES
    },
    "deprivation_thresholds": {
        "population_most_deprived": DeprivationConstants.HIGH_DEPRIVATION_POPULATION,
        "population_least_deprived": DeprivationConstants.LOW_DEPRIVATION_POPULATION
    },
    "intersection_thresholds": {
        "high_fsm": DeprivationConstants.HIGH_FSM_THRESHOLD,
        "medium_fsm": DeprivationConstants.MEDIUM_FSM_THRESHOLD,
        "low_visit_rate": DeprivationConstants.LOW_VISIT_RATE_THRESHOLD,
        "high_visit_rate": DeprivationConstants.HIGH_VISIT_RATE_THRESHOLD
    },
    "intervention_thresholds": {
        "quick_win_distance": InterventionConstants.QUICK_WIN_MAX_DISTANCE_KM,
        "quick_win_gap": InterventionConstants.QUICK_WIN_MIN_GAP,
        "strategic_deprivation": InterventionConstants.STRATEGIC_DEPRIVATION_THRESHOLD,
        "strategic_fsm": InterventionConstants.STRATEGIC_FSM_THRESHOLD,
        "scale_population": InterventionConstants.SCALE_POPULATION_THRESHOLD,
        "scale_gap": InterventionConstants.SCALE_MIN_GAP,
        "success_distance": InterventionConstants.SUCCESS_MAX_DISTANCE_KM,
        "success_visit_rate": InterventionConstants.SUCCESS_MIN_VISIT_RATE
    },
    "visualization": {
        "geojson_repo_url": VisualizationConstants.GEOJSON_REPO_URL,
        "geojson_local_path": VisualizationConstants.GEOJSON_LOCAL_PATH,
        "map_center_lat": VisualizationConstants.MAP_CENTER_LAT,
        "map_center_lon": VisualizationConstants.MAP_CENTER_LON,
        "map_zoom": VisualizationConstants.MAP_ZOOM,
        "dorset_postcode_areas": GeographicConstants.DORSET_POSTCODE_AREAS
    },
    "model_params": {
        "test_size": ModelConstants.TEST_SIZE,
        "random_state": ModelConstants.RANDOM_STATE,
        "n_splits_cv": ModelConstants.N_SPLITS_CV
    },
    "selected_features": [
        'distance_to_BI',
        'avg_imd_decile',
        'avg_income_score',
        'avg_fsm%',
        'avg_employment_score',
        'pop%_most_deprived',
        'pop%_moderately_deprived',
        'pop%_least_deprived'
    ],
    "output_files": {
        "ml_ready_data": 'ml_ready_district_data.csv',
        "three_way_intersection": 'three_way_intersection_analysis_v2.csv'
    }
}


# Data Processing Functions

In [37]:
def get_deprivation_category(imd_decile: int) -> str:
    """Convert IMD decile to deprivation category"""
    if pd.isna(imd_decile):
        return 'unknown'
    imd_decile = int(imd_decile)
    if imd_decile in DeprivationConstants.MOST_DEPRIVED_DECILES:
        return TierLabels.MOST_DEPRIVED
    elif imd_decile in DeprivationConstants.MODERATELY_DEPRIVED_DECILES:
        return TierLabels.MODERATELY_DEPRIVED
    elif imd_decile in DeprivationConstants.LEAST_DEPRIVED_DECILES:
        return TierLabels.LEAST_DEPRIVED
    else:
        return 'unknown'

def get_deprivation_tier(row):
    """Get district-level deprivation tier based on population percentages"""
    if pd.isna(row['pop%_most_deprived']) or pd.isna(row['pop%_least_deprived']):
        return 'Unknown'
    if row['pop%_most_deprived'] >= DeprivationConstants.HIGH_DEPRIVATION_POPULATION:
        return TierLabels.HIGH_DEPRIVATION
    elif row['pop%_least_deprived'] >= DeprivationConstants.LOW_DEPRIVATION_POPULATION:
        return TierLabels.LOW_DEPRIVATION
    else:
        return TierLabels.MIXED_DEPRIVATION

def calculate_haversine_distance(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
    """Calculate haversine distance between two points"""
    R = 6371
    dlat = np.radians(lat2 - lat1)
    dlon = np.radians(lon2 - lon1)
    a = np.sin(dlat/2)**2 + np.cos(np.radians(lat1)) * np.cos(np.radians(lat2)) * np.sin(dlon/2)**2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
    return R * c

def get_outward_code(postcode: str) -> str:
    """Extract outward code from postcode"""
    if pd.isna(postcode):
        return None
    canonical_pc = "".join(str(postcode).upper().split())
    return canonical_pc[:-3] if len(canonical_pc) > 3 else canonical_pc

def load_school_data(school_info_path: str, school_census_path: str) -> pd.DataFrame:
    """Load and merge school data"""
    school_info = pd.read_csv(school_info_path, usecols=['URN', 'POSTCODE'])
    school_census = pd.read_csv(school_census_path, usecols=['URN', 'PNUMFSMEVER', 'NOR'])
    school_info['URN'] = school_info['URN'].astype(str)
    school_census['URN'] = school_census['URN'].astype(str)
    school_census['PNUMFSMEVER'] = pd.to_numeric(
        school_census['PNUMFSMEVER'].astype(str).str.replace('%', ''), errors='coerce'
    )
    school_census['NOR'] = pd.to_numeric(school_census['NOR'], errors='coerce')
    school_info['POSTCODE_CLEAN'] = school_info['POSTCODE'].str.replace(r'\s+', '', regex=True)
    return pd.merge(
        school_info.dropna(subset=['POSTCODE_CLEAN']),
        school_census[['URN', 'PNUMFSMEVER', 'NOR']],
        on='URN'
    )

def calculate_lsoa_fsm_rates(school_df: pd.DataFrame, postcode_to_lsoa_map: pd.DataFrame) -> pd.DataFrame:
    """Calculate weighted FSM averages by LSOA"""
    LOG.info("Calculating weighted FSM averages using school size")
    school_with_lsoa = pd.merge(school_df, postcode_to_lsoa_map, on='POSTCODE_CLEAN', how='left')
    school_with_lsoa = school_with_lsoa.dropna(subset=['lsoa21cd'])
    school_with_lsoa['fsm_count'] = (school_with_lsoa['PNUMFSMEVER'] / 100) * school_with_lsoa['NOR']
    lsoa_fsm_agg = school_with_lsoa.groupby('lsoa21cd').agg({
        'fsm_count': 'sum',
        'NOR': 'sum'
    }).reset_index()
    lsoa_fsm_agg['avg_fsm%'] = (lsoa_fsm_agg['fsm_count'] / lsoa_fsm_agg['NOR']) * 100
    lsoa_fsm_agg['avg_fsm%'] = lsoa_fsm_agg['avg_fsm%'].fillna(0)
    LOG.info(f"Calculated weighted average FSM for {len(lsoa_fsm_agg)} LSOAs")
    return lsoa_fsm_agg[['lsoa21cd', 'avg_fsm%']]

def load_and_clean_ons_data(ons_path: str) -> pd.DataFrame:
    """Load and clean ONS postcode data"""
    ons_cols = ['pcds', 'lsoa21cd', 'lad25cd', 'lat', 'long']
    ons_df = pd.read_csv(ons_path, dtype=str, usecols=ons_cols)
    ons_df.rename(columns={'pcds': 'POSTCODE', 'lsoa21cd': 'lsoa21cd'}, inplace=True)
    ons_df.dropna(subset=['POSTCODE', 'lsoa21cd', 'lad25cd'], inplace=True)
    ons_df['District'] = ons_df['POSTCODE'].apply(get_outward_code)
    ons_df['lat'] = pd.to_numeric(ons_df['lat'], errors='coerce')
    ons_df['long'] = pd.to_numeric(ons_df['long'], errors='coerce')
    ons_df['POSTCODE_CLEAN'] = ons_df['POSTCODE'].str.replace(r'\s+', '', regex=True)
    return ons_df

def load_data(file_paths: dict) -> dict:
    """Loads all raw data files"""
    LOG.info("Loading all raw data files...")
    dataframes = {}
    current_file = None
    try:
        current_file = 'imd_decile'
        dataframes['imd_decile'] = pd.read_csv(file_paths['imd_decile'])
        LOG.info("Loaded IMD decile and population data.")
        current_file = 'lad_names'
        dataframes['lad_names'] = pd.read_csv(file_paths['lad_names'])
        current_file = 'lad_regions'
        try:
            dataframes['lad_regions'] = pd.read_csv(file_paths['lad_regions'])
        except Exception:
            dataframes['lad_regions'] = pd.read_excel(file_paths['lad_regions'])
        current_file = 'membership'
        dataframes['membership'] = pd.read_csv(file_paths['membership'])
        current_file = 'ons_data'
        dataframes['ons_df'] = load_and_clean_ons_data(file_paths['ons_data'])
        current_file = 'school_data'
        school_df = load_school_data(file_paths['school_info'], file_paths['school_census'])
        postcode_to_lsoa_map = dataframes['ons_df'][['POSTCODE_CLEAN', 'lsoa21cd']].drop_duplicates()
        dataframes['lsoa_fsm'] = calculate_lsoa_fsm_rates(school_df, postcode_to_lsoa_map)
        LOG.info("All source data loaded successfully.")
        return dataframes
    except FileNotFoundError as e:
        missing_file = e.filename if hasattr(e, 'filename') else file_paths.get(current_file, 'Unknown File')
        LOG.error(f"Error: Could not find a required data file: {missing_file}", exc_info=True)
        raise e
    except Exception as e:
        LOG.error(f"An error occurred during data loading (loading '{current_file}'): {e}", exc_info=True)
        raise e

def clean_imd_data(imd_decile: pd.DataFrame) -> pd.DataFrame:
    """Clean and prepare IMD data"""
    imd_cols_to_keep = {
        'LSOA code (2021)': 'lsoa21cd',
        'Index of Multiple Deprivation (IMD) Decile (where 1 is most deprived 10% of LSOAs)': 'imd_decile',
        'Income Score (rate)': 'income_score',
        'Employment Score (rate)': 'employment_score',
        'Total population: mid 2022': 'Population'
    }
    missing_imd_cols = [col for col in imd_cols_to_keep if col not in imd_decile.columns]
    if missing_imd_cols:
        LOG.warning(f"IMD Decile file missing expected columns: {missing_imd_cols}")
        imd_cols_to_keep = {k: v for k, v in imd_cols_to_keep.items() if k in imd_decile.columns}
    LOG.info(f"Using IMD columns: {imd_cols_to_keep}")
    imd_decile_clean = imd_decile[list(imd_cols_to_keep.keys())].rename(columns=imd_cols_to_keep)
    imd_decile_clean['lsoa21cd'] = imd_decile_clean['lsoa21cd'].astype(str).str.strip()
    imd_decile_clean['deprivation_category'] = imd_decile_clean['imd_decile'].apply(get_deprivation_category)
    return imd_decile_clean

def clean_membership_data(membership_df: pd.DataFrame) -> pd.DataFrame:
    """Clean membership data"""
    membership_df.columns = membership_df.columns.str.strip()
    membership_df.rename(columns={'Primary Supporter Postal District': 'District'}, inplace=True)
    membership_df['District'] = membership_df['District'].astype(str).str.strip().str.upper()
    if 'Visits' not in membership_df.columns:
        LOG.error("Membership file is missing the 'Visits' column.")
        raise KeyError("Missing 'Visits' column in membership file.")
    membership_clean = membership_df[['District', 'Visits']].dropna(subset=['District'])
    LOG.info(f"Cleaned membership data. Found {len(membership_clean)} visit records.")
    return membership_clean

def clean_lad_data(lad_names: pd.DataFrame, lad_regions: pd.DataFrame) -> tuple:
    """Clean LAD names and regions data"""
    lad_names.columns = lad_names.columns.str.lower()
    lad_names.rename(columns={'lad24cd': 'lad25cd', 'lad24nm': 'Authority_Name'}, inplace=True)
    lad_cols_to_merge = ['lad25cd', 'Authority_Name']
    lad_regions.columns = lad_regions.columns.str.lower()
    rename_cols_region = {'lad25cd': 'lad25cd_merge', 'rgn25nm': 'Region_Name'}
    if 'lad25cd' in lad_regions.columns and 'rgn25nm' in lad_regions.columns:
        lad_regions.rename(columns=rename_cols_region, inplace=True)
        region_cols_to_merge = ['lad25cd_merge', 'Region_Name']
    else:
        LOG.warning("Region data not available. Region filtering will be disabled.")
        region_cols_to_merge = []
    return lad_cols_to_merge, region_cols_to_merge

def build_lsoa_master_data(ons_df: pd.DataFrame, imd_decile_clean: pd.DataFrame,
                          lsoa_fsm: pd.DataFrame, lad_names: pd.DataFrame,
                          lad_regions: pd.DataFrame) -> pd.DataFrame:
    """Build LSOA-level master data"""
    lsoa_df = ons_df.drop_duplicates(subset=['lsoa21cd']).drop(
        columns=['POSTCODE', 'POSTCODE_CLEAN', 'District']
    )
    lsoa_data = pd.merge(lsoa_df, imd_decile_clean, on='lsoa21cd', how='left')
    lsoa_data = pd.merge(lsoa_data, lsoa_fsm, on='lsoa21cd', how='left')
    lad_cols_to_merge, region_cols_to_merge = clean_lad_data(lad_names, lad_regions)
    lsoa_data = pd.merge(lsoa_data, lad_names[lad_cols_to_merge], on='lad25cd', how='left')
    if region_cols_to_merge:
        lsoa_data = pd.merge(lsoa_data, lad_regions[region_cols_to_merge],
                           left_on='lad25cd', right_on='lad25cd_merge', how='left')
    lsoa_data['Region_Name'] = lsoa_data.get('Region_Name', 'Unknown')
    lsoa_data['Authority_Name'] = lsoa_data.get('Authority_Name', 'Unknown')
    LOG.info(f"LSOA-level data created with {len(lsoa_data)} rows.")
    return lsoa_data

def clean_and_merge(data: dict) -> tuple:
    """Builds LSOA-level data with categories of deprivation"""
    LOG.info("Building LSOA-level master data...")
    ons_df = data['ons_df'].copy()
    district_lsoa_map = ons_df[['District', 'lsoa21cd']].drop_duplicates().dropna()
    LOG.info(f"Created District-to-LSOA map with {len(district_lsoa_map)} links.")
    imd_decile_clean = clean_imd_data(data['imd_decile'])
    membership_clean = clean_membership_data(data['membership'])
    lsoa_data = build_lsoa_master_data(
        ons_df, imd_decile_clean, data['lsoa_fsm'],
        data['lad_names'], data['lad_regions']
    )
    return lsoa_data, district_lsoa_map, membership_clean, ons_df

def calculate_district_aggregations(district_lsoa_data: pd.DataFrame, selected_features: list) -> dict:
    """Calculate district-level aggregations"""
    base_aggs = {}
    if 'avg_imd_decile' in selected_features:
        base_aggs['imd_decile'] = 'mean'
    if 'avg_income_score' in selected_features:
        base_aggs['income_score'] = 'mean'
    if 'avg_fsm%' in selected_features:
        base_aggs['avg_fsm%'] = 'mean'
    if 'avg_employment_score' in selected_features:
        base_aggs['employment_score'] = 'mean'
    if 'distance_to_BI' in selected_features:
        base_aggs['lat'] = 'mean'
        base_aggs['long'] = 'mean'
    if 'Population' in district_lsoa_data.columns:
        base_aggs['Population'] = 'sum'
    else:
        LOG.error("'Population' column not found in LSOA data.")
        district_lsoa_data['Population'] = 0
        base_aggs['Population'] = 'sum'
    base_aggs['Region_Name'] = [lambda x: x.mode().iloc[0] if not x.mode().empty else 'Unknown']
    base_aggs['Authority_Name'] = [lambda x: x.mode().iloc[0] if not x.mode().empty else 'Unknown']
    base_aggs['lsoa21cd'] = 'count'
    return base_aggs

def calculate_deprivation_percentages(district_lsoa_data: pd.DataFrame, district_features: pd.DataFrame) -> pd.DataFrame:
    """Calculate population-weighted deprivation percentages"""
    if 'Population' not in district_lsoa_data.columns or district_lsoa_data['Population'].sum() <= 0:
        LOG.warning("Could not calculate deprivation percentages. 'Population' column missing or all zero.")
        district_features['pop%_most_deprived'] = 0
        district_features['pop%_moderately_deprived'] = 0
        district_features['pop%_least_deprived'] = 0
        return district_features
    deprivation_population = district_lsoa_data.groupby(
        ['District', 'deprivation_category']
    )['Population'].sum().unstack(fill_value=0)
    total_population = deprivation_population.sum(axis=1)
    if TierLabels.MOST_DEPRIVED in deprivation_population.columns:
        district_features['pop%_most_deprived'] = (
            deprivation_population[TierLabels.MOST_DEPRIVED] / total_population
        ) * 100
    else:
        district_features['pop%_most_deprived'] = 0
    if TierLabels.MODERATELY_DEPRIVED in deprivation_population.columns:
        district_features['pop%_moderately_deprived'] = (
            deprivation_population[TierLabels.MODERATELY_DEPRIVED] / total_population
        ) * 100
    else:
        district_features['pop%_moderately_deprived'] = 0
    if TierLabels.LEAST_DEPRIVED in deprivation_population.columns:
        district_features['pop%_least_deprived'] = (
            deprivation_population[TierLabels.LEAST_DEPRIVED] / total_population
        ) * 100
    else:
        district_features['pop%_least_deprived'] = 0
    cols_to_fill = ['pop%_most_deprived', 'pop%_moderately_deprived', 'pop%_least_deprived']
    for col in cols_to_fill:
        if col in district_features.columns:
            district_features[col] = district_features[col].fillna(0)
    LOG.info("Added population-weighted deprivation percentages.")
    return district_features

def calculate_district_distances(district_features: pd.DataFrame, reserve_coords: dict) -> pd.DataFrame:
    """Calculate distances to Brownsea Island"""
    district_features[['avg_lat', 'avg_long']] = district_features[['avg_lat', 'avg_long']].fillna(0)
    district_features['distance_to_BI'] = calculate_haversine_distance(
        reserve_coords['latitude'], reserve_coords['longitude'],
        district_features['avg_lat'], district_features['avg_long']
    )
    district_features = district_features.drop(columns=['avg_lat', 'avg_long'])
    return district_features

def engineer_features(lsoa_data: pd.DataFrame, district_lsoa_map: pd.DataFrame,
                     membership_df: pd.DataFrame, reserve_coords: dict,
                     selected_features: list) -> pd.DataFrame:
    """Aggregates LSOA data to District level"""
    LOG.info("Engineering district-level features...")
    district_lsoa_data = pd.merge(district_lsoa_map, lsoa_data, on='lsoa21cd')
    base_aggs = calculate_district_aggregations(district_lsoa_data, selected_features)
    district_features = district_lsoa_data.groupby('District').agg(base_aggs)
    if isinstance(district_features.columns, pd.MultiIndex):
        district_features.columns = ['_'.join(col).strip() for col in district_features.columns.values]
    rename_map = {
        'imd_decile_mean': 'avg_imd_decile',
        'income_score_mean': 'avg_income_score',
        'avg_fsm%_mean': 'avg_fsm%',
        'employment_score_mean': 'avg_employment_score',
        'lat_mean': 'avg_lat',
        'long_mean': 'avg_long',
        'Region_Name_<lambda>': 'Region_Name',
        'Authority_Name_<lambda>': 'Authority_Name',
        'lsoa21cd_count': 'lsoa_count',
        'Population_sum': 'Population'
    }
    for col in ['avg_fsm%', 'Region_Name', 'Authority_Name', 'Population']:
        if col in district_features.columns:
            rename_map[col] = col
    district_features.rename(columns=rename_map, inplace=True)
    district_features = calculate_deprivation_percentages(district_lsoa_data, district_features)
    if 'distance_to_BI' in selected_features:
        district_features = calculate_district_distances(district_features, reserve_coords)
    ml_dataset = pd.merge(district_features.reset_index(), membership_df, on='District', how='left')
    ml_dataset['is_member'] = ~ml_dataset['Visits'].isna()
    ml_dataset['Visits'] = ml_dataset['Visits'].fillna(0)
    for col in selected_features:
        if col in ml_dataset.columns:
            if pd.api.types.is_numeric_dtype(ml_dataset[col]):
                ml_dataset[col] = ml_dataset[col].fillna(0)
    LOG.info(f"District-level features created. Total districts: {len(ml_dataset)}")
    return ml_dataset.reset_index(drop=True)


# Modeling Functions

In [38]:
def create_model_pipelines() -> dict:
    """Create model pipelines"""
    return {
        "Random Forest": RandomForestRegressor(
            random_state=ModelConstants.RANDOM_STATE,
            n_jobs=-1
        ),
        "LightGBM": lgb.LGBMRegressor(
            random_state=ModelConstants.RANDOM_STATE
        ),
        "XGBoost": xgb.XGBRegressor(
            random_state=ModelConstants.RANDOM_STATE,
            n_jobs=-1
        ),
        "CatBoost": cb.CatBoostRegressor(
            random_state=ModelConstants.RANDOM_STATE,
            verbose=0
        ),
        "Gradient Boosting": GradientBoostingRegressor(
            random_state=ModelConstants.RANDOM_STATE
        ),
        "Ridge Regression": Ridge(
            random_state=ModelConstants.RANDOM_STATE
        )
    }

def evaluate_single_model(name: str, model: any, X: pd.DataFrame, y_log: pd.Series,
                         cv: KFold) -> dict:
    """Evaluate a single model"""
    try:
        pipeline = Pipeline([
            ('scaler', StandardScaler()),
            ('model', model)
        ])
        r2_scores = cross_val_score(
            pipeline, X, y_log, cv=cv, scoring='r2', n_jobs=-1
        )
        mse_scores = cross_val_score(
            pipeline, X, y_log, cv=cv, scoring='neg_mean_squared_error', n_jobs=-1
        )
        return {
            "Model": name,
            "Mean R2": np.mean(r2_scores),
            "Std R2": np.std(r2_scores),
            "Mean RMSE": np.mean(np.sqrt(-mse_scores))
        }
    except Exception as e:
        LOG.error(f"Error during cross-validation for {name}: {e}")
        return {
            "Model": name,
            "Mean R2": np.nan,
            "Std R2": np.nan,
            "Mean RMSE": np.nan
        }

def train_and_evaluate(X: pd.DataFrame, y: pd.Series, params: dict):
    """Trains multiple models and evaluates them"""
    LOG.info("Training and evaluating models...")
    y_log = np.log1p(y)
    models = create_model_pipelines()
    results_list = []
    trained_pipelines = {}
    cv = KFold(
        n_splits=params['n_splits_cv'],
        shuffle=True,
        random_state=params['random_state']
    )
    for name, model in models.items():
        LOG.info(f"Running cross-validation for {name}...")
        result = evaluate_single_model(name, model, X, y_log, cv)
        results_list.append(result)
        if not pd.isna(result['Mean R2']):
            try:
                pipeline = Pipeline([
                    ('scaler', StandardScaler()),
                    ('model', model)
                ])
                pipeline.fit(X, y_log)
                trained_pipelines[name] = pipeline
                LOG.info(f"{name} - Mean R2: {result['Mean R2']:.4f}")
            except Exception as e:
                LOG.error(f"Error training pipeline for {name}: {e}")
                trained_pipelines[name] = None
        else:
            trained_pipelines[name] = None
    return results_list, trained_pipelines

def extract_feature_importance(model) -> np.ndarray:
    """Extract feature importance from model"""
    if hasattr(model, 'feature_importances_'):
        return model.feature_importances_
    elif hasattr(model, 'coef_'):
        return np.abs(model.coef_)
    else:
        return None

def create_feature_importance_chart(importance_df: pd.DataFrame, best_model_name: str):
    """Create feature importance visualization"""
    top_5_df = importance_df.head(5)
    fig = px.bar(
        top_5_df.sort_values(by='Importance', ascending=True),
        x='Importance',
        y='Feature',
        orientation='h',
        title=f'<b>Top 5 Important Features for {best_model_name}</b>',
        text='Importance_Pct',
        labels={'Importance': 'Importance Score', 'Feature': 'Feature'}
    )
    fig.update_traces(texttemplate='%{text:.2f}%', textposition='outside')
    fig.update_layout(
        yaxis_title=None,
        xaxis_title='Feature Importance Score',
        height=max(400, len(top_5_df) * 50)
    )
    fig.show()

def analyze_feature_importance(pipeline, feature_names, best_model_name):
    """Extracts and displays feature importance"""
    LOG.info("Analyzing feature importance...")
    try:
        model = pipeline.named_steps['model']
        importances = extract_feature_importance(model)
        if importances is not None:
            importance_df = pd.DataFrame({
                'Feature': feature_names,
                'Importance': importances
            }).sort_values(by='Importance', ascending=False)
            importance_df['Importance_Pct'] = (
                importance_df['Importance'] / importance_df['Importance'].sum()
            ) * 100
            print("-"*65)
            print(f"Feature Importance for {best_model_name}")
            print("-"*65)
            create_feature_importance_chart(importance_df, best_model_name)
    except Exception as e:
        LOG.error(f"Error analyzing feature importance: {e}")

# Intervention Analysis Functions

In [39]:
def check_intervention_conditions(row) -> dict:
    """Check all intervention conditions"""
    thresholds = CONFIG['intervention_thresholds']
    return {
        'quick_win': all([
            row['distance_to_BI'] <= thresholds['quick_win_distance'],
            row['visits_gap'] >= thresholds['quick_win_gap']
        ]),
        'strategic_equity': all([
            row['pop%_most_deprived'] >= thresholds['strategic_deprivation'],
            row['avg_fsm%'] >= thresholds['strategic_fsm']
        ]),
        'scale_opportunity': all([
            row['Population'] >= thresholds['scale_population'],
            row['visits_gap'] >= thresholds['scale_gap']
        ]),
        'success_story': all([
            row['distance_to_BI'] <= thresholds['success_distance'],
            row['visits_per_1000'] >= thresholds['success_visit_rate'],
            row['avg_fsm%'] < DeprivationConstants.HIGH_FSM_THRESHOLD
        ])
    }

def check_urgent_action_condition(row) -> bool:
    """Check urgent action condition"""
    return (
        row['avg_fsm%'] >= DeprivationConstants.HIGH_FSM_THRESHOLD and
        row['visits_per_1000'] <= InterventionConstants.URGENT_ACTION_VISIT_RATE and
        row['visits_gap'] >= InterventionConstants.URGENT_ACTION_GAP
    )

def diagnose_intervention_type(row):
    """Dynamically assign intervention type"""
    conditions = check_intervention_conditions(row)
    urgent_action = check_urgent_action_condition(row)
    intervention_type = None
    priority_reason = ""
    if conditions['strategic_equity']:
        intervention_type = InterventionTypes.EQUITY_PRIORITY
        priority_reason = "Highest priority: High deprivation + high FSM"
    elif urgent_action:
        intervention_type = InterventionTypes.URGENT_ACTION
        priority_reason = "Critical equity gap: High FSM + low visit rate"
    elif conditions['quick_win']:
        intervention_type = InterventionTypes.NEARBY_OPPORTUNITY
        priority_reason = "High potential: Nearby with significant membership gap"
    elif conditions['scale_opportunity']:
        intervention_type = InterventionTypes.SCALE_POTENTIAL
        priority_reason = "Large population with growth opportunity"
    elif conditions['success_story']:
        intervention_type = InterventionTypes.MODEL_DISTRICT
        priority_reason = "High performance: Good visit rate nearby"
    else:
        intervention_type = InterventionTypes.STANDARD_MONITORING
        priority_reason = "Balanced performance: Meets expectations"
    diagnosis = {
        'intervention_type': intervention_type,
        'priority_reason': priority_reason,
        'conditions_met': {k: v for k, v in conditions.items() if v},
        'all_conditions': conditions,
        'urgent_action': urgent_action
    }
    return intervention_type, diagnosis

def filter_bcp_dorset_districts(data: pd.DataFrame) -> pd.DataFrame:
    """Filter for BCP and Dorset districts"""
    return data[
        data['District'].str.contains(
            '|'.join([f'^{area}' for area in GeographicConstants.BCP_DORSET_POSTCODES]),
            na=False
        )
    ].copy()

def add_model_predictions(analysis_df: pd.DataFrame, trained_pipeline, X: pd.DataFrame) -> pd.DataFrame:
    """Add model predictions to analysis data"""
    if trained_pipeline is None:
        analysis_df['predicted_visit_rate'] = np.nan
        analysis_df['performance_gap'] = np.nan
        return analysis_df
    try:
        X_filtered = X.loc[analysis_df.index] if analysis_df.index.isin(X.index).any() else X[X.index.isin(analysis_df.index)]
        if not X_filtered.empty:
            predictions_log = trained_pipeline.predict(X_filtered)
            predictions = np.expm1(predictions_log)
            analysis_df['predicted_visit_rate'] = predictions
            analysis_df['performance_gap'] = analysis_df['predicted_visit_rate'] - analysis_df['visits_per_1000']
        else:
            analysis_df['predicted_visit_rate'] = np.nan
            analysis_df['performance_gap'] = np.nan
        analysis_df['predicted_visit_rate'] = analysis_df['predicted_visit_rate'].fillna(0)
        analysis_df['performance_gap'] = analysis_df['performance_gap'].fillna(0)
    except Exception as e:
        LOG.error(f"Error generating model predictions: {e}")
        analysis_df['predicted_visit_rate'] = np.nan
        analysis_df['performance_gap'] = np.nan
    return analysis_df

def calculate_visit_metrics(data: pd.DataFrame) -> pd.DataFrame:
    """Calculate visit rates and gaps"""
    data = data.copy()
    data['Population'] = pd.to_numeric(data['Population'], errors='coerce').replace(0, np.nan)
    data['visits_per_1000'] = (data['Visits'] / data['Population']) * 1000
    data['visits_per_1000'] = data['visits_per_1000'].fillna(0)
    target_visit_rate = 10
    data['visits_gap_raw'] = target_visit_rate - data['visits_per_1000']
    data['visits_gap'] = data['visits_gap_raw'].clip(lower=0)
    return data

def assign_intervention_types(data: pd.DataFrame) -> pd.DataFrame:
    """Assign intervention types to all districts"""
    data = data.copy()
    intervention_results = []
    diagnosis_info = []
    for _, row in data.iterrows():
        intervention_type, diagnosis = diagnose_intervention_type(row)
        intervention_results.append(intervention_type)
        diagnosis_info.append(diagnosis)
    data['intervention_type'] = intervention_results
    data['intervention_diagnosis'] = diagnosis_info
    return data

def get_fsm_tier(row) -> str:
    """Categorize FSM tier"""
    if pd.isna(row['avg_fsm%']):
        return 'Unknown'
    if row['avg_fsm%'] >= DeprivationConstants.HIGH_FSM_THRESHOLD:
        return TierLabels.HIGH_FSM
    elif row['avg_fsm%'] >= DeprivationConstants.MEDIUM_FSM_THRESHOLD:
        return TierLabels.MEDIUM_FSM
    else:
        return TierLabels.LOW_FSM

def get_visit_rate_tier(row) -> str:
    """Categorize visit rate tier"""
    if pd.isna(row['visits_per_1000']):
        return 'Unknown'
    visit_rate = row['visits_per_1000']
    if visit_rate >= DeprivationConstants.HIGH_VISIT_RATE_THRESHOLD:
        return TierLabels.HIGH_VISIT_RATE
    elif visit_rate >= DeprivationConstants.LOW_VISIT_RATE_THRESHOLD:
        return TierLabels.MEDIUM_VISIT_RATE
    else:
        return TierLabels.LOW_VISIT_RATE

def apply_categorizations(data: pd.DataFrame) -> pd.DataFrame:
    """Apply all categorizations to data"""
    data = data.copy()
    data['deprivation_tier'] = data.apply(get_deprivation_tier, axis=1)
    data['fsm_tier'] = data.apply(get_fsm_tier, axis=1)
    data['visit_rate_tier'] = data.apply(get_visit_rate_tier, axis=1)
    data['intersection_segment'] = (
        data['deprivation_tier'] + " + " +
        data['fsm_tier'] + " + " +
        data['visit_rate_tier']
    )
    return data

def create_priority_matrix_categories() -> pd.DataFrame:
    """Create priority action matrix categories"""
    matrix_categories = [
        {
            "Quadrant": "Urgent Action",
            "Need Criteria": "FSM ≥15% OR Deprivation ≥40%",
            "Visit Rate Range": f"≤{DeprivationConstants.LOW_VISIT_RATE_THRESHOLD} visits/1000",
            "Description": "High-need areas with very low engagement",
            "Strategic Focus": "Immediate equity interventions"
        },
        {
            "Quadrant": "High Priority",
            "Need Criteria": "FSM ≥15% OR Deprivation ≥40%",
            "Visit Rate Range": f"{DeprivationConstants.LOW_VISIT_RATE_THRESHOLD}-{DeprivationConstants.HIGH_VISIT_RATE_THRESHOLD} visits/1000",
            "Description": "High-need areas with moderate engagement",
            "Strategic Focus": "Targeted outreach and retention"
        },
        {
            "Quadrant": "Maintain",
            "Need Criteria": "FSM <15% AND Deprivation <40%",
            "Visit Rate Range": f"≥{DeprivationConstants.LOW_VISIT_RATE_THRESHOLD} visits/1000",
            "Description": "Lower-need areas with good engagement",
            "Strategic Focus": "Sustain current performance"
        },
        {
            "Quadrant": "Growth Opportunity",
            "Need Criteria": "FSM <15% AND Deprivation <40%",
            "Visit Rate Range": f"≤{DeprivationConstants.LOW_VISIT_RATE_THRESHOLD} visits/1000",
            "Description": "Lower-need areas with growth potential",
            "Strategic Focus": "Expansion and awareness campaigns"
        }
    ]
    return pd.DataFrame(matrix_categories).set_index("Quadrant")

def create_intervention_categories() -> pd.DataFrame:
    """Create intervention strategy categories"""
    intervention_categories = [
        {
            "Intervention Type": InterventionTypes.EQUITY_PRIORITY,
            "Priority Level": "Highest",
            "Key Conditions": "≥40% most deprived + ≥15% FSM",
            "Strategic Goal": "Address deepest social inequality",
            "Resource Allocation": "Maximum investment"
        },
        {
            "Intervention Type": InterventionTypes.URGENT_ACTION,
            "Priority Level": "Very High",
            "Key Conditions": f"≥15% FSM + ≤{DeprivationConstants.LOW_VISIT_RATE_THRESHOLD} visits/1000 + ≥5 gap",
            "Strategic Goal": "Immediate support for underserved high-need areas",
            "Resource Allocation": "High investment"
        },
        {
            "Intervention Type": InterventionTypes.NEARBY_OPPORTUNITY,
            "Priority Level": "High",
            "Key Conditions": "≤30km distance + ≥5 visits gap",
            "Strategic Goal": "Rapid growth in accessible underperforming areas",
            "Resource Allocation": "Moderate investment"
        },
        {
            "Intervention Type": InterventionTypes.SCALE_POTENTIAL,
            "Priority Level": "Medium",
            "Key Conditions": "≥50,000 population + ≥3 visits gap",
            "Strategic Goal": "Maximize impact through scale",
            "Resource Allocation": "Strategic investment"
        },
        {
            "Intervention Type": InterventionTypes.MODEL_DISTRICT,
            "Priority Level": "Learning",
            "Key Conditions": f"≤20km distance + ≥{DeprivationConstants.HIGH_VISIT_RATE_THRESHOLD} visits/1000 + <15% FSM",
            "Strategic Goal": "Learn and replicate successful patterns",
            "Resource Allocation": "Study and documentation"
        },
        {
            "Intervention Type": InterventionTypes.STANDARD_MONITORING,
            "Priority Level": "Routine",
            "Key Conditions": "Balanced performance: Meets baseline expectations",
            "Strategic Goal": "Maintain performance and monitor changes",
            "Resource Allocation": "Minimal intervention"
        }
    ]
    df = pd.DataFrame(intervention_categories).set_index("Intervention Type")
    return df.drop(columns=['Resource Allocation'])

# Visualization Functions

In [40]:
def create_priority_matrix_plot(data: pd.DataFrame) -> go.Figure:
    """Create priority action matrix scatter plot"""
    if 'impact_score' not in data.columns:
        data['impact_score'] = (
            data['pop%_most_deprived'] * 0.4 +
            data['avg_fsm%'] * 0.3 +
            (100 - data['visits_per_1000'] * 10) * 0.3
        )
    def get_quadrant(row):
        high_need = (row['avg_fsm%'] >= DeprivationConstants.HIGH_FSM_THRESHOLD or
                    row['pop%_most_deprived'] >= DeprivationConstants.HIGH_DEPRIVATION_POPULATION)
        low_visit_rate = (row['visits_per_1000'] <= CONFIG['intersection_thresholds']['low_visit_rate'])
        medium_visit_rate = (CONFIG['intersection_thresholds']['low_visit_rate'] <
                           row['visits_per_1000'] <= CONFIG['intersection_thresholds']['high_visit_rate'])
        if high_need and low_visit_rate:
            return 'Urgent Action'
        elif high_need and medium_visit_rate:
            return 'High Priority'
        elif not high_need and row['visits_per_1000'] >= CONFIG['intersection_thresholds']['low_visit_rate']:
            return 'Maintain'
        else:
            return 'Growth Opportunity'
    data['quadrant'] = data.apply(get_quadrant, axis=1)
    fig = px.scatter(
        data,
        x='avg_fsm%',
        y='visits_per_1000',
        size='Population',
        color='quadrant',
        hover_name='District',
        hover_data=['Authority_Name', 'impact_score', 'distance_to_BI', 'pop%_most_deprived', 'avg_fsm%'],
        title='<b>BCP & Dorset Districts: Priority Action Matrix</b><br>Size=Population, Color=Priority Quadrant (FSM OR Deprivation)',
        labels={
            'avg_fsm%': 'Free School Meal % →',
            'visits_per_1000': 'visit rate (per 1000) ↓',
            'quadrant': 'Priority Quadrant'
        },
        color_discrete_map={
            'Urgent Action': 'red',
            'High Priority': 'orange',
            'Maintain': 'green',
            'Growth Opportunity': 'blue'
        }
    )
    for i, row in data.iterrows():
        fig.add_annotation(
            x=row['avg_fsm%'],
            y=row['visits_per_1000'],
            text=row['District'],
            showarrow=False,
            yshift=12,
            font=dict(size=10, color="black"),
            bgcolor=None,
            bordercolor=None,
            borderwidth=0,
            borderpad=0,
            opacity=1.0
        )
    fig.add_hline(y=DeprivationConstants.LOW_VISIT_RATE_THRESHOLD, line_dash="dash", line_color="orange",
                   annotation_text=f"Medium Threshold ({DeprivationConstants.LOW_VISIT_RATE_THRESHOLD})")
    fig.add_hline(y=DeprivationConstants.HIGH_VISIT_RATE_THRESHOLD, line_dash="dash", line_color="green",
                   annotation_text=f"High Threshold ({DeprivationConstants.HIGH_VISIT_RATE_THRESHOLD})")
    fig.add_vline(x=DeprivationConstants.HIGH_FSM_THRESHOLD, line_dash="dash", line_color="red",
                   annotation_text="FSM Threshold")
    fig.add_annotation(x=25, y=DeprivationConstants.HIGH_VISIT_RATE_THRESHOLD + 2,
                      text="HIGH NEED + MODERATE ENGAGEMENT", showarrow=False,
                      font=dict(size=12, color="orange"), bgcolor="rgba(255,255,255,0.9)")
    fig.add_annotation(x=25, y=DeprivationConstants.LOW_VISIT_RATE_THRESHOLD - 2,
                      text="HIGH NEED + LOW ENGAGEMENT", showarrow=False,
                      font=dict(size=12, color="darkred"), bgcolor="rgba(255,255,255,0.9)")
    fig.add_annotation(x=5, y=DeprivationConstants.HIGH_VISIT_RATE_THRESHOLD + 2,
                      text="LOW NEED + GOOD ENGAGEMENT", showarrow=False,
                      font=dict(size=12, color="green"), bgcolor="rgba(255,255,255,0.9)")
    fig.add_annotation(x=5, y=DeprivationConstants.LOW_VISIT_RATE_THRESHOLD - 2,
                      text="LOW NEED + GROWTH OPPORTUNITY", showarrow=False,
                      font=dict(size=12, color="blue"), bgcolor="rgba(255,255,255,0.9)")
    fig.add_annotation(x=20, y=CONFIG['intersection_thresholds']['high_visit_rate'] + 5,
                      text="HIGH NEED = FSM ≥15% OR Deprivation ≥40%",
                      showarrow=False, font=dict(size=10, color="black"),
                      bgcolor="lightyellow")
    fig.update_xaxes(range=[0, max(data['avg_fsm%'].max() * 1.1, 30)])
    fig.update_yaxes(range=[0, max(data['visits_per_1000'].max() * 1.1,
                                  CONFIG['intersection_thresholds']['high_visit_rate'] * 1.2)])
    return fig

def create_intervention_treemap(data: pd.DataFrame) -> go.Figure:
    """Create intervention strategy treemap"""
    intervention_types = data['intervention_type'].unique()
    color_palette = px.colors.qualitative.Plotly
    intervention_color_map = {}
    for i, intervention_type in enumerate(intervention_types):
        intervention_color_map[intervention_type] = color_palette[i % len(color_palette)]
    fig = px.treemap(
        data,
        path=['intervention_type', 'Authority_Name', 'District'],
        values='Population',
        color='intervention_type',
        color_discrete_map=intervention_color_map,
        title='<b>Intervention Strategy Map</b><br>Size=Population, Color=Intervention Type',
        hover_data={
            'visits_per_1000': ':.2f',
            'visits_gap': ':.2f',
            'distance_to_BI': ':.1f',
            'avg_fsm%': ':.1f',
            'pop%_most_deprived': ':.1f',
            'Population': ':,.0f'
        }
    )
    fig.update_layout(
        margin=dict(t=60, l=25, r=25, b=25),
        font=dict(size=14)
    )
    fig.update_traces(
        texttemplate="<b>%{label}</b>",
        textposition="middle center",
        hovertemplate=(
            "<b>%{label}</b><br>"
            "Intervention: %{color}<br>"
            "Population: %{value:,.0f}<br>"
            "Visits/1000: %{customdata[0]:.2f}<br>"
            "Membership Gap: %{customdata[1]:.2f}<br>"
            "Distance: %{customdata[2]:.1f} km<br>"
            "FSM: %{customdata[3]:.1f}%<br>"
            "Most Deprived: %{customdata[4]:.1f}%<br>"
            "<extra></extra>"
        )
    )
    return fig

def create_gap_analysis_chart(data: pd.DataFrame) -> go.Figure:
    """Create performance gap analysis chart"""
    if 'performance_gap' not in data.columns or 'predicted_visit_rate' not in data.columns:
        LOG.warning("Missing required columns for gap analysis")
        return None

    gap_data = data[['District', 'visits_per_1000', 'predicted_visit_rate', 'performance_gap']].copy()
    gap_data = gap_data.sort_values('performance_gap', ascending=False)

    fig = go.Figure()

    # Add actual visits
    fig.add_trace(go.Bar(
        x=gap_data['District'],
        y=gap_data['visits_per_1000'],
        name='Actual Visit Rate',
        marker_color='lightblue'
    ))

    # Add predicted visits
    fig.add_trace(go.Bar(
        x=gap_data['District'],
        y=gap_data['predicted_visit_rate'],
        name='Predicted Visit Rate',
        marker_color='lightcoral'
    ))

    # Add gap annotations
    for i, row in gap_data.iterrows():
        if row['performance_gap'] > 0:
            fig.add_annotation(
                x=row['District'],
                y=max(row['visits_per_1000'], row['predicted_visit_rate']) + 0.5,
                text=f"+{row['performance_gap']:.1f}",
                showarrow=False,
                font=dict(size=10, color="red")
            )

    fig.update_layout(
        title='<b>Performance Gap Analysis: Actual vs Predicted Visit Rates</b>',
        xaxis_title='District',
        yaxis_title='Visits per 1000 People',
        barmode='group',
        height=600,
        showlegend=True
    )

    return fig

def create_equity_gap_visualization(data: pd.DataFrame) -> go.Figure:
    """Create visualization of deprivation-based disparities"""
    equity_data = data.groupby('deprivation_tier').agg({
        'visits_per_1000': 'mean',
        'Population': 'sum',
        'District': 'count'
    }).reset_index()

    fig = px.bar(
        equity_data,
        x='deprivation_tier',
        y='visits_per_1000',
        title='<b>Equity Gap: Visit Rates by Deprivation Level</b>',
        labels={'visits_per_1000': 'Average Visits per 1000 People', 'deprivation_tier': 'Deprivation Tier'},
        color='deprivation_tier',
        color_discrete_map={
            TierLabels.HIGH_DEPRIVATION: 'red',
            TierLabels.MIXED_DEPRIVATION: 'orange',
            TierLabels.LOW_DEPRIVATION: 'green'
        }
    )

    fig.update_traces(marker_line_width=0)
    fig.update_layout(showlegend=False)

    return fig

def create_capacity_planning_visualization(data: pd.DataFrame) -> go.Figure:
    """Visualize intervention workload distribution"""
    workload_data = data.groupby('intervention_type').agg({
        'District': 'count',
        'Population': 'sum'
    }).reset_index()

    fig = px.sunburst(
        workload_data,
        path=['intervention_type'],
        values='Population',
        title='<b>Intervention Workload Distribution</b><br>Size indicates population coverage',
        hover_data=['District'],
        width=800,  # Increased width
        height=600  # Increased height
    )

    fig.update_layout(
        margin=dict(t=100, l=50, r=50, b=50),
        font=dict(size=14)
    )

    # Increase text size for better visibility
    fig.update_traces(
        textinfo='label+value+percent parent',
        insidetextorientation='radial',
        textfont=dict(size=12)
    )

    return fig

def prepare_visualization_data(lsoa_master_df: pd.DataFrame, district_lsoa_map: pd.DataFrame,
                              ml_dataset: pd.DataFrame) -> pd.DataFrame:
    """Prepare data for visualizations"""
    try:
        print("Preparing visualization data...")
        district_lsoa_data = pd.merge(district_lsoa_map, lsoa_master_df, on='lsoa21cd')

        # Get modal IMD decile for each district
        modal_deciles = district_lsoa_data.dropna(subset=['imd_decile']).groupby('District')['imd_decile'].apply(
            lambda x: x.mode().iloc[0] if not x.mode().empty else None
        ).reset_index().rename(columns={'imd_decile': 'modal_imd_decile'})

        # Get visit data
        district_visits = ml_dataset[['District', 'Visits', 'visits_per_1000', 'Population']].drop_duplicates()

        # Merge everything
        district_summary = pd.merge(district_visits, modal_deciles.dropna(subset=['modal_imd_decile']), on='District', how='inner')
        district_summary.dropna(subset=['modal_imd_decile'], inplace=True)

        if all(col in ml_dataset.columns for col in ['pop%_most_deprived', 'pop%_least_deprived']):
            deprivation_cols = ['District', 'pop%_most_deprived', 'pop%_least_deprived']
            if 'pop%_moderately_deprived' in ml_dataset.columns:
                deprivation_cols.append('pop%_moderately_deprived')
            deprivation_segments = ml_dataset[deprivation_cols].drop_duplicates()
            district_summary = pd.merge(district_summary, deprivation_segments, on='District', how='left')

            def assign_population_segment(row):
                if pd.isna(row['pop%_most_deprived']) or pd.isna(row['pop%_least_deprived']):
                    return 'Unknown'
                if row['pop%_most_deprived'] >= DeprivationConstants.HIGH_DEPRIVATION_POPULATION:
                    return 'Most Deprived'
                elif row['pop%_least_deprived'] >= DeprivationConstants.LOW_DEPRIVATION_POPULATION:
                    return 'Least Deprived'
                else:
                    return 'Mixed'

            district_summary['deprivation_segment'] = district_summary.apply(assign_population_segment, axis=1)

        return district_summary

    except Exception as e:
        LOG.error(f"Error preparing data for visualizations: {e}")
        print(f"Debug: Available columns in ml_dataset: {list(ml_dataset.columns)}")
        return None

def load_geojson_data(viz_config: dict) -> dict:
    """Load GeoJSON data for maps"""
    if not os.path.exists(viz_config['geojson_local_path']):
        LOG.info("Cloning GeoJSON repository...")
        subprocess.run(['git', 'clone', viz_config['geojson_repo_url'], viz_config['geojson_local_path']])
    else:
        LOG.info("GeoJSON repository already exists locally.")
    combined_geojson = {"type": "FeatureCollection", "features": []}
    geojson_base_path = os.path.join(viz_config['geojson_local_path'], 'geojson')
    for area in viz_config['dorset_postcode_areas']:
        geojson_path = os.path.join(geojson_base_path, f"{area}.geojson")
        try:
            with open(geojson_path) as f:
                data = json.load(f)
                combined_geojson['features'].extend(data['features'])
        except FileNotFoundError:
            LOG.warning(f"GeoJSON file not found for {area}. Skipping.")
    if not combined_geojson['features']:
        LOG.error("No GeoJSON features loaded. Cannot create maps.")
        return None
    return combined_geojson

def create_choropleth_map(district_summary: pd.DataFrame, combined_geojson: dict,
                         color_column: str, title: str, color_scale: str) -> go.Figure:
    """Create a choropleth map with district labels"""

    def get_centroid(feature):
        """Calculate centroid for a GeoJSON feature"""
        if feature['geometry']['type'] == 'Polygon':
            coords = feature['geometry']['coordinates'][0]
        elif feature['geometry']['type'] == 'MultiPolygon':
            coords = feature['geometry']['coordinates'][0][0]
        else:
            return None
        lats = [coord[1] for coord in coords]
        lons = [coord[0] for coord in coords]
        return {'lat': sum(lats) / len(lats), 'lon': sum(lons) / len(lons)}

    # Calculate centroids for all districts
    district_centroids = {}
    for feature in combined_geojson['features']:
        district_name = feature['properties'].get('name')
        if district_name:
            centroid = get_centroid(feature)
            if centroid:
                district_centroids[district_name] = centroid

    # Add centroid coordinates to the data
    district_summary = district_summary.copy()
    district_summary['centroid_lat'] = district_summary['District'].map(
        lambda x: district_centroids.get(x, {}).get('lat', 0)
    )
    district_summary['centroid_lon'] = district_summary['District'].map(
        lambda x: district_centroids.get(x, {}).get('lon', 0)
    )

    # Determine if the color column is categorical or numeric
    is_categorical = district_summary[color_column].dtype == 'object'

    if is_categorical:
        unique_categories = district_summary[color_column].unique()
        if color_column == 'deprivation_segment':
            color_discrete_map = {
                'Most Deprived': '#d73027',
                'Mixed': '#fdae61',
                'Least Deprived': '#1a9850'
            }
        else:
            default_colors = px.colors.qualitative.Set3
            color_discrete_map = {cat: default_colors[i % len(default_colors)]
                                for i, cat in enumerate(unique_categories)}

        fig = px.choropleth_mapbox(
            district_summary,
            geojson=combined_geojson,
            locations='District',
            featureidkey="properties.name",
            color=color_column,
            color_discrete_map=color_discrete_map,
            mapbox_style="carto-positron",
            zoom=VisualizationConstants.MAP_ZOOM,
            center={"lat": VisualizationConstants.MAP_CENTER_LAT, "lon": VisualizationConstants.MAP_CENTER_LON},
            opacity=VisualizationConstants.MAP_OPACITY,
            title=f'<b>{title}</b><br><sub>Postcode districts labeled</sub>',
            labels={color_column: title.split(' of ')[-1]},
            hover_data=['Population', 'visits_per_1000'] + (
                ['deprivation_segment'] if 'deprivation_segment' in district_summary.columns else []
            )
        )
    else:
        fig = px.choropleth_mapbox(
            district_summary,
            geojson=combined_geojson,
            locations='District',
            featureidkey="properties.name",
            color=color_column,
            color_continuous_scale=color_scale,
            range_color=(0, district_summary[color_column].quantile(0.95)),
            mapbox_style="carto-positron",
            zoom=VisualizationConstants.MAP_ZOOM,
            center={"lat": VisualizationConstants.MAP_CENTER_LAT, "lon": VisualizationConstants.MAP_CENTER_LON},
            opacity=VisualizationConstants.MAP_OPACITY,
            title=f'<b>{title}</b><br><sub>Postcode districts labeled</sub>',
            labels={color_column: title.split(' of ')[-1]},
            hover_data=['Population', 'visits_per_1000'] + (
                ['deprivation_segment'] if 'deprivation_segment' in district_summary.columns else []
            )
        )

    # Add district labels as text annotations
    for i, row in district_summary.iterrows():
        if not pd.isna(row['centroid_lat']) and row['centroid_lat'] != 0:
            fig.add_trace(
                go.Scattermapbox(
                    lat=[row['centroid_lat']],
                    lon=[row['centroid_lon']],
                    mode='text',
                    text=[row['District']],
                    textfont=dict(size=12, color='black', weight='bold'),
                    showlegend=False,
                    hoverinfo='skip'
                )
            )

    fig.update_layout(
        margin={"r": 0, "t": 60, "l": 0, "b": 0},
    )

    if not is_categorical:
        fig.update_layout(coloraxis_colorbar=dict(title=title.split(' of ')[-1]))

    return fig

def create_choropleth_visualizations(data: pd.DataFrame, lsoa_master_df: pd.DataFrame,
                                   district_lsoa_map: pd.DataFrame, viz_config: dict):
    """Create and display choropleth maps for the analysis"""
    print("\nGeographic Analysis: Choropleth Maps")
    print("-"*65)

    # Prepare data
    district_summary = prepare_visualization_data(lsoa_master_df, district_lsoa_map, data)
    if district_summary is None:
        print("Could not prepare data for choropleth maps")
        return

    # Filter for BCP & Dorset districts
    dorset_districts_summary = district_summary[
        district_summary['District'].str.contains(
            '|'.join(f'^{area}' for area in viz_config['dorset_postcode_areas']),
            na=False, case=False
        )
    ]

    if dorset_districts_summary.empty:
        LOG.info("No Dorset districts found for choropleth maps")
        return

    # Load GeoJSON data
    combined_geojson = load_geojson_data(viz_config)
    if combined_geojson is None:
        LOG.info("Could not load GeoJSON data for maps")
        return

    # Create and display maps
    visits_map = create_choropleth_map(
        dorset_districts_summary, combined_geojson, 'Visits',
        'Geographic Distribution of Member Visits', 'viridis'
    )
    visits_map.show()

    visit_rate_map = create_choropleth_map(
        dorset_districts_summary, combined_geojson, 'visits_per_1000',
        'Geographic Distribution of Visit Rate (per 1000 people)', 'blues'
    )
    visit_rate_map.show()

    if 'deprivation_segment' in dorset_districts_summary.columns:
        deprivation_map = create_choropleth_map(
            dorset_districts_summary, combined_geojson, 'deprivation_segment',
            'Geographic Distribution of Deprivation Categories', None
        )
        deprivation_map.show()

    LOG.info("Choropleth maps displayed successfully!")

# Analysis & Reporting Functions

In [41]:
def display_analysis_statistics(data: pd.DataFrame):
    """Display comprehensive analysis statistics"""
    print("-"*65)
    print("Analysis Statistics")
    print("-"*65)

    # Intervention type distribution
    intervention_dist = data['intervention_type'].value_counts()
    print("Intervention Type Distribution:")
    for intervention, count in intervention_dist.items():
        print(f"  {intervention}: {count} districts")
    print()

    # Deprivation tiers
    deprivation_dist = data['deprivation_tier'].value_counts()
    print("Deprivation Tiers:")
    for tier, count in deprivation_dist.items():
        print(f"  {tier}: {count} districts")
    print()

    # FSM tiers
    fsm_dist = data['fsm_tier'].value_counts()
    print("FSM Tiers:")
    for tier, count in fsm_dist.items():
        print(f"  {tier}: {count} districts")
    print()

    # Visit rate tiers
    visit_dist = data['visit_rate_tier'].value_counts()
    print("Visit Rate Tiers:")
    for tier, count in visit_dist.items():
        print(f"  {tier}: {count} districts")
    print()

    # Visit rate thresholds reminder
    print("Visit Rate Thresholds:")
    print(f"  Low Visit Rate: < {DeprivationConstants.LOW_VISIT_RATE_THRESHOLD} visits/1000")
    print(f"  Medium Visit Rate: {DeprivationConstants.LOW_VISIT_RATE_THRESHOLD} - {DeprivationConstants.HIGH_VISIT_RATE_THRESHOLD} visits/1000")
    print(f"  High Visit Rate: ≥ {DeprivationConstants.HIGH_VISIT_RATE_THRESHOLD} visits/1000")
    print()

    # Additional performance stats
    print("Performance Highlights:")
    top_3 = data.nlargest(3, 'visits_per_1000')[['District', 'visits_per_1000']]
    bottom_3 = data.nsmallest(3, 'visits_per_1000')[['District', 'visits_per_1000']]

    print("Top 3 Districts by Visit Rate:")
    for _, row in top_3.iterrows():
        print(f"  {row['District']}: {row['visits_per_1000']:.2f} visits/1000")

    print("\nBottom 3 Districts by Visit Rate:")
    for _, row in bottom_3.iterrows():
        print(f"  {row['District']}: {row['visits_per_1000']:.2f} visits/1000")

    # Equity gap stats
    high_dep_avg = data[data['deprivation_tier'] == TierLabels.HIGH_DEPRIVATION]['visits_per_1000'].mean()
    low_dep_avg = data[data['deprivation_tier'] == TierLabels.LOW_DEPRIVATION]['visits_per_1000'].mean()
    equity_gap = low_dep_avg - high_dep_avg

    print(f"\nEquity Gap (Low vs High Deprivation): {equity_gap:.2f} visits/1000")

    # Geographic coverage
    unique_authorities = data['Authority_Name'].nunique()
    print(f"\nGeographic Coverage: {unique_authorities} unique local authorities")

def create_executive_summary_dashboard(data: pd.DataFrame):
    """Create executive summary dashboard with key metrics"""
    print("="*150)
    print("\4. EXECUTIVE SUMMARY DASHBOARD")
    print("="*150)

    # Key metrics
    total_districts = len(data)
    total_population = data['Population'].sum()
    avg_visit_rate = data['visits_per_1000'].mean()
    total_visits = data['Visits'].sum()

    # Intervention distribution
    intervention_counts = data['intervention_type'].value_counts()

    # Equity metrics
    high_deprivation_districts = len(data[data['deprivation_tier'] == TierLabels.HIGH_DEPRIVATION])
    high_fsm_districts = len(data[data['fsm_tier'] == TierLabels.HIGH_FSM])

    print(f"Total Districts Analyzed: {total_districts}")
    print(f"Total Population Coverage: {total_population:,.0f}")
    print(f"Average Visit Rate: {avg_visit_rate:.2f} visits/1000")
    print(f"Total Member Visits: {total_visits:,.0f}")
    print(f"High Deprivation Districts: {high_deprivation_districts}")
    print(f"High FSM Districts: {high_fsm_districts}")
    print("\n")

    # Top priorities
    urgent_actions = data[data['intervention_type'] == InterventionTypes.URGENT_ACTION]
    equity_priorities = data[data['intervention_type'] == InterventionTypes.EQUITY_PRIORITY]

    if not urgent_actions.empty:
        print("4.1. Top Urgent Action Districts:")
        print("-"*65)
        display(urgent_actions[['District', 'Authority_Name', 'avg_fsm%', 'visits_per_1000']].head(5))
        print("\n")

    if not equity_priorities.empty:
        print("4.2. Equity Priority Districts:")
        print("-"*65)
        display(equity_priorities[['District', 'Authority_Name', 'pop%_most_deprived', 'avg_fsm%']])
        print("\n")

def create_comparative_analysis(data: pd.DataFrame):
    """Create side-by-side comparison of similar districts"""
    print("\n4.3. Comparative Analysis: Similar Districts")
    print("-"*65)

    # Group by deprivation tier and compare performance
    comparison_data = data.groupby('deprivation_tier').agg({
        'visits_per_1000': ['mean', 'std', 'count'],
        'avg_fsm%': 'mean',
        'Population': 'sum'
    }).round(2)

    comparison_data.columns = ['avg_visit_rate', 'std_visit_rate', 'district_count', 'avg_fsm%', 'total_population']
    display(comparison_data)
    print("\n")

    # Compare top vs bottom performers
    top_performers = data.nlargest(3, 'visits_per_1000')[['District', 'Authority_Name', 'visits_per_1000', 'deprivation_tier']]
    bottom_performers = data.nsmallest(3, 'visits_per_1000')[['District', 'Authority_Name', 'visits_per_1000', 'deprivation_tier']]

    print("4.4. Top 3 Performing Districts:")
    print("-"*65)
    display(top_performers)
    print("\n4.5. Bottom 3 Performing Districts:")
    print("-"*65)
    display(bottom_performers)
    print("\n")

def create_quick_wins_highlight(data: pd.DataFrame):
    """Highlight quick win opportunities"""
    print("\n4.6. Quick Win Opportunities")
    print("-"*65)

    quick_wins = data[data['intervention_type'] == InterventionTypes.NEARBY_OPPORTUNITY]
    if not quick_wins.empty:
        display(quick_wins[['District', 'Authority_Name', 'distance_to_BI', 'visits_gap', 'visits_per_1000']].round(2))
    else:
        print("No quick win opportunities identified.")
    print("\n")

def create_geographic_clustering_analysis(data: pd.DataFrame, combined_geojson: dict):
    """Show regional patterns for coordinated interventions"""
    print("\n4.7. Geographic Clustering Analysis")
    print("-"*65)

    # Group by Authority_Name to identify regional patterns
    regional_analysis = data.groupby('Authority_Name').agg({
        'District': 'count',
        'Population': 'sum',
        'visits_per_1000': 'mean',
        'avg_fsm%': 'mean',
        'intervention_type': lambda x: x.mode().iloc[0] if not x.mode().empty else 'Mixed'
    }).round(2)

    regional_analysis = regional_analysis.rename(columns={
        'District': 'num_districts',
        'Population': 'total_population',
        'visits_per_1000': 'avg_visit_rate',
        'intervention_type': 'primary_intervention'
    })

    display(regional_analysis)
    print("\n")

# Export Functions

In [47]:
def export_analysis_results(data: pd.DataFrame):
    """Provide one-click export capabilities"""
    print("="*150)
    print("\5. EXPORT OPTIONS")
    print("="*150)

    # Create comprehensive export data
    export_data = data[[
        'District', 'Authority_Name', 'Region_Name', 'intervention_type',
        'deprivation_tier', 'fsm_tier', 'visit_rate_tier',
        'Population', 'Visits', 'visits_per_1000', 'visits_gap',
        'avg_fsm%', 'pop%_most_deprived', 'pop%_moderately_deprived', 'pop%_least_deprived',
        'distance_to_BI'
    ]].copy()

    # Add calculated fields for export
    if 'performance_gap' in data.columns:
        export_data['performance_gap'] = data['performance_gap']
    if 'predicted_visit_rate' in data.columns:
        export_data['predicted_visit_rate'] = data['predicted_visit_rate']

    # Generate timestamp for filenames
    from datetime import datetime
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

    # Export options
    try:
        # CSV Export
        csv_filename = f"membership_analysis_export_{timestamp}.csv"
        export_data.to_csv(csv_filename, index=False)
        print(f"CSV exported: {csv_filename}")

        # Excel Export with multiple sheets
        excel_filename = f"membership_analysis_export_{timestamp}.xlsx"
        with pd.ExcelWriter(excel_filename, engine='openpyxl') as writer:
            # Main data sheet
            export_data.to_excel(writer, sheet_name='District Analysis', index=False)

            # Summary statistics sheet
            summary_data = data.groupby('intervention_type').agg({
                'District': 'count',
                'Population': 'sum',
                'visits_per_1000': 'mean',
                'visits_gap': 'sum'
            }).round(2)
            summary_data.to_excel(writer, sheet_name='Summary Statistics')

            # Priority districts sheet
            priority_districts = data[data['intervention_type'].isin([
                InterventionTypes.URGENT_ACTION,
                InterventionTypes.EQUITY_PRIORITY
            ])]
            if not priority_districts.empty:
                priority_districts.to_excel(writer, sheet_name='Priority Districts', index=False)

        print(f"Excel exported: {excel_filename}")

        # JSON Export
        json_filename = f"membership_analysis_export_{timestamp}.json"
        export_data.to_json(json_filename, orient='records', indent=2)
        print(f"JSON exported: {json_filename}")

        print(f"\nAll exports completed successfully!")
        print(f"Dataset contains {len(export_data)} districts exported.")

        # Offer download in Colab
        try:
            from google.colab import files
            print("\nDownload options:")
            print(f"files.download('{csv_filename}')  # For CSV")
            print(f"files.download('{excel_filename}')  # For Excel")
            print(f"files.download('{json_filename}')  # For JSON")
        except:
            print("\n(Download functionality available in Google Colab)")

    except Exception as e:
        print(f"Export failed: {e}")
        print("Please ensure you have the required packages installed:")
        print("!pip install openpyxl")

    return export_data

# 3-Way Analysis Function

In [43]:
def analyze_three_way_intersection(ml_dataset: pd.DataFrame, trained_pipeline, X: pd.DataFrame,
                                  lsoa_master_df: pd.DataFrame, district_lsoa_map: pd.DataFrame):
    """
    Comprehensive Analysis: Three-way intersection of Deprivation × FSM Rates × visit rate
    Focused on BCP and Dorset districts
    """
    print("="*150)
    print("3. 3-WAY INTERSECTION ANALYSIS (DEPRIVATION + VISIT RATE + FSM")
    print("="*150)

    # Load and prepare data
    analysis_df = filter_bcp_dorset_districts(ml_dataset)

    # Apply modeling and calculations
    analysis_df = add_model_predictions(analysis_df, trained_pipeline, X)
    analysis_df = calculate_visit_metrics(analysis_df)

    # Apply intervention framework
    analysis_df = assign_intervention_types(analysis_df)

    # Apply categorizations
    analysis_df = apply_categorizations(analysis_df)

    # Display statistics after analysis header
    print("\n")
    display_analysis_statistics(analysis_df)

    # Create strategic framework tables
    print("-"*120)
    print("3.1 Intervention Action Plan and Priority Definitions")
    print("-"*120)
    print("\n3.1.1 Priority Action Ranking")
    print("-"*100)
    matrix_df = create_priority_matrix_categories()
    display(matrix_df)

    print("\n3.1.2 Intervention Strategy Ranking")
    print("-"*100)
    intervention_df = create_intervention_categories()
    display(intervention_df)

    bcp_dorset_df = analysis_df.copy()
    if not bcp_dorset_df.empty:
        numeric_cols = ['visits_per_1000', 'avg_fsm%', 'pop%_most_deprived',
                        'pop%_moderately_deprived', 'pop%_least_deprived',
                        'distance_to_BI', 'performance_gap']
        for col in numeric_cols:
            if col in bcp_dorset_df.columns:
                bcp_dorset_df[col] = bcp_dorset_df[col].apply(lambda x: float(f"{x:.2g}") if pd.notna(x) else x)

    print("-"*120)
    print("3.2 Strategic Visualizations")
    print("-"*120)

    if not bcp_dorset_df.empty:
        # 1. Priority matrix plot
        fig2 = create_priority_matrix_plot(bcp_dorset_df)
        fig2.show()

        # 2. Intervention strategy map
        fig3 = create_intervention_treemap(bcp_dorset_df)
        fig3.show()

        # Gap Analysis
        gap_fig = create_gap_analysis_chart(bcp_dorset_df)
        if gap_fig:
            gap_fig.show()

        # Equity Gap Visualization
        equity_fig = create_equity_gap_visualization(bcp_dorset_df)
        equity_fig.show()

        # Capacity Planning
        capacity_fig = create_capacity_planning_visualization(bcp_dorset_df)
        capacity_fig.show()

        # Geographic visualizations
        create_choropleth_visualizations(ml_dataset, lsoa_master_df, district_lsoa_map, CONFIG['visualization'])

        # Executive Summary
        create_executive_summary_dashboard(bcp_dorset_df)

        # Comparative Analysis
        create_comparative_analysis(bcp_dorset_df)

        # Quick Wins Highlight
        create_quick_wins_highlight(bcp_dorset_df)

        # Geographic Clustering
        create_geographic_clustering_analysis(bcp_dorset_df, load_geojson_data(CONFIG['visualization']))

        # District details with intervention types
        print("\n4.8. District Intervention Summary")
        print("-"*65)
        display(bcp_dorset_df[['District', 'Authority_Name', 'intervention_type', 'avg_fsm%',
                              'pop%_most_deprived', 'pop%_moderately_deprived', 'pop%_least_deprived',
                              'visits_per_1000', 'quadrant']].round(2))

        # Segment analysis
        print("\n4.9. Segment Analysis: Deprivation × FSM × Visit Rate")
        print("-"*65)
        segment_analysis = bcp_dorset_df.groupby(['deprivation_tier', 'fsm_tier', 'visit_rate_tier']).agg({
            'visits_per_1000': 'mean',
            'District': 'count',
            'Population': 'sum',
            'Visits': 'sum'
        }).round(2).reset_index()
        display(segment_analysis[['deprivation_tier', 'fsm_tier', 'visit_rate_tier',
                                'visits_per_1000', 'District', 'Population', 'Visits']])

        # Export Capabilities - Now properly placed as the final section
        export_data = export_analysis_results(bcp_dorset_df)

    else:
        LOG.warning("No BCP or Dorset districts found for strategic analysis")

    # Save results
    LOG.info("Saving BCP & Dorset Analysis Results")
    try:
        output_cols = [
            'District', 'Authority_Name', 'Region_Name', 'intersection_segment',
            'deprivation_tier', 'fsm_tier', 'visit_rate_tier',
            'Population', 'Visits', 'visits_per_1000', 'avg_fsm%',
            'pop%_most_deprived', 'pop%_moderately_deprived', 'pop%_least_deprived',
            'distance_to_BI', 'intervention_type'
        ]
        available_cols = [col for col in output_cols if col in analysis_df.columns]
        intersection_output = analysis_df[available_cols].copy()
        if 'performance_gap' in analysis_df.columns:
            intersection_output['performance_gap'] = analysis_df['performance_gap']
        if 'impact_score' in analysis_df.columns:
            intersection_output['impact_score'] = analysis_df['impact_score']
        numeric_cols = ['visits_per_1000', 'avg_fsm%', 'pop%_most_deprived',
                        'pop%_moderately_deprived', 'pop%_least_deprived',
                        'distance_to_BI', 'performance_gap', 'impact_score']
        for col in numeric_cols:
            if col in intersection_output.columns:
                intersection_output[col] = intersection_output[col].round(2)
        intersection_output.to_csv(CONFIG['output_files']['three_way_intersection'], index=False)
        LOG.info(f"BCP & Dorset three-way intersection analysis saved to '{CONFIG['output_files']['three_way_intersection']}'")
    except Exception as e:
        LOG.error(f"Error saving output file: {e}")
    return analysis_df

# Core Pipeline Functions

In [44]:
def display_deprivation_definitions():
    """Display deprivation definitions"""
    imd_decile_definitions = [
        {
            "IMD Decile Range": "1-3",
            "Deprivation Level": "Most Deprived",
            "National Percentile": "Most deprived 30%",
            "Description": "Areas with highest levels of multiple deprivation"
        },
        {
            "IMD Decile Range": "4-7",
            "Deprivation Level": "Moderately Deprived",
            "National Percentile": "Middle 40%",
            "Description": "Areas with average to moderate deprivation levels"
        },
        {
            "IMD Decile Range": "8-10",
            "Deprivation Level": "Least Deprived",
            "National Percentile": "Least deprived 30%",
            "Description": "Affluent areas with lowest deprivation levels"
        }
    ]
    district_tier_definitions = [
        {
            "District Tier": "High Deprivation",
            "Definition": "≥40% of population live in 'Most Deprived' LSOAs",
            "Calculation": "most deprived ≥ 40% of population",
            "Purpose": "Identifies districts with concentrated deprivation"
        },
        {
            "District Tier": "Low Deprivation",
            "Definition": "≥40% of population live in 'Least Deprived' LSOAs",
            "Calculation": "least deprived ≥ 40% of population",
            "Purpose": "Identifies predominantly affluent districts"
        },
        {
            "District Tier": "Mixed Deprivation",
            "Definition": "Neither High nor Low Deprivation thresholds met",
            "Calculation": "most deprived < 40% AND least deprived < 40%",
            "Purpose": "Districts with balanced or diverse deprivation profiles"
        }
    ]
    print("\n1. LSOA-level Classification (Based on IMD Deciles)")
    print("Each Lower Layer Super Output Area (LSOA) is categorized by its national IMD rank\n")
    imd_df = pd.DataFrame(imd_decile_definitions)
    display(imd_df.style.set_properties(**{'text-align': 'left'})
            .set_table_styles([{'selector': 'th', 'props': [('text-align', 'left')]}]))
    print("\n2. District-level Tiers")
    print("Districts are categorized by the deprivation profile of their population\n")
    tier_df = pd.DataFrame(district_tier_definitions)
    display(tier_df.style.set_properties(**{'text-align': 'left'})
            .set_table_styles([{'selector': 'th', 'props': [('text-align', 'left')]}]))
    print("\nRelationship between LSOA-level Classification and District-level Tiers:")
    print("• LSOA-level: Each small area gets a deprivation category based on IMD decile")
    print("• District-level: We calculate the % of district population living in each LSOA category")
    print("• District tiers are based on these population-weighted percentages\n")
    print("Example:")
    print("• A 'High Deprivation' district might have: 45% in Most Deprived, 30% in Moderate, 25% in Least")
    print("• A 'Mixed Deprivation' district might have: 25% in Most Deprived, 50% in Moderate, 25% in Least")
    print("• A 'Low Deprivation' district might have: 10% in Most Deprived, 30% in Moderate, 60% in Least")
    import sys
    sys.stdout.flush()

def select_best_model(results_df: pd.DataFrame, trained_pipelines: dict) -> tuple:
    """Select the best performing model"""
    best_model_name = results_df.index[0]
    if (pd.isna(results_df.loc[best_model_name, 'Mean R2']) or
        best_model_name not in trained_pipelines or
        trained_pipelines[best_model_name] is None):
        LOG.error(f"Best model '{best_model_name}' failed during training or has NaN R2.")
        for model_name in results_df.index[1:]:
            if (not pd.isna(results_df.loc[model_name, 'Mean R2']) and
                model_name in trained_pipelines and
                trained_pipelines[model_name] is not None):
                best_model_name = model_name
                LOG.warning(f"Falling back to next best model: {best_model_name}")
                break
        else:
            LOG.error("No valid models were successfully trained.")
            raise ValueError("All models failed training.")
    return best_model_name, trained_pipelines[best_model_name]

def prepare_modeling_data(ml_dataset: pd.DataFrame) -> tuple:
    """Prepare data for modeling"""
    for col in CONFIG['selected_features']:
        if col not in ml_dataset.columns:
            LOG.warning(f"Feature '{col}' not found in engineered data. Filling with 0.")
            ml_dataset[col] = 0
    X = ml_dataset[CONFIG['selected_features']]
    y = ml_dataset['visits_per_1000']
    LOG.info(f"Using features: {list(X.columns)}")
    X_train, X_test, y_train, y_test = train_test_split(
        X, y,
        test_size=CONFIG['model_params']['test_size'],
        random_state=CONFIG['model_params']['random_state']
    )
    return X, y

def execute_data_pipeline() -> tuple:
    """Execute data loading and processing pipeline"""
    raw_data = load_data(CONFIG['file_paths'])
    lsoa_master_df, district_lsoa_map, membership_df, ons_df = clean_and_merge(raw_data)
    ml_dataset = engineer_features(
        lsoa_master_df, district_lsoa_map, membership_df,
        CONFIG['BI_coordinates'], CONFIG['selected_features']
    )
    ml_dataset['Population'] = pd.to_numeric(ml_dataset['Population'], errors='coerce').replace(0, np.nan)
    ml_dataset['visits_per_1000'] = (ml_dataset['Visits'] / ml_dataset['Population']) * 1000
    ml_dataset['visits_per_1000'] = ml_dataset['visits_per_1000'].fillna(0)
    ml_dataset.to_csv(CONFIG['output_files']['ml_ready_data'], index=False)
    LOG.info(f"Consolidated data saved to '{CONFIG['output_files']['ml_ready_data']}'")
    return ml_dataset, lsoa_master_df, district_lsoa_map

def execute_modeling_pipeline(ml_dataset: pd.DataFrame) -> tuple:
    """Execute modeling pipeline"""
    LOG.info("Preparing data for modeling...")
    X, y = prepare_modeling_data(ml_dataset)
    cv_results_list, trained_pipelines = train_and_evaluate(X, y, CONFIG['model_params'])
    results_df = pd.DataFrame(cv_results_list).sort_values('Mean R2', ascending=False)
    results_df = results_df.set_index('Model')
    print("="*150)
    print("1. MODEL PERFORMANCE DETAILS")
    print("="*150)
    print("\nModel Performance Comparison")
    print("-"*65)
    display(results_df)
    print("\n\n")
    best_model_name, final_model_pipeline = select_best_model(results_df, trained_pipelines)
    LOG.info(f"Best performing model: {best_model_name} with R2: {results_df.loc[best_model_name, 'Mean R2']:.4f}")
    analyze_feature_importance(final_model_pipeline, X.columns, best_model_name)
    return X, final_model_pipeline

# Main Function

In [45]:
def main():
    """Main function to run the Machine Learning pipeline"""
    LOG.info("Starting Pipeline for Brownsea Island Membership Visiting Data Analysis")
    ml_dataset = None
    final_model_pipeline = None
    X = None
    lsoa_master_df = None
    district_lsoa_map = None
    try:
        ml_dataset, lsoa_master_df, district_lsoa_map = execute_data_pipeline()
        X, final_model_pipeline = execute_modeling_pipeline(ml_dataset)
        print()
        print("="*150)
        print("2. DEPRIVATION DEFINITIONS")
        print("="*150)
        display_deprivation_definitions()
        print("\n")
        import time
        time.sleep(1)
    except Exception as e:
        LOG.error(f"An error occurred in pipeline: {e}", exc_info=True)
        LOG.error("Pipeline Failed")
        return
    try:
        if ml_dataset is not None and final_model_pipeline is not None and X is not None:
            # Pass all required parameters to the analysis function
            ml_dataset = analyze_three_way_intersection(ml_dataset, final_model_pipeline, X,
                                                       lsoa_master_df, district_lsoa_map)
            LOG.info("Pipeline Complete")
        else:
            LOG.error("Cannot proceed with three-way intersection analysis - required data is missing")
            LOG.error("Pipeline Failed - missing data for analysis")
    except Exception as e:
        LOG.error(f"An error occurred in analysis: {e}", exc_info=True)
        LOG.error("Pipeline Failed during analysis")

# Run Pipeline

In [48]:
if __name__ == '__main__':
    main()

[2025-11-06 16:00:47] INFO     - Starting Pipeline for Brownsea Island Membership Visiting Data Analysis
[2025-11-06 16:00:47] INFO     - Loading all raw data files...
[2025-11-06 16:00:47] INFO     - Loaded IMD decile and population data.
[2025-11-06 16:01:18] INFO     - Calculating weighted FSM averages using school size
[2025-11-06 16:01:20] INFO     - Calculated weighted average FSM for 17124 LSOAs
[2025-11-06 16:01:20] INFO     - All source data loaded successfully.
[2025-11-06 16:01:20] INFO     - Building LSOA-level master data...
[2025-11-06 16:01:21] INFO     - Created District-to-LSOA map with 66212 links.
[2025-11-06 16:01:21] INFO     - Using IMD columns: {'LSOA code (2021)': 'lsoa21cd', 'Index of Multiple Deprivation (IMD) Decile (where 1 is most deprived 10% of LSOAs)': 'imd_decile', 'Income Score (rate)': 'income_score', 'Employment Score (rate)': 'employment_score', 'Total population: mid 2022': 'Population'}
[2025-11-06 16:01:21] INFO     - Cleaned membership data. Fou

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000474 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 3103, number of used features: 8
[LightGBM] [Info] Start training from score 0.120380


[2025-11-06 16:01:51] INFO     - XGBoost - Mean R2: 0.7357
[2025-11-06 16:01:51] INFO     - Running cross-validation for CatBoost...
[2025-11-06 16:02:31] INFO     - CatBoost - Mean R2: 0.7412
[2025-11-06 16:02:31] INFO     - Running cross-validation for Gradient Boosting...
[2025-11-06 16:02:41] INFO     - Gradient Boosting - Mean R2: 0.7685
[2025-11-06 16:02:41] INFO     - Running cross-validation for Ridge Regression...
[2025-11-06 16:02:42] INFO     - Ridge Regression - Mean R2: 0.1950


1. MODEL PERFORMANCE DETAILS

Model Performance Comparison
-----------------------------------------------------------------


Unnamed: 0_level_0,Mean R2,Std R2,Mean RMSE
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Gradient Boosting,0.768484,0.026104,0.119365
LightGBM,0.759471,0.028837,0.121598
Random Forest,0.756781,0.015176,0.122398
CatBoost,0.741242,0.031352,0.126163
XGBoost,0.735652,0.030996,0.127372
Ridge Regression,0.195015,0.017416,0.222684


[2025-11-06 16:02:42] INFO     - Best performing model: Gradient Boosting with R2: 0.7685
[2025-11-06 16:02:42] INFO     - Analyzing feature importance...





-----------------------------------------------------------------
Feature Importance for Gradient Boosting
-----------------------------------------------------------------



2. DEPRIVATION DEFINITIONS

1. LSOA-level Classification (Based on IMD Deciles)
Each Lower Layer Super Output Area (LSOA) is categorized by its national IMD rank



Unnamed: 0,IMD Decile Range,Deprivation Level,National Percentile,Description
0,1-3,Most Deprived,Most deprived 30%,Areas with highest levels of multiple deprivation
1,4-7,Moderately Deprived,Middle 40%,Areas with average to moderate deprivation levels
2,8-10,Least Deprived,Least deprived 30%,Affluent areas with lowest deprivation levels



2. District-level Tiers
Districts are categorized by the deprivation profile of their population



Unnamed: 0,District Tier,Definition,Calculation,Purpose
0,High Deprivation,≥40% of population live in 'Most Deprived' LSOAs,most deprived ≥ 40% of population,Identifies districts with concentrated deprivation
1,Low Deprivation,≥40% of population live in 'Least Deprived' LSOAs,least deprived ≥ 40% of population,Identifies predominantly affluent districts
2,Mixed Deprivation,Neither High nor Low Deprivation thresholds met,most deprived < 40% AND least deprived < 40%,Districts with balanced or diverse deprivation profiles



Relationship between LSOA-level Classification and District-level Tiers:
• LSOA-level: Each small area gets a deprivation category based on IMD decile
• District-level: We calculate the % of district population living in each LSOA category
• District tiers are based on these population-weighted percentages

Example:
• A 'High Deprivation' district might have: 45% in Most Deprived, 30% in Moderate, 25% in Least
• A 'Mixed Deprivation' district might have: 25% in Most Deprived, 50% in Moderate, 25% in Least
• A 'Low Deprivation' district might have: 10% in Most Deprived, 30% in Moderate, 60% in Least


3. 3-WAY INTERSECTION ANALYSIS (DEPRIVATION + VISIT RATE + FSM


-----------------------------------------------------------------
Analysis Statistics
-----------------------------------------------------------------
Intervention Type Distribution:
  Urgent Action (High Need): 18 districts
  Standard Monitoring: 18 districts
  Nearby Opportunity (Quick Win): 5 districts
  Equity Priority 

Unnamed: 0_level_0,Need Criteria,Visit Rate Range,Description,Strategic Focus
Quadrant,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Urgent Action,FSM ≥15% OR Deprivation ≥40%,≤5 visits/1000,High-need areas with very low engagement,Immediate equity interventions
High Priority,FSM ≥15% OR Deprivation ≥40%,5-10 visits/1000,High-need areas with moderate engagement,Targeted outreach and retention
Maintain,FSM <15% AND Deprivation <40%,≥5 visits/1000,Lower-need areas with good engagement,Sustain current performance
Growth Opportunity,FSM <15% AND Deprivation <40%,≤5 visits/1000,Lower-need areas with growth potential,Expansion and awareness campaigns



3.1.2 Intervention Strategy Ranking
----------------------------------------------------------------------------------------------------


Unnamed: 0_level_0,Priority Level,Key Conditions,Strategic Goal
Intervention Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Equity Priority (High Deprivation),Highest,≥40% most deprived + ≥15% FSM,Address deepest social inequality
Urgent Action (High Need),Very High,≥15% FSM + ≤5 visits/1000 + ≥5 gap,Immediate support for underserved high-need areas
Nearby Opportunity (Quick Win),High,≤30km distance + ≥5 visits gap,Rapid growth in accessible underperforming areas
Scale Potential (Large Population),Medium,"≥50,000 population + ≥3 visits gap",Maximize impact through scale
Model District (Success Story),Learning,≤20km distance + ≥10 visits/1000 + <15% FSM,Learn and replicate successful patterns
Standard Monitoring,Routine,Balanced performance: Meets baseline expectations,Maintain performance and monitor changes


------------------------------------------------------------------------------------------------------------------------
3.2 Strategic Visualizations
------------------------------------------------------------------------------------------------------------------------



Geographic Analysis: Choropleth Maps
-----------------------------------------------------------------
Preparing visualization data...


[2025-11-06 16:02:44] INFO     - GeoJSON repository already exists locally.


[2025-11-06 16:02:45] INFO     - Choropleth maps displayed successfully!


. EXECUTIVE SUMMARY DASHBOARD
Total Districts Analyzed: 48
Total Population Coverage: 1,540,437
Average Visit Rate: 4.08 visits/1000
Total Member Visits: 6,274
High Deprivation Districts: 4
High FSM Districts: 28


4.1. Top Urgent Action Districts:
-----------------------------------------------------------------


Unnamed: 0,District,Authority_Name,avg_fsm%,visits_per_1000
191,BH10,"Bournemouth, Christchurch and Poole",21.0,2.9
205,BH23,"Bournemouth, Christchurch and Poole",19.0,4.4
206,BH24,New Forest,16.0,4.1
207,BH25,New Forest,19.0,2.8
214,BH8,"Bournemouth, Christchurch and Poole",17.0,1.6




4.2. Equity Priority Districts:
-----------------------------------------------------------------


Unnamed: 0,District,Authority_Name,pop%_most_deprived,avg_fsm%
190,BH1,"Bournemouth, Christchurch and Poole",53.0,26.0
192,BH11,"Bournemouth, Christchurch and Poole",49.0,31.0
213,BH7,"Bournemouth, Christchurch and Poole",42.0,20.0





4.3. Comparative Analysis: Similar Districts
-----------------------------------------------------------------


Unnamed: 0_level_0,avg_visit_rate,std_visit_rate,district_count,avg_fsm%,total_population
deprivation_tier,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
High Deprivation,2.51,1.29,4,22.5,128258.0
Low Deprivation,5.51,2.86,14,16.07,453626.0
Mixed Deprivation,3.62,2.56,30,17.29,958553.0




4.4. Top 3 Performing Districts:
-----------------------------------------------------------------


Unnamed: 0,District,Authority_Name,visits_per_1000,deprivation_tier
195,BH14,"Bournemouth, Christchurch and Poole",9.7,Low Deprivation
196,BH15,"Bournemouth, Christchurch and Poole",9.6,Mixed Deprivation
197,BH16,Dorset,9.0,Mixed Deprivation



4.5. Bottom 3 Performing Districts:
-----------------------------------------------------------------


Unnamed: 0,District,Authority_Name,visits_per_1000,deprivation_tier
2602,SP9,Wiltshire,0.63,Mixed Deprivation
2594,SP11,Test Valley,0.65,Mixed Deprivation
2593,SP10,Test Valley,0.81,Low Deprivation





4.6. Quick Win Opportunities
-----------------------------------------------------------------


Unnamed: 0,District,Authority_Name,distance_to_BI,visits_gap,visits_per_1000
201,BH2,"Bournemouth, Christchurch and Poole",6.6,8.77,1.2
208,BH3,"Bournemouth, Christchurch and Poole",7.4,7.5,2.5
211,BH5,"Bournemouth, Christchurch and Poole",8.4,7.47,2.5
212,BH6,"Bournemouth, Christchurch and Poole",11.0,5.86,4.1
2599,SP6,New Forest,29.0,7.27,2.7


[2025-11-06 16:02:45] INFO     - GeoJSON repository already exists locally.





4.7. Geographic Clustering Analysis
-----------------------------------------------------------------


Unnamed: 0_level_0,num_districts,total_population,avg_visit_rate,avg_fsm%,primary_intervention
Authority_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"Bournemouth, Christchurch and Poole",18,609356.0,4.77,17.18,Standard Monitoring
Dorset,19,538086.0,4.74,18.37,Standard Monitoring
New Forest,3,74481.0,3.2,16.67,Urgent Action (High Need)
Test Valley,2,112725.0,0.73,23.5,Urgent Action (High Need)
Wiltshire,6,205789.0,1.47,13.1,Standard Monitoring





4.8. District Intervention Summary
-----------------------------------------------------------------


Unnamed: 0,District,Authority_Name,intervention_type,avg_fsm%,pop%_most_deprived,pop%_moderately_deprived,pop%_least_deprived,visits_per_1000,quadrant
190,BH1,"Bournemouth, Christchurch and Poole",Equity Priority (High Deprivation),26.0,53.0,35.0,12.0,0.95,Urgent Action
191,BH10,"Bournemouth, Christchurch and Poole",Urgent Action (High Need),21.0,32.0,45.0,23.0,2.9,Urgent Action
192,BH11,"Bournemouth, Christchurch and Poole",Equity Priority (High Deprivation),31.0,49.0,38.0,13.0,4.1,Urgent Action
193,BH12,"Bournemouth, Christchurch and Poole",Standard Monitoring,25.0,15.0,58.0,27.0,6.8,High Priority
194,BH13,"Bournemouth, Christchurch and Poole",Standard Monitoring,15.0,9.1,34.0,57.0,7.6,High Priority
195,BH14,"Bournemouth, Christchurch and Poole",Model District (Success Story),13.0,0.0,51.0,49.0,9.7,Maintain
196,BH15,"Bournemouth, Christchurch and Poole",Standard Monitoring,25.0,15.0,62.0,23.0,9.6,High Priority
197,BH16,Dorset,Standard Monitoring,18.0,21.0,44.0,35.0,9.0,High Priority
198,BH17,"Bournemouth, Christchurch and Poole",Standard Monitoring,17.0,7.4,43.0,50.0,7.5,High Priority
199,BH18,"Bournemouth, Christchurch and Poole",Model District (Success Story),15.0,0.0,27.0,73.0,9.0,High Priority



4.9. Segment Analysis: Deprivation × FSM × Visit Rate
-----------------------------------------------------------------


Unnamed: 0,deprivation_tier,fsm_tier,visit_rate_tier,visits_per_1000,District,Population,Visits
0,High Deprivation,High FSM,Low Visit Rate,2.52,3,107733.0,240.0
1,High Deprivation,Medium FSM,Low Visit Rate,2.5,1,20525.0,52.0
2,Low Deprivation,High FSM,Low Visit Rate,3.34,5,180479.0,567.0
3,Low Deprivation,High FSM,Medium Visit Rate,7.3,2,65972.0,483.0
4,Low Deprivation,Medium FSM,Low Visit Rate,2.35,2,40454.0,94.0
5,Low Deprivation,Medium FSM,Medium Visit Rate,8.22,5,166721.0,1444.0
6,Mixed Deprivation,High FSM,Low Visit Rate,2.36,13,437950.0,967.0
7,Mixed Deprivation,High FSM,Medium Visit Rate,8.18,5,156523.0,1295.0
8,Mixed Deprivation,Low FSM,Low Visit Rate,1.2,1,20252.0,25.0
9,Mixed Deprivation,Low FSM,Medium Visit Rate,5.5,1,15509.0,85.0


. EXPORT OPTIONS
CSV exported: membership_analysis_export_20251106_160245.csv


[2025-11-06 16:02:45] INFO     - Saving BCP & Dorset Analysis Results
[2025-11-06 16:02:45] INFO     - BCP & Dorset three-way intersection analysis saved to 'three_way_intersection_analysis_v2.csv'
[2025-11-06 16:02:45] INFO     - Pipeline Complete


Excel exported: membership_analysis_export_20251106_160245.xlsx
JSON exported: membership_analysis_export_20251106_160245.json

All exports completed successfully!
Dataset contains 48 districts exported.

Download options:
files.download('membership_analysis_export_20251106_160245.csv')  # For CSV
files.download('membership_analysis_export_20251106_160245.xlsx')  # For Excel
files.download('membership_analysis_export_20251106_160245.json')  # For JSON
