# Pre-processing park datasets

This notebook combines a range of open data to create a basis for further parks-based geospatial methods.

**Note: you need to have downloaded the data listed in the [Acquiring open parks data notebook](00_Acquiring_open_parks_data.ipynb).**

## Acknowledgements

The algorithms and code here were originally conceptualised and developed by Fran Pontin, and extended and refactored by Maeve Murphy Quinlan. Descriptions and test written and edited by both Fran Pontin and Maeve Murphy Quinlan.

## Datasets used

This notebook takes in the following datasets:

1. Local Authority District (LAD) boundaries, specifically for Bradford
2. Ordnance Survey 'OS Open Greenspace data'
3. Openstreetmap Greenspace data
4. Webscraped Bradford district parks

## Local Authoirty Approach

This analysis is designed to be run on a local authority by local authority basis as data availability tends to vary by local authority. We use [May 2024 Local Authority District (LAD) data](https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://geoportal.statistics.gov.uk/datasets/ons::local-authority-districts-may-2024-boundaries-uk-bfe-2/about&ved=2ahUKEwi83bT-pJGOAxWBUUEAHSA-NZsQFnoECAoQAQ&usg=AOvVaw3nYML7UR9GdX1gnUYTH8uz), however this can be updated as and when boundaries change. It is also possible to edit the code to change the geographies used by choosing a different shapefile boundary - e.g. focus on an entire unitary authority or metropolitan area, or focus on a specific town or village.

## OS Open Greenspace

In Great Britain the Ordanace Survey provide 'OS Open Greenspace'-  'Covering a range of greenspaces in urban and rural areas including playing fields, sports’ facilities, play areas and allotments.' The data is provided as polygons of the greenspace areas themseleves and entry points to the greenspace site as point data. 

The data can be downloaded here: https://osdatahub.os.uk/downloads/open/OpenGreenspace


## Bradford Parks

In Bradford https://bradforddistrictparks.org/ provides data on formally recognised parks and greenspaces in Bradford. We web-scraped the parks website to get further information on parks faciltiies and define 'formal' aprks and greenspaces within Bradford.


In [41]:
import pandas as pd
import geopandas as gpd
import fiona
import osmnx as ox
from shapely.ops import unary_union
import re
import folium
from difflib import SequenceMatcher
from pathlib import Path


In [42]:
# Import safer_parks functions
from safer_parks import (
    subset_to_LAD,
    merge_touching_or_intersecting_polygons_condense,
    clean_and_deduplicate,
    match_parks_to_greenspace
)

In [43]:
# Define data paths and config set up

LOCAL_AUTHORITY_NAME = 'Bradford'  # Change this for different areas
DATA_DIR = '../data/data_downloads' # This path is not tracked by version control

OUTPUT_DIR = '../data/results/testing' # for saving out resulting processed datasets

# File paths
OPGS_PATH = f"{DATA_DIR}/opgrsp_gpkg_gb/Data/opgrsp_gb.gpkg"
LAD_PATH = f"{DATA_DIR}/Local_Authority_Districts_May_2024_Boundaries_UK_BFE_-5285665645137523111.geojson"
PARKS_DATA_PATH = f"{DATA_DIR}/bradford_district_parks.geojson"

print(f"Configured for: {LOCAL_AUTHORITY_NAME}")
print(f"Data directory: {DATA_DIR}")

Configured for: Bradford
Data directory: ../data/data_downloads


## 1. Load LAD Boundaries

Load the Local Authority Disctrict boundaries to define the study area.

In [44]:
# Load Local Authority District boundaries
print("Loading Local Authority District boundaries...")
LAD_2024 = gpd.read_file(LAD_PATH)

print(f"Loaded {len(LAD_2024)} local authority districts")
print(f"Available LADs include: {sorted(LAD_2024['LAD24NM'].unique())[:10]}...")

# Check if our target LAD exists
if LOCAL_AUTHORITY_NAME in LAD_2024['LAD24NM'].values:
    print(f"✓ Found {LOCAL_AUTHORITY_NAME} in the dataset")
    target_lad = LAD_2024[LAD_2024['LAD24NM'] == LOCAL_AUTHORITY_NAME]
    print(f"Area: {target_lad.geometry.area.iloc[0]/1000000:.2f} km²")
else:
    print(f"✗ {LOCAL_AUTHORITY_NAME} not found in dataset")
    print("Available LADs:", sorted(LAD_2024['LAD24NM'].unique()))

Loading Local Authority District boundaries...
Loaded 361 local authority districts
Available LADs include: ['Aberdeen City', 'Aberdeenshire', 'Adur', 'Amber Valley', 'Angus', 'Antrim and Newtownabbey', 'Ards and North Down', 'Argyll and Bute', 'Armagh City, Banbridge and Craigavon', 'Arun']...
✓ Found Bradford in the dataset
Area: 0.00 km²


## 2. Load OS Open Greenspace Data

Load the Ordnance Survey data and pre-process it.

In [45]:
# Check available layers in the OS Open Greenspace file
print("Available layers in OS Open Greenspace:")
layers = fiona.listlayers(OPGS_PATH)
for layer in layers:
    print(f"  - {layer}")

Available layers in OS Open Greenspace:
  - access_point
  - greenspace_site


In [46]:
# Load OS Open Greenspace data
print("\nLoading OS Open Greenspace data...")
os_greenspace_access = gpd.read_file(OPGS_PATH, layer='access_point')
os_greenspace_site = gpd.read_file(OPGS_PATH, layer='greenspace_site')

print(f"Loaded {len(os_greenspace_access)} access points")
print(f"Loaded {len(os_greenspace_site)} greenspace sites")


Loading OS Open Greenspace data...
Loaded 342348 access points
Loaded 161954 greenspace sites


### Subset the OS Greenspace data to the target LAD

- Reduce analysis time and just focus on the Bradford district

In [47]:
# Subset OS greenspace data to the target LAD
print(f"\nSubsetting OS greenspace data to {LOCAL_AUTHORITY_NAME}...")

greenspace_sites_lad = subset_to_LAD(
    LAD_gdf=LAD_2024, 
    LAD_column_name='LAD24NM', 
    LAD_name=LOCAL_AUTHORITY_NAME, 
    data_to_subset=os_greenspace_site
)

greenspace_access_lad = subset_to_LAD(
    LAD_gdf=LAD_2024, 
    LAD_column_name='LAD24NM', 
    LAD_name=LOCAL_AUTHORITY_NAME, 
    data_to_subset=os_greenspace_access
)

print(f"OS greenspace sites in {LOCAL_AUTHORITY_NAME}: {len(greenspace_sites_lad)}")
print(f"OS greenspace access points in {LOCAL_AUTHORITY_NAME}: {len(greenspace_access_lad)}")

# Display sample data
print("\nSample OS greenspace data:")
print(greenspace_sites_lad[['id', 'distinctive_name_1', 'function']].head())


Subsetting OS greenspace data to Bradford...
OS greenspace sites in Bradford: 1022
OS greenspace access points in Bradford: 2364

Sample OS greenspace data:
                                       id       distinctive_name_1  \
1    0A0C1723-E2A4-4FA1-B127-F73A7954FB7B  Shay Grange Crematorium   
151  2BE88DCC-827A-7FED-E063-AAEFA00A0EDD                     None   
163  2BE88DCC-83BB-7FED-E063-AAEFA00A0EDD                     None   
605  2BE88DCC-D52D-7FED-E063-AAEFA00A0EDD                     None   
754  2BE88DCC-F082-7FED-E063-AAEFA00A0EDD              Attock Park   

                  function  
1    Public Park Or Garden  
151  Public Park Or Garden  
163             Play Space  
605  Other Sports Facility  
754             Play Space  


In [48]:
# greenspace_sites_lad.explore()

## 3. Load in Open Street Map data

Extract greenspace data from OpenStreetMap for the same area.

In [56]:
# Define OSM tags for greenspaces
osm_tags = {
    'leisure': ['park'],
    'landuse': ['forest', 'meadow'],
    'natural': ['wood', 'scrub']
}

print("Extracting OSM greenspace data...")
print(f"Tags to extract: {osm_tags}")

# Get the target LAD geometry
target_lad_geom = LAD_2024.loc[LAD_2024['LAD24NM'] == LOCAL_AUTHORITY_NAME, :]

try:
    # Extract OSM features
    gdf = ox.features_from_polygon(
        target_lad_geom.to_crs("EPSG:4326").union_all(), 
        osm_tags
    )
    
    # Filter the geometries to include only the specified types
    osm_greenspace = gdf[
        gdf['leisure'].isin(['park']) |
        gdf['landuse'].isin(['forest', 'meadow']) | 
        gdf['natural'].isin(['wood', 'scrub'])
    ].reset_index()
    
    print(f"Extracted {len(osm_greenspace)} OSM greenspace features")
    
    # Display sample data
    print("\nSample OSM greenspace data:")
    available_cols = [col for col in ['name', 'leisure', 'landuse', 'natural'] if col in osm_greenspace.columns]
    print(osm_greenspace[available_cols].head())
    
except Exception as e:
    print(f"Error extracting OSM data: {e}")
    osm_greenspace = gpd.GeoDataFrame()

Extracting OSM greenspace data...
Tags to extract: {'leisure': ['park'], 'landuse': ['forest', 'meadow'], 'natural': ['wood', 'scrub']}
Extracted 3161 OSM greenspace features

Sample OSM greenspace data:
                           name leisure landuse natural
0  Greengates Recreation Ground    park     NaN     NaN
1                           NaN     NaN     NaN    wood
2                           NaN     NaN     NaN    wood
3                           NaN     NaN     NaN    wood
4                           NaN    park     NaN     NaN


## 4. Combine OS and OSM Greenspace Data

Merge the OS and OSM datasets and consolidate touching/intersecting polygons.

In [50]:
if len(osm_greenspace) > 0:
    # Prepare OS greenspace data for joining
    os_gs_to_join = greenspace_sites_lad[['distinctive_name_1', 'function', 'id', 'geometry']].copy()
    os_gs_to_join.columns = ['Name (OS)', 'Type (OS)', 'OS ID', 'geometry']

    # Prepare OSM greenspace data for joining
    # Combine OSM tags into one value
    osm_greenspace['Type (OSM)'] = osm_greenspace[['natural', 'leisure', 'landuse']].apply(
        lambda x: ', '.join(sorted(x.dropna().astype(str))), axis=1
    )

    # Select relevant columns for OSM data
    osm_cols = ['name', 'Type (OSM)', 'id', 'geometry']
    available_osm_cols = [col for col in osm_cols if col in osm_greenspace.columns]
    osm_gs_to_join = osm_greenspace[available_osm_cols].copy()
    
    # Ensure we have the right column names
    if 'name' in osm_gs_to_join.columns:
        osm_gs_to_join = osm_gs_to_join.rename(columns={'name': 'Name (OSM)'})
    else:
        osm_gs_to_join['Name (OSM)'] = None
        
    if 'id' in osm_gs_to_join.columns:
        osm_gs_to_join = osm_gs_to_join.rename(columns={'id': 'OSM ID'})
    else:
        osm_gs_to_join['OSM ID'] = None

    # Ensure both datasets have the same CRS
    osm_gs_to_join = osm_gs_to_join.to_crs(os_gs_to_join.crs)

    # Combine OS and OSM data
    print("Combining OS and OSM greenspace data...")
    combined_greenspace = pd.concat([os_gs_to_join, osm_gs_to_join], ignore_index=True)
    
    print(f"Combined dataset: {len(combined_greenspace)} features")
    print(f"  - OS features: {len(os_gs_to_join)}")
    print(f"  - OSM features: {len(osm_gs_to_join)}")
    
else:
    print("No OSM data available, using only OS data")
    combined_greenspace = os_gs_to_join.copy()

Combining OS and OSM greenspace data...
Combined dataset: 4183 features
  - OS features: 1022
  - OSM features: 3161


In [51]:
# Merge touching or intersecting polygons
print("Merging touching or intersecting polygons...")
print(f"Before merging: {len(combined_greenspace)} features")

os_osm_greenspace_lad = merge_touching_or_intersecting_polygons_condense(combined_greenspace)

print(f"After merging: {len(os_osm_greenspace_lad)} features")

# Clean up the combined attributes
print("Cleaning combined attributes...")
for col in ['Name (OS)', 'Name (OSM)', 'Type (OS)', 'Type (OSM)', 'OS ID', 'OSM ID']:
    if col in os_osm_greenspace_lad.columns:
        os_osm_greenspace_lad[col] = os_osm_greenspace_lad[col].apply(
            lambda x: clean_and_deduplicate(x, separator=',') if pd.notnull(x) else x
        )

print("✓ Greenspace data processing complete")

Merging touching or intersecting polygons...
Before merging: 4183 features
After merging: 1598 features
Cleaning combined attributes...
✓ Greenspace data processing complete


In [52]:
# os_osm_greenspace_lad.explore()

## 5. Load Parks Data

Load in the point parks dataset to match official names to the greenspace polygons.

In [53]:
# Load parks data
print(f"Loading parks data from {PARKS_DATA_PATH}...")

try:
    parks_data = gpd.read_file(PARKS_DATA_PATH)
    print(f"Loaded {len(parks_data)} parks")
    
    # Display basic info about the parks data
    print("\nParks data columns:", list(parks_data.columns))
    
    if 'Park Name' in parks_data.columns:
        print(f"\nSample park names:")
        print(parks_data['Park Name'].head().tolist())
    
    # Check geometry type
    geom_types = parks_data.geometry.geom_type.value_counts()
    print(f"\nGeometry types: {dict(geom_types)}")
    
except Exception as e:
    print(f"Error loading parks data: {e}")
    parks_data = gpd.GeoDataFrame()

Loading parks data from ../data/data_downloads/bradford_district_parks.geojson...
Loaded 71 parks

Parks data columns: ['Park Name', 'Park URL', 'Location', 'Opening Hours', 'Latitude', 'Longitude', 'Car parking', 'River', 'Walking routes', 'Cycle route', 'Accessibility route', 'Pond', 'Open 24 hours a day, all year round', '5-a-side football pitch', 'Bowling green', "Children's play area", 'Fitness equipment', 'Multi use games area', 'Picnic area', 'Basketball court', 'Cricket pitch', 'Skate park', 'Full size football pitch', 'Wildflower area', 'NA', 'Dog free area', 'BMX track', 'Horse route', 'Nature reserve', 'Public sculpture', 'Angling', 'Café', 'Toilet', 'Visitor centre', 'Walled garden', 'Tennis court', 'Table tennis', 'Bandstand', 'Play area for ages 0-3', 'Aviary', 'Sensory garden', 'Glass house', 'Museum', 'Boating lake', 'Miniature railway', 'Park features str', 'geometry']

Sample park names:
['Hirst Woods', 'Hebers Ghyll Woodlands', 'Esholt Woods', 'Chellow Dean Woods', '

# Greenspace matching

This section of the notebook performs comprehensive matching between parks (point data) and greenspace polygons using multiple strategies.

## Matching Strategy Overview

Hierarchical Matching: Use multiple matching strategies in order of confidence

1. Exact name matching
2. Fuzzy Name Matching: Handle variations in park names  
3. Multi-distance Spatial Matching Try different buffer sizes systematically, starting at 0m
4. Quality Assessment: Score and validate matches
5. Manual Review Flagging: Identify cases needing human review

Because this is an automated process, there *will* be instances where the algorithm does not work perfectly. Human review will always be needed.

## Configuration of matching parameters etc.

- Set a fuzzy matching "minimum similarity" between 0-1, where 1 is exact.
- Set a series of spatial buffers to be tested incrementally.

In [54]:
# Configuration 

# Matching parameters
NAME_THRESHOLD = 0.85  # Minimum similarity for fuzzy name matching (0-1)
SPATIAL_BUFFERS = [0, 10, 15]  # Buffer distances to try (meters)

In [55]:
print("Starting comprehensive park-greenspace matching...")

# Prepare parks data - ensure we have a clean index
parks_clean = parks_data.copy().reset_index(drop=True).reset_index()
greenspace_gdf = os_osm_greenspace_lad.copy()

# Run the matching algorithm
matched_parks, unmatched_parks, match_quality = match_parks_to_greenspace(
    parks_gdf=parks_clean,
    greenspace_gdf=greenspace_gdf,
    name_threshold=NAME_THRESHOLD,
    spatial_buffers=SPATIAL_BUFFERS
)

Starting comprehensive park-greenspace matching...
Starting with 71 parks to match...

=== Strategy 1: Exact OS Name Matching ===
Found 22 exact OS name matches

=== Strategy 2: Exact OSM Name Matching ===
Found 10 exact OSM name matches

=== Strategy 3: Fuzzy Name Matching ===
Found 4 fuzzy name matches

=== Strategy 4: Spatial Matching ===
Trying 0m buffer...
Found 12 matches with 0m buffer
Trying 10m buffer...
Found 4 matches with 10m buffer
Trying 15m buffer...

=== FINAL RESULTS ===
Successfully matched: 52 parks
Unmatched: 19 parks

Match method summary:
method
exact_name_os     22
spatial_0m        12
exact_name_osm    10
fuzzy_name_osm     4
spatial_10m        4
Name: count, dtype: int64

Average match quality: 0.990

Unmatched parks: ['Hebers Ghyll Woodlands', 'Esholt Woods', 'Black Carr Woods', 'Branken Bank Avenue', 'Dane Hill Drive', 'Fairfield Recreation Ground', 'Spring Mill Street Recreation Ground', 'Odsal Village Green', 'Upper Wyke Recreation Ground', 'Scholemoor Play

In [57]:
if len(match_quality) > 0:
    print("=== MATCH QUALITY ANALYSIS ===")
    
    # Method distribution
    print("Matches by method:")
    method_counts = match_quality['method'].value_counts()
    for method, count in method_counts.items():
        print(f"  {method}: {count} ({count/len(match_quality)*100:.1f}%)")
    
    # Quality scores
    print(f"\nMatch quality statistics:")
    print(f"  Mean quality: {match_quality['match_quality'].mean():.3f}")
    print(f"  Median quality: {match_quality['match_quality'].median():.3f}")
    print(f"  Min quality: {match_quality['match_quality'].min():.3f}")
    print(f"  Max quality: {match_quality['match_quality'].max():.3f}")
    
    # Quality by method
    print(f"\nAverage quality by method:")
    quality_by_method = match_quality.groupby('method')['match_quality'].mean().sort_values(ascending=False)
    for method, avg_quality in quality_by_method.items():
        print(f"  {method}: {avg_quality:.3f}")
    
else:
    print("No matches found - check your data and parameters")

=== MATCH QUALITY ANALYSIS ===
Matches by method:
  exact_name_os: 22 (42.3%)
  spatial_0m: 12 (23.1%)
  exact_name_osm: 10 (19.2%)
  fuzzy_name_osm: 4 (7.7%)
  spatial_10m: 4 (7.7%)

Match quality statistics:
  Mean quality: 0.990
  Median quality: 1.000
  Min quality: 0.870
  Max quality: 1.000

Average quality by method:
  exact_name_os: 1.000
  exact_name_osm: 1.000
  spatial_0m: 1.000
  spatial_10m: 0.950
  fuzzy_name_osm: 0.914


In [58]:
# Display detailed match information
if len(matched_parks) > 0:
    print("=== SAMPLE MATCHED PARKS ===")
    
    # Select columns to display
    display_cols = ['Park Name', 'match_method', 'match_quality']
    name_cols = [col for col in matched_parks.columns if 'Name (' in col]
    display_cols.extend(name_cols)
    
    available_display_cols = [col for col in display_cols if col in matched_parks.columns]
    
    sample_matches = matched_parks[available_display_cols].head(10)
    for _, row in sample_matches.iterrows():
        print(f"\nPark: {row['Park Name']}")
        print(f"  Method: {row['match_method']}")
        print(f"  Quality: {row['match_quality']:.3f}")
        for col in name_cols:
            if col in row and pd.notna(row[col]):
                print(f"  {col}: {row[col]}")

=== SAMPLE MATCHED PARKS ===

Park: Russell Hall Park
  Method: exact_name_os
  Quality: 1.000
  Name (OS): Russell Hall Park
  Name (OSM): Russell Hall Park

Park: Greenwood Park
  Method: exact_name_os
  Quality: 1.000
  Name (OS): Greenwood Park
  Name (OSM): Greenwood Park

Park: Grange Park
  Method: exact_name_os
  Quality: 1.000
  Name (OS): Grange Park
  Name (OSM): Grange Park

Park: Foxhill Park
  Method: exact_name_os
  Quality: 1.000
  Name (OS): Foxhill Park
  Name (OSM): Foxhill Park

Park: Knowles Park
  Method: exact_name_os
  Quality: 1.000
  Name (OS): Knowles Park
  Name (OSM): Knowles Park

Park: Seymour Street Recreation Ground
  Method: exact_name_os
  Quality: 1.000
  Name (OS): Seymour Street Recreation Ground
  Name (OSM): Seymour Park

Park: Eccleshill Park
  Method: exact_name_os
  Quality: 1.000
  Name (OS): Eccleshill Park
  Name (OSM): 

Park: Cross Roads Park
  Method: exact_name_os
  Quality: 1.000
  Name (OS): Cross Roads Park
  Name (OSM): Cross Roads 

In [62]:
# Show unmatched parks for manual review
if len(unmatched_parks) > 0:
    print("=== UNMATCHED PARKS (for manual review) ===")
    print(f"Total unmatched: {len(unmatched_parks)}")
    
    if 'Park Name' in unmatched_parks.columns:
        print("\nUnmatched park names:")
        for name in sorted(unmatched_parks['Park Name'].tolist()):
            print(f"  - {name}")
else:
    print("✓ All parks successfully matched!")

=== UNMATCHED PARKS (for manual review) ===
Total unmatched: 19

Unmatched park names:
  - Black Carr Woods
  - Branken Bank Avenue
  - City Park
  - Claremont Recreation Ground
  - Dane Hill Drive
  - Eastburn Playing Field
  - Esholt Woods
  - Fairfield Recreation Ground
  - Grove Park
  - Hebers Ghyll Woodlands
  - Margaret McMillan Tower
  - Odsal Village Green
  - Scholemoor Play Area
  - Spring Mill Street Recreation Ground
  - The Delph Eccleshill
  - Tong Play Area
  - Trident Play Area
  - Upper Wyke Recreation Ground
  - War Memorials


In [60]:
# explore the datasets

matched_parks.explore()

In [72]:
# Create output directory based on previous provided path
output_dir = Path(OUTPUT_DIR)
output_dir.mkdir(parents=True, exist_ok=True)

print("Creating comprehensive parks dataset...")

# Prepare matched parks (with polygon geometries)
matched_for_combining = None
if len(matched_parks) > 0:
    matched_for_combining = matched_parks.copy()
    matched_for_combining['match_status'] = 'matched'
    print(f"Matched parks: {len(matched_for_combining)} (with polygon geometries)")

# Prepare unmatched parks (with original point geometries)
unmatched_for_combining = None
if len(unmatched_parks) > 0:
    unmatched_for_combining = unmatched_parks.copy()
    unmatched_for_combining['match_status'] = 'unmatched'
    # Add missing columns to align with matched parks structure
    unmatched_for_combining['match_method'] = 'none'
    unmatched_for_combining['match_quality'] = 0.0
    
    # Add empty columns for greenspace attributes if they don't exist
    greenspace_cols = ['Name (OS)', 'Name (OSM)', 'Type (OS)', 'Type (OSM)', 'OS ID', 'OSM ID']
    for col in greenspace_cols:
        if col not in unmatched_for_combining.columns:
            unmatched_for_combining[col] = None
    
    print(f"Unmatched parks: {len(unmatched_for_combining)} (with original point geometries)")

# Combine matched and unmatched parks
all_parks_list = []
if matched_for_combining is not None:
    all_parks_list.append(matched_for_combining)
if unmatched_for_combining is not None:
    all_parks_list.append(unmatched_for_combining)

if len(all_parks_list) > 0:
    # Ensure all datasets have the same CRS before combining
    target_crs = "EPSG:4326"
    aligned_parks = []
    
    for parks_df in all_parks_list:
        aligned_parks.append(parks_df.to_crs(target_crs))
    
    # Combine all parks into one GeoDataFrame
    all_parks_gdf = gpd.GeoDataFrame(pd.concat(aligned_parks, ignore_index=True))
    
    # Save comprehensive parks dataset
    all_parks_output = output_dir / f"{LOCAL_AUTHORITY_NAME.lower()}_all_parks_comprehensive.geojson"
    all_parks_gdf.to_file(all_parks_output, driver="GeoJSON")
    print(f"✓ Saved comprehensive parks dataset: {all_parks_output}")
    
    # Also save as CSV for analysis (without geometry)
    all_parks_csv = output_dir / f"{LOCAL_AUTHORITY_NAME.lower()}_all_parks_data.csv"
    all_parks_df = all_parks_gdf.drop(columns=['geometry'])
    all_parks_df.to_csv(all_parks_csv, index=False)
    print(f"✓ Saved parks data table: {all_parks_csv}")
    
    # Summary statistics
    print(f"\n=== COMPREHENSIVE DATASET SUMMARY ===")
    print(f"Total parks: {len(all_parks_gdf)}")
    print(f"Matched parks (polygons): {len(all_parks_gdf[all_parks_gdf['match_status'] == 'matched'])}")
    print(f"Unmatched parks (points): {len(all_parks_gdf[all_parks_gdf['match_status'] == 'unmatched'])}")
    
    # Show geometry types
    geom_types = all_parks_gdf.geometry.geom_type.value_counts()
    print(f"Geometry types: {dict(geom_types)}")

else:
    print("No parks data to save")

# Save match quality report
if len(match_quality) > 0:
    quality_output = output_dir / f"{LOCAL_AUTHORITY_NAME.lower()}_match_quality.csv"
    match_quality.to_csv(quality_output, index=False)
    print(f"✓ Saved match quality report: {quality_output}")

print(f"\n=== RESULTS SAVED ===")
print(f"Output directory: {output_dir}")
print(f"Files created:")
if 'all_parks_gdf' in locals() and len(all_parks_gdf) > 0:
    print(f"  - {all_parks_output.name} (comprehensive parks with mixed geometries)")
    print(f"  - {all_parks_csv.name} (parks data table)")
if len(match_quality) > 0:
    print(f"  - {quality_output.name} (match quality analysis)")

Creating comprehensive parks dataset...
Matched parks: 52 (with polygon geometries)
Unmatched parks: 19 (with original point geometries)
✓ Saved comprehensive parks dataset: ../data/results/testing/bradford_all_parks_comprehensive.geojson
✓ Saved parks data table: ../data/results/testing/bradford_all_parks_data.csv

=== COMPREHENSIVE DATASET SUMMARY ===
Total parks: 71
Matched parks (polygons): 52
Unmatched parks (points): 19
Geometry types: {'Polygon': np.int64(49), 'Point': np.int64(19), 'MultiPolygon': np.int64(3)}
✓ Saved match quality report: ../data/results/testing/bradford_match_quality.csv

=== RESULTS SAVED ===
Output directory: ../data/results/testing
Files created:
  - bradford_all_parks_comprehensive.geojson (comprehensive parks with mixed geometries)
  - bradford_all_parks_data.csv (parks data table)
  - bradford_match_quality.csv (match quality analysis)


In [71]:
# Saving out other data

# Save LAD boundary
lad_output = output_dir / f"{LOCAL_AUTHORITY_NAME.lower()}_boundary.geojson"
target_lad_geom.to_crs("EPSG:4326").to_file(lad_output, driver="GeoJSON")
print(f"✓ Saved LAD boundary: {lad_output}")

✓ Saved LAD boundary: ../data/results/testing/bradford_boundary.geojson
