# Network analysis data

This notebook takes a network analysis dataset generated by Melissa Barrientos, generated from OSM data using the Space Syntax toolkit, and converts it froma shape file into geojson format. It also changes the coordinate reference system for display on the dashboard, and saves it into separate, smaller files:

OSM_highway_analysis - TD400: Created by the author from OSM data, developed using Space Syntax toolkit. Proxy for escape routes

OSM_highway_analysis – NACH400: Created by the author from OSM data, developed using Space Syntax toolkit. Choice analysis in radius 400m serves as a proxy for routes with higher potential to be chosen for moving around in a local, 5min walk.

OSM_highway_analysis – NACH800: Created by the author from OSM data, developed using Space Syntax toolkit. Choice analysis in radius 800m serves as a proxy for routes with higher potential to be chosen for moving around in a local, 10min walk.

OSM_highway_analysis – NACH2000	: Created by the author from OSM data, developed using Space Syntax toolkit. Choice analysis in radius 2000m serves as a proxy for routes with higher potential to be chosen for moving around on a more global scale, a proxy for short car or bike trips.

OSM_highway_analysis – NAIN400: Created by the author from OSM data, developed using Space Syntax toolkit. Choice analysis in radius 2000m serves as a proxy for areas with higher potential to concentrate movement and activity in a local, 5min walk.

OSM_highway_analysis – NAIN800: Created by the author from OSM data, developed using Space Syntax toolkit. Choice analysis in radius 2000m serves as a proxy for areas with higher potential to concentrate movement and activity in a local, 10min walk.

OSM_highway_analysis – NAIN2000: Created by the author from OSM data, developed using Space Syntax toolkit. Choice analysis in radius 2000m serves as proxy for areas with higher potential to concentrate movement and activity on a more global scale, a proxy for short car or bike trips.


In [1]:
import geopandas as gpd
import os
import pandas as pd
from pathlib import Path

# Set up paths
data_dir = Path('../data/network_analysis')
output_dir = Path('../data/processed/network_analysis')
output_dir.mkdir(parents=True, exist_ok=True)

print(f"Input directory: {data_dir}")
print(f"Output directory: {output_dir}")

Input directory: ../data/network_analysis
Output directory: ../data/processed/network_analysis


In [9]:
# load the shapefile and check structure
shapefile_path = data_dir / 'OSM_highway_analysis.shp'
gdf = gpd.read_file(shapefile_path)

print(f"Original CRS: {gdf.crs}")
print(f"Shape: {gdf.shape}")
print(f"Columns: {list(gdf.columns)}")
gdf.head()

Original CRS: EPSG:27700
Shape: (66319, 40)
Columns: ['ref', 'x1', 'y1', 'x2', 'y2', 'angCONN', 'CONN', 'segid', 'segLEN', 'CH', 'CHr1200m', 'CHr2000m', 'CHr400m', 'CHr800m', 'INT', 'INTr1200m', 'INTr2000m', 'INTr400m', 'INTr800m', 'NC', 'NCr1200m', 'NCr2000m', 'NCr400m', 'NCr800m', 'TD', 'TDr1200m', 'TDr2000m', 'TDr400m', 'TDr800m', 'NACH', 'NACHr1200m', 'NACHr2000m', 'NACHr400m', 'NACHr800m', 'NAIN', 'NAINr1200m', 'NAINr2000m', 'NAINr400m', 'NAINr800m', 'geometry']


Unnamed: 0,ref,x1,y1,x2,y2,angCONN,CONN,segid,segLEN,CH,...,NACHr1200m,NACHr2000m,NACHr400m,NACHr800m,NAIN,NAINr1200m,NAINr2000m,NAINr400m,NAINr800m,geometry
0,0,417218.270067,434688.491793,417249.825361,434701.635735,3.021487,3,0,34.183327,0.0,...,0.515952,0.56108,0.600396,0.633673,0.715803,0.571899,0.659701,0.636659,0.549916,"LINESTRING (417218.27 434688.492, 417249.825 4..."
1,1,417249.825361,434701.635735,417294.530653,434590.83838,3.072928,3,0,119.47643,0.0,...,0.488279,0.533034,0.574039,0.585151,0.679011,0.559242,0.57505,0.622529,0.520572,"LINESTRING (417249.825 434701.636, 417294.531 ..."
2,2,417322.904088,434487.625141,417381.56191,434511.105572,1.991772,2,1,63.182835,0.0,...,0.0,0.0,0.0,0.0,0.619646,0.496149,0.505211,0.553909,0.489951,"LINESTRING (417322.904 434487.625, 417381.562 ..."
3,3,417381.56191,434511.105572,417394.216004,434479.854374,2.956092,4,1,33.715923,1025717.0,...,1.078744,1.069848,1.147125,1.0968,0.663912,0.564586,0.637652,0.678844,0.559867,"LINESTRING (417381.562 434511.106, 417394.216 ..."
4,4,417394.216004,434479.854374,417424.741796,434493.830155,2.662013,4,1,33.573002,545574.0,...,0.993414,0.969901,1.050881,1.010915,0.618034,0.493197,0.506402,0.583491,0.495488,"LINESTRING (417394.216 434479.854, 417424.742 ..."


In [5]:
# Convert from British National Grid (EPSG:27700) to WGS84 (EPSG:4326)
if gdf.crs != 'EPSG:4326':
    print(f"Converting from {gdf.crs} to EPSG:4326 (WGS84)")
    gdf_wgs84 = gdf.to_crs('EPSG:4326')
    print(f"New CRS: {gdf_wgs84.crs}")
else:
    print("Data is already in WGS84")
    gdf_wgs84 = gdf.copy()

# Check the bounds in the new CRS
print(f"Bounds (WGS84): {gdf_wgs84.total_bounds}")

Converting from EPSG:27700 to EPSG:4326 (WGS84)
New CRS: EPSG:4326
Bounds (WGS84): [-1.8563382 53.7507514 -1.6798496 53.8374088]


In [7]:
# Look for columns that match the metrics in Mel's data spreadsheet
# Use column names: 
# Columns: ['ref', 'x1', 'y1', 'x2', 'y2',
# 'angCONN', 'CONN', 'segid', 'segLEN',
# 'CH', 'CHr1200m', 'CHr2000m', 'CHr400m',
# 'CHr800m', 'INT', 'INTr1200m', 'INTr2000m',
# 'INTr400m', 'INTr800m',
# 'NC', 'NCr1200m', 'NCr2000m', 'NCr400m', 'NCr800m',
# 'TD', 'TDr1200m', 'TDr2000m', 'TDr400m', 'TDr800m', 'NACH',
# 'NACHr1200m', 'NACHr2000m', 'NACHr400m', 'NACHr800m', 'NAIN', 'NAINr1200m', 'NAINr2000m', 'NAINr400m', 'NAINr800m', 'geometry']
metrics_to_find = ['TDr400m', 'NACHr400m', 'NACHr800m', 'NACHr2000m', 'NAINr400m', 'NAINr800m', 'NAINr2000m']

print("Available columns:")
for col in gdf_wgs84.columns:
    print(f"  {col}")

print("\nLooking for metric columns:")
found_metrics = {}
for metric in metrics_to_find:
    # Look for exact matches or close matches
    matching_cols = [col for col in gdf_wgs84.columns if metric.lower() in col.lower()]
    if matching_cols:
        found_metrics[metric] = matching_cols[0]  # Take the first match
        print(f"  {metric}: Found as '{matching_cols[0]}'")
    else:
        print(f"  {metric}: NOT FOUND")

print(f"\nFound {len(found_metrics)} out of {len(metrics_to_find)} metrics")

Available columns:
  ref
  x1
  y1
  x2
  y2
  angCONN
  CONN
  segid
  segLEN
  CH
  CHr1200m
  CHr2000m
  CHr400m
  CHr800m
  INT
  INTr1200m
  INTr2000m
  INTr400m
  INTr800m
  NC
  NCr1200m
  NCr2000m
  NCr400m
  NCr800m
  TD
  TDr1200m
  TDr2000m
  TDr400m
  TDr800m
  NACH
  NACHr1200m
  NACHr2000m
  NACHr400m
  NACHr800m
  NAIN
  NAINr1200m
  NAINr2000m
  NAINr400m
  NAINr800m
  geometry

Looking for metric columns:
  TDr400m: Found as 'TDr400m'
  NACHr400m: Found as 'NACHr400m'
  NACHr800m: Found as 'NACHr800m'
  NACHr2000m: Found as 'NACHr2000m'
  NAINr400m: Found as 'NAINr400m'
  NAINr800m: Found as 'NAINr800m'
  NAINr2000m: Found as 'NAINr2000m'

Found 7 out of 7 metrics


In [8]:
# Save the complete dataset as GeoJSON
complete_output_path = output_dir / 'OSM_highway_analysis_complete.geojson'
gdf_wgs84.to_file(complete_output_path, driver='GeoJSON')
print(f"Complete dataset saved to: {complete_output_path}")
print(f"File size: {complete_output_path.stat().st_size / (1024*1024):.2f} MB")

Complete dataset saved to: ../data/processed/network_analysis/OSM_highway_analysis_complete.geojson
File size: 68.00 MB


In [10]:
# Create separate files for each metric
# Keep essential columns plus geometry and the specific metric
essential_columns = ['geometry']  # Add any other essential columns you want to keep
# For now: just geometry

for metric_name, column_name in found_metrics.items():
    # Create a subset with essential columns plus the specific metric
    columns_to_keep = essential_columns + [column_name]
    
    # Add any ID or identifier columns if they exist
    id_columns = [col for col in gdf_wgs84.columns if any(id_term in col.lower() 
                  for id_term in ['id', 'fid', 'objectid', 'osm_id', 'segid'])]
    columns_to_keep.extend(id_columns)
    
    # Remove duplicates while preserving order
    columns_to_keep = list(dict.fromkeys(columns_to_keep))
    
    # Create subset
    metric_gdf = gdf_wgs84[columns_to_keep].copy()
    
    # Rename the metric column to a standard name
    metric_gdf = metric_gdf.rename(columns={column_name: 'value'})
    
    # Add metadata
    metric_gdf['metric'] = metric_name
    
    # Save as GeoJSON
    output_path = output_dir / f'OSM_highway_analysis_{metric_name}.geojson'
    metric_gdf.to_file(output_path, driver='GeoJSON')
    
    file_size = output_path.stat().st_size / (1024*1024)
    print(f"{metric_name}: Saved to {output_path.name} ({file_size:.2f} MB)")

TDr400m: Saved to OSM_highway_analysis_TDr400m.geojson (15.27 MB)
NACHr400m: Saved to OSM_highway_analysis_NACHr400m.geojson (15.89 MB)
NACHr800m: Saved to OSM_highway_analysis_NACHr800m.geojson (15.90 MB)
NACHr2000m: Saved to OSM_highway_analysis_NACHr2000m.geojson (15.97 MB)
NAINr400m: Saved to OSM_highway_analysis_NAINr400m.geojson (16.00 MB)
NAINr800m: Saved to OSM_highway_analysis_NAINr800m.geojson (16.01 MB)
NAINr2000m: Saved to OSM_highway_analysis_NAINr2000m.geojson (16.08 MB)


In [12]:
# Create summary statistics for each metric
# Not useful for analysis, but good test if processing has worked correctly
summary_stats = {}

for metric_name, column_name in found_metrics.items():
    stats = gdf_wgs84[column_name].describe()
    summary_stats[metric_name] = stats
    
    print(f"\n{metric_name} ({column_name}) Statistics:")
    print(f"  Count: {stats['count']}")
    print(f"  Mean: {stats['mean']:.4f}")
    print(f"  Std: {stats['std']:.4f}")
    print(f"  Min: {stats['min']:.4f}")
    print(f"  Max: {stats['max']:.4f}")

# Save summary statistics
# summary_df = pd.DataFrame(summary_stats).T
# summary_output_path = output_dir / 'network_analysis_summary_statistics.csv'
# summary_df.to_csv(summary_output_path)
# print(f"\nSummary statistics saved to: {summary_output_path}")


TDr400m (TDr400m) Statistics:
  Count: 66319.0
  Mean: 993.5918
  Std: 850.9472
  Min: -1.0000
  Max: 5779.4375

NACHr400m (NACHr400m) Statistics:
  Count: 66319.0
  Mean: 1.0105
  Std: 0.3615
  Min: 0.0000
  Max: 3.5746

NACHr800m (NACHr800m) Statistics:
  Count: 66319.0
  Mean: 0.9771
  Std: 0.3484
  Min: 0.0000
  Max: 2.5466

NACHr2000m (NACHr2000m) Statistics:
  Count: 66319.0
  Mean: 0.9388
  Std: 0.3375
  Min: 0.0000
  Max: 2.4977

NAINr400m (NAINr400m) Statistics:
  Count: 66319.0
  Mean: 0.9137
  Std: 0.4332
  Min: 0.2177
  Max: 11.9657

NAINr800m (NAINr800m) Statistics:
  Count: 66319.0
  Mean: 0.8066
  Std: 0.2933
  Min: 0.2218
  Max: 8.6653

NAINr2000m (NAINr2000m) Statistics:
  Count: 66319.0
  Mean: 0.7689
  Std: 0.2220
  Min: 0.2006
  Max: 4.5722


In [14]:
# Check that expected files have been created, and that size is ok for GitHub

# List all created files and their sizes
print("Created files:")
for file_path in output_dir.glob('OSM_highway_analysis_*.geojson'):
    file_size = file_path.stat().st_size / (1024*1024)
    print(f"  {file_path.name}: {file_size:.2f} MB")

print(f"\nTotal files created: {len(list(output_dir.glob('OSM_highway_analysis_*.geojson')))}")
print(f"Output directory: {output_dir}")

Created files:
  OSM_highway_analysis_complete.geojson: 68.00 MB
  OSM_highway_analysis_NAINr800m.geojson: 16.01 MB
  OSM_highway_analysis_NACHr800m.geojson: 15.90 MB
  OSM_highway_analysis_TDr400m.geojson: 15.27 MB
  OSM_highway_analysis_NACHr400m.geojson: 15.89 MB
  OSM_highway_analysis_NACHr2000m.geojson: 15.97 MB
  OSM_highway_analysis_NAINr2000m.geojson: 16.08 MB
  OSM_highway_analysis_NAINr400m.geojson: 16.00 MB

Total files created: 8
Output directory: ../data/processed/network_analysis
