<div style="padding: 25px; background-color: #f4ecf7; border-radius: 12px; border: 2px solid #8e44ad;">
    <h1 style="color: #5b2c6f; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;">Step 03: The Intelligence</h1>
    <h2 style="color: #8e44ad;">Graph Analytics, Structural Hole Detection, and 3D Visualisation</h2>
    <p style="font-size: 1.1em; color: #2c3e50;">
        The enriched H3 grid from Step 02 becomes a <strong>spatial graph</strong>. We compute
        network centrality metrics, aggregate POIs by business role, and score each hexagon
        using a heuristic that combines population, demographics, and competitive landscape.
    </p>
    <hr style="border-color: #8e44ad;">
    <table style="width:100%; border-collapse: collapse; margin-top: 10px; font-size: 0.95em;">
        <tr style="background: #5b2c6f; color: white;">
            <th style="padding: 8px; text-align: left;">Pipeline</th>
            <th style="padding: 8px; text-align: left;">Source</th>
            <th style="padding: 8px; text-align: left;">Action</th>
        </tr>
        <tr><td style="padding: 8px;"><strong>Input</strong></td><td>Notebooks 01 &amp; 02</td><td><code>camden_h3_grid.parquet</code> + <code>camden_pois.geojson</code></td></tr>
        <tr style="background: #f9f0fc;"><td style="padding: 8px;"><strong>Process</strong></td><td>This notebook</td><td>Graph &rarr; centrality &rarr; POI aggregation &rarr; scoring &rarr; 3D map</td></tr>
        <tr><td style="padding: 8px;"><strong>Output</strong></td><td>Analyst / Report</td><td>Interactive Pydeck HTML + updated parquet with all features</td></tr>
    </table>
    <p style="margin-top: 15px; color: #2c3e50;">
        <strong>Canonical scoring formula:</strong>
        $Score_H = Pop_H \times D_H + \alpha \cdot S_H + \beta \cdot A_H - \gamma \cdot C_H$
    </p>
    <p style="color: #2c3e50;"><strong>Theoretical basis:</strong> Burt's Structural Hole Theory (1992) &mdash;
    hexagons with high demand signals but no existing supply represent <em>structural holes</em> in the retail network.</p>
</div>

<div style="margin-top: 30px;">
    <h2 style="color: #2e86c1; border-bottom: 2px solid #2e86c1; padding-bottom: 10px;">Concept: Spatial Graphs and Centrality</h2>
    <p>We model Camden as a <strong>graph</strong> where:</p>
    <ul>
        <li><strong>Nodes</strong> = H3 hexagons (carrying population, demographics, and POI counts).</li>
        <li><strong>Edges</strong> = adjacency links between touching hexagons, weighted by Inverse Distance: $w_{uv} = \frac{1}{d(u,v) + 1}$</li>
    </ul>
    <p>From this graph we compute four <strong>centrality metrics</strong> that quantify each hexagon's structural position:</p>
    <table style="width:100%; border-collapse: collapse; margin: 15px 0; font-size: 0.95em;">
        <tr style="background: #2e86c1; color: white;">
            <th style="padding: 8px;">Metric</th>
            <th style="padding: 8px;">Intuition</th>
            <th style="padding: 8px;">Business Meaning</th>
        </tr>
        <tr>
            <td style="padding: 8px;"><strong>Degree</strong></td>
            <td>How many neighbours?</td>
            <td>Interior vs. boundary &mdash; interior hexes have 6 neighbours</td>
        </tr>
        <tr style="background: #eaf2f8;">
            <td style="padding: 8px;"><strong>Betweenness</strong></td>
            <td>How often on shortest paths?</td>
            <td>Transit corridors &mdash; high foot traffic pass-through</td>
        </tr>
        <tr>
            <td style="padding: 8px;"><strong>Closeness</strong></td>
            <td>Average distance to all others?</td>
            <td>Central vs. peripheral location within Camden</td>
        </tr>
        <tr style="background: #eaf2f8;">
            <td style="padding: 8px;"><strong>Clustering</strong></td>
            <td>How interconnected are neighbours?</td>
            <td>Neighbourhood cohesion &mdash; tightly knit areas</td>
        </tr>
    </table>
    <p>These metrics feed directly into the ML pipeline (<code>camden_predictive_model.ipynb</code>)
    as features for the binary classification model.</p>
</div>

In [None]:
import os
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

import h3
import geopandas as gpd
import networkx as nx
import pandas as pd
import numpy as np
import pydeck as pdk

# H3 version guard
assert int(h3.__version__.split('.')[0]) >= 4, f"H3 v4+ required, got {h3.__version__}"

# Load enriched master grid from Step 02
h3_grid = gpd.read_parquet("data/outputs/camden_h3_grid.parquet")

# Load categorised POIs from Step 01
pois = gpd.read_file("data/processed/camden_pois.geojson")

print(f"Grid: {len(h3_grid)} hexagons, CRS: {h3_grid.crs}")
print(f"POIs: {len(pois)} features, roles: {pois['role'].value_counts().to_dict()}")

<div style="margin-top: 30px;">
    <h2 style="color: #117a65; border-bottom: 2px solid #117a65; padding-bottom: 10px;">Task 1: Building the Spatial Graph + Centrality</h2>
    <p>We construct the graph in two steps:</p>
    <ol>
        <li><strong>Nodes:</strong> One per hexagon, carrying population and centroid coordinates (EPSG:27700 &mdash; metres).</li>
        <li><strong>Edges:</strong> Between each hex and its 6 immediate neighbours (via H3 v4 <code>grid_disk</code>),
        weighted by Inverse Distance: $w = 1/(d+1)$ where $d$ is Euclidean distance in metres.</li>
    </ol>
    <p>After construction, we compute four centrality metrics that capture each hexagon's
    structural role in the urban network.</p>
</div>

In [None]:
G = nx.Graph()

# Grid is in BNG (EPSG:27700), so centroid distances are in metres
for idx, row in h3_grid.iterrows():
    G.add_node(row['h3_index'],
               population=row['population'],
               x=row.geometry.centroid.x,
               y=row.geometry.centroid.y)

# H3 v4: grid_disk replaces k_ring
for h3_idx in h3_grid['h3_index']:
    neighbors = h3.grid_disk(h3_idx, 1)
    for neighbor in neighbors:
        if neighbor != h3_idx and G.has_node(neighbor):
            p1 = (G.nodes[h3_idx]['x'], G.nodes[h3_idx]['y'])
            p2 = (G.nodes[neighbor]['x'], G.nodes[neighbor]['y'])
            d = ((p1[0]-p2[0])**2 + (p1[1]-p2[1])**2)**0.5
            G.add_edge(h3_idx, neighbor, distance=d, weight=1/(d+1))

print(f"Graph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")

# === Compute Graph Centrality Metrics ===
print("Computing centrality metrics...")
deg = nx.degree_centrality(G)
bet = nx.betweenness_centrality(G, weight='weight')
clo = nx.closeness_centrality(G, distance='distance')
clu = nx.clustering(G, weight='weight')

h3_grid['degree_centrality'] = h3_grid['h3_index'].map(deg)
h3_grid['betweenness_centrality'] = h3_grid['h3_index'].map(bet)
h3_grid['closeness_centrality'] = h3_grid['h3_index'].map(clo)
h3_grid['clustering_coeff'] = h3_grid['h3_index'].map(clu)

print("\nCentrality metrics added:")
for col in ['degree_centrality', 'betweenness_centrality', 'closeness_centrality', 'clustering_coeff']:
    print(f"  {col}: mean={h3_grid[col].mean():.4f}, max={h3_grid[col].max():.4f}")

<div style="margin-top: 30px;">
    <h2 style="color: #884ea0; border-bottom: 2px solid #884ea0; padding-bottom: 10px;">Task 2: Detecting Structural Holes (Heuristic Site Score)</h2>
    <p>A <strong>Structural Hole</strong> (Burt, 1992) is a location with high demand signals but no existing supply.
    We quantify this with a heuristic score per hexagon:</p>
    <p style="text-align: center; font-size: 1.1em;">
        $$Score_H = Pop_H \times D_H + \alpha \cdot S_H + \beta \cdot A_H - \gamma \cdot C_H$$
    </p>
    <table style="width:100%; border-collapse: collapse; margin: 15px 0; font-size: 0.95em;">
        <tr style="background: #884ea0; color: white;">
            <th style="padding: 8px;">Symbol</th><th style="padding: 8px;">Meaning</th><th style="padding: 8px;">Value</th>
        </tr>
        <tr><td style="padding: 8px;">$Pop_H$</td><td>LandScan population count</td><td>From zonal stats</td></tr>
        <tr style="background: #f4ecf7;"><td style="padding: 8px;">$D_H$</td><td>Demand proxy: <code>level4_perc / 100</code></td><td>[0, 1]</td></tr>
        <tr><td style="padding: 8px;">$\alpha$</td><td>Synergy weight (gyms, universities, offices)</td><td>5</td></tr>
        <tr style="background: #f4ecf7;"><td style="padding: 8px;">$\beta$</td><td>Anchor weight (transit stations)</td><td>3</td></tr>
        <tr><td style="padding: 8px;">$\gamma$</td><td>Competitor penalty (cafes, coffee shops)</td><td>15</td></tr>
    </table>
    <p><strong>Note:</strong> This is an <em>interpretable heuristic</em>. The ML notebook
    (<code>camden_predictive_model.ipynb</code>) learns optimal weights from the data using supervised classification.
    POIs are classified using the <code>role</code> column created in Notebook 01.</p>
</div>

In [None]:
# === Spatial join POIs to hexagons using the role column ===
hex_pois = gpd.sjoin(pois, h3_grid[['h3_index', 'geometry']], how="inner", predicate="within")

# Count POIs by role per hexagon
role_counts = hex_pois.groupby(['h3_index', 'role']).size().unstack(fill_value=0)
role_counts.columns = [f'n_{c.lower()}' for c in role_counts.columns]

# Merge counts into master grid
h3_grid = h3_grid.merge(role_counts, left_on='h3_index', right_index=True, how='left')
for col in ['n_competitor', 'n_synergy', 'n_anchor', 'n_other']:
    if col not in h3_grid.columns:
        h3_grid[col] = 0
    h3_grid[col] = h3_grid[col].fillna(0).astype(int)

print(f"POI counts per hex â€” Competitors: {h3_grid['n_competitor'].sum()}, "
      f"Synergy: {h3_grid['n_synergy'].sum()}, Anchors: {h3_grid['n_anchor'].sum()}")

# === Canonical Site Score ===
# Demand proxy: degree-level education normalised to [0,1]
h3_grid['demand_index'] = h3_grid['level4_perc'] / 100.0

ALPHA, BETA, GAMMA = 5, 3, 15
h3_grid['site_score'] = (
    h3_grid['population'] * h3_grid['demand_index']
    + ALPHA * h3_grid['n_synergy']
    + BETA  * h3_grid['n_anchor']
    - GAMMA * h3_grid['n_competitor']
)

print(f"\nSite Score range: [{h3_grid['site_score'].min():.1f}, {h3_grid['site_score'].max():.1f}]")
print(f"Mean: {h3_grid['site_score'].mean():.1f}")
print(f"Positive-score hexes: {(h3_grid['site_score'] > 0).sum()} / {len(h3_grid)}")

<div style="margin-top: 30px;">
    <h2 style="color: #1a5276; border-bottom: 2px solid #1a5276; padding-bottom: 10px;">Visualisation: Interactive 3D Mapping</h2>
    <p>We use <strong>Pydeck</strong> to render a 3D extruded hex map over Camden:</p>
    <ul>
        <li><strong>Colour:</strong> <span style="color: green; font-weight: bold;">Green</span> = high score (opportunity) &rarr;
        <span style="color: red; font-weight: bold;">Red</span> = low score (saturated / low demand)</li>
        <li><strong>Height:</strong> Proportional to normalised score &mdash; tall towers are prime locations.</li>
        <li><strong>Tooltip:</strong> Shows score, population, competitor count, and synergy count on hover.</li>
    </ul>
    <p><strong>Normalisation:</strong> Scores are min-max scaled to [0, 1] to handle negative values
    (hexes with high competitor counts can have negative raw scores). RGB values are pre-computed
    in Python to avoid JavaScript expression errors in Pydeck.</p>
</div>

In [None]:
# Prepare for Pydeck (needs WGS84)
viz_df = h3_grid.to_crs(epsg=4326).copy()

# Normalise score to [0, 1] for colour mapping (handles negatives)
s_min = viz_df['site_score'].min()
s_max = viz_df['site_score'].max()
viz_df['score_norm'] = (viz_df['site_score'] - s_min) / (s_max - s_min + 1e-9)

# Build colour columns: green for high score, red for low
viz_df['r'] = ((1 - viz_df['score_norm']) * 255).astype(int)
viz_df['g'] = (viz_df['score_norm'] * 255).astype(int)
viz_df['b'] = 50

layer = pdk.Layer(
    "H3HexagonLayer",
    viz_df[['h3_index', 'site_score', 'score_norm', 'r', 'g', 'b',
            'n_competitor', 'n_synergy', 'population']],
    pickable=True,
    stroked=True,
    filled=True,
    extruded=True,
    get_hexagon="h3_index",
    get_fill_color="[r, g, b, 180]",
    get_elevation="score_norm * 500",
    elevation_scale=1,
)

view_state = pdk.ViewState(latitude=51.54, longitude=-0.14, zoom=12, pitch=50, bearing=0)
tooltip = {
    "html": "<b>Score:</b> {site_score}<br>"
            "<b>Pop:</b> {population}<br>"
            "<b>Competitors:</b> {n_competitor}<br>"
            "<b>Synergy:</b> {n_synergy}",
    "style": {"background": "rgba(0,0,0,0.7)", "color": "white", "font-size": "12px"}
}

r = pdk.Deck(layers=[layer], initial_view_state=view_state, tooltip=tooltip)
r.to_html("data/outputs/camden_site_potential.html")

# Also save the enriched grid (now with centrality + scores + POI counts)
h3_grid.to_parquet("data/outputs/camden_h3_grid.parquet")

print("3D map saved: data/outputs/camden_site_potential.html")
print(f"Updated grid saved: data/outputs/camden_h3_grid.parquet ({len(h3_grid.columns)} columns)")
print("\nPipeline complete. Proceed to camden_predictive_model.ipynb for ML analysis.")