# Week 3: Random Networks & Small Worlds

**Learning objectives** — After this lab you should be able to:

- Explain the Erdos-Renyi model and its Poisson degree distribution
- Describe the giant component phase transition at avg_degree = 1
- Explain the "six degrees of separation" phenomenon
- Define the small-world property (high clustering + short paths)
- Build and explore Watts-Strogatz graphs at varying rewiring probabilities
- Reproduce the classic C(p)/L(p) vs p plot
- Explain Kleinberg's navigable small-world model and greedy routing
- Test whether real networks exhibit the small-world property

In the 1960s, psychologist Stanley Milgram ran a famous experiment: he asked random people
in Nebraska to forward a letter to a target person in Boston by passing it through personal
acquaintances. The letters that arrived took, on average, only about **six steps**.

This "six degrees of separation" idea suggests that even very large networks can have
surprisingly **short paths**. But how is this compatible with the tight friend-groups
(high clustering) we see in real social networks? This week we build up from the simplest
random graph model to increasingly realistic small-world models.

In [None]:
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
from netsci.loaders import load_graph
from netsci.utils import SEED, graph_summary, small_world_table
from netsci.models import greedy_route
from netsci import viz, models

---
## 1. Datasets

We use two real-world networks this week:

- **US Airports** (500 nodes, ~2,980 edges) -- flight routes between airports. An obvious small-world candidate: you can fly between almost any two cities with just 1-2 layovers, yet airports tend to cluster regionally.
- **EU Email** (1,005 nodes, ~25,571 edges) -- email exchange between members of a European research institution. Email acts as an information-flow proxy: departments form tight clusters, but interdepartmental messages create shortcuts.

In [None]:
G_air = load_graph("airports")
graph_summary(G_air)
print()
G_email = load_graph("email")
graph_summary(G_email)

**Try it yourself**: How many connected components does the email network have? Use `nx.number_connected_components()` on the undirected version. Is the airport network fully connected?

In [None]:
# YOUR CODE HERE
n_components_email = nx.number_connected_components(G_email.to_undirected())
n_components_air = nx.number_connected_components(G_air)

assert n_components_email == 20, (
    "Hint: nx.number_connected_components(G_email.to_undirected())"
)
assert n_components_air == 1, "Hint: nx.number_connected_components(G_air)"
print(f"Email network: {n_components_email} connected component(s)")
print(f"Airport network: {n_components_air} connected component(s)")
print(
    f"Airport network is {'fully connected' if n_components_air == 1 else 'NOT fully connected'}!"
)

---
## 2. Six Degrees of Separation

Let's check: how many hops does it actually take to get between two random airports?

In [None]:
apl_air = nx.average_shortest_path_length(G_air)
diam_air = nx.diameter(G_air)
print(f"Airports: {G_air.number_of_nodes()} nodes")
print(f"  Average shortest path length: {apl_air:.2f}")
print(f"  Diameter: {diam_air}")
print(f"\nEven among 500 airports, the average separation is only ~{apl_air:.1f} hops!")

In [None]:
# Shortest path highlighted on the airport network
nodes_air = list(G_air.nodes())
# Pick two distant airports
src, tgt = nodes_air[0], nodes_air[-1]
path = nx.shortest_path(G_air, src, tgt)
path_edges = list(zip(path[:-1], path[1:]))

with plt.style.context("seaborn-v0_8-muted"):
    fig, ax = plt.subplots(figsize=(10, 7))
    pos = nx.spring_layout(G_air, seed=SEED)
    # Draw all edges in gray
    nx.draw_networkx_edges(G_air, pos, ax=ax, edge_color="#dddddd", width=0.3)
    # Highlight path edges in red
    nx.draw_networkx_edges(
        G_air, pos, ax=ax, edgelist=path_edges, edge_color="#D65F5F", width=3.0
    )
    # Draw all nodes small
    nx.draw_networkx_nodes(
        G_air, pos, ax=ax, node_size=10, node_color="#4878CF", alpha=0.5
    )
    # Highlight path nodes
    nx.draw_networkx_nodes(
        G_air, pos, ax=ax, nodelist=path, node_size=80, node_color="#D65F5F"
    )
    # Label endpoints
    nx.draw_networkx_labels(G_air, pos, {src: src, tgt: tgt}, ax=ax, font_size=9)
    ax.set_title(
        f"Only {len(path) - 1} hops between {src} and {tgt} in {G_air.number_of_nodes()} airports",
        fontsize=13,
    )
    ax.axis("off")
    fig.tight_layout()
    plt.show()

**Visualizing "six degrees"**: The red path above shows the shortest route between two airports. Despite the network having 500 nodes, most pairs are only 2-3 hops apart. The gray tangle of edges is what makes this possible — there are many alternative routes, and hub airports act as express connectors.

---
## 3. The Erdos-Renyi Random Model

Before we can define what makes a network "small-world," we need a **random baseline** to compare against. The Erdos-Renyi (ER) model is the simplest possible random graph: flip a biased coin for every pair of nodes — heads means they get an edge, tails means they don't.

The edge probability p is set via a chosen average degree. Because each edge is independent, the resulting degree distribution is **Poisson** (bell-shaped), which means all nodes have roughly similar degree — no hubs.

In [None]:
G_er = models.erdos_renyi(n=500, avg_degree=6)
graph_summary(G_er)
print(f"Max degree: {max(d for _, d in G_er.degree())}")

In [None]:
viz.plot_degree_dist(G_er, title="Erdos-Renyi (n=500, avg_deg=6)")

In [None]:
viz.plot_degree_dist(G_er, log=True, title="Erdos-Renyi (log-log) — no fat tail")

**Reading the log-log plot**: The ER degree distribution curves *downward* on a log-log scale — this is the signature of an exponential (Poisson) tail. Real networks like airports follow a roughly straight line (power law). We'll explore this gap in Week 4.

### Phase Transition in ER Graphs

Something interesting happens as we increase the edge probability: a **giant connected component** suddenly appears.

In [None]:
# ER phase transition concept: 3 panels at different average degrees
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
phase_params = [
    (0.5, "Below threshold\n⟨k⟩ = 0.5"),
    (1.0, "At threshold\n⟨k⟩ = 1.0"),
    (3.0, "Above threshold\n⟨k⟩ = 3.0"),
]

for ax, (avg_d, label) in zip(axes, phase_params):
    G_phase = models.erdos_renyi(100, avg_d)
    pos = nx.spring_layout(G_phase, seed=SEED, k=0.5)
    # Color giant component red, rest gray
    components = sorted(nx.connected_components(G_phase), key=len, reverse=True)
    giant = components[0]
    colors = ["#D65F5F" if n in giant else "#CCCCCC" for n in G_phase.nodes()]
    sizes = [40 if n in giant else 20 for n in G_phase.nodes()]
    nx.draw_networkx_nodes(G_phase, pos, ax=ax, node_color=colors, node_size=sizes)
    nx.draw_networkx_edges(
        G_phase, pos, ax=ax, edge_color="#999999", width=0.5, alpha=0.5
    )
    gc_frac = len(giant) / G_phase.number_of_nodes()
    ax.set_title(f"{label}\nGiant component: {gc_frac:.0%}", fontsize=11)
    ax.axis("off")

fig.suptitle(
    "ER Phase Transition: Giant Component Emergence (red = giant component)",
    fontsize=13,
    fontweight="bold",
)
fig.tight_layout()
plt.show()

**The phase transition in action**: Below ⟨k⟩ = 1 (left), the network is a collection of tiny isolated fragments. At the threshold (center), a single giant component begins to emerge. Above threshold (right), most nodes belong to one connected giant component.

In [None]:
# Phase transition: vary average degree, track giant component size
n_phase = 500
avg_degrees = np.linspace(0.1, 3.0, 20)
gcc_sizes = []

for avg_d in avg_degrees:
    G_tmp = models.erdos_renyi(n_phase, avg_d)
    gcc = max(nx.connected_components(G_tmp), key=len)
    gcc_sizes.append(len(gcc) / n_phase)

with plt.style.context("seaborn-v0_8-muted"):
    fig, ax = plt.subplots(figsize=(7, 4))
    ax.plot(avg_degrees, gcc_sizes, "o-", markersize=5)
    ax.axvline(1.0, color="red", linestyle="--", alpha=0.5, label="avg_degree = 1")
    ax.set_xlabel("Average degree")
    ax.set_ylabel("Fraction in giant component")
    ax.set_title("ER Phase Transition")
    ax.legend()
    fig.tight_layout()
    plt.show()

**The red dashed line at avg_degree = 1**: Below this threshold, the network is fragmented into many tiny components. Above it, a single **giant component** suddenly absorbs most nodes. This is one of the most celebrated results in random graph theory — the giant component emerges as a sharp phase transition, not a gradual process.

---
## 4. The Small-World Property

A network has the **small-world property** if it has:
1. **High clustering** (much higher than a random graph of the same size)
2. **Short average path length** (comparable to a random graph of the same size)

We can now use the ER model from Section 3 as our baseline. Let's test this for the airports network by comparing it to a random Erdos-Renyi graph with the same number of nodes and edges.

In [None]:
# Compute properties for airports
C_air = nx.average_clustering(G_air)
L_air = nx.average_shortest_path_length(G_air)

# Generate a random graph with same n and avg degree
n = G_air.number_of_nodes()
m = G_air.number_of_edges()
p_er = 2 * m / (n * (n - 1))

# Average over a few random graphs for stability
rng = np.random.default_rng(SEED)
C_rand_list, L_rand_list = [], []
for i in range(5):
    G_rand = nx.erdos_renyi_graph(n, p_er, seed=int(rng.integers(1e6)))
    if nx.is_connected(G_rand):
        C_rand_list.append(nx.average_clustering(G_rand))
        L_rand_list.append(nx.average_shortest_path_length(G_rand))

C_rand = np.mean(C_rand_list)
L_rand = np.mean(L_rand_list)

print(f"{'':20s} {'Airports':>10s} {'Random ER':>10s}")
print(f"{'Clustering (C)':20s} {C_air:10.4f} {C_rand:10.4f}")
print(f"{'Avg path length (L)':20s} {L_air:10.2f} {L_rand:10.2f}")
print(f"\nC_real / C_random = {C_air / C_rand:.1f}x  (>> 1 means high clustering)")
print(f"L_real / L_random = {L_air / L_rand:.2f}  (~1 means short paths)")
sigma = (C_air / C_rand) / (L_air / L_rand)
if sigma > 1:
    print(f"\nAirports IS a small world (\u03c3 = {sigma:.1f}): high clustering, short paths.")
else:
    print(f"\nAirports is NOT a small world (\u03c3 = {sigma:.1f}).")

---
## 5. The Watts-Strogatz Model

In 1998, Watts and Strogatz proposed a simple model that produces small-world networks.

Start with a **ring lattice** (each node connected to its *k* nearest neighbors), then
**rewire** each edge with probability *p*.

- `p = 0`: perfect lattice — high clustering, long paths
- `p = 1`: random graph — low clustering, short paths
- `p ~ 0.01-0.1`: **sweet spot** — high clustering AND short paths

In [None]:
# Watts-Strogatz rewiring progression: p=0, p=0.1, p=1.0
fig, axes = plt.subplots(1, 3, figsize=(14, 4.5))
p_demo = [0, 0.1, 1.0]
labels = ["p = 0\n(regular lattice)", "p = 0.1\n(small world)", "p = 1.0\n(random)"]

for ax, p, label in zip(axes, p_demo, labels):
    G_demo = nx.watts_strogatz_graph(12, 4, p, seed=SEED)
    pos = nx.circular_layout(G_demo)

    # Identify which edges were rewired (compare to p=0 baseline)
    G_base = nx.watts_strogatz_graph(12, 4, 0, seed=SEED)
    base_edges = set(frozenset(e) for e in G_base.edges())

    regular = [(u, v) for u, v in G_demo.edges() if frozenset((u, v)) in base_edges]
    rewired = [(u, v) for u, v in G_demo.edges() if frozenset((u, v)) not in base_edges]

    nx.draw_networkx_edges(
        G_demo, pos, edgelist=regular, ax=ax, edge_color="#999999", width=1.0
    )
    nx.draw_networkx_edges(
        G_demo,
        pos,
        edgelist=rewired,
        ax=ax,
        edge_color="#D65F5F",
        width=2.0,
        style="dashed",
    )
    nx.draw_networkx_nodes(G_demo, pos, ax=ax, node_color="#4878CF", node_size=120)
    ax.set_title(label, fontsize=11)
    ax.axis("off")

fig.suptitle(
    "Watts-Strogatz Rewiring: gray = original, red dashed = rewired", fontsize=12
)
fig.tight_layout()
plt.show()

In [None]:
# Visualize WS graphs at different rewiring probabilities
p_values = [0, 0.01, 0.1, 0.5, 1.0]
fig, axes = plt.subplots(1, 5, figsize=(20, 4))

for ax, p in zip(axes, p_values):
    G_ws = nx.watts_strogatz_graph(30, 4, p, seed=SEED)
    pos = nx.circular_layout(G_ws)
    nx.draw_networkx(
        G_ws,
        pos,
        ax=ax,
        node_color="#4878CF",
        node_size=40,
        edge_color="#cccccc",
        width=0.5,
        with_labels=False,
        alpha=0.9,
    )
    C = nx.average_clustering(G_ws)
    ax.set_title(f"p = {p}\nC = {C:.2f}")
    ax.axis("off")

fig.suptitle("Watts-Strogatz: from lattice (p=0) to random (p=1)", fontsize=14)
fig.tight_layout()
plt.show()

**What to notice**: At p = 0.01 the graph still *looks* like a ring -- almost all edges are local. Yet even these few rewired shortcuts dramatically reduce path lengths. This is the key insight: structure can look highly ordered while distances behave as if the network were random.

In [None]:
# Compute C and L at each p
print(f"{'p':>6s} {'C':>8s} {'L':>8s}")
print("-" * 24)
for p in p_values:
    G_ws = models.watts_strogatz(200, 6, p, seed=SEED)
    C = nx.average_clustering(G_ws)
    L = nx.average_shortest_path_length(G_ws)
    print(f"{p:6.2f} {C:8.4f} {L:8.2f}")

**The asymmetry**: Path length (L) is highly sensitive to the *first few* rewirings -- a handful of shortcuts create "express lanes" across the ring. Clustering (C) is robust -- it takes *many* rewirings to disrupt the local triangles. This asymmetry is what creates the small-world regime.

---
## 6. The Sweet Spot: C(p) and L(p) vs p

The classic Watts-Strogatz figure normalizes clustering and path length by their values at p=0.
The key insight: **path length drops dramatically with just a tiny bit of rewiring**, while
**clustering stays high** until much more rewiring occurs.

**From formula to code**: The sweep below normalizes both metrics by their values at p = 0:

$$\frac{C(p)}{C(0)} \qquad \text{and} \qquad \frac{L(p)}{L(0)}$$

This puts both curves on a common 0-to-1 scale so we can compare their *rates of change*. At p = 0 both ratios equal 1.0 by definition; as p → 1 both approach their random-graph values. The key question is: **at what p does each curve start to drop?** The code first builds the p = 0 baseline graph, records `C0` and `L0`, then divides each subsequent measurement by these baselines.

In [None]:
# Sweep over p values (log-spaced)
n_ws, k_ws = 200, 6
p_sweep = np.logspace(-4, 0, 15)

# Get baseline at p=0
G0 = models.watts_strogatz(n_ws, k_ws, 0, seed=SEED)
C0 = nx.average_clustering(G0)
L0 = nx.average_shortest_path_length(G0)

C_list, L_list = [], []
for p in p_sweep:
    G_ws = models.watts_strogatz(n_ws, k_ws, p, seed=SEED)
    C_list.append(nx.average_clustering(G_ws) / C0)
    L_list.append(nx.average_shortest_path_length(G_ws) / L0)

# Plot the classic WS figure
with plt.style.context("seaborn-v0_8-muted"):
    fig, ax = plt.subplots(figsize=(7, 5))
    ax.plot(p_sweep, C_list, "o-", label="C(p) / C(0)", markersize=5)
    ax.plot(p_sweep, L_list, "s-", label="L(p) / L(0)", markersize=5)
    ax.set_xscale("log")
    ax.set_xlabel("Rewiring probability p")
    ax.set_ylabel("Normalized value")
    ax.set_title("The Small-World Sweet Spot")
    ax.legend()
    ax.set_ylim(0, 1.1)
    fig.tight_layout()
    plt.show()

**The small-world regime**: Look at the gap between the two curves in the range p ~ 0.001 to p ~ 0.1. In this zone, L has already dropped to near its random-graph value while C remains close to its lattice value. This is the "sweet spot" -- networks with both short paths and high clustering. Most real social networks live in this regime.

**Before you tweak**: The cell below lets you change *n* (network size) and *k* (number of neighbors). Predict: if you increase k from 6 to 10, will the transition zone shift left, right, or stay the same?

In [None]:
# ---- TWEAK: Change n and k, re-run the sweep ----
n_tw = 200  # <-- change me (try 100, 500)
k_tw = 6  # <-- change me (try 4, 10)

G0_tw = models.watts_strogatz(n_tw, k_tw, 0, seed=SEED)
C0_tw = nx.average_clustering(G0_tw)
L0_tw = nx.average_shortest_path_length(G0_tw)

C_tw, L_tw = [], []
for p in p_sweep:
    G_tw = models.watts_strogatz(n_tw, k_tw, p, seed=SEED)
    C_tw.append(nx.average_clustering(G_tw) / C0_tw)
    L_tw.append(nx.average_shortest_path_length(G_tw) / L0_tw)

with plt.style.context("seaborn-v0_8-muted"):
    fig, ax = plt.subplots(figsize=(7, 5))
    ax.plot(p_sweep, C_tw, "o-", label="C(p)/C(0)", markersize=5)
    ax.plot(p_sweep, L_tw, "s-", label="L(p)/L(0)", markersize=5)
    ax.set_xscale("log")
    ax.set_xlabel("Rewiring probability p")
    ax.set_ylabel("Normalized value")
    ax.set_title(f"WS Sweep (n={n_tw}, k={k_tw})")
    ax.legend()
    ax.set_ylim(0, 1.1)
    fig.tight_layout()
    plt.show()

**What you should see**: Increasing *k* deepens the small-world "sweet spot" — more local connections raise the baseline clustering, so the gap between C(p) and L(p) at intermediate p values grows wider. Increasing *n* shifts the transition zone: larger rings need more shortcuts to bridge distant parts, so L drops at slightly higher p. The qualitative shape (L drops first, C drops later) is universal.

---
## 7. Kleinberg's Navigable Small Worlds

Watts-Strogatz showed that short paths *exist* in small-world networks. But Milgram's experiment revealed something deeper: people can actually *find* those short paths using only local information — they don't have a map of the whole network!

This distinction is crucial. In a random graph, short paths exist but no one can find them without a global map — you'd need to run BFS on the entire network. In Milgram's experiment, people used a simple heuristic: forward the letter to whoever seems "closer" to the target (by geography, profession, or social circle). Somehow, this greedy strategy works.

In 2000, Jon Kleinberg asked: **what network structure allows decentralized (greedy) search to find short paths?**

### The Model

Start with a 2D grid where each node connects to its local neighbors. Then add **long-range links** with probability proportional to distance^(-r):

$$P(\text{link to } v) \;\propto\; d(u,v)^{-r}$$

The exponent *r* controls the reach of long-range links:
- **r = 0**: uniform random links (like adding random shortcuts)
- **r = 2**: optimal for greedy routing — links are "tuned" to the grid dimension
- **r >> 2**: links are almost all local (no useful shortcuts)

In [None]:
# Kleinberg grids at r=0, r=2, r=4
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
r_values = [0, 2, 4]
r_labels = ["r = 0\n(uniform random)", "r = 2\n(optimal)", "r = 4\n(nearly local)"]

for ax, r_val, label in zip(axes, r_values, r_labels):
    G_k, pos_k = models.kleinberg_grid(n=15, r=r_val, q=1)

    # Separate local edges from long-range edges
    local_edges = []
    long_edges = []
    for u, v in G_k.edges():
        dist = abs(u[0] - v[0]) + abs(u[1] - v[1])
        if dist <= 1:
            local_edges.append((u, v))
        else:
            long_edges.append((u, v))

    nx.draw_networkx_edges(
        G_k, pos_k, edgelist=local_edges, ax=ax, edge_color="#CCCCCC", width=0.5
    )
    nx.draw_networkx_edges(
        G_k,
        pos_k,
        edgelist=long_edges,
        ax=ax,
        edge_color="#D65F5F",
        width=1.5,
        alpha=0.7,
    )
    nx.draw_networkx_nodes(G_k, pos_k, ax=ax, node_size=30, node_color="#4878CF")
    ax.set_title(f"{label}\n{len(long_edges)} long-range links", fontsize=11)
    ax.axis("off")

fig.suptitle(
    "Kleinberg Model: Local Grid + Long-Range Links (red)",
    fontsize=13,
    fontweight="bold",
)
fig.tight_layout()
plt.show()

**What each r value means**: At r=0 (left), long-range links jump uniformly across the grid — some reach very far, but they don't "know" about distance. At r=2 (center), links span a wide range of distances with the right balance — some reach far, but most connect moderately distant nodes. At r=4 (right), almost all links stay close to the source — essentially just thickening the local neighborhood.

In [None]:
# Demo: run greedy routing on 50 random pairs at each r value
rng_k = np.random.default_rng(SEED)
n_grid = 15
n_pairs = 50

print(f"{'r':>3s} {'Mean path':>12s} {'Success rate':>14s}")
print("-" * 32)
for r_val in [0, 1, 2, 3, 4]:
    G_k, pos_k = models.kleinberg_grid(n=n_grid, r=r_val, q=1, seed=SEED)
    nodes_k = list(G_k.nodes())
    lengths = []
    successes = 0
    for _ in range(n_pairs):
        s, t = tuple(rng_k.choice(len(nodes_k), size=2, replace=False))
        path = greedy_route(G_k, nodes_k[s], nodes_k[t], pos_k)
        if path is not None:
            lengths.append(len(path) - 1)
            successes += 1
    mean_len = np.mean(lengths) if lengths else float("inf")
    print(f"{r_val:3d} {mean_len:12.1f} {successes / n_pairs:14.0%}")

In [None]:
# Step-by-step greedy routing: show the decision at each hop
G_step, pos_step = models.kleinberg_grid(n=10, r=2, q=1, seed=SEED)
source_step, target_step = (0, 0), (9, 9)

# Trace greedy route with decision details
path_step = [source_step]
current = source_step
visited = {source_step}
decisions = []
for _ in range(G_step.number_of_nodes()):
    if current == target_step:
        break
    neighbors = [n for n in G_step.neighbors(current) if n not in visited]
    if not neighbors:
        break
    # Compute distances for all neighbors
    dists = {
        n: abs(n[0] - target_step[0]) + abs(n[1] - target_step[1]) for n in neighbors
    }
    best = min(dists, key=dists.get)
    decisions.append(
        (current, best, dict(sorted(dists.items(), key=lambda x: x[1])[:4]))
    )
    visited.add(best)
    path_step.append(best)
    current = best

# Show first 4 steps as panels
n_panels = min(4, len(decisions))
fig, axes = plt.subplots(1, n_panels, figsize=(4.5 * n_panels, 4.5))
if n_panels == 1:
    axes = [axes]

for idx, ax in enumerate(axes):
    current_node = decisions[idx][0]
    chosen = decisions[idx][1]
    neighbor_dists = decisions[idx][2]

    # Draw grid lightly
    nx.draw_networkx_edges(G_step, pos_step, ax=ax, edge_color="#EEEEEE", width=0.3)
    nx.draw_networkx_nodes(G_step, pos_step, ax=ax, node_size=15, node_color="#DDDDDD")

    # Draw path so far
    partial_path = path_step[: idx + 1]
    if len(partial_path) > 1:
        path_edges = list(zip(partial_path[:-1], partial_path[1:]))
        nx.draw_networkx_edges(
            G_step,
            pos_step,
            edgelist=path_edges,
            ax=ax,
            edge_color="#999999",
            width=2.0,
        )
        nx.draw_networkx_nodes(
            G_step,
            pos_step,
            nodelist=partial_path[:-1],
            ax=ax,
            node_size=30,
            node_color="#999999",
        )

    # Highlight current node
    nx.draw_networkx_nodes(
        G_step,
        pos_step,
        nodelist=[current_node],
        ax=ax,
        node_size=100,
        node_color="#D65F5F",
        edgecolors="black",
        linewidths=1.5,
    )

    # Highlight candidate neighbors with distance labels
    candidates = list(neighbor_dists.keys())
    cand_colors = ["#6ACC65" if n == chosen else "#FFD700" for n in candidates]
    nx.draw_networkx_nodes(
        G_step,
        pos_step,
        nodelist=candidates,
        ax=ax,
        node_size=60,
        node_color=cand_colors,
        edgecolors="black",
        linewidths=0.8,
    )
    # Label distances
    for n, d in neighbor_dists.items():
        ax.annotate(
            f"d={d}",
            xy=pos_step[n],
            xytext=(5, 5),
            textcoords="offset points",
            fontsize=7,
            color="#333333",
        )

    # Highlight target
    nx.draw_networkx_nodes(
        G_step,
        pos_step,
        nodelist=[target_step],
        ax=ax,
        node_size=100,
        node_color="#4878CF",
        edgecolors="black",
        linewidths=1.5,
    )

    # Arrow from current to chosen
    ax.annotate(
        "",
        xy=pos_step[chosen],
        xytext=pos_step[current_node],
        arrowprops=dict(arrowstyle="-|>", color="#D65F5F", lw=2),
    )

    ax.set_title(
        f"Step {idx + 1}: at {current_node}\nChoose {chosen} (d={neighbor_dists[chosen]})",
        fontsize=10,
    )
    ax.axis("off")

fig.suptitle(
    "Greedy Routing Step-by-Step (r=2): red=current, green=chosen, blue=target",
    fontsize=12,
    fontweight="bold",
)
fig.tight_layout()
plt.show()

**How greedy routing decides**: At each step (panel), the current node (red) evaluates all its neighbors and picks the one closest to the target (blue) by Manhattan distance. The chosen neighbor is green; alternatives are yellow with their distances shown. Notice how long-range links (when available) let the algorithm make big jumps early on, while local grid edges handle the final approach. This is why r=2 works — the long-range links provide useful shortcuts at every distance scale.

In [None]:
# Visualize a single greedy route at r=0, r=2, r=4
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
rng_path = np.random.default_rng(42)

for ax, r_val, label in zip(axes, [0, 2, 4], ["r = 0", "r = 2 (optimal)", "r = 4"]):
    G_k, pos_k = models.kleinberg_grid(n=15, r=r_val, q=1, seed=SEED)
    nodes_k = list(G_k.nodes())

    # Pick a pair with reasonable distance
    source, target = nodes_k[0], nodes_k[-1]
    path = greedy_route(G_k, source, target, pos_k)

    # Draw grid
    nx.draw_networkx_edges(G_k, pos_k, ax=ax, edge_color="#EEEEEE", width=0.3)
    nx.draw_networkx_nodes(G_k, pos_k, ax=ax, node_size=20, node_color="#CCCCCC")

    if path is not None:
        path_edges = list(zip(path[:-1], path[1:]))
        nx.draw_networkx_edges(
            G_k, pos_k, edgelist=path_edges, ax=ax, edge_color="#D65F5F", width=2.5
        )
        nx.draw_networkx_nodes(
            G_k, pos_k, nodelist=path, ax=ax, node_size=40, node_color="#D65F5F"
        )
        nx.draw_networkx_nodes(
            G_k,
            pos_k,
            nodelist=[source, target],
            ax=ax,
            node_size=100,
            node_color=["#6ACC65", "#4878CF"],
        )
        ax.set_title(f"{label}\nGreedy path: {len(path) - 1} hops", fontsize=11)
    else:
        ax.set_title(f"{label}\nGreedy routing STUCK", fontsize=11)
    ax.axis("off")

fig.suptitle(
    "Greedy Routing: green = source, blue = target, red = path",
    fontsize=13,
    fontweight="bold",
)
fig.tight_layout()
plt.show()

In [None]:
# Sweep r from 0 to 5, measure mean greedy path length
r_sweep = np.linspace(0, 5, 11)
mean_lengths = []
success_rates = []
n_grid = 15
n_pairs = 50
rng_sweep = np.random.default_rng(SEED)

for r_val in r_sweep:
    G_k, pos_k = models.kleinberg_grid(n=n_grid, r=r_val, q=1, seed=SEED)
    nodes_k = list(G_k.nodes())
    lengths = []
    successes = 0
    for _ in range(n_pairs):
        s, t = tuple(rng_sweep.choice(len(nodes_k), size=2, replace=False))
        path = greedy_route(G_k, nodes_k[s], nodes_k[t], pos_k)
        if path is not None:
            lengths.append(len(path) - 1)
            successes += 1
    mean_lengths.append(np.mean(lengths) if lengths else float("nan"))
    success_rates.append(successes / n_pairs)

with plt.style.context("seaborn-v0_8-muted"):
    fig, ax = plt.subplots(figsize=(7, 5))
    ax.plot(r_sweep, mean_lengths, "o-", markersize=6)
    ax.axvline(
        2.0, color="red", linestyle="--", alpha=0.5, label="r = 2 (grid dimension)"
    )
    ax.set_xlabel("Clustering exponent r")
    ax.set_ylabel("Mean greedy path length")
    ax.set_title("Kleinberg Sweep: Optimal Routing at r = grid dimension")
    ax.legend()
    fig.tight_layout()
    plt.show()

**The minimum at r = 2**: Greedy routing is most efficient when the clustering exponent matches the grid dimension (d = 2 for our 2D grid). At this sweet spot, the distribution of link distances is **scale-invariant** — there are roughly equal numbers of links at every distance scale, giving greedy search useful shortcuts at every stage of navigation.

Kleinberg proved that greedy routing achieves O(log² n) path length at r = d, while any other r value leads to polynomial path lengths. This is why r = 2 is special — it's the only exponent where decentralized search is efficient.

**Why this matters for real networks**: Social networks are navigable because friendships encode geographic and social distance with approximately the right decay exponent. When you forward a letter toward a target, you choose friends who are "closer" in some social dimension — and this works because your friendship distribution has the Kleinberg property.

**Think about it**: What value of r would be optimal for greedy routing on a 1D ring (instead of a 2D grid)? (Answer: r = 1, matching the dimension.)

---
## 8. Real Networks are Small Worlds

The **small-world coefficient** sigma = (C/C_rand) / (L/L_rand). If sigma > 1, the network
is considered a small world.

**From formula to code**: The small-world coefficient combines both comparisons into a single number:

$$\sigma = \frac{C_{\text{real}} \;/\; C_{\text{random}}}{L_{\text{real}} \;/\; L_{\text{random}}}$$

The numerator captures how much *more* clustered the real network is compared to a random graph with the same size and density. The denominator captures how much *longer* its paths are. A true small world has a large numerator (clustering far exceeds random) and a denominator near 1 (paths are comparable), so σ >> 1. The function below computes this by generating several Erdős–Rényi baselines and averaging their C and L to get stable estimates.

In [None]:
# Email network is directed — convert to undirected for APL
G_email_u = G_email.to_undirected()

# Use largest connected component for both
G_air_cc = G_air.subgraph(max(nx.connected_components(G_air), key=len)).copy()
G_email_cc = G_email_u.subgraph(max(nx.connected_components(G_email_u), key=len)).copy()

In [None]:
small_world_table(G_air_cc, "US Airports")
small_world_table(G_email_cc, "EU Email")

---
## Summary

| Concept | Key insight |
|---------|-------------|
| **Erdos-Renyi model** | Simplest random graph — Poisson degree distribution, no hubs |
| **Phase transition** | Giant component appears when avg_degree > 1 |
| **Six degrees** | Even large networks have short average paths |
| **Small-world property** | High clustering AND short paths (relative to random) |
| **Watts-Strogatz model** | Ring lattice + tiny rewiring = small world |
| **Sweet spot** | L drops fast, C drops slowly as p increases |
| **Kleinberg model** | Greedy routing optimal when r = grid dimension |
| **Navigability** | Short paths exist AND can be found with local info when r = d |
| **Sigma** | C_ratio / L_ratio > 1 indicates small-world |

Next week: **Scale-Free Networks, Hubs & Resilience** — why do some nodes have hundreds of connections?