# Week 2: Network Properties

**Learning objectives** — After this lab you should be able to:

- Interpret degree distributions (including log-log plots)
- Compute and explain shortest paths, diameter, and average path length
- Understand the clustering coefficient and what it measures
- Compare three centrality measures: degree, betweenness, and closeness
- Identify important nodes using centrality-based visualizations
- Recognize the adjacency, degree, and Laplacian matrix representations of a graph

Last week we learned the vocabulary of networks. This week we learn how to **measure** them.
How do we decide which node is most important? How tightly knit is a network?
How far apart are two random nodes? These questions are answered by network *properties*.

In [None]:
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
from netsci.loaders import load_graph
from netsci.utils import SEED, graph_summary
from netsci import viz

---
## 1. Datasets

This week we use two networks with very different structures:
- **Facebook** ego network (334 nodes) — a dense social network
- **US Power Grid** (4,941 nodes) — a sparse infrastructure network

- **Facebook** — a SNAP ego network: one user's friend list plus all the connections *among* those friends. With 334 nodes and ~2,852 edges, it is a dense social graph where friend-of-a-friend links are very common. Expect high clustering and a fat-tailed degree distribution.
- **US Power Grid** — compiled by Watts & Strogatz (1998), this network of 4,941 substations and transmission lines is sparse and nearly planar. Engineering constraints (each substation connects to a few neighbors) produce a narrow degree distribution with no extreme hubs.
- **Yeast Protein Interactions** — mapped by Jeong et al. (2001), a landmark Barabasi paper showing that protein networks are scale-free. Nodes are proteins in yeast (*S. cerevisiae*), edges are physical interactions. With 1,870 proteins and 2,277 interactions, it gives us a **biological** comparison point alongside social (Facebook) and infrastructure (Power Grid) networks.

In [None]:
G_fb = load_graph("facebook")
graph_summary(G_fb)
print()
G_pg = load_graph("powergrid")
graph_summary(G_pg)
print()
G_pr = load_graph("protein")
graph_summary(G_pr)

---
## 2. Degree Distribution

Think of an airport: its **degree** is the number of routes it serves.
Most small airports have just a handful of routes, but major hubs like Atlanta or Chicago have hundreds.

The **degree distribution** shows how degrees are spread across all nodes.

**From concept to code**: The plot below tallies how many nodes share each degree value *k* and displays the resulting frequency. On a **log-log** scale, a straight line implies a power law $P(k) \propto k^{-\gamma}$ — a few extreme hubs coexist with many low-degree nodes. A sharply peaked histogram, by contrast, signals a narrow distribution with no extreme hubs.

In [None]:
viz.plot_degree_dist(G_fb, title="Facebook — Degree Distribution")

In [None]:
viz.plot_degree_dist(G_fb, log=True, title="Facebook — Degree Distribution (log-log)")

**Reading the log-log plot**: The roughly straight-line trend is the signature of a **power law** (or “scale-free”) distribution. It means a few nodes have very high degree (hubs) while the vast majority have low degree. In Facebook terms: most people have a modest number of friends, but a few “social connectors” befriend dozens.

On the log-log plot, the points roughly follow a straight line. This is a **fat-tailed** (or heavy-tailed) distribution — a signature of many real-world networks. It means there are a few nodes with very high degree ("hubs") and many nodes with low degree.

Compare this to the power grid:

In [None]:
viz.plot_degree_dist(G_pg, title="Power Grid — Degree Distribution")

The power grid is very different — most nodes have degree 2-3 (think of a power line running through substations). There are no extreme hubs.

**The contrast**: The power grid’s distribution falls off sharply after degree 3–4 — there are essentially no hubs. This reflects an engineering constraint: each substation connects only to its geographic neighbors. In social networks, there is no such constraint, so popularity can compound without limit.

---
## 3. Paths and Shortest Paths

Imagine you want to fly from a small regional airport to another small airport on the opposite coast.
You need to take connecting flights — the **shortest path** is the route with the fewest layovers.

Key concepts:
- **Shortest path length**: fewest edges between two nodes
- **Diameter**: the longest shortest path in the graph (the "worst case")
- **Average shortest path length (APL)**: the typical separation between nodes

In [None]:
# Shortest path example in Facebook network
nodes_fb = list(G_fb.nodes())
path = nx.shortest_path(G_fb, source=nodes_fb[0], target=nodes_fb[-1])
print(f"Shortest path from {nodes_fb[0]} to {nodes_fb[-1]}: {path}")
print(f"Length: {len(path) - 1} hops")

In [None]:
# Diameter and average path length (Facebook is small enough to compute exactly)
diameter_fb = nx.diameter(G_fb)
apl_fb = nx.average_shortest_path_length(G_fb)
print(f"Facebook diameter: {diameter_fb}")
print(f"Facebook average path length: {apl_fb:.2f}")
print(
    f"\nEven with {G_fb.number_of_nodes()} people, the average separation is only ~{apl_fb:.1f} hops!"
)

### How shortest paths are found: BFS

**Breadth-First Search (BFS)** explores a graph layer by layer outward from a starting node.
It naturally finds shortest paths in unweighted graphs.

In [None]:
# BFS from a starting node — count how many nodes are at each distance
source = nodes_fb[0]
lengths = nx.single_source_shortest_path_length(G_fb, source)
max_dist = max(lengths.values())

print(f"BFS from node {source}:")
for d in range(max_dist + 1):
    count = sum(1 for v in lengths.values() if v == d)
    print(f"  Distance {d}: {count} nodes")

**Almost everyone within 3–4 hops**: The BFS output shows that starting from a single node, you can reach nearly the entire Facebook network within 3–4 steps. This is the computational version of “six degrees of separation” — and in dense social networks, it’s often closer to *two or three* degrees.

### Bridges: Critical Connections

A **bridge** is an edge whose removal would disconnect the graph — it's the only path between two parts of the network. Bridges represent structurally critical infrastructure: in the power grid, a bridge is a transmission line with no backup route.

In [None]:
# Find bridges in the Power Grid
bridges = list(nx.bridges(G_pg))
print(f"Power Grid has {len(bridges)} bridges out of {G_pg.number_of_edges()} edges")
print(
    f"That's {len(bridges) / G_pg.number_of_edges():.1%} of all edges — surprisingly many!"
)

# Visualize a bridge in a small subgraph
# Pick a bridge and extract its local neighborhood
bridge_u, bridge_v = bridges[0]
neighborhood = set(
    nx.single_source_shortest_path_length(G_pg, bridge_u, cutoff=2).keys()
)
neighborhood |= set(
    nx.single_source_shortest_path_length(G_pg, bridge_v, cutoff=2).keys()
)
G_local = G_pg.subgraph(neighborhood).copy()

with plt.style.context("seaborn-v0_8-muted"):
    fig, ax = plt.subplots(figsize=(7, 5))
    pos = nx.spring_layout(G_local, seed=SEED)
    # Draw all edges gray
    nx.draw_networkx_edges(G_local, pos, ax=ax, edge_color="#cccccc", width=1.0)
    # Highlight the bridge in red
    nx.draw_networkx_edges(
        G_local,
        pos,
        ax=ax,
        edgelist=[(bridge_u, bridge_v)],
        edge_color="#D65F5F",
        width=3.0,
    )
    # Color bridge endpoints
    colors = [
        "#D65F5F" if n in (bridge_u, bridge_v) else "#4878CF" for n in G_local.nodes()
    ]
    nx.draw_networkx_nodes(G_local, pos, ax=ax, node_color=colors, node_size=100)
    ax.set_title(
        f"A bridge edge (red) in the Power Grid\nRemoving it disconnects these two subgraphs",
        fontsize=12,
    )
    ax.axis("off")
    fig.tight_layout()
    plt.show()

---
## 4. Clustering Coefficient

Here's a simple question: **are your friends also friends with each other?**

The **clustering coefficient** of a node measures exactly this. If you have 5 friends and they all know each other, your clustering coefficient is 1.0. If none of them know each other, it's 0.0.

Formally, for a node with degree $k$:

$$C_i = \frac{\text{number of edges among neighbors}}{\binom{k}{2}} = \frac{2 \times \text{triangles}}{k(k-1)}$$

In [None]:
# Concept diagram: clustering coefficient = 1.0, 0.5, 0.0 for a center node
fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))

# C = 1.0: center node with all neighbors connected to each other
G1 = nx.Graph()
G1.add_edges_from(
    [(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
)
c1 = nx.clustering(G1, 0)

# C = 0.5: center node with half the neighbor pairs connected
G2 = nx.Graph()
G2.add_edges_from([(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (2, 3), (3, 4)])
c2 = nx.clustering(G2, 0)

# C = 0.0: star — no neighbors connected to each other
G3 = nx.star_graph(4)
c3 = nx.clustering(G3, 0)

for ax, G, c_val, label in zip(
    axes,
    [G1, G2, G3],
    [c1, c2, c3],
    ["All neighbors connected", "Some neighbors connected", "No neighbors connected"],
):
    pos = nx.spring_layout(G, seed=SEED)
    colors = ["#6ACC65" if n == 0 else "#B47CC7" for n in G.nodes()]
    sizes = [400 if n == 0 else 250 for n in G.nodes()]
    nx.draw_networkx(
        G,
        pos,
        ax=ax,
        node_color=colors,
        node_size=sizes,
        edge_color="#999999",
        width=1.5,
        with_labels=True,
        font_size=9,
        font_color="white",
    )
    ax.set_title(f"C(green node) = {c_val:.2f}\n{label}", fontsize=11)
    ax.axis("off")

fig.suptitle(
    "Clustering Coefficient: How Connected Are Your Neighbors?",
    fontsize=13,
    fontweight="bold",
)
fig.tight_layout()
plt.show()

**Reading the panels**: The green center node has 4 neighbors (purple) in each case. When all 6 possible neighbor pairs are connected (left), clustering = 1.0 — a perfect clique. When 3 of 6 pairs are connected (middle), clustering = 0.5. When no neighbors know each other (right, a "star"), clustering = 0.0. Real nodes fall somewhere on this spectrum.

The formula above translates to a simple recipe: for each node, (1) list its neighbors, (2) count how many pairs of neighbors are connected, (3) divide by the total possible pairs. Let’s verify with a toy example first, then compare real networks.

In [None]:
# Toy example: a triangle has perfect clustering
triangle = nx.Graph()
triangle.add_edges_from([(0, 1), (1, 2), (0, 2)])

for node in triangle.nodes():
    print(f"Node {node}: clustering = {nx.clustering(triangle, node):.2f}")

print(f"\nAverage clustering: {nx.average_clustering(triangle):.2f}")

**Predict before you run**: The next cell compares average clustering in Facebook (a dense social network) vs the US Power Grid (a sparse infrastructure network). Before you execute it, predict: which network will have higher clustering, and by roughly how much? Consider whether the "friends of friends become friends" dynamic that drives clustering in social life has any counterpart in how power lines are routed.

In [None]:
# Compare clustering: Facebook (social) vs Power Grid (infrastructure)
cc_fb = nx.average_clustering(G_fb)
cc_pg = nx.average_clustering(G_pg)
print(f"Facebook average clustering:   {cc_fb:.4f}")
print(f"Power Grid average clustering: {cc_pg:.4f}")
print(f"\nFacebook is {cc_fb / cc_pg:.0f}x more clustered than the Power Grid.")
print("Social networks form tight friend groups; power grids are tree-like.")

**What 60% clustering means**: In the Facebook network, roughly 60% of the time, two of your friends also know each other. In a random graph of the same size and density, this number would be around 5% (equal to the density). The enormous gap tells us that social networks are not random — friendships are highly **transitive** (friends of friends tend to become friends).

**Try it yourself**: Compute the clustering coefficient of node 0 in the Facebook network by hand using the formula, then verify with `nx.clustering()`. Fill in the cell below.

In [None]:
# Step 1: count neighbors and edges among them
node = 0
neighbors = list(G_fb.neighbors(node))
k = len(neighbors)
# Count edges among neighbors
neighbor_set = set(neighbors)
edges_among = sum(
    1 for u in neighbors for v in G_fb.neighbors(u) if v in neighbor_set and v > u
)

# YOUR CODE HERE — compute clustering using the formula C = 2*edges / (k*(k-1))
C_manual = 2 * edges_among / (k * (k - 1))
C_nx = nx.clustering(G_fb, node)
assert abs(C_manual - C_nx) < 1e-6, (
    f"Hint: C = 2 * {edges_among} / ({k} * {k - 1}) = {C_nx:.6f}"
)
print(f"Node {node}: {k} neighbors, {edges_among} edges among them")
print(f"C_manual = {C_manual:.6f}, C_networkx = {C_nx:.6f}")

---
## 5. Centrality Measures

"Importance" in a network can mean different things. Here are three common centrality measures:

| Centrality | Analogy | Formula |
|-----------|---------|--------|
| **Degree** | The popular person (most friends) | Number of connections |
| **Betweenness** | The bridge between groups (broker) | Fraction of shortest paths passing through the node |
| **Closeness** | The person closest to everyone | Inverse of average distance to all other nodes |

In [None]:
# Concept diagram: two structural roles in networks
fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))

# --- Panel 1: Degree Hub (wheel graph — center connects to all) ---
G_hub = nx.wheel_graph(8)  # node 0 = center, nodes 1-7 = rim
pos_hub = nx.spring_layout(G_hub, seed=SEED)
colors_hub = ["#D65F5F" if n == 0 else "#4878CF" for n in G_hub.nodes()]
sizes_hub = [500 if n == 0 else 200 for n in G_hub.nodes()]
nx.draw_networkx(
    G_hub,
    pos_hub,
    ax=axes[0],
    node_color=colors_hub,
    node_size=sizes_hub,
    edge_color="#999999",
    width=1.2,
    with_labels=False,
)
axes[0].set_title("Degree Hub\nconnected to many nodes", fontsize=11)
axes[0].axis("off")

# --- Panel 2: Betweenness Bridge (two cliques joined by one node) ---
G_bridge = nx.Graph()
for i in range(5):  # clique A: nodes 0-4
    for j in range(i + 1, 5):
        G_bridge.add_edge(i, j)
for i in range(6, 11):  # clique B: nodes 6-10
    for j in range(i + 1, 11):
        G_bridge.add_edge(i, j)
G_bridge.add_edges_from([(2, 5), (5, 8)])  # node 5 = bridge

pos_bridge = nx.spring_layout(G_bridge, seed=42)
colors_bridge = ["#D65F5F" if n == 5 else "#4878CF" for n in G_bridge.nodes()]
sizes_bridge = [500 if n == 5 else 200 for n in G_bridge.nodes()]
nx.draw_networkx(
    G_bridge,
    pos_bridge,
    ax=axes[1],
    node_color=colors_bridge,
    node_size=sizes_bridge,
    edge_color="#999999",
    width=1.2,
    with_labels=False,
)
axes[1].set_title("Betweenness Bridge\nconnects otherwise-separate groups", fontsize=11)
axes[1].axis("off")

fig.suptitle("Two Structural Roles in Networks", fontsize=13, fontweight="bold")
fig.tight_layout()
plt.show()

A key question for this section: **do the same nodes appear “important” under all three measures?** If degree, betweenness, and closeness agree, importance is one-dimensional. If they disagree, different nodes play different structural roles — some are locally popular (degree hubs), some bridge distant communities (betweenness brokers), and some are simply close to everyone (closeness centers).

**From formula to code**: Each centrality has a precise definition that NetworkX computes for us:

- **Degree centrality**: $C_D(v) = \frac{\deg(v)}{n - 1}$ — fraction of all other nodes that $v$ connects to. Highest for hubs with many direct links.
- **Closeness centrality**: $C_C(v) = \frac{n - 1}{\sum_u d(v,u)}$ — reciprocal of average distance to all other nodes. Highest for nodes in the "geographic center" of the network.
- **Betweenness centrality**: $C_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}}$ — fraction of all shortest paths between *other* pairs that pass through $v$. Highest for bridge nodes (see diagram above).

The cell below calls `nx.degree_centrality`, `nx.betweenness_centrality`, and `nx.closeness_centrality` to compute all three at once.

In [None]:
# Compute all three centralities on Facebook
deg_cent = nx.degree_centrality(G_fb)
bet_cent = nx.betweenness_centrality(G_fb, seed=SEED)
clo_cent = nx.closeness_centrality(G_fb)

In [None]:
# Top 5 nodes by each centrality
for name, cent in [
    ("Degree", deg_cent),
    ("Betweenness", bet_cent),
    ("Closeness", clo_cent),
]:
    top5 = sorted(cent, key=cent.get, reverse=True)[:5]
    print(f"Top 5 by {name}: {top5}")

In [None]:
# Visualize: color and size by each centrality
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
pos = nx.spring_layout(G_fb, seed=SEED)

centralities = [
    ("Degree Centrality", deg_cent),
    ("Betweenness Centrality", bet_cent),
    ("Closeness Centrality", clo_cent),
]

for ax, (name, cent) in zip(axes, centralities):
    values = [cent[n] for n in G_fb.nodes()]
    sizes = [300 * v / max(values) + 10 for v in values]
    nx.draw_networkx(
        G_fb,
        pos,
        ax=ax,
        node_color=values,
        cmap=plt.cm.YlOrRd,
        node_size=sizes,
        edge_color="#cccccc",
        width=0.3,
        with_labels=False,
        alpha=0.9,
    )
    ax.set_title(name)
    ax.axis("off")

fig.suptitle("Facebook — Three Centrality Measures", fontsize=14)
fig.tight_layout()
plt.show()

Notice that the three measures often highlight *different* nodes:
- A node with many connections (high degree) may not sit between communities (low betweenness).
- A node that bridges two clusters may have moderate degree but very high betweenness.

---
## 6. Tweak & Observe

In [None]:
# ---- TWEAK: Change which centrality measure is used for coloring ----
centrality = deg_cent  # <-- change me: deg_cent, bet_cent, or clo_cent

values = [centrality[n] for n in G_fb.nodes()]
sizes = [300 * v / max(values) + 10 for v in values]
viz.draw_graph(
    G_fb, node_color=values, node_size=sizes, title="Facebook — colored by centrality"
)

In [None]:
# Compare: do the same nodes appear in the top 5 across measures?
top_deg = set(sorted(deg_cent, key=deg_cent.get, reverse=True)[:5])
top_bet = set(sorted(bet_cent, key=bet_cent.get, reverse=True)[:5])
top_clo = set(sorted(clo_cent, key=clo_cent.get, reverse=True)[:5])

print(f"Overlap (degree & betweenness): {top_deg & top_bet}")
print(f"Overlap (degree & closeness):   {top_deg & top_clo}")
print(f"Overlap (all three):            {top_deg & top_bet & top_clo}")

**Low overlap, different roles**: If the same nodes dominated all three rankings, centrality would be a single concept. But the limited overlap reveals that “importance” is multi-dimensional: a node can be a local hub (high degree) without being a global bridge (high betweenness). In the power grid, this distinction is critical — the most-connected substation is not necessarily the one whose failure would split the network.

---
## 7. PageRank & Eigenvector Centrality

The three centrality measures above treat each neighbor equally. But in reality, **being connected to an important node should make you more important**. This recursive idea leads to two powerful measures:

- **Eigenvector centrality**: a node's importance is proportional to the sum of its neighbors' importances. This is the eigenvector of the adjacency matrix corresponding to the largest eigenvalue.
- **PageRank**: Google's algorithm for ranking web pages. It adds a "damping factor" α (typically 0.85) — with probability α, a random surfer follows a link; with probability 1-α, they jump to a random page. This prevents dead-ends and ensures all nodes get a baseline score.

In [None]:
# Compute eigenvector centrality and PageRank on Facebook
eig_cent = nx.eigenvector_centrality(G_fb, max_iter=1000)
pr_cent = nx.pagerank(G_fb, alpha=0.85)

# Top 5 by each
print("Top 5 by Eigenvector Centrality:")
for n in sorted(eig_cent, key=eig_cent.get, reverse=True)[:5]:
    print(f"  Node {n}: {eig_cent[n]:.4f}")

print("\nTop 5 by PageRank:")
for n in sorted(pr_cent, key=pr_cent.get, reverse=True)[:5]:
    print(f"  Node {n}: {pr_cent[n]:.4f}")

In [None]:
# Compare top-5 across all five centrality measures
top5_eig = set(sorted(eig_cent, key=eig_cent.get, reverse=True)[:5])
top5_pr = set(sorted(pr_cent, key=pr_cent.get, reverse=True)[:5])

print("Top-5 comparison across centrality measures:")
print(f"  Degree:       {sorted(top_deg)}")
print(f"  Betweenness:  {sorted(top_bet)}")
print(f"  Closeness:    {sorted(top_clo)}")
print(f"  Eigenvector:  {sorted(top5_eig)}")
print(f"  PageRank:     {sorted(top5_pr)}")
print(f"\nOverlap (Degree & PageRank): {top_deg & top5_pr}")
print(f"Overlap (Degree & Eigenvector): {top_deg & top5_eig}")
print(f"Overlap (all five): {top_deg & top_bet & top_clo & top5_eig & top5_pr}")

**PageRank vs Degree**: PageRank often agrees with degree centrality in undirected networks, but it can differ when a high-degree node’s neighbors are themselves low-degree (inflating degree but not PageRank). The real power of PageRank shows on **directed** networks like the web, where in-links from authoritative pages count far more than in-links from obscure ones.

In [None]:
# Visualize: color nodes by PageRank
pr_values = [pr_cent[n] for n in G_fb.nodes()]
pr_sizes = [3000 * v / max(pr_values) + 10 for v in pr_values]
viz.draw_graph(
    G_fb,
    node_color=pr_values,
    node_size=pr_sizes,
    title="Facebook — colored by PageRank",
)

---
## 8. Degree Correlations & Assortativity

A natural question: **do popular nodes connect to other popular nodes?**

The **assortativity coefficient** *r* measures this:
- *r* > 0 → **assortative**: high-degree nodes connect to high-degree nodes (typical in social networks)
- *r* < 0 → **disassortative**: hubs connect to many low-degree nodes (typical in biological networks)
- *r* ≈ 0 → no degree correlation

The classic visualization is the **average neighbor degree** plot: for each degree value *k*, compute the average degree of neighbors of nodes with degree *k*.

In [None]:
# Average neighbor degree vs node degree for three networks
fig, axes = plt.subplots(1, 3, figsize=(16, 4.5))
networks = [("Facebook", G_fb), ("Power Grid", G_pg), ("Protein", G_pr)]

for ax, (name, G) in zip(axes, networks):
    knn = nx.average_neighbor_degree(G)
    degrees = dict(G.degree())

    # Group by degree, compute mean neighbor degree
    from collections import defaultdict

    deg_to_knn = defaultdict(list)
    for node in G.nodes():
        deg_to_knn[degrees[node]].append(knn[node])

    k_vals = sorted(deg_to_knn.keys())
    knn_means = [np.mean(deg_to_knn[k]) for k in k_vals]

    ax.scatter(k_vals, knn_means, s=25, alpha=0.7, edgecolors="white", linewidth=0.5)
    ax.set_xlabel("Node degree (k)")
    ax.set_ylabel("Avg neighbor degree ⟨k_nn⟩")
    r = nx.degree_assortativity_coefficient(G)
    ax.set_title(f"{name}\nr = {r:.3f}")

fig.suptitle("Average Neighbor Degree vs Node Degree", fontsize=13, fontweight="bold")
fig.tight_layout()
plt.show()

In [None]:
# Compute assortativity coefficient for all three networks
for name, G in [("Facebook", G_fb), ("Power Grid", G_pg), ("Protein", G_pr)]:
    r = nx.degree_assortativity_coefficient(G)
    label = (
        "assortative" if r > 0.05 else ("disassortative" if r < -0.05 else "neutral")
    )
    print(f"{name:12s}: r = {r:+.4f}  ({label})")

**Reading the patterns**:
- **Facebook** (r > 0): the upward trend in ⟨k_nn⟩ means popular people befriend other popular people. This is *assortative mixing* — like attracts like.
- **Power Grid** (r ≈ 0): no clear trend — substations connect based on geography, not degree preference.
- **Protein** (r < 0): the downward trend means hubs connect predominantly to low-degree nodes. This is *disassortative mixing* — hubs serve as connectors between many peripheral proteins.

**Why it matters**: Assortative networks are more robust to targeted attacks (hubs form a resilient core), while disassortative networks fragment more easily when hubs are removed.

---
## 9. Matrix Representations

Every graph has a natural matrix representation. These matrices are the foundation of spectral methods, random walks, and (in more advanced courses) graph neural networks.

Three key matrices:
- **Adjacency matrix (A)**: $A_{ij} = 1$ if nodes $i$ and $j$ are connected, 0 otherwise
- **Degree matrix (D)**: diagonal matrix where $D_{ii} = \deg(i)$
- **Laplacian matrix (L = D - A)**: encodes the graph's "diffusion" structure — how things flow through the network

In [None]:
# Visualize the three matrices for the Karate Club
import seaborn as sns

G_karate = load_graph("karate")

A = nx.to_numpy_array(G_karate)
degrees_k = np.array([G_karate.degree(n) for n in G_karate.nodes()])
D = np.diag(degrees_k)
L = D - A

fig, axes = plt.subplots(1, 3, figsize=(16, 4.5))

for ax, matrix, title, cmap in zip(
    axes,
    [A, D, L],
    ["Adjacency (A)", "Degree (D)", "Laplacian (L = D - A)"],
    ["Blues", "Oranges", "RdBu_r"],
):
    sns.heatmap(
        matrix,
        ax=ax,
        cmap=cmap,
        square=True,
        cbar_kws={"shrink": 0.7},
        xticklabels=False,
        yticklabels=False,
    )
    ax.set_title(title, fontsize=12)

fig.suptitle(
    "Karate Club — Three Matrix Representations", fontsize=13, fontweight="bold"
)
fig.tight_layout()
plt.show()

**Reading the matrices**:
- **Adjacency (A)**: The block structure (visible as darker squares along the diagonal) reflects the two Karate Club factions — members within the same faction are more densely connected. Each row/column is a node; a colored cell means an edge exists.
- **Degree (D)**: A diagonal matrix where the value on the diagonal is the node's degree. Nodes 0 and 33 (the instructor and president) have the largest values — they are the hubs.
- **Laplacian (L = D - A)**: Diagonal entries are positive (node degree), off-diagonal entries are negative (where edges exist). The Laplacian encodes how "different" each node is from its neighbors — it's the discrete analog of the second derivative, and its eigenvalues reveal the graph's connectivity structure.

**Why matrices matter**: Many network algorithms are secretly matrix operations. PageRank is the dominant eigenvector of a modified adjacency matrix. Spectral clustering uses the Laplacian's eigenvectors to find communities. Random walks on graphs correspond to powers of the stochastic matrix $D^{-1}A$. Understanding these representations opens the door to a rich toolbox of linear algebra methods for network analysis.

---
## Summary

| Measure | What it tells you | Facebook | Power Grid | Protein |
|---------|-------------------|----------|------------|--------|
| **Degree distribution** | How connections are spread | Fat-tailed (hubs exist) | Narrow (no hubs) | Fat-tailed (scale-free) |
| **Average path length** | Typical separation | Short (~2) | Longer | — |
| **Clustering coefficient** | Friend-of-friend density | High (0.6+) | Low (0.08) | Moderate |
| **Degree centrality** | Most connected | Hub nodes | Not very useful | Hub proteins |
| **Betweenness centrality** | Bridges between groups | Community connectors | Critical substations | Pathway bottlenecks |
| **Closeness centrality** | Closest to everyone | Central, well-connected | Geographic centers | Central in metabolism |
| **PageRank** | Recursive importance | Similar to degree (undirected) | — | — |
| **Eigenvector centrality** | Important neighbors | Hub-adjacent nodes | — | — |
| **Assortativity** | Do hubs connect to hubs? | Assortative (r > 0) | Neutral (r ≈ 0) | Disassortative (r < 0) |
| **Matrix representations** | A, D, L encode structure | Block structure in A reveals communities | Sparse A, tree-like L | — |

Next week: **Small Worlds** — why is it that everyone is only "six degrees" apart?