# Week 2: Network Properties — Assignment

**Learning objectives** — In this assignment you will:

- Rank nodes by degree centrality
- Compute betweenness centrality using NetworkX
- Implement local clustering coefficient from scratch
- Implement closeness centrality from scratch
- Measure the correlation between centrality measures

## Grading

| Section | Part | Function | Points |
|---------|------|----------|--------|
| 1 | Degree Centrality | `top_k_by_degree(G, k)` | 10 |
| 2 | Betweenness | `compute_betweenness(G, normalized)` | 15 |
| 3 | Clustering | `local_clustering(G, node)` | 20 |
| 4 | Closeness | `closeness_centrality(G, node)` | 10 |
| 5 | Correlation | `centrality_correlation(G)` | 15 |
| 6 | PageRank | `compute_pagerank(G, alpha, max_iter)` | 10 |
| 7 | Assortativity | `degree_assortativity(G)` | 10 |
| — | Written Questions | — | 10 |
| | **Total** | | **100** |

## Before You Start

This assignment builds on the Week 2 lab. Make sure you are comfortable with:

- **Degree distribution** — how to read linear and log-log plots (Lab Section 2)
- **Clustering coefficient** — the formula C = 2·triangles / k(k-1) and what it measures (Lab Section 4)
- **Three centrality measures** — degree (popularity), betweenness (brokerage), closeness (proximity) and how they can disagree (Lab Section 5)
- **PageRank** — recursive importance via random walks, damping factor α (Lab Section 6)
- **Assortativity** — do hubs connect to hubs? The degree correlation coefficient r (Lab Section 7)

You will implement clustering and closeness **from scratch** — review the formulas in the lab before starting Sections 3-4. Sections 6-7 require implementing PageRank and assortativity from scratch.

In [None]:
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
from netsci.loaders import load_graph
from netsci.utils import SEED

In [None]:
G_karate = load_graph("karate")
G_fb = load_graph("facebook")

---
## Section 1: Degree Centrality (10 pts)

Return the top-*k* nodes by degree as a list of `(node, degree)` tuples, sorted from highest to lowest degree.

In [None]:
def top_k_by_degree(G, k=5):
    """Return the top-k nodes by degree.

    Parameters
    ----------
    G : nx.Graph
    k : int

    Returns
    -------
    list of (node, degree) tuples, sorted descending by degree.
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
_top5 = top_k_by_degree(G_karate, 5)
assert len(_top5) == 5
# Degrees should be descending
assert all(_top5[i][1] >= _top5[i + 1][1] for i in range(len(_top5) - 1))
# Node 33 or 0 should be in top 5 (they are the highest-degree nodes)
_top_nodes = {n for n, d in _top5}
assert 33 in _top_nodes or 0 in _top_nodes
print(f"Top 5 in Karate: {_top5}")
print("Section 1 passed!")

---
## Section 2: Betweenness Centrality (15 pts)

Compute betweenness centrality for all nodes using NetworkX's implementation.
Return a dictionary mapping each node to its betweenness centrality value.

In [None]:
def compute_betweenness(G, normalized=True):
    """Compute betweenness centrality for all nodes.

    Parameters
    ----------
    G : nx.Graph
    normalized : bool, default True
        If True, normalize by 2/((n-1)(n-2)) for undirected graphs.

    Returns
    -------
    dict mapping node -> float
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
_bet = compute_betweenness(G_karate, normalized=True)
_bet_nx = nx.betweenness_centrality(G_karate, normalized=True)
assert isinstance(_bet, dict)
assert len(_bet) == G_karate.number_of_nodes()
for node in G_karate.nodes():
    assert abs(_bet[node] - _bet_nx[node]) < 1e-6, f"Mismatch at node {node}"
print("Section 2 passed!")

---
## Section 3: Local Clustering Coefficient (20 pts)

Implement the **local clustering coefficient** for a single node **from scratch** (do not call `nx.clustering`).

Recall: for a node $v$ with degree $k_v$, the clustering coefficient is:

$$C_v = \frac{2 \times |\text{edges among neighbors of } v|}{k_v (k_v - 1)}$$

If $k_v < 2$, return 0.0 (no triangle is possible).

In [None]:
def local_clustering(G, node):
    """Compute the local clustering coefficient of a node from scratch.

    Parameters
    ----------
    G : nx.Graph
    node : any hashable

    Returns
    -------
    float
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
for node in G_karate.nodes():
    _mine = local_clustering(G_karate, node)
    _nx_val = nx.clustering(G_karate, node)
    assert abs(_mine - _nx_val) < 1e-6, f"Node {node}: got {_mine}, expected {_nx_val}"
print("All 34 nodes match nx.clustering — Section 3 passed!")

---
## Section 4: Closeness Centrality (10 pts)

Implement **closeness centrality** for a single node **from scratch** (do not call `nx.closeness_centrality`).

$$C_C(v) = \frac{n - 1}{\sum_{u \neq v} d(v, u)}$$

where $d(v, u)$ is the shortest path length from $v$ to $u$, and $n$ is the number of nodes.

You may use `nx.shortest_path_length` to get distances.

In [None]:
def closeness_centrality(G, node):
    """Compute closeness centrality for a single node from scratch.

    Parameters
    ----------
    G : nx.Graph
    node : any hashable

    Returns
    -------
    float
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
for node in G_karate.nodes():
    _mine = closeness_centrality(G_karate, node)
    _nx_val = nx.closeness_centrality(G_karate, node)
    assert abs(_mine - _nx_val) < 1e-6, f"Node {node}: got {_mine}, expected {_nx_val}"
print("All 34 nodes match nx.closeness_centrality — Section 4 passed!")

---
## Section 5: Centrality Correlation (15 pts)

Compute the **Pearson correlation** between degree centrality and betweenness centrality across all nodes.

Use `np.corrcoef` or the formula directly. Return a single float.

In [None]:
def centrality_correlation(G):
    """Compute Pearson r between degree and betweenness centrality.

    Parameters
    ----------
    G : nx.Graph

    Returns
    -------
    float  (Pearson correlation coefficient)
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
_r = centrality_correlation(G_karate)
assert isinstance(_r, float)
# Degree and betweenness are positively correlated in most real networks
assert _r > 0.3, f"Expected positive correlation, got {_r}"
assert _r <= 1.0

# Verify against direct computation
_deg = nx.degree_centrality(G_karate)
_bet = nx.betweenness_centrality(G_karate)
_nodes = list(G_karate.nodes())
_d = [_deg[n] for n in _nodes]
_b = [_bet[n] for n in _nodes]
_expected_r = float(np.corrcoef(_d, _b)[0, 1])
assert abs(_r - _expected_r) < 1e-6, f"Got {_r}, expected {_expected_r}"
print(f"Pearson r (degree vs betweenness) = {_r:.4f}")
print("Section 5 passed!")

---
## Section 6: PageRank from Scratch (10 pts)

Implement PageRank using the **power iteration** method:

1. Initialize: every node gets score 1/n
2. Repeat for `max_iter` iterations:
   - For each node v: new_score(v) = (1 - alpha) / n + alpha * sum(score(u) / degree(u) for u in neighbors of v)
3. Return the final scores as a dictionary

The damping factor `alpha` (default 0.85) controls how likely the random surfer is to follow a link vs jump to a random page.

In [None]:
def compute_pagerank(G, alpha=0.85, max_iter=100):
    """Compute PageRank using power iteration.

    Parameters
    ----------
    G : nx.Graph
    alpha : float, default 0.85
        Damping factor.
    max_iter : int, default 100
        Number of power iterations.

    Returns
    -------
    dict mapping node -> float (PageRank score)
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
_pr = compute_pagerank(G_karate, alpha=0.85, max_iter=100)
_pr_nx = nx.pagerank(G_karate, alpha=0.85, max_iter=100)
assert isinstance(_pr, dict)
assert len(_pr) == G_karate.number_of_nodes()
# Check values are close to NetworkX (tolerance for convergence differences)
for node in G_karate.nodes():
    assert abs(_pr[node] - _pr_nx[node]) < 0.005, (
        f"Node {node}: got {_pr[node]:.6f}, expected {_pr_nx[node]:.6f}"
    )
# Sum should be approximately 1
assert abs(sum(_pr.values()) - 1.0) < 0.01, (
    f"Sum = {sum(_pr.values()):.4f}, expected ~1.0"
)
print(f"Top 3 by PageRank: {sorted(_pr, key=_pr.get, reverse=True)[:3]}")
print("Section 6 passed!")

---
## Section 7: Degree Assortativity (10 pts)

Implement the **degree assortativity coefficient** from scratch using Pearson correlation of degrees at edge endpoints.

For each edge (u, v) in an undirected graph, include **both** orderings: (deg(u), deg(v)) and (deg(v), deg(u)). This symmetrization matches Newman's original formula. Then compute the Pearson correlation coefficient between the two degree sequences using `np.corrcoef`.

In [None]:
def degree_assortativity(G):
    """Compute degree assortativity coefficient from scratch.

    Parameters
    ----------
    G : nx.Graph

    Returns
    -------
    float (Pearson correlation of degrees at edge endpoints)
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
_r = degree_assortativity(G_karate)
_r_nx = nx.degree_assortativity_coefficient(G_karate)
assert isinstance(_r, float)
assert abs(_r - _r_nx) < 0.05, f"Got {_r:.4f}, expected {_r_nx:.4f}"

# Test on Facebook too
_r_fb = degree_assortativity(G_fb)
_r_fb_nx = nx.degree_assortativity_coefficient(G_fb)
assert abs(_r_fb - _r_fb_nx) < 0.05, (
    f"Facebook: got {_r_fb:.4f}, expected {_r_fb_nx:.4f}"
)

print(f"Karate:   r = {_r:.4f} (nx: {_r_nx:.4f})")
print(f"Facebook: r = {_r_fb:.4f} (nx: {_r_fb_nx:.4f})")
print("Section 7 passed!")

---
## Written Questions (10 pts)

### Question 1 (5 pts)

In the US power grid, what does it mean for a node to have high **betweenness centrality**?
What would happen to the network if that node (substation) failed?

*Hints to guide your thinking:*
- *Betweenness counts how many shortest paths pass **through** a node. In a sparse, tree-like network like the power grid, what happens when a node on the "trunk" is removed?*
- *Think about the difference between a node with many local connections (high degree) vs. a node that sits on the only path between two regions (high betweenness).*
- *Consider real infrastructure: if a key highway interchange fails, what happens to traffic flow?*

**Your Answer:**



### Question 2 (5 pts)

Can a node have **high degree** but **low betweenness**? Describe a network structure where this happens and explain why.

*Hints to guide your thinking:*
- *Imagine a "star" subgraph: one central node connected to 50 leaf nodes, all of whom also connect to a separate hub. The central node has high degree — but do shortest paths between other nodes need to pass through it?*
- *Betweenness is about being on shortest paths **between other pairs**. A node can be popular (many friends) yet replaceable (all its friends also know each other directly).*
- *Think about the Karate Club: node 33 has the highest degree (17) — does it also have the highest betweenness? Check and explain why or why not.*

**Your Answer:**

