# Week 5: Community Detection — Assignment

**Learning objectives** — In this assignment you will:

- Implement modularity from scratch
- Apply Louvain community detection with different resolutions
- Measure partition quality using purity
- Build community-based node coloring
- Analyze community size distributions

## Grading

| Section | Part | Function | Points |
|---------|------|----------|--------|
| 1 | Modularity | `compute_modularity(G, partition)` | 25 |
| 2 | Louvain Detection | `detect_communities(G, resolution)` | 20 |
| 3 | Partition Purity | `partition_purity(detected, ground_truth)` | 20 |
| 4 | Community Coloring | `color_by_community(G, partition)` | 15 |
| 5 | Size Distribution | `community_size_dist(partition)` | 10 |
| — | Written Questions | — | 10 |
| | **Total** | | **100** |

## Before You Start

This assignment builds on the Week 5 lab. Make sure you are comfortable with:

- **Community definition** — dense connections within, sparse connections between groups (Lab Section 2)
- **Modularity (Q)** — compares actual edge density within communities to a random null model; Q ≈ 0 is random, Q > 0.3 is meaningful (Lab Section 3)
- **Louvain algorithm** — greedy modularity maximization with local moves + aggregation (Lab Section 4)
- **Resolution parameter** — controls community granularity; higher resolution → more, smaller communities (Lab Section 10)

Section 1 asks you to implement modularity **from scratch** — review the formula Q = (1/2m) Σ[A_ij - k_i·k_j/(2m)] δ(c_i, c_j) in the lab before starting.

In [None]:
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
from netsci.loaders import load_graph
from netsci.utils import SEED

In [None]:
G_karate = load_graph("karate")
G_football = load_graph("football")

# Ground truth for karate
gt_karate = [
    {n for n in G_karate.nodes() if G_karate.nodes[n].get("club") == "Mr. Hi"},
    {n for n in G_karate.nodes() if G_karate.nodes[n].get("club") == "Officer"},
]

---
## Section 1: Modularity from Scratch (25 pts)

Implement the modularity formula:

$$Q = \frac{1}{2m} \sum_{ij} \left[ A_{ij} - \frac{k_i k_j}{2m} \right] \delta(c_i, c_j)$$

where:
- $A_{ij}$ is the adjacency matrix entry
- $k_i$ is the degree of node $i$
- $m$ is the total number of edges
- $\delta(c_i, c_j) = 1$ if nodes $i$ and $j$ are in the same community

**Do not use** `nx.community.modularity`.

In [None]:
def compute_modularity(G, partition):
    """Compute modularity of a partition from scratch.

    Parameters
    ----------
    G : nx.Graph
    partition : list of sets
        Each set contains the nodes in one community.

    Returns
    -------
    float
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
_Q = compute_modularity(G_karate, gt_karate)
_Q_nx = nx.community.modularity(G_karate, gt_karate)
assert abs(_Q - _Q_nx) < 1e-6, f"Got {_Q}, expected {_Q_nx}"
print(f"Karate GT modularity: {_Q:.6f} (expected: {_Q_nx:.6f})")

# Also test with Louvain output
_louv = list(nx.community.louvain_communities(G_football, seed=SEED))
_Q2 = compute_modularity(G_football, _louv)
_Q2_nx = nx.community.modularity(G_football, _louv)
assert abs(_Q2 - _Q2_nx) < 1e-6, f"Football: got {_Q2}, expected {_Q2_nx}"
print(f"Football Louvain modularity: {_Q2:.6f}")
print("Section 1 passed!")

---
## Section 2: Community Detection (20 pts)

Wrap `nx.community.louvain_communities` with a configurable resolution parameter.
Return a list of sets. Use `seed=SEED`.

In [None]:
def detect_communities(G, resolution=1.0):
    """Detect communities using the Louvain algorithm.

    Parameters
    ----------
    G : nx.Graph
    resolution : float

    Returns
    -------
    list of sets
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
_comms = detect_communities(G_football, resolution=1.0)
assert isinstance(_comms, list)
assert all(isinstance(c, set) for c in _comms)
# All nodes should be assigned
_all_nodes = set()
for c in _comms:
    _all_nodes |= c
assert _all_nodes == set(G_football.nodes())

_Q = nx.community.modularity(G_football, _comms)
assert _Q > 0.55, f"Football modularity {_Q} too low"
print(f"Football: {len(_comms)} communities, Q = {_Q:.4f}")
print("Section 2 passed!")

---
## Section 3: Partition Purity (20 pts)

**Purity** measures how well detected communities match ground truth.
For each detected community, find the ground-truth label that appears most often.
Purity is the fraction of nodes that are correctly assigned to their majority label.

$$\text{purity} = \frac{1}{N} \sum_{c \in \text{detected}} \max_{t \in \text{truth}} |c \cap t|$$

In [None]:
def partition_purity(detected, ground_truth):
    """Compute purity of detected partition against ground truth.

    Parameters
    ----------
    detected : list of sets
    ground_truth : list of sets

    Returns
    -------
    float (between 0 and 1)
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
# Perfect partition should have purity 1.0
_p_perfect = partition_purity(gt_karate, gt_karate)
assert abs(_p_perfect - 1.0) < 1e-6, (
    f"Perfect partition purity should be 1.0, got {_p_perfect}"
)

# Louvain on karate should have high purity
_louvain_k = list(nx.community.louvain_communities(G_karate, seed=SEED))
_p_louvain = partition_purity(_louvain_k, gt_karate)
assert _p_louvain > 0.8, f"Louvain purity on karate should be > 0.8, got {_p_louvain}"
print(f"Perfect purity: {_p_perfect:.4f}")
print(f"Louvain purity on karate: {_p_louvain:.4f}")
print("Section 3 passed!")

---
## Section 4: Community Coloring (15 pts)

Given a graph and a partition, return a list of integer community labels
in the same order as `G.nodes()`. Community 0 is the first set, community 1 is the second, etc.

Do not use `partition_to_labels` — implement the mapping from scratch.

In [None]:
def color_by_community(G, partition):
    """Map each node to its community index.

    Parameters
    ----------
    G : nx.Graph
    partition : list of sets

    Returns
    -------
    list of int — one per node in G.nodes() order
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
_colors = color_by_community(G_karate, gt_karate)
assert isinstance(_colors, list)
assert len(_colors) == G_karate.number_of_nodes()
assert all(isinstance(c, int) for c in _colors)
# Check that nodes in same community get same color
_nodes = list(G_karate.nodes())
for i, comm in enumerate(gt_karate):
    for n in comm:
        idx = _nodes.index(n)
        assert _colors[idx] == i, f"Node {n} should be community {i}"
print("Section 4 passed!")

---
## Section 5: Community Size Distribution (10 pts)

Return a dictionary mapping community size to the number of communities of that size.

In [None]:
def community_size_dist(partition):
    """Compute the community size distribution.

    Parameters
    ----------
    partition : list of sets

    Returns
    -------
    dict mapping size (int) -> count (int)
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# --- Validation ---
# Karate ground truth has two communities
_dist = community_size_dist(gt_karate)
assert isinstance(_dist, dict)
assert sum(size * count for size, count in _dist.items()) == G_karate.number_of_nodes()
print(f"Karate GT size distribution: {_dist}")

# Football Louvain
_louv_fb = list(nx.community.louvain_communities(G_football, seed=SEED))
_dist_fb = community_size_dist(_louv_fb)
assert (
    sum(size * count for size, count in _dist_fb.items())
    == G_football.number_of_nodes()
)
print(f"Football Louvain size distribution: {_dist_fb}")
print("Section 5 passed!")

---
## Written Questions (10 pts)

### Question 1 (5 pts)

If two runs of Louvain on the same graph both produce modularity Q > 0.35 but assign
some nodes to different communities, are they equally "correct"?
What does this tell you about the nature of community detection?

*Hints to guide your thinking:*
- *Modularity measures partition quality, but multiple partitions can have similar Q values. What does this say about uniqueness?*
- *Think about "bridge" nodes that sit between communities — which community should they belong to?*
- *Consider: is community detection more like classification (one right answer) or clustering (multiple valid groupings)? What are the implications for interpreting results?*

**Your Answer:**



### Question 2 (5 pts)

Imagine running two different community detection algorithms on the GoT network.
One gives 5 communities with Q=0.42, the other gives 12 communities with Q=0.38.
Which partition is "better"? What other factors besides modularity should you consider?

*Hints to guide your thinking:*
- *Q = 0.42 with 5 communities vs Q = 0.38 with 12 communities — is the Q difference (0.04) large or small in the context of typical Q values?*
- *More communities can capture finer structure but may also "overfit" to noise. Fewer communities are simpler but may merge distinct groups.*
- *Beyond modularity, consider: do the detected communities make narrative sense? Can you interpret what each community represents? Are community sizes balanced or heavily skewed?*

**Your Answer:**

