# Week 5: Community Detection

**Learning objectives** — After this lab you should be able to:

- Explain what a community is in a network (dense within, sparse between)
- Motivate community detection with real-world applications
- Compute and interpret modularity
- Apply the Louvain algorithm (bottom-up) to detect communities
- Apply the Girvan-Newman algorithm (top-down) and contrast the two approaches
- Evaluate detected communities against ground truth using NMI
- Visualize communities with both static and interactive tools

People naturally form groups — friend circles, departments, clubs.
In network science, these groups are called **communities**: subsets of nodes that are
densely connected to each other but sparsely connected to the rest of the network.

This week we learn how to **find** communities automatically and **evaluate** how good
the detected groups are.

**Why does community detection matter?** It is one of the most widely applied techniques in network science:
- **Social media**: identifying echo chambers, coordinated inauthentic behavior, and interest groups for content recommendation
- **Biology**: discovering protein complexes and functional modules in protein interaction networks
- **Fraud detection**: finding rings of accounts that transact suspiciously among themselves but rarely with outsiders
- **Epidemiology**: tracing contact clusters during disease outbreaks (connecting to Week 6)
- **Neuroscience**: mapping functional brain regions from connectivity data

In each case, the core question is the same: *which groups of nodes are more tightly connected internally than expected by chance?*

In [None]:
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict
from networkx.algorithms.community import girvan_newman
from sklearn.metrics import normalized_mutual_info_score
from netsci.loaders import load_graph
from netsci.utils import SEED, graph_summary, partition_to_labels
from netsci import viz

---
## 1. Datasets

We use three networks this week:
- **Karate Club** — ground truth factions known
- **College Football** — teams grouped by conference
- **Game of Thrones** — character interactions

This week's three networks each offer a different perspective on community detection:

- **Zachary's Karate Club** (34 nodes) — the "hello world" of community detection. Wayne Zachary observed this club in 1977 and documented the real split into two factions, giving us **ground truth** to validate algorithms against.
- **College Football** (115 nodes) — NCAA Division I-A teams connected by games played. The natural communities are **conferences** (SEC, Big Ten, etc.), providing another ground-truth benchmark.
- **Game of Thrones** (796 nodes) — character interactions from the books. There is **no ground truth** — we must discover communities and interpret them narratively. This is the more realistic scenario.

In [None]:
G_karate = load_graph("karate")
graph_summary(G_karate)
print()
G_football = load_graph("football")
graph_summary(G_football)
print()
G_got = load_graph("got")
graph_summary(G_got)

---
## 2. What are Communities?

Let's start with the Karate Club. We already know the ground-truth split — the club
divided into two factions after a dispute.

In [None]:
# Community concept: clear communities (left) vs random graph (right)
fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))

# Left: two 5-node cliques connected by a single bridge
G_comm = nx.Graph()
G_comm.add_edges_from(
    [(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
)  # clique A
G_comm.add_edges_from(
    [(5, 6), (5, 7), (5, 8), (5, 9), (6, 7), (6, 8), (6, 9), (7, 8), (7, 9), (8, 9)]
)  # clique B
G_comm.add_edge(4, 5)  # single bridge

colors_comm = ["#4878CF"] * 5 + ["#D65F5F"] * 5
pos_comm = nx.spring_layout(G_comm, seed=SEED)
nx.draw_networkx(
    G_comm,
    pos_comm,
    ax=axes[0],
    node_color=colors_comm,
    node_size=250,
    edge_color="#999999",
    width=1.2,
    with_labels=True,
    font_size=9,
    font_color="white",
)
axes[0].set_title("Clear communities\n(dense within, sparse between)", fontsize=11)
axes[0].axis("off")

# Right: random graph with similar density
G_rand = nx.erdos_renyi_graph(10, 0.45, seed=SEED)
pos_rand = nx.spring_layout(G_rand, seed=SEED)
nx.draw_networkx(
    G_rand,
    pos_rand,
    ax=axes[1],
    node_color="#8C8C8C",
    node_size=250,
    edge_color="#999999",
    width=1.2,
    with_labels=True,
    font_size=9,
    font_color="white",
)
axes[1].set_title("No community structure\n(edges spread uniformly)", fontsize=11)
axes[1].axis("off")

fig.suptitle("What makes a community?", fontsize=13)
fig.tight_layout()
plt.show()

In [None]:
# Color by known faction
color_map = []
for node in G_karate.nodes():
    club = G_karate.nodes[node].get("club", "Mr. Hi")
    color_map.append("#D65F5F" if club == "Mr. Hi" else "#4878CF")

viz.draw_graph(
    G_karate, node_color=color_map, title="Karate Club — ground truth factions"
)

Notice: members of the same faction are mostly connected to each other (dense **within**),
with fewer connections between factions (sparse **between**). That's what a community is.

---
## 3. Modularity

**Modularity** (Q) measures the quality of a partition. It compares the fraction of edges
within communities to what you'd expect in a random graph with the same degree sequence.

- Q close to 0: no better than random
- Q close to 1: very strong community structure
- Typical real-world values: 0.3 — 0.7

**Intuition behind the formula**: Modularity compares what you *observe* (actual edge density within communities) to what you'd *expect* by chance (if edges were randomly rewired keeping degrees fixed). A positive Q means the communities have more internal edges than a random baseline — the higher Q, the more "real" the community structure. A random partition gives Q ≈ 0; strong communities yield Q ≈ 0.3-0.7.

**Predict before you run**: The Karate Club has 34 nodes and 78 edges, split into two factions. Given that modularity ranges from -0.5 to ~1.0, where do you expect the ground-truth Q to fall — below 0.3, between 0.3-0.5, or above 0.5?

In [None]:
# Build the ground-truth partition as a list of sets
gt_partition = [
    {n for n in G_karate.nodes() if G_karate.nodes[n].get("club") == "Mr. Hi"},
    {n for n in G_karate.nodes() if G_karate.nodes[n].get("club") == "Officer"},
]

Q_gt = nx.community.modularity(G_karate, gt_partition)
print(f"Modularity of ground-truth partition: {Q_gt:.4f}")

In [None]:
# ---- TWEAK: Try a random partition and see modularity drop ----
rng = np.random.default_rng(SEED)
nodes = list(G_karate.nodes())
rng.shuffle(nodes)
half = len(nodes) // 2
random_partition = [set(nodes[:half]), set(nodes[half:])]

Q_rand = nx.community.modularity(G_karate, random_partition)
print(f"Modularity of random partition: {Q_rand:.4f}")
print(f"Ground truth is {Q_gt / max(Q_rand, 0.001):.1f}x better")

**The gap**: Ground truth partition achieves Q ≈ 0.37 while a random split gives Q near 0. This 0.37 may seem modest, but modularity rarely exceeds 0.7 even for the strongest community structures. The gap between random and ground truth confirms that the Karate Club factions are a real structural feature, not an artifact.

---
## 4. Louvain Algorithm

The **Louvain algorithm** is a fast, greedy method that maximizes modularity.
It works by iteratively moving nodes between communities to increase Q.

**How Louvain works** (simplified):

1. **Local moves** — start with each node in its own community. Greedily move each node to the neighboring community that increases modularity the most. Repeat until no move improves Q.
2. **Aggregation** — collapse each community into a single "super-node," creating a smaller network. Go back to step 1.

This two-phase loop repeats until modularity stops improving. It's fast (near-linear time) and usually finds high-quality partitions, though it's not guaranteed to find the global optimum.

In [None]:
# Louvain step-by-step: initial → first pass → final partition
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
pos_karate = nx.spring_layout(G_karate, seed=SEED)

palette = [
    "#4878CF",
    "#D65F5F",
    "#6ACC65",
    "#B47CC7",
    "#C4AD66",
    "#77BEDB",
    "#E8A06B",
    "#8C8C8C",
]

# Panel 1: Each node in its own community (initial state)
colors_init = [palette[i % len(palette)] for i in range(G_karate.number_of_nodes())]
nx.draw_networkx(
    G_karate,
    pos_karate,
    ax=axes[0],
    node_color=colors_init,
    node_size=120,
    edge_color="#cccccc",
    width=0.5,
    with_labels=True,
    font_size=7,
    font_color="white",
)
axes[0].set_title(
    f"Step 0: Each node = own community\n({G_karate.number_of_nodes()} communities)",
    fontsize=11,
)
axes[0].axis("off")

# Panel 2: First pass (high resolution to get intermediate-like result)
comms_mid = nx.community.louvain_communities(G_karate, resolution=2.0, seed=SEED)
node_to_mid = {}
for i, comm in enumerate(comms_mid):
    for n in comm:
        node_to_mid[n] = i
colors_mid = [palette[node_to_mid[n] % len(palette)] for n in G_karate.nodes()]
nx.draw_networkx(
    G_karate,
    pos_karate,
    ax=axes[1],
    node_color=colors_mid,
    node_size=120,
    edge_color="#cccccc",
    width=0.5,
    with_labels=True,
    font_size=7,
    font_color="white",
)
Q_mid = nx.community.modularity(G_karate, comms_mid)
axes[1].set_title(
    f"After first pass: {len(comms_mid)} communities\nQ = {Q_mid:.3f}", fontsize=11
)
axes[1].axis("off")

# Panel 3: Final partition (default resolution)
comms_final = nx.community.louvain_communities(G_karate, resolution=1.0, seed=SEED)
node_to_final = {}
for i, comm in enumerate(comms_final):
    for n in comm:
        node_to_final[n] = i
colors_final = [palette[node_to_final[n] % len(palette)] for n in G_karate.nodes()]
nx.draw_networkx(
    G_karate,
    pos_karate,
    ax=axes[2],
    node_color=colors_final,
    node_size=120,
    edge_color="#cccccc",
    width=0.5,
    with_labels=True,
    font_size=7,
    font_color="white",
)
Q_final = nx.community.modularity(G_karate, comms_final)
axes[2].set_title(
    f"Final: {len(comms_final)} communities\nQ = {Q_final:.3f}", fontsize=11
)
axes[2].axis("off")

fig.suptitle(
    "Louvain Algorithm: From Individual Nodes to Communities",
    fontsize=13,
    fontweight="bold",
)
fig.tight_layout()
plt.show()

**Watching Louvain converge**: Starting from 34 singleton communities (left), the algorithm greedily merges nodes with their best-fitting neighbors. The intermediate state (center, simulated with high resolution) shows small clusters forming. The final partition (right) finds 3-4 communities that maximize modularity. Each merge step increases Q — the algorithm stops when no further move improves the score.

In [None]:
# Detect communities with Louvain
louvain_karate = nx.community.louvain_communities(G_karate, seed=SEED)
Q_louvain = nx.community.modularity(G_karate, louvain_karate)

print(f"Louvain found {len(louvain_karate)} communities")
print(f"Modularity: {Q_louvain:.4f}")
for i, comm in enumerate(louvain_karate):
    print(f"  Community {i}: {sorted(comm)}")

In [None]:
# Visualize detected communities
node_to_comm = {}
for i, comm in enumerate(louvain_karate):
    for n in comm:
        node_to_comm[n] = i

palette = [
    "#4878CF",
    "#D65F5F",
    "#6ACC65",
    "#B47CC7",
    "#C4AD66",
    "#77BEDB",
    "#E8A06B",
    "#8C8C8C",
]
colors = [palette[node_to_comm[n] % len(palette)] for n in G_karate.nodes()]

viz.draw_graph(G_karate, node_color=colors, title="Karate — Louvain communities")

**Comparing to ground truth**: Louvain typically finds 3-4 communities in the Karate Club, while the actual split was into 2 factions. The extra communities usually correspond to tightly-knit subgroups *within* a faction. Notice that **bridge nodes** (like node 2) sometimes end up in a different community than ground truth — these are the structurally ambiguous members who had connections to both sides.

**Try it yourself**: Find the 3 nodes with highest betweenness centrality in the Karate Club. Are they bridge nodes between the two factions? Check by comparing to the ground-truth faction labels.

In [None]:
# YOUR CODE HERE
bet = nx.betweenness_centrality(G_karate)
top3 = sorted(bet, key=bet.get, reverse=True)[:3]

assert set(top3) == {0, 33, 32}, (
    "Hint: sort nodes by betweenness centrality in descending order"
)
for n in top3:
    faction = G_karate.nodes[n].get("club", "Mr. Hi")
    cross_edges = sum(
        1 for nb in G_karate.neighbors(n) if G_karate.nodes[nb].get("club") != faction
    )
    print(
        f"Node {n}: betweenness={bet[n]:.3f}, faction={faction}, "
        f"cross-faction edges={cross_edges}"
    )

In [None]:
# Run Louvain on football
louvain_fb = nx.community.louvain_communities(G_football, seed=SEED)
Q_fb = nx.community.modularity(G_football, louvain_fb)
print(f"Football: Louvain found {len(louvain_fb)} communities, Q = {Q_fb:.4f}")

# Color by detected community
node_to_comm_fb = {}
for i, comm in enumerate(louvain_fb):
    for n in comm:
        node_to_comm_fb[n] = i

colors_fb = [palette[node_to_comm_fb[n] % len(palette)] for n in G_football.nodes()]
viz.draw_graph(
    G_football, node_color=colors_fb, title="College Football — Louvain communities"
)

---
## 5. Girvan-Newman: Top-Down Community Detection

Louvain works **bottom-up** — it starts with individual nodes and merges them into communities. The **Girvan-Newman algorithm** takes the opposite approach: it starts with the entire network and progressively **removes edges** to reveal communities.

**How Girvan-Newman works**:

1. Compute **edge betweenness** — the number of shortest paths passing through each edge.
2. Remove the edge with highest betweenness (it's the most "bridge-like" edge connecting different communities).
3. Recompute edge betweenness on the remaining graph.
4. Repeat until the desired number of communities emerges.

This top-down approach produces a **dendrogram** — a hierarchy of community splits. It's slower than Louvain (O(m²n) vs near-linear) but provides a complementary perspective: instead of asking "which nodes belong together?" it asks "which edges hold different groups together?"

**Predict before you run**: Girvan-Newman removes the highest-betweenness edge at each step. In the Karate Club, which edge do you think gets removed first — one connecting the two faction leaders (nodes 0 and 33), or one connecting a bridge node to a faction? Think about which edge carries the most shortest paths between the two factions.

In [None]:
# Girvan-Newman on Karate Club

# Run Girvan-Newman and collect the first split into 2 communities
gn_iter = girvan_newman(G_karate)
gn_2 = next(gn_iter)  # first split = 2 communities
gn_karate = [set(c) for c in gn_2]

Q_gn = nx.community.modularity(G_karate, gn_karate)
print(f"Girvan-Newman found {len(gn_karate)} communities, Q = {Q_gn:.4f}")
print(f"Louvain found {len(louvain_karate)} communities, Q = {Q_louvain:.4f}")
print(f"\nGirvan-Newman communities:")
for i, comm in enumerate(gn_karate):
    print(f"  Community {i}: {sorted(comm)}")

In [None]:
# Visualize: Louvain (bottom-up) vs Girvan-Newman (top-down) vs Ground Truth
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
pos_karate = nx.spring_layout(G_karate, seed=SEED)

palette = [
    "#4878CF",
    "#D65F5F",
    "#6ACC65",
    "#B47CC7",
    "#C4AD66",
    "#77BEDB",
    "#E8A06B",
    "#8C8C8C",
]

# Panel 1: Ground truth
colors_gt = []
for node in G_karate.nodes():
    club = G_karate.nodes[node].get("club", "Mr. Hi")
    colors_gt.append("#D65F5F" if club == "Mr. Hi" else "#4878CF")
nx.draw_networkx(
    G_karate,
    pos_karate,
    ax=axes[0],
    node_color=colors_gt,
    node_size=120,
    edge_color="#cccccc",
    width=0.5,
    with_labels=True,
    font_size=7,
    font_color="white",
)
axes[0].set_title(f"Ground truth (2 factions)\nQ = {Q_gt:.3f}", fontsize=11)
axes[0].axis("off")

# Panel 2: Girvan-Newman
node_to_gn = {}
for i, comm in enumerate(gn_karate):
    for n in comm:
        node_to_gn[n] = i
colors_gn = [palette[node_to_gn[n] % len(palette)] for n in G_karate.nodes()]
nx.draw_networkx(
    G_karate,
    pos_karate,
    ax=axes[1],
    node_color=colors_gn,
    node_size=120,
    edge_color="#cccccc",
    width=0.5,
    with_labels=True,
    font_size=7,
    font_color="white",
)
axes[1].set_title(
    f"Girvan-Newman ({len(gn_karate)} communities)\nQ = {Q_gn:.3f}", fontsize=11
)
axes[1].axis("off")

# Panel 3: Louvain
colors_louvain = [palette[node_to_comm[n] % len(palette)] for n in G_karate.nodes()]
nx.draw_networkx(
    G_karate,
    pos_karate,
    ax=axes[2],
    node_color=colors_louvain,
    node_size=120,
    edge_color="#cccccc",
    width=0.5,
    with_labels=True,
    font_size=7,
    font_color="white",
)
axes[2].set_title(
    f"Louvain ({len(louvain_karate)} communities)\nQ = {Q_louvain:.3f}", fontsize=11
)
axes[2].axis("off")

fig.suptitle(
    "Top-Down (Girvan-Newman) vs Bottom-Up (Louvain) vs Ground Truth",
    fontsize=13,
    fontweight="bold",
)
fig.tight_layout()
plt.show()

**Two approaches, two perspectives**: Girvan-Newman's 2-way split closely matches the actual faction structure — it naturally recovers the two factions because it removes the bridge edges connecting them. Louvain finds more communities because it maximizes modularity, which can be higher with finer-grained partitions. Neither answer is "wrong" — they highlight different levels of resolution in the community hierarchy.

**The tradeoff**: Girvan-Newman is more interpretable (you can trace which edges were removed and why) but much slower — O(m²n) vs Louvain's near-linear time. For large networks (>10,000 nodes), Girvan-Newman becomes impractical, but for small networks with ground truth, it often gives cleaner results.

**The sweet spot**: Modularity rises as Girvan-Newman splits the network, peaks at some optimal number of communities, then declines as real communities get fragmented. The peak tells us the "natural" number of communities according to the modularity criterion. Note that the peak may not align with the ground-truth number — modularity is a structural measure, not a social one.

---
## 6. Label Propagation

An alternative: **Label Propagation** assigns each node the label that most of its neighbors have.
It's faster but non-deterministic.

**Non-determinism by design**: Label Propagation works by repeatedly asking each node: "what label do most of my neighbors have?" Ties are broken randomly, which means different runs can produce different partitions. This is not a bug — it reflects genuine ambiguity in the community structure. Nodes with mixed neighborhood labels sit at community boundaries where multiple groupings are equally valid.

**Predict before you run**: Will Label Propagation achieve higher or lower modularity than Louvain? Consider that Louvain explicitly maximizes modularity, while Label Propagation optimizes a different objective (local label consensus).

In [None]:
lpa_karate = list(nx.community.label_propagation_communities(G_karate))
Q_lpa = nx.community.modularity(G_karate, lpa_karate)
print(f"Label Propagation: {len(lpa_karate)} communities, Q = {Q_lpa:.4f}")
print(f"Louvain Q = {Q_louvain:.4f}")
print(f"{'Louvain' if Q_louvain >= Q_lpa else 'LPA'} has higher modularity")

---
## 7. Evaluating Against Ground Truth: NMI

Modularity tells us about partition *quality* but not *accuracy*. When we have ground truth (Karate Club factions, Football conferences), we can directly measure how well a detected partition matches it.

**Normalized Mutual Information (NMI)** is the standard metric:
- **NMI = 1.0**: the detected and true partitions are identical (up to label permutation)
- **NMI = 0.0**: the two partitions share no information — knowing one tells you nothing about the other
- NMI handles different numbers of communities gracefully — Louvain finding 4 communities can still score high if they are clean sub-splits of the 2 true factions

In [None]:
# Ground-truth labels for Karate Club
gt_labels = partition_to_labels(G_karate, gt_partition)

# Compare all three algorithms
louvain_labels = partition_to_labels(G_karate, louvain_karate)
gn_labels = partition_to_labels(G_karate, gn_karate)
lpa_labels = partition_to_labels(G_karate, lpa_karate)

print("Karate Club — NMI vs ground truth:")
print(
    f"  Girvan-Newman (2 comms): NMI = {normalized_mutual_info_score(gt_labels, gn_labels):.3f}"
)
print(
    f"  Louvain ({len(louvain_karate)} comms):       NMI = {normalized_mutual_info_score(gt_labels, louvain_labels):.3f}"
)
print(
    f"  Label Propagation:      NMI = {normalized_mutual_info_score(gt_labels, lpa_labels):.3f}"
)

In [None]:
# NMI on the Football network (ground truth = conferences)
gt_fb_partition = []
conf_dict = nx.get_node_attributes(G_football, "conference")
conf_groups = defaultdict(set)
for node, conf in conf_dict.items():
    conf_groups[conf].add(node)
gt_fb_partition = list(conf_groups.values())

gt_fb_labels = partition_to_labels(G_football, gt_fb_partition)
louvain_fb_labels = partition_to_labels(G_football, louvain_fb)

# Also run Girvan-Newman on football (use best modularity split)
gn_fb_iter = girvan_newman(G_football)
best_gn_fb = None
best_gn_fb_q = -1
for comms in gn_fb_iter:
    comms = [set(c) for c in comms]
    q = nx.community.modularity(G_football, comms)
    if q > best_gn_fb_q:
        best_gn_fb_q = q
        best_gn_fb = comms
    if len(comms) > 20:
        break
gn_fb_labels = partition_to_labels(G_football, best_gn_fb)

print("Football — NMI vs ground truth (conferences):")
print(
    f"  Girvan-Newman ({len(best_gn_fb)} comms): NMI = {normalized_mutual_info_score(gt_fb_labels, gn_fb_labels):.3f}"
)
print(
    f"  Louvain ({len(louvain_fb)} comms):       NMI = {normalized_mutual_info_score(gt_fb_labels, louvain_fb_labels):.3f}"
)

**NMI vs Modularity**: Notice that the algorithm with the highest modularity doesn't always have the highest NMI. Modularity measures structural quality of a partition; NMI measures agreement with external labels. A partition can be structurally excellent (high Q) while disagreeing with human-defined ground truth. This is a fundamental tension in community detection — "structural communities" and "functional communities" don't always coincide.

---
## 8. Interactive Visualization with PyVis

Static plots are useful but for exploring communities interactively — zooming, dragging,
hovering over nodes — we can use PyVis.

In [None]:
viz.draw_pyvis(
    G_football,
    node_color=node_to_comm_fb,
    title="College Football — Louvain Communities",
    filename="football_communities.html",
)

---
## 9. Game of Thrones

Let's find communities in the GoT interaction network. Which characters cluster together?

**Predict before you run**: If you've read the first book (or watched the show), which characters do you expect to cluster together? Think about the major storylines — the Starks in Winterfell, the Lannisters in King's Landing, Daenerys across the sea. Will the algorithm recover these narrative clusters?

In [None]:
louvain_got = nx.community.louvain_communities(G_got, seed=SEED)
Q_got = nx.community.modularity(G_got, louvain_got)
print(f"GoT: {len(louvain_got)} communities, Q = {Q_got:.4f}")

# Show largest communities
sorted_comms = sorted(louvain_got, key=len, reverse=True)
for i, comm in enumerate(sorted_comms[:5]):
    members = sorted(comm)[:8]
    suffix = f"... (+{len(comm) - 8} more)" if len(comm) > 8 else ""
    print(f"  Community {i} ({len(comm)} members): {members}{suffix}")

In [None]:
# Color by community
node_to_comm_got = {}
for i, comm in enumerate(louvain_got):
    for n in comm:
        node_to_comm_got[n] = i

colors_got = [palette[node_to_comm_got[n] % len(palette)] for n in G_got.nodes()]
viz.draw_graph(
    G_got, node_color=colors_got, title="Game of Thrones — Louvain communities"
)

In [None]:
# Interactive — zoom in and explore!
viz.draw_pyvis(
    G_got,
    node_color=node_to_comm_got,
    title="Game of Thrones — Communities",
    filename="got_communities.html",
)

---
## 10. Tweak & Observe: Resolution Parameter

**Predict before you tweak**: The resolution parameter controls the "zoom level" of community detection. Resolution > 1 penalizes large communities, encouraging the algorithm to find more, smaller groups. Resolution < 1 does the opposite — it merges small communities into larger ones.

Before running the sweep below: at resolution = 2.0, will the number of communities roughly double, or change by less than that?

In [None]:
# ---- TWEAK: Change the resolution parameter ----
resolution = 1.0  # <-- change me (try 0.5, 1.0, 1.5, 2.0)

comms = nx.community.louvain_communities(G_football, resolution=resolution, seed=SEED)
Q = nx.community.modularity(G_football, comms)
print(f"Resolution = {resolution}: {len(comms)} communities, Q = {Q:.4f}")
print("Higher resolution → more, smaller communities")

In [None]:
# Sweep resolution values
resolutions = [0.5, 0.75, 1.0, 1.25, 1.5, 2.0]
print(f"{'Resolution':>12s} {'Communities':>12s} {'Modularity':>12s}")
for r in resolutions:
    c = nx.community.louvain_communities(G_football, resolution=r, seed=SEED)
    q = nx.community.modularity(G_football, c)
    print(f"{r:12.2f} {len(c):12d} {q:12.4f}")

**Resolution as a zoom level**: The sweep confirms that higher resolution finds more communities while modularity initially rises (finer partitions can capture more structure) then falls (at some point, splitting real communities hurts more than it helps). There is no single "correct" resolution — the choice depends on whether you want broad groupings (organizational departments) or fine-grained clusters (project teams within departments).

---
## Summary

| Concept | Key insight |
|---------|-------------|
| **Community** | Group of nodes densely connected internally, sparse externally |
| **Applications** | Social media, biology, fraud detection, epidemiology, neuroscience |
| **Modularity (Q)** | Measures partition quality (0 = random, 0.3-0.7 = good) |
| **Louvain** | Bottom-up greedy algorithm that maximizes Q (fast, near-linear) |
| **Girvan-Newman** | Top-down edge removal by betweenness (interpretable, slower) |
| **Label Propagation** | Alternative: faster but non-deterministic |
| **NMI** | Measures agreement between detected and true communities (0 = none, 1 = perfect) |
| **Resolution** | Controls community granularity in Louvain |

Next week: **Network Dynamics** — how epidemics and information spread through networks.