# Marvel Social Network  

This project explores the Marvel Universe as a social network.  
Characters are represented as nodes, and their co-appearances in comics form the edges.  

The analysis applies network science techniques to study connectivity,  
identify central characters, and detect community structures among heroes.  

In [33]:
import pandas as pd
import networkx as nx
from pathlib import Path
import community as community_louvain  #from python-louvain
from collections import Counter

## Load dataset

The dataset includes two files:  
- `edges.csv` -> links heroes to the comics they appear in  
- `nodes.csv` -> lists each entity as either a hero or a comic  

Below is a preview of the first few rows to confirm structure.  

In [34]:
# Load raw CSVs
edges = pd.read_csv("../data/edges.csv")
nodes = pd.read_csv("../data/nodes.csv")

# Preview data
display(edges.head())
display(nodes.head())

Unnamed: 0,hero,comic
0,24-HOUR MAN/EMMANUEL,AA2 35
1,3-D MAN/CHARLES CHAN,AVF 4
2,3-D MAN/CHARLES CHAN,AVF 5
3,3-D MAN/CHARLES CHAN,COC 1
4,3-D MAN/CHARLES CHAN,H2 251


Unnamed: 0,node,type
0,2001 10,comic
1,2001 8,comic
2,2001 9,comic
3,24-HOUR MAN/EMMANUEL,hero
4,3-D MAN/CHARLES CHAN,hero


## Build bipartite graph -> project to hero–hero network

The dataset links heroes to the comics they appear in.  
To explore how characters are connected, I first build a bipartite graph (heroes <-> comics).  
This is then projected into a hero–hero network, where edge weights reflect the number of shared comic appearances.  

In [35]:
# Expected schema:
#  - edges: hero, comic
#  - nodes: node, type in {hero, comic}

# Check columns for sanity
print("Edges columns:", list(edges.columns))
print("Nodes columns:", list(nodes.columns))

# Build bipartite graph (heroes <-> comics)
B = nx.Graph()

heroes = set(nodes.loc[nodes["type"].str.lower() == "hero", "node"])
comics = set(nodes.loc[nodes["type"].str.lower() == "comic", "node"])

B.add_nodes_from(heroes, bipartite="hero")
B.add_nodes_from(comics, bipartite="comic")
B.add_edges_from(edges.apply(lambda r: (r["hero"], r["comic"]), axis=1))

print(f"Bipartite graph: {B.number_of_nodes():,} nodes, {B.number_of_edges():,} edges")
print(f"#heroes={len(heroes):,}, #comics={len(comics):,}")

# Projected hero–hero network (edges weighted by number of shared comics)
G = nx.algorithms.bipartite.weighted_projected_graph(B, heroes)

print(f"Hero graph: {G.number_of_nodes():,} nodes, {G.number_of_edges():,} edges")

# Sample edges with weights
list(G.edges(data=True))[:5]

Edges columns: ['hero', 'comic']
Nodes columns: ['node', 'type']
Bipartite graph: 19,091 nodes, 96,104 edges
#heroes=6,439, #comics=12,651
Hero graph: 6,440 nodes, 171,644 edges


[('CADUCEUS', 'GENII/JASON KIMBALL', {'weight': 8}),
 ('CADUCEUS', 'VALKIN', {'weight': 1}),
 ('CADUCEUS', 'HARGEN', {'weight': 1}),
 ('CADUCEUS', 'THOR/DR. DONALD BLAK', {'weight': 1}),
 ('CADUCEUS', 'JUNIPER', {'weight': 2})]

## Compute centrality metrics

On the hero–hero graph `G`, compute:
- **degree_centrality** (unweighted neighbors)
- **strength** (sum of edge weights = shared-comic counts)
- **eigenvector** (weighted)
Results are previewed (top 10 by strength) and saved to `outputs/metrics_v1.csv`.

In [36]:
# Degree centrality (unweighted)
deg = nx.degree_centrality(G)

# Strength = sum of edge weights
strength = {n: G.degree(n, weight="weight") for n in G.nodes()}

# Eigenvector centrality (weighted)
eig = nx.eigenvector_centrality(G, weight="weight", max_iter=300, tol=1e-6)

# Assemble tidy table
metrics = (
    pd.DataFrame({
        "character": list(G.nodes()),
        "degree_centrality": [deg[n] for n in G.nodes()],
        "strength": [strength[n] for n in G.nodes()],
        "eigenvector": [eig[n] for n in G.nodes()],
    })
    .sort_values("strength", ascending=False)
    .reset_index(drop=True)
)

# Preview top 10
display(metrics.head(10))

# Save to outputs/
Path("outputs").mkdir(parents=True, exist_ok=True)
out_csv = "outputs/metrics_v1.csv"
metrics.to_csv(out_csv, index=False)
print(f"Saved: {out_csv}  (rows={len(metrics):,})")

Unnamed: 0,character,degree_centrality,strength,eigenvector
0,CAPTAIN AMERICA,0.298028,16057,0.273361
1,SPIDER-MAN/PETER PARKER,0.272403,13730,0.130145
2,IRON MAN/TONY STARK,0.243205,11997,0.225258
3,THOR/DR. DONALD BLAK,0.204224,11558,0.19953
4,THING/BENJAMIN J. GR,0.22488,10772,0.231308
5,WOLVERINE/LOGAN,0.21463,10480,0.138351
6,HUMAN TORCH/JOHNNY S,0.219289,10377,0.227829
7,SCARLET WITCH/WANDA,0.211524,10168,0.215719
8,MR. FANTASTIC/REED R,0.21991,9886,0.222814
9,VISION,0.193042,9786,0.210616


Saved: outputs/metrics_v1.csv  (rows=6,440)


## Community detection

Detect character communities in the hero–hero network using the Louvain method.
Communities represent clusters of characters who co-appear frequently across comics.

In [37]:
# Run Louvain community detection
partition = community_louvain.best_partition(G, weight="weight")

# Attach community labels to DataFrame
metrics["community"] = metrics["character"].map(partition)

print(f"Detected {len(set(partition.values()))} communities")
display(metrics.head(10))

Detected 52 communities


Unnamed: 0,character,degree_centrality,strength,eigenvector,community
0,CAPTAIN AMERICA,0.298028,16057,0.273361,5
1,SPIDER-MAN/PETER PARKER,0.272403,13730,0.130145,6
2,IRON MAN/TONY STARK,0.243205,11997,0.225258,5
3,THOR/DR. DONALD BLAK,0.204224,11558,0.19953,0
4,THING/BENJAMIN J. GR,0.22488,10772,0.231308,11
5,WOLVERINE/LOGAN,0.21463,10480,0.138351,2
6,HUMAN TORCH/JOHNNY S,0.219289,10377,0.227829,11
7,SCARLET WITCH/WANDA,0.211524,10168,0.215719,5
8,MR. FANTASTIC/REED R,0.21991,9886,0.222814,11
9,VISION,0.193042,9786,0.210616,5


In [38]:
community_sizes = Counter(partition.values())
print("Top 5 communities by size:")
for comm, size in community_sizes.most_common(5):
    print(f"Community {comm}: {size} heroes")

Top 5 communities by size:
Community 2: 1313 heroes
Community 5: 1170 heroes
Community 6: 932 heroes
Community 17: 611 heroes
Community 11: 525 heroes


Note: Louvain community IDs (e.g., 2, 5, 6) are arbitrary labels.  
The sizes above are sorted by number of heroes, not by community ID.