# Marvel Social Network  

This project explores the Marvel Universe as a social network.  
Characters are represented as nodes, and their co-appearances in comics form the edges.  

The analysis applies network science techniques to study connectivity,  
identify central characters, and detect community structures among heroes.  

In [18]:
import pandas as pd
import networkx as nx

## Load dataset

The dataset includes two files:  
- `edges.csv` -> links heroes to the comics they appear in  
- `nodes.csv` -> lists each entity as either a hero or a comic  

Below is a preview of the first few rows to confirm structure.  

In [19]:
# Load raw CSVs
edges = pd.read_csv("../data/edges.csv")
nodes = pd.read_csv("../data/nodes.csv")

# Preview data
display(edges.head())
display(nodes.head())

Unnamed: 0,hero,comic
0,24-HOUR MAN/EMMANUEL,AA2 35
1,3-D MAN/CHARLES CHAN,AVF 4
2,3-D MAN/CHARLES CHAN,AVF 5
3,3-D MAN/CHARLES CHAN,COC 1
4,3-D MAN/CHARLES CHAN,H2 251


Unnamed: 0,node,type
0,2001 10,comic
1,2001 8,comic
2,2001 9,comic
3,24-HOUR MAN/EMMANUEL,hero
4,3-D MAN/CHARLES CHAN,hero


## Build bipartite graph -> project to hero–hero network

The dataset links heroes to the comics they appear in.  
To explore how characters are connected, I first build a bipartite graph (heroes <-> comics).  
This is then projected into a hero–hero network, where edge weights reflect the number of shared comic appearances.  

In [20]:
# Expected schema:
#  - edges: hero, comic
#  - nodes: node, type in {hero, comic}

# Check columns for sanity
print("Edges columns:", list(edges.columns))
print("Nodes columns:", list(nodes.columns))

# Build bipartite graph (heroes <-> comics)
B = nx.Graph()

heroes = set(nodes.loc[nodes["type"].str.lower() == "hero", "node"])
comics = set(nodes.loc[nodes["type"].str.lower() == "comic", "node"])

B.add_nodes_from(heroes, bipartite="hero")
B.add_nodes_from(comics, bipartite="comic")
B.add_edges_from(edges.apply(lambda r: (r["hero"], r["comic"]), axis=1))

print(f"Bipartite graph: {B.number_of_nodes():,} nodes, {B.number_of_edges():,} edges")
print(f"#heroes={len(heroes):,}, #comics={len(comics):,}")

# Projected hero–hero network (edges weighted by number of shared comics)
G = nx.algorithms.bipartite.weighted_projected_graph(B, heroes)

print(f"Hero graph: {G.number_of_nodes():,} nodes, {G.number_of_edges():,} edges")

# Sample edges with weights
list(G.edges(data=True))[:5]

Edges columns: ['hero', 'comic']
Nodes columns: ['node', 'type']
Bipartite graph: 19,091 nodes, 96,104 edges
#heroes=6,439, #comics=12,651
Hero graph: 6,440 nodes, 171,644 edges


[('CADUCEUS', 'GENII/JASON KIMBALL', {'weight': 8}),
 ('CADUCEUS', 'VALKIN', {'weight': 1}),
 ('CADUCEUS', 'HARGEN', {'weight': 1}),
 ('CADUCEUS', 'THOR/DR. DONALD BLAK', {'weight': 1}),
 ('CADUCEUS', 'JUNIPER', {'weight': 2})]