# 03. Network Analysis (Weeks 3-5)

## Analyzing the Architecture of Fate

This notebook performs the core network science analysis, mapping directly to the course curriculum.

### Course Concepts Applied
1.  **Week 5: Community Detection (The Bipolar World)**
    -   **Method:** Louvain Algorithm.
    -   **Application:** Quantifying the structural split between Int and Faith.
2.  **Week 4: Centrality (The Tragedy of Utility)**
    -   **Method:** Betweenness Centrality.
    -   **Application:** Identifying "Tragic Hubs" (Merchants).
3.  **Week 3: Network Structure (The Illusion of Choice)**
    -   **Method:** Shortest Paths & Assortativity.
    -   **Application:** Measuring the "cost" of endings and the segregation of items.

In [5]:
import pandas as pd
import networkx as nx
import community.community_louvain as community_louvain # requires: pip install python-louvain
from pathlib import Path
import matplotlib.pyplot as plt

# ==========================================
# 1. CONFIGURATION & SETUP
# ==========================================
# Exact path from your provided code
PROCESSED_DIR = Path("C:/Users/biagu/Documents/GitHub/social_graphs_project/data/processed")

NODES_PATH = PROCESSED_DIR / "nodes_enriched.csv"
EDGES_PATH = PROCESSED_DIR / "edges_enriched.csv"
OUTPUT_PATH = PROCESSED_DIR / "nodes_analyzed.csv"

print(f"Loading data from: {PROCESSED_DIR}")

# ==========================================
# 2. BUILD GRAPH
# ==========================================
nodes_df = pd.read_csv(NODES_PATH)
edges_df = pd.read_csv(EDGES_PATH)

print(f"Nodes loaded: {len(nodes_df)}")
print(f"Edges loaded: {len(edges_df)}")

# Initialize Graph
G = nx.Graph()

# Add Nodes
for _, row in nodes_df.iterrows():
    # We treat 'node_id' as the unique identifier
    G.add_node(row['node_id'], 
               name=row['name'], 
               type=row['node_type'], 
               description=str(row['description']))

# Add Edges
for _, row in edges_df.iterrows():
    G.add_edge(row['source'], row['target'], 
               relation=row['relationship'], 
               type=row['edge_type'])

print(f"Graph constructed: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges.")

# ==========================================
# 3. COMMUNITY DETECTION (The Factions)
# ==========================================
print("\n--- Detecting Communities (Louvain) ---")
# Compute partition
partition = community_louvain.best_partition(G)

# Add 'community' column to the DataFrame
nodes_df['community'] = nodes_df['node_id'].map(partition)

# Analyze Community Sizes
comm_counts = nodes_df['community'].value_counts()
print(f"Total Communities Detected: {len(comm_counts)}")
print("\nTop 10 Largest Communities:")
print(comm_counts.head(10))

# Print sample members of top communities to verify they make sense
print("\n--- Community Membership Preview ---")
for comm_id in comm_counts.head(5).index:
    members = nodes_df[nodes_df['community'] == comm_id]['name'].tolist()
    # Handle cases where names might be missing/NaN
    clean_members = [str(m) for m in members if pd.notna(m)]
    print(f"\nCommunity {comm_id} (Size: {len(members)}):")
    print(f"  Sample Members: {', '.join(clean_members[:8])}...")

# ==========================================
# 4. CENTRALITY ANALYSIS (The "True" Lore Bridges)
# ==========================================
print("\n--- Calculating Centrality (Filtered) ---")

# 1. Define Generic Wiki Headers to Filter Out
# These distort centrality because everything links to "Items" or "Weapons"
generic_headers = [
    "Items", "Weapons", "Armors", "Incantations", "Sorceries", 
    "Talismans", "Ash of War", "Spirit Ashes", "Shields", 
    "Consumables", "Key Items", "Bosses", "NPCs", "Locations",
    "Creatures", "Enemies", "Materials"
]

# 2. Create a temporary view of the graph without these nodes
G_lore = G.copy()
nodes_to_remove = [n for n, data in G_lore.nodes(data=True) if data.get('name') in generic_headers]
G_lore.remove_nodes_from(nodes_to_remove)

print(f"Temporarily removed {len(nodes_to_remove)} generic header nodes for accurate lore analysis.")

# 3. Calculate Betweenness Centrality
betweenness = nx.betweenness_centrality(G_lore)

# 4. Map back to DataFrame (fill NaN with 0 for nodes we removed)
nodes_df['centrality'] = nodes_df['node_id'].map(betweenness).fillna(0)

# 5. Show Top Characters
print("\nTop 15 Most Central Characters/Items (Lore Accurate):")
print(nodes_df[['name', 'node_type', 'centrality']]
      .sort_values('centrality', ascending=False)
      .head(15))

# ==========================================
# 5. EXPORT RESULTS
# ==========================================
# Save the dataframe with 'community' and 'centrality' added
nodes_df.to_csv(OUTPUT_PATH, index=False)
print(f"\n✅ Analysis Complete. Saved results to: {OUTPUT_PATH}")

Loading data from: C:\Users\biagu\Documents\GitHub\social_graphs_project\data\processed
Nodes loaded: 2410
Edges loaded: 4805
Graph constructed: 2410 nodes, 4669 edges.

--- Detecting Communities (Louvain) ---
Total Communities Detected: 543

Top 10 Largest Communities:
community
9      310
6      208
41     205
100    151
126    106
117     97
81      80
32      76
197     76
199     74
Name: count, dtype: int64

--- Community Membership Preview ---

Community 9 (Size: 310):
  Sample Members: Bloody Finger, Grave Glovewort [1], Grave Glovewort [2], Ghost Glovewort [1], Fireproof Dried Liver, Fire Grease, Oil Pot, Remembrance Of The Blasphemous...

Community 6 (Size: 208):
  Sample Members: Memory Of Grace, Smithing Stone [2], Ancient Dragon Smithing Stone, Somber Ancient Dragon Smithing Stone, Rowa Raisin, Pauper's Rune, Pickled Turtle Neck, Gold-pickled Fowl Foot...

Community 41 (Size: 205):
  Sample Members: Armorer's Cookbook [2], Armorer's Cookbook [3], Armorer's Cookbook [4], Ar