# 02. Graph Construction (Week 2)

## Week 2: Graph Theory & NetworkX

This notebook processes the raw API data into a multimodal graph that captures the structural mechanics of the game.

### Course Concepts Applied
-   **Nodes & Edges:** Defining what constitutes a node and an edge.
-   **Attributes:** Attaching metadata to nodes for later analysis.
-   **Graph Construction:** Building the network structure from tabular data.

### 1. Data for the graph

For the graph, we utilize data from the Elden Ring API and scrapped data from the Elden Ring. 

From the API we get data about:
- items
- weapons
- NPCs 
- locations
- bosses
- armors
- talismans
- incantations

From the Wiki page we get data about:
- armors
- bosses
- npcs
- weapons

In [3]:
import json
import re
import pandas as pd
import networkx as nx
from pathlib import Path
from typing import List, Dict, Set, Tuple
from dataclasses import dataclass, asdict
from itertools import combinations
import os

# ==========================================
# 1. CONFIGURATION & SETUP
# ==========================================

# Absolute paths as confirmed working
project_root = Path("C:/Users/biagu/Documents/GitHub/social_graphs_project")

API_DIR = Path("C:/Users/biagu/Documents/GitHub/social_graphs_project/data/raw")
WIKI_DIR = Path("C:/Users/biagu/Documents/GitHub/social_graphs_project/data/scraped")
PROCESSED_DIR = Path("C:/Users/biagu/Documents/GitHub/social_graphs_project/data/processed")

# Create output directory
PROCESSED_DIR.mkdir(parents=True, exist_ok=True)

print(f"Project Root: {project_root}")
print(f"API Data: {API_DIR}")
print(f"Wiki Data: {WIKI_DIR}")

# Map Wiki filenames to Node Types
WIKI_FILES = {
    "armor.json": "armor",
    "bosses.json": "boss",
    "npcs.json": "npc",
    "weapons.json": "weapon"
}

# API Endpoints
API_ENDPOINTS = [
    "items", "weapons", "npcs", "locations", 
    "bosses", "armors", "talismans", "incantations"
]

@dataclass
class NodeRecord:
    node_id: str
    node_type: str
    name: str
    description: str
    source: str
    metadata: dict

@dataclass
class EdgeRecord:
    source: str
    target: str
    relationship: str
    edge_type: str
    metadata: dict

Project Root: C:\Users\biagu\Documents\GitHub\social_graphs_project
API Data: C:\Users\biagu\Documents\GitHub\social_graphs_project\data\raw
Wiki Data: C:\Users\biagu\Documents\GitHub\social_graphs_project\data\scraped


### 2. Helper Functions & Data Loading
This section defines functions to normalize names, clean text, and safely load JSON files. It also contains the primary logic for merging the structured API data with the unstructured Wiki data into a single NodeRecord format.

In [4]:
# ==========================================
# 2. HELPER FUNCTIONS 
# ==========================================

def normalize_name(name):
    if not name: return None
    # Lowercase, strip, remove special chars
    return name.lower().strip().replace("'", "").replace("-", " ")

def clean_text(value: str) -> str:
    if not value: return ""
    return re.sub(r"\s+", " ", value).strip()

def explode_locations(raw_value: str) -> List[str]:
    """Splits location strings into individual location names."""
    if not raw_value: return []
    if not isinstance(raw_value, str): return []
    
    seps = [",", "/", " and ", " & ", ";"]
    parts = [raw_value]
    for sep in seps:
        nxt = []
        for part in parts:
            nxt.extend(part.split(sep))
        parts = nxt
    return [p.strip() for p in parts if p.strip()]

def load_json_safe(path):
    if not path.exists():
        print(f"  ❌ File not found: {path}")
        return []
    try:
        with open(path, 'r', encoding='utf-8') as f:
            return json.load(f)
    except Exception as e:
        print(f"  ⚠️ Error loading {path.name}: {e}")
        return []


### 3. Nodes and Edge Generation

It contains the primary logic for merging the structured API data with the unstructured Wiki data into a single NodeRecord format.

Here we construct the relationships (Edges) between nodes. We use three specific techniques:

Explicit Edges: Linking items to locations via "Found In" metadata and "Drops".

Shared Locations: Linking characters (Bosses/NPCs) if they inhabit the same location bucket.

Mentions (NLP): Searching descriptions for mentions of other nodes (Lore connections).

In [5]:
# ==========================================
# 3. DATA LOADING & MERGING
# ==========================================

def build_merged_nodes():
    nodes_map = {} 

    print("\n--- Loading API Data ---")
    for endpoint in API_ENDPOINTS:
        data = load_json_safe(API_DIR / f"{endpoint}.json")
        for item in data:
            name = item.get("name")
            if not name: continue
            norm_name = normalize_name(name)
            
            n_type = endpoint[:-1] if endpoint.endswith('s') else endpoint
            nodes_map[norm_name] = NodeRecord(
                node_id=item.get("id") or f"api_{norm_name.replace(' ', '_')}",
                node_type=n_type,
                name=name,
                description=clean_text(item.get("description") or item.get("effect")),
                source="api",
                metadata={k:v for k,v in item.items() if k not in ['id', 'name', 'description', 'effect']}
            )

    print("\n--- Loading Wiki Data ---")
    for filename, n_type in WIKI_FILES.items():
        data = load_json_safe(WIKI_DIR / filename)
        for item in data:
            name = item.get("name")
            if not name: continue
            norm_name = normalize_name(name)
            
            infobox = item.get("infobox", {})
            wiki_desc = clean_text(item.get("description"))
            
            if norm_name in nodes_map:
                existing = nodes_map[norm_name]
                if wiki_desc and wiki_desc not in existing.description:
                    existing.description = (existing.description + " " + wiki_desc).strip()
                existing.metadata.update(infobox)
                existing.metadata['wiki_url'] = item.get("url")
                existing.source = "merged"
            else:
                nodes_map[norm_name] = NodeRecord(
                    node_id=f"wiki_{norm_name.replace(' ', '_')}",
                    node_type=n_type,
                    name=name,
                    description=wiki_desc,
                    source="wiki",
                    metadata=infobox
                )
    
    return list(nodes_map.values())

# ==========================================
# 4. EDGE GENERATION 
# ==========================================

def build_edges(nodes: List[NodeRecord]):
    edges = []
    node_lookup = {normalize_name(n.name): n.node_id for n in nodes}
    location_ids = {n.node_id for n in nodes if n.node_type == 'location'}
    
    print("\n--- Generating Edges ---")
    
    # 1. EXPLICIT EDGES (Location & Drops)
    print("Processing Explicit Edges (Found In / Drops)...")
    for node in nodes:
        # A. Location Edges
        # Check both API 'location' and Wiki 'Location' fields
        loc_str = node.metadata.get("location") or node.metadata.get("Location")
        if loc_str:
            # Use explode_locations to handle lists like "Limgrave, Stormhill"
            possible_locs = explode_locations(str(loc_str))
            for loc_name in possible_locs:
                norm_loc = normalize_name(loc_name)
                # Try direct match
                target_id = node_lookup.get(norm_loc)
                
                if not target_id:
                     for known_loc_name, known_loc_id in node_lookup.items():
                        if known_loc_id in location_ids and known_loc_name in norm_loc:
                            target_id = known_loc_id
                            break
                
                if target_id and target_id in location_ids:
                    edges.append(EdgeRecord(node.node_id, target_id, "found_in", "location", {}))

        # B. Drops Edges
        drops = node.metadata.get("drops") or []
        if isinstance(drops, list):
            for drop_name in drops:
                drop_id = node_lookup.get(normalize_name(drop_name))
                if drop_id:
                    edges.append(EdgeRecord(node.node_id, drop_id, "drops", "drop", {}))

    # 2. SHARED LOCATION EDGES
    # Connects Bosses/NPCs that are in the same location bucket
    print("Processing Shared Location Edges...")
    loc_buckets: Dict[str, List[str]] = {}
    
    for node in nodes:
        # Only group Characters (Bosses, NPCs)
        if node.node_type not in ['boss', 'npc']:
            continue
            
        loc_str = node.metadata.get("location") or node.metadata.get("Location")
        if loc_str:
            for loc in explode_locations(str(loc_str)):
                norm = normalize_name(loc)
                if norm:
                    loc_buckets.setdefault(norm, []).append(node.node_id)

    for loc_name, node_ids in loc_buckets.items():
        # If multiple chars are in this location, connect them
        if len(node_ids) > 1:
            # Sort to ensure unique pairs
            for a, b in combinations(sorted(set(node_ids)), 2):
                edges.append(EdgeRecord(a, b, "share_location", "share_location", {"location": loc_name}))

    # 3. MENTIONS
    # Check if ANY node name appears in ANY node description
    print("Processing Mention Edges (NLP)...")
    
    # Filter short names to avoid noise (e.g., "Map", "Key")
    searchable_names = {name: nid for name, nid in node_lookup.items() if len(name) >= 5}
    
    for node in nodes:
        desc = (node.description or "").lower()
        if not desc: continue
        
        # Check against all searchable names
        for target_name, target_id in searchable_names.items():
            if target_id == node.node_id: continue # No self-loops
            
            if target_name in desc:
                edges.append(EdgeRecord(node.node_id, target_id, "mentions", "mention", {}))

    return edges



### 4. Execution & Export

Runs the pipeline and saves the nodes_enriched.csv and edges_enriched.csv files to the processed directory.

In [6]:
# ==========================================
# 5. EXECUTION
# ==========================================

nodes = build_merged_nodes()

if not nodes:
    print("❌ Critical Error: No nodes loaded. Verify your paths!")
else:
    print(f"Nodes Loaded: {len(nodes)}")
    
    edges = build_edges(nodes)
    print(f"Edges Created: {len(edges)}")

    node_df = pd.DataFrame([asdict(n) for n in nodes])
    edge_df = pd.DataFrame([asdict(e) for e in edges])

    # JSON serialize metadata
    if 'metadata' in node_df.columns:
        node_df['metadata'] = node_df['metadata'].apply(lambda x: json.dumps(x, ensure_ascii=False))
    if 'metadata' in edge_df.columns:
        edge_df['metadata'] = edge_df['metadata'].apply(lambda x: json.dumps(x, ensure_ascii=False))

    node_df.to_csv(PROCESSED_DIR / "nodes_enriched.csv", index=False)
    edge_df.to_csv(PROCESSED_DIR / "edges_enriched.csv", index=False)
    print(f"✅ Saved enriched graph to {PROCESSED_DIR}")


--- Loading API Data ---

--- Loading Wiki Data ---
Nodes Loaded: 2410

--- Generating Edges ---
Processing Explicit Edges (Found In / Drops)...
Processing Shared Location Edges...
Processing Mention Edges (NLP)...
Edges Created: 4805
✅ Saved enriched graph to C:\Users\biagu\Documents\GitHub\social_graphs_project\data\processed
