# Measure 4: Social Network Analysis ("The Hub")

## 1. Introduction & Objective
**Objective:** To visualize the social structure of *Anna Karenina* and quantify character interactions using Network Theory metrics.

**Theoretical Framework:**
Tolstoy's novel is structured around two parallel plots that rarely intersect:
1.  **The Society Plot (Anna):** A dense, interconnected web of St. Petersburg/Moscow society.
2.  **The Rural Plot (Levin):** An isolated, philosophical narrative largely removed from the main social centers.

**Methodology:**
* **Nodes:** Characters (sized by **Degree Centrality**).
* **Edges:** Co-occurrence in the same sentence (weighted by frequency).
* **Visualization:** A Force-Directed Graph to cluster social groups naturally.

## 2. Setup & Configuration
Importing necessary libraries for graph calculation (`networkx`), visualization (`matplotlib`), and text processing (`nltk`).

In [None]:
# Install dependencies (if not already installed)
%pip install networkx pandas matplotlib nltk numpy

In [None]:
import os
import itertools
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize

# Ensure NLTK data is downloaded
nltk.download('punkt')
nltk.download('punkt_tab')

# --- PATH CONFIGURATION ---
DATA_DIR = '../data'
RESULTS_DIR = '../results'

if not os.path.exists(RESULTS_DIR):
    os.makedirs(RESULTS_DIR)

# --- TARGET CONFIGURATION ---
# Filtering for the 8 primary characters to maintain graph readability
CONFIG = {
    "filename": "The Project Gutenberg eBook of Anna Karenina, by Leo Tolstoy.txt",
    "characters": ["Anna", "Vronsky", "Levin", "Kitty", "Karenin", "Stiva", "Dolly", "Betsy"]
}

## 3. Data Processing Logic

The following functions handle the text processing pipeline:
1.  **Sentence Segmentation:** Splitting the raw text into distinct sentences.
2.  **Interaction scanning:** Looping through sentences to identify character presence.
3.  **Graph Construction:** If two target characters appear in the same sentence, an edge is created or weighted up.

In [None]:
def load_text(filename):
    """Loads text file from the data directory."""
    filepath = os.path.join(DATA_DIR, filename)
    try:
        with open(filepath, 'r', encoding='utf-8') as f:
            return f.read()
    except FileNotFoundError:
        print(f"ERROR: Could not find {filepath}")
        return ""

def build_graph(text, characters):
    """Builds a weighted undirected graph based on sentence co-occurrence."""
    sentences = sent_tokenize(text)
    G = nx.Graph()
    G.add_nodes_from(characters)
    
    # Case-insensitive mapping
    char_map = {c.lower(): c for c in characters}
    
    print(f"Processing {len(sentences)} sentences...")
    
    for sent in sentences:
        tokens = set(word_tokenize(sent.lower()))
        found = [char_map[c] for c in char_map if c in tokens]
        
        # Create edges for all pairs found in the sentence
        if len(found) > 1:
            for pair in itertools.combinations(found, 2):
                u, v = pair
                if G.has_edge(u, v):
                    G[u][v]['weight'] += 1
                else:
                    G.add_edge(u, v, weight=1)
    return G

## 4. Visualization Engine

This section defines the aesthetic parameters for the graph:
* **Algorithm:** `spring_layout` (Force-directed placement).
* **Styling:** Curved edges for clarity, heatmap coloring for centrality, and outlined text for readability.

In [None]:
def draw_beautiful_network(G):
    """Generates and saves the final high-resolution network graph."""
    plt.figure(figsize=(14, 10), facecolor='white')
    ax = plt.gca()
    
    # 1. Metrics: Calculate Degree Centrality for node sizing
    centrality = nx.degree_centrality(G)
    node_sizes = [v * 8000 + 500 for v in centrality.values()]
    
    # 2. Layout: Spring layout with k=1.5 to increase node spacing
    pos = nx.spring_layout(G, k=1.5, iterations=50, seed=42) 
    
    # 3. Draw Edges: Variable thickness based on interaction weight
    weights = [G[u][v]['weight'] for u, v in G.edges()]
    max_weight = max(weights) if weights else 1
    
    for (u, v, d) in G.edges(data=True):
        width = (d['weight'] / max_weight) * 4 + 0.5
        
        # Use arrowstyle='-' to draw curves without arrowheads, ensuring compatibility
        nx.draw_networkx_edges(G, pos, edgelist=[(u, v)], width=width, alpha=0.3, 
                               edge_color="#555555", 
                               connectionstyle="arc3,rad=0.1", 
                               arrows=True, arrowstyle="-", ax=ax)

    # 4. Draw Nodes: Colored by centrality (Plasma colormap)
    node_colors = list(centrality.values())
    nx.draw_networkx_nodes(G, pos, node_size=node_sizes, node_color=node_colors, 
                                   cmap=plt.cm.plasma, alpha=0.9, edgecolors='white', linewidths=2, ax=ax)
    
    # 5. Draw Labels: With white path effects for contrast
    labels = nx.draw_networkx_labels(G, pos, font_size=12, font_family="sans-serif", font_weight="bold")
    
    import matplotlib.patheffects as path_effects
    for _, label in labels.items():
        label.set_path_effects([path_effects.withStroke(linewidth=3, foreground='white')])

    # 6. Final Formatting & Save
    plt.title("Character Interaction Network: Anna Karenina", fontsize=18, fontweight='bold', pad=20)
    plt.axis('off')
    
    save_path = f"{RESULTS_DIR}/anna_karenina_network_final.png"
    plt.savefig(save_path, dpi=300, bbox_inches='tight')
    print(f"Graph saved successfully to: {save_path}")
    plt.show()

## 5. Main Execution

In [None]:
def run_analysis():
    print("Loading text data...")
    text = load_text(CONFIG['filename'])
    
    if text:
        G = build_graph(text, CONFIG['characters'])
        
        if G.number_of_edges() > 0:
            print("Generating network visualization...")
            draw_beautiful_network(G)
        else:
            print("No interactions found among the specified characters.")
    else:
        print("File not found.")

run_analysis()

## 6. Interpretation of Results

### Visual Analysis
The resulting graph highlights the structural polarity of the novel:

1.  **The Hub (Anna):** Anna appears as the largest node with the highest centrality score. She serves as the structural center of the graph, bridging her husband (Karenin), her lover (Vronsky), and the socialite circle (Betsy).
2.  **The Isolate (Levin):** Levin is visually marginalized, often appearing on the periphery of the graph. The force-directed layout pushes him away from the center because he lacks direct connections to the antagonist figures (Karenin/Vronsky). His only strong tethers are family (Kitty, Dolly, Stiva).
3.  **The Bridge (Stiva):** Stiva Oblonsky acts as the crucial connector. Without his node, the graph would fracture into two disconnected components, mirroring his role in the narrative as the only character welcome in all social circles.

### Future Suggestions
* **Community Detection:** Applying the Louvain Method could mathematically identify the "Family" vs. "Society" clusters.
* **Dynamic Analysis:** Splitting the text by chapter could animate Levin's gradual (though slight) integration into the main plot over time.