# 1.13b: Lattice Topology Analysis

**Goal:** Analyze the graph structure of any lattice discovered by 1.13a.

## Method

If 1.13a found lattice structure, build an undirected graph to characterize its topology:
- **Nodes**: Unique vectors participating in the lattice
  - Black hole centroids (degenerate vectors with multiple identical tokens)
  - Singletons (unique vectors with lattice neighbors)
- **Edges**: All lattice neighbor relationships
  - Orthogonal: differ by ±1 mantissa in exactly 1 dimension
  - Diagonal: differ by ±1 mantissa in multiple dimensions

Analyze:
1. **Composition**: How many singletons vs black holes?
2. **Connected components**: One structure or multiple disconnected clusters?
3. **Graph properties**: Diameter, density, clustering coefficient
4. **Degree distribution**: Which nodes are hubs? What's the connectivity pattern?

## Use Case

This notebook provides sanity-check analysis of lattice structures found by 1.13a:
- If no structure found (0 nodes), nothing to analyze
- If structure found, characterize its topology to understand what it is
- Compare across different W matrices to identify patterns or anomalies

## Parameters

In [48]:
# Model to analyze (must match 1.13a)
# MODEL_NAME = "Qwen3-4B-Instruct-2507"
# MODEL_NAME = "Qwen2.5-3B-Instruct"
MODEL_NAME = "Lil_Gatsby"

# Input from 1.13a
STRUCTURE_FILE = f"../tensors/{MODEL_NAME}/1.13a_lattice_structure.safetensors"

## Imports

In [49]:
import torch
import numpy as np
from safetensors.torch import load_file
from pathlib import Path
from collections import Counter
import matplotlib.pyplot as plt
import networkx as nx

## Load Data

In [50]:
# Load structure results from 1.13a
data = load_file(STRUCTURE_FILE)

print(f"✓ Loaded structure from {Path(STRUCTURE_FILE).name}")
print()

# Extract data
black_hole_token_ids = data['black_hole_token_ids'].tolist()
orthogonal_edges = data['orthogonal_edges'].tolist()
diagonal_edges = data['diagonal_edges'].tolist()
representative_tokens = data['representative_tokens'].tolist()
inverse_indices = data['inverse_indices']

n_black_hole_tokens = data['n_black_hole_tokens'].item()
n_black_hole_centroids = data['n_black_hole_centroids'].item()

print(f"1.13a results:")
print(f"  Black holes: {n_black_hole_tokens:,} tokens → {n_black_hole_centroids} centroids")
print(f"  Lattice edges: {len(orthogonal_edges):,} orthogonal + {len(diagonal_edges):,} diagonal = {len(orthogonal_edges) + len(diagonal_edges):,} total")
print()

✓ Loaded structure from 1.13a_lattice_structure.safetensors

1.13a results:
  Black holes: 53 tokens → 1 centroids
  Lattice edges: 0 orthogonal + 0 diagonal = 0 total



## Check for Lattice Structure

In [51]:
# Quick check: do we have anything to analyze?
total_edges = len(orthogonal_edges) + len(diagonal_edges)

if total_edges == 0:
    print("="*80)
    print("NO LATTICE STRUCTURE FOUND")
    print("="*80)
    print()
    print("1.13a found no lattice neighbor relationships.")
    print("This W matrix does not contain detectable lattice-scale structure.")
    print()
    print("Possible interpretations:")
    print("  • Embeddings are truly continuous (no bfloat16 quantization artifacts)")
    print("  • Lattice structure exists but is too sparse to detect with current method")
    print("  • Tokens were initialized to random positions without shared structure")
    print()
    print("="*80)
    
    # Stop here - nothing to analyze
    raise SystemExit("No lattice structure to analyze. Stopping notebook execution.")
else:
    print(f"✓ Lattice structure detected: {total_edges:,} edges found")
    print(f"  Proceeding with topology analysis...")
    print()

NO LATTICE STRUCTURE FOUND

1.13a found no lattice neighbor relationships.
This W matrix does not contain detectable lattice-scale structure.

Possible interpretations:
  • Embeddings are truly continuous (no bfloat16 quantization artifacts)
  • Lattice structure exists but is too sparse to detect with current method
  • Tokens were initialized to random positions without shared structure



SystemExit: No lattice structure to analyze. Stopping notebook execution.

## Map Tokens to Unique Vectors

In [None]:
print("="*80)
print("MAPPING TOKENS TO UNIQUE VECTORS")
print("="*80)
print()

# Build mapping: token_id -> unique_vector_id
token_to_unique = {}
for unique_id, token_id in enumerate(representative_tokens):
    token_to_unique[token_id] = unique_id

print(f"Total unique vectors in W: {len(representative_tokens):,}")
print()

# Collect all unique vector IDs that participate in the lattice
lattice_unique_ids = set()

for token_i, token_j in orthogonal_edges + diagonal_edges:
    lattice_unique_ids.add(token_to_unique[token_i])
    lattice_unique_ids.add(token_to_unique[token_j])

# Identify which are black holes vs singletons
black_hole_unique_ids = set()
for bh_token in black_hole_token_ids:
    if bh_token in token_to_unique:  # Should always be true
        black_hole_unique_ids.add(token_to_unique[bh_token])

singleton_unique_ids = lattice_unique_ids - black_hole_unique_ids

n_singletons = len(singleton_unique_ids)
n_bh_centroids_in_lattice = len(black_hole_unique_ids)
n_total_lattice = len(lattice_unique_ids)

pct_singletons = 100 * n_singletons / n_total_lattice if n_total_lattice > 0 else 0
pct_bh = 100 * n_bh_centroids_in_lattice / n_total_lattice if n_total_lattice > 0 else 0

print(f"Lattice composition:")
print(f"  Total: {n_total_lattice} unique vectors")
print(f"  Singletons: {n_singletons} ({pct_singletons:.1f}%)")
print(f"  Black hole centroids: {n_bh_centroids_in_lattice} ({pct_bh:.1f}%)")
print()

# Map edges from token space to unique vector space
unique_edges = set()
for token_i, token_j in orthogonal_edges + diagonal_edges:
    unique_i = token_to_unique[token_i]
    unique_j = token_to_unique[token_j]
    edge = tuple(sorted([unique_i, unique_j]))
    unique_edges.add(edge)

print(f"Unique edges (after deduplication): {len(unique_edges):,}")
print()

## Build Lattice Graph

In [None]:
print("="*80)
print("BUILDING LATTICE GRAPH")
print("="*80)
print()

# Create graph over unique vector IDs
G = nx.Graph()
G.add_nodes_from(lattice_unique_ids)
G.add_edges_from(unique_edges)

print(f"✓ Graph constructed")
print(f"  Nodes: {G.number_of_nodes()} unique vectors")
print(f"  Edges: {G.number_of_edges()} lattice neighbor relationships")
print(f"  Density: {nx.density(G):.4f}")
print()

## Connected Components Analysis

In [None]:
print("="*80)
print("CONNECTED COMPONENTS ANALYSIS")
print("="*80)
print()

components = list(nx.connected_components(G))
n_components = len(components)
components_sorted = sorted(components, key=len, reverse=True)

print(f"Number of connected components: {n_components}")
print()

# Show component sizes
if n_components <= 20:
    print(f"Component sizes:")
    for i, component in enumerate(components_sorted):
        size = len(component)
        pct = 100 * size / G.number_of_nodes()
        print(f"  Component {i+1}: {size:4d} nodes ({pct:5.1f}%)")
        
        # Show members if small enough
        if size <= 10:
            members = sorted(component)
            member_tokens = [representative_tokens[uid] for uid in members]
            print(f"    Representative token IDs: {member_tokens}")
else:
    print(f"Component size distribution (showing top 20):")
    for i in range(min(20, n_components)):
        component = components_sorted[i]
        size = len(component)
        pct = 100 * size / G.number_of_nodes()
        print(f"  Component {i+1}: {size:4d} nodes ({pct:5.1f}%)")
    if n_components > 20:
        print(f"  ... and {n_components - 20} more components")

print()

## Graph Properties

In [None]:
print("="*80)
print("GRAPH PROPERTIES")
print("="*80)
print()

largest_component = max(components, key=len)
G_largest = G.subgraph(largest_component).copy()

print(f"Largest component:")
print(f"  Size: {len(largest_component)} nodes ({100*len(largest_component)/G.number_of_nodes():.1f}%)")
print(f"  Edges: {G_largest.number_of_edges()}")
print(f"  Density: {nx.density(G_largest):.4f}")

# Diameter and path length (if connected)
if nx.is_connected(G_largest):
    diameter = nx.diameter(G_largest)
    avg_path = nx.average_shortest_path_length(G_largest)
    print(f"  Diameter: {diameter} hops")
    print(f"  Average path length: {avg_path:.2f} hops")
else:
    diameter = None
    avg_path = None
    print(f"  Diameter: N/A (not fully connected)")

clustering = nx.average_clustering(G_largest)
print(f"  Average clustering coefficient: {clustering:.4f}")
print()

# Degree statistics
degrees = [G.degree(n) for n in G.nodes()]
print(f"Degree statistics (all nodes):")
print(f"  Min: {min(degrees)}")
print(f"  Max: {max(degrees)}")
print(f"  Mean: {np.mean(degrees):.2f}")
print(f"  Median: {np.median(degrees):.2f}")
print()

# Hub nodes
degree_dict = dict(G.degree())
hubs = sorted(degree_dict.items(), key=lambda x: -x[1])[:10]
print(f"Top 10 hub nodes (by degree):")
for unique_id, degree in hubs:
    token_id = representative_tokens[unique_id]
    node_type = "BH" if unique_id in black_hole_unique_ids else "singleton"
    print(f"  Unique ID {unique_id:5d} (token {token_id:6d}, {node_type:9s}): {degree:3d} neighbors")

print()

## Distance Distribution

In [None]:
if nx.is_connected(G_largest) and len(G_largest) <= 500:
    print("="*80)
    print("DISTANCE DISTRIBUTION")
    print("="*80)
    print()
    
    print("Computing all pairwise shortest paths...")
    all_paths = dict(nx.all_pairs_shortest_path_length(G_largest))
    
    distances = []
    for source in all_paths:
        for target, distance in all_paths[source].items():
            if source < target:
                distances.append(distance)
    
    distance_counts = Counter(distances)
    print(f"\nDistance distribution:")
    for dist in sorted(distance_counts.keys()):
        count = distance_counts[dist]
        pct = 100 * count / len(distances)
        print(f"  Distance {dist}: {count:6,} pairs ({pct:5.1f}%)")
    
    print(f"\nTotal pairs: {len(distances):,}")
    print()
else:
    distance_counts = None
    print("\n(Skipping distance distribution: component too large or not connected)\n")

## Visualizations

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 12), dpi=100)

# Plot 1: Component sizes
ax = axes[0, 0]
component_sizes = [len(c) for c in components_sorted]
if len(component_sizes) > 1:
    sizes_to_plot = component_sizes[:min(30, len(component_sizes))]
    ax.bar(range(1, len(sizes_to_plot)+1), sizes_to_plot, edgecolor='black', alpha=0.7, color='steelblue')
    ax.set_xlabel('Component rank', fontsize=11)
    ax.set_ylabel('Component size (nodes)', fontsize=11)
    ax.set_title(f'Connected Component Sizes\n({n_components} total)', fontsize=12, fontweight='bold')
    ax.grid(axis='y', alpha=0.3)
else:
    info_text = f'Single Connected Component\n\n{G.number_of_nodes()} nodes\n{G.number_of_edges()} edges'
    ax.text(0.5, 0.5, info_text, ha='center', va='center', fontsize=14,
            bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.3))
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_title('Component Structure', fontsize=12, fontweight='bold')

# Plot 2: Degree distribution
ax = axes[0, 1]
ax.hist(degrees, bins=range(0, max(degrees)+2), edgecolor='black', alpha=0.7, color='coral')
ax.set_xlabel('Degree (number of neighbors)', fontsize=11)
ax.set_ylabel('Number of nodes', fontsize=11)
ax.set_title('Degree Distribution', fontsize=12, fontweight='bold')
ax.grid(axis='y', alpha=0.3)

stats_text = f"Mean: {np.mean(degrees):.1f}\nMedian: {np.median(degrees):.1f}\nMax: {max(degrees)}"
ax.text(0.97, 0.97, stats_text, transform=ax.transAxes,
        fontsize=9, verticalalignment='top', horizontalalignment='right',
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# Plot 3: Distance distribution
ax = axes[1, 0]
if distance_counts is not None:
    ax.bar(sorted(distance_counts.keys()), 
           [distance_counts[k] for k in sorted(distance_counts.keys())], 
           edgecolor='black', alpha=0.7, color='mediumseagreen')
    ax.set_xlabel('Shortest path distance (hops)', fontsize=11)
    ax.set_ylabel('Number of node pairs', fontsize=11)
    ax.set_title('Pairwise Distance Distribution', fontsize=12, fontweight='bold')
    ax.grid(axis='y', alpha=0.3)
else:
    ax.text(0.5, 0.5, 'Distance distribution\nskipped\n(too large or not connected)', 
            ha='center', va='center', fontsize=11)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_title('Pairwise Distance Distribution', fontsize=12, fontweight='bold')

# Plot 4: Network visualization
ax = axes[1, 1]
if len(G_largest) <= 200:
    pos = nx.spring_layout(G_largest, k=0.5, iterations=50, seed=42)
    
    # Color nodes by type
    node_colors = ['red' if uid in black_hole_unique_ids else 'lightblue' 
                   for uid in G_largest.nodes()]
    
    nx.draw_networkx_nodes(G_largest, pos, node_size=100, node_color=node_colors, 
                           edgecolors='black', linewidths=0.5, ax=ax)
    nx.draw_networkx_edges(G_largest, pos, alpha=0.3, width=0.5, ax=ax)
    
    # Legend
    from matplotlib.patches import Patch
    legend_elements = [
        Patch(facecolor='red', edgecolor='black', label='Black holes'),
        Patch(facecolor='lightblue', edgecolor='black', label='Singletons')
    ]
    ax.legend(handles=legend_elements, loc='upper right')
    
    ax.set_title(f'Network Layout\n(Largest Component: {len(largest_component)} nodes)', 
                 fontsize=12, fontweight='bold')
    ax.axis('off')
else:
    ax.text(0.5, 0.5, f'Network visualization\nskipped\n({len(G_largest)} nodes too large)', 
            ha='center', va='center', fontsize=11)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_title('Network Layout', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

print("✓ Visualizations complete")

## Summary

In [None]:
print("="*80)
print("SUMMARY")
print("="*80)
print()

print(f"Lattice structure detected:")
print(f"  Unique vectors: {G.number_of_nodes()}")
print(f"    • {n_singletons} singletons ({pct_singletons:.1f}%)")
print(f"    • {n_bh_centroids_in_lattice} black hole centroids ({pct_bh:.1f}%)")
print(f"  Lattice edges: {G.number_of_edges()}")
print(f"  Graph density: {nx.density(G):.4f}")
print()

print(f"Connectivity:")
print(f"  Connected components: {n_components}")
print(f"  Largest component: {len(largest_component)} nodes ({100*len(largest_component)/G.number_of_nodes():.1f}%)")
if diameter is not None:
    print(f"  Diameter: {diameter} hops")
    print(f"  Average path length: {avg_path:.2f} hops")
print()

# Interpretation
print("Interpretation:")
if n_components == 1:
    print(f"  ✓ FULLY CONNECTED: All {G.number_of_nodes()} nodes form a single connected structure")
    if diameter is not None:
        print(f"  • Max distance: {diameter} lattice steps")
        print(f"  • Avg distance: {avg_path:.2f} lattice steps")
    print(f"  • Every node reachable from every other via single-mantissa jumps")
elif len(largest_component) > G.number_of_nodes() * 0.9:
    print(f"  ⚠️  MOSTLY CONNECTED: Giant component ({len(largest_component)} nodes) + {n_components-1} small clusters")
    print(f"  • Most structure is connected, with few outliers")
else:
    print(f"  ⚠️  FRAGMENTED: {n_components} disconnected components")
    print(f"  • Largest: {len(largest_component)} nodes ({100*len(largest_component)/G.number_of_nodes():.1f}%)")
    print(f"  • Structure is split into separate regions")

print()

if n_bh_centroids_in_lattice > 0 and n_singletons > 0:
    print("Composition note:")
    print(f"  • Lattice contains both black holes ({n_bh_centroids_in_lattice}) and singletons ({n_singletons})")
    print(f"  • Ratio: {n_singletons/n_bh_centroids_in_lattice:.1f} singletons per black hole centroid")
elif n_bh_centroids_in_lattice > 0:
    print("Composition note:")
    print(f"  • Lattice contains ONLY black hole centroids (no singletons with neighbors)")
elif n_singletons > 0:
    print("Composition note:")
    print(f"  • Lattice contains ONLY singletons (no black holes detected)")

print()
print("="*80)