# Standards Alignment Demo

This notebook demonstrates the standards alignment system using Gromov-Wasserstein optimal transport.

## Theory Background

The system uses:
- **JMLR 2024**: Entropic GW stability under regularization λ
- **JCGS 2023**: GW consistency guarantees under sampling

Key properties:
- Stability bound ε ≤ C √(λ log(max(n₁, n₂)))
- Consistency under document sampling
- Fused costs combining text + structure

In [None]:
import sys
sys.path.append('..')

import numpy as np
import matplotlib.pyplot as plt
from standards_alignment import load_standards, gw_align, sparse_align
from standards_alignment.graph_ops import DocumentGraph

# Load standards
standards = load_standards()
std_graph = standards['graph']
print(f"Loaded standards version {standards['version']} with checksum {standards['checksum'][:8]}...")

## Create Sample Document

In [None]:
# Create document graph
doc_graph = DocumentGraph()

# Add security section
doc_graph.add_section(
    "sec_security", 
    "Security Requirements",
    "This section defines security and access control requirements."
)

doc_graph.add_paragraph(
    "sec_security",
    "para_auth", 
    "The system must implement robust user authentication using secure credentials and password policies."
)

doc_graph.add_paragraph(
    "sec_security",
    "para_roles",
    "User roles and permissions must be clearly defined with appropriate privilege levels for different user types."
)

doc_graph.add_table(
    "sec_security",
    "table_roles",
    {
        "headers": True,
        "rows": [
            ["Role", "Permissions", "Access Level"],
            ["Admin", "Full System", "High"],
            ["User", "Read/Write", "Medium"],
            ["Guest", "Read Only", "Low"]
        ]
    }
)

# Add performance section
doc_graph.add_section(
    "sec_perf",
    "Performance Requirements", 
    "Performance and scalability requirements."
)

doc_graph.add_paragraph(
    "sec_perf",
    "para_response",
    "System response time must be under 100ms for 95% of requests."
)

print(f"Created document with {len(doc_graph.G.nodes())} nodes and {len(doc_graph.G.edges())} edges")
print("Document nodes:", list(doc_graph.G.nodes()))

## Gromov-Wasserstein Alignment

In [None]:
# Run GW alignment
gw_result = gw_align(std_graph, doc_graph, reg=0.1, max_iter=50)

print("\n=== GW Alignment Results ===")
print(f"Transport plan shape: {gw_result.transport_plan.shape}")
print(f"Cost decomposition: {gw_result.cost_decomposition}")
print(f"Stability bound ε: {gw_result.stability_bound:.4f}")
print(f"Regularization λ: {gw_result.regularization}")

print("\nTop matching pairs:")
for i, (std_idx, doc_idx, weight) in enumerate(gw_result.matching_pairs[:5]):
    std_nodes = list(std_graph.G.nodes())
    doc_nodes = list(doc_graph.G.nodes())
    print(f"{i+1}. {std_nodes[std_idx]} ↔ {doc_nodes[doc_idx]} (weight: {weight:.3f})")

## Sparse Alignment Comparison

In [None]:
# Run sparse alignment for comparison
sparse_result = sparse_align(std_graph, doc_graph)

print("\n=== Sparse Alignment Results ===")
print(f"Cost decomposition: {sparse_result.cost_decomposition}")
print(f"Stability bound: {sparse_result.stability_bound} (no guarantees)")

print("\nTop sparse matching pairs:")
for i, (std_idx, doc_idx, weight) in enumerate(sparse_result.matching_pairs[:5]):
    if weight > 0:
        std_nodes = list(std_graph.G.nodes())
        doc_nodes = list(doc_graph.G.nodes())
        print(f"{i+1}. {std_nodes[std_idx]} ↔ {doc_nodes[doc_idx]} (weight: {weight:.3f})")

## Visualize Transport Plans

In [None]:
# Plot transport plans
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# GW transport plan
im1 = ax1.imshow(gw_result.transport_plan, cmap='Blues', aspect='auto')
ax1.set_title(f'Entropic GW Transport Plan\nλ={gw_result.regularization}, ε≤{gw_result.stability_bound:.3f}')
ax1.set_xlabel('Document Nodes')
ax1.set_ylabel('Standards Nodes')
plt.colorbar(im1, ax=ax1)

# Sparse transport plan
im2 = ax2.imshow(sparse_result.transport_plan, cmap='Reds', aspect='auto')
ax2.set_title('Sparse Bipartite Matching')
ax2.set_xlabel('Document Nodes')
ax2.set_ylabel('Standards Nodes')
plt.colorbar(im2, ax=ax2)

plt.tight_layout()
plt.show()

## Stability Analysis

In [None]:
# Test stability across different regularization values
reg_values = [0.01, 0.05, 0.1, 0.2, 0.5]
stability_bounds = []
gw_costs = []

for reg in reg_values:
    result = gw_align(std_graph, doc_graph, reg=reg, max_iter=30)
    stability_bounds.append(result.stability_bound)
    gw_costs.append(result.cost_decomposition['gw_distance'])

# Plot stability vs regularization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

ax1.plot(reg_values, stability_bounds, 'bo-')
ax1.set_xlabel('Regularization λ')
ax1.set_ylabel('Stability Bound ε')
ax1.set_title('Stability vs Regularization')
ax1.grid(True)

ax2.plot(reg_values, gw_costs, 'ro-')
ax2.set_xlabel('Regularization λ')
ax2.set_ylabel('GW Distance')
ax2.set_title('GW Cost vs Regularization')
ax2.grid(True)

plt.tight_layout()
plt.show()

print("\n=== Stability Analysis ===")
print("λ\t\tε\t\tGW Cost")
for reg, eps, cost in zip(reg_values, stability_bounds, gw_costs):
    print(f"{reg:.2f}\t\t{eps:.4f}\t\t{cost:.4f}")

## Conditions for Stable Matchings

Based on the theoretical foundations:

1. **Entropic Regularization**: λ > 0 ensures unique, stable transport plans
2. **Stability Bound**: ε ≤ C √(λ log(max(n₁, n₂))) where C depends on graph geometry
3. **Consistency**: Under document sampling, GW distances converge with rate O(n^(-1/2))
4. **Fused Costs**: Combining structural + semantic costs improves alignment quality

**Practical Guidelines**:
- Use λ ∈ [0.05, 0.2] for good stability/accuracy tradeoff
- Monitor stability bound ε in logs
- Consider sparse fallback for very large graphs
- Validate on synthetic data with known ground truth