# Manifold Algebra (v0.7)

## The (reg, zeros) Invariant

All structures use the same universal identifier:

```
content → hash → (reg, zeros) → index
```

This glues together:
- **HLLSet**: Register positions
- **AM**: Row/column indices
- **W**: Row/column indices
- **Sheaf sections**: Cross-layer identifiers

## Unified Processing Model

Every interaction (ingestion OR query) follows the same pipeline:

```
INPUT → HLLSet → Sub-HRT → Extend with Context → Merge → New Current
```

Properties:
- **Sub-structure isolation**: Work on separate instance
- **Idempotent merge**: Same input → same result
- **Eventual consistency**: Parallel changes converge
- **CRDT-like**: Commutative, associative

## Algebraic Operations

| Category | Operations | Description |
|----------|------------|-------------|
| **Projection (π)** | `project_layer`, `project_rows` | Extract substructure |
| **Transform** | `transpose`, `normalize`, `scale` | Transform structure |
| **Composition** | `merge_add`, `compose_chain` | Combine structures |
| **Path** | `reachable_from`, `path_closure` | Path operations |
| **Lift/Lower** | `lift_to_layer`, `lower_aggregate` | Move between layers |

## 1. Imports and Setup

In [1]:
import time
import warnings
import os

# Suppress GPU warnings
os.environ.setdefault("CUDA_VISIBLE_DEVICES", "0")
warnings.filterwarnings("ignore", message=".*cuda capability.*")

# Core imports
from core import (
    # 3D Sparse HRT
    SparseHRT3D,
    Sparse3DConfig,
    SparseAM3D,
    SparseLattice3D,
    BasicHLLSet3D,
    Edge3D,
    HLLSet,
    get_device,
    __version__
)

# Manifold Algebra
from core.manifold_algebra import (
    # Universal ID
    UniversalID,
    content_to_index,
    
    # Sparse Matrices
    SparseMatrix,
    Sparse3DMatrix,
    
    # Projection
    project_layer,
    project_rows,
    project_cols,
    project_submatrix,
    
    # Transform
    transpose,
    transpose_3d,
    normalize_rows,
    normalize_3d,
    scale,
    
    # Filter
    filter_threshold,
    filter_predicate,
    
    # Composition
    merge_add,
    merge_max,
    compose_chain,
    merge_3d_add,
    
    # Path
    reachable_from,
    path_closure,
    
    # Lift/Lower
    lift_to_layer,
    lower_aggregate,
    
    # Cross-structure
    am_to_w,
    w_to_am,
    
    # LUT & Tokens
    START,
    END,
    LookupTable,
    tokenize,
    generate_ntokens,
    
    # Unified Processing
    ProcessingResult,
    input_to_hllset,
    build_sub_hrt,
    extend_with_context,
    merge_hrt,
    unified_process,
    build_w_from_am,
)

print(f"Fractal Manifold Core v{__version__}")
print(f"Device: {get_device()}")

Fractal Manifold Core v0.7.0
Device: cuda


## 2. Configuration

In [2]:
# System configuration
N_GRAM_SIZE = 3
P_BITS = 10
H_BITS = 32

config = Sparse3DConfig(
    p_bits=P_BITS,
    h_bits=H_BITS,
    max_n=N_GRAM_SIZE
)

# Create LUT
lut = LookupTable(config=config)
lut.add_ntoken(START)
lut.add_ntoken(END)

print(f"=== Configuration ===")
print(f"N-gram size: {N_GRAM_SIZE}")
print(f"AM dimension: {config.dimension:,}")
print(f"Shape: {config.shape}")
print()
print(f"LUT initialized:")
print(f"  START index: {lut.ntoken_to_index[START]}")
print(f"  END index: {lut.ntoken_to_index[END]}")

=== Configuration ===
N-gram size: 3
AM dimension: 32,770
Shape: (3, 32770, 32770)

LUT initialized:
  START index: 7891
  END index: 14237


## 3. Universal ID Demonstration

In [3]:
print("=== Universal ID: (reg, zeros) ===")
print()
print("The (reg, zeros) pair is the universal identifier used everywhere:")
print("  - HLLSet register positions")
print("  - AM row/column indices")
print("  - W row/column indices")
print("  - Sheaf section identifiers")
print()

# Demonstrate same content → same ID everywhere
test_contents = ["cat", "dog", "the cat", "cat sat"]

print("Content → UniversalID → Index:")
print("-" * 60)
for content in test_contents:
    uid = UniversalID.from_content(content, layer=0, p_bits=P_BITS, h_bits=H_BITS)
    idx = uid.to_index(config)
    print(f"  '{content}' → {uid} → index={idx}")

print()
print("Same content ALWAYS maps to same index!")
print("This is what enables idempotent merge.")

=== Universal ID: (reg, zeros) ===

The (reg, zeros) pair is the universal identifier used everywhere:
  - HLLSet register positions
  - AM row/column indices
  - W row/column indices
  - Sheaf section identifiers

Content → UniversalID → Index:
------------------------------------------------------------
  'cat' → UID(reg=653, zeros=0, L0) → index=15019
  'dog' → UID(reg=594, zeros=2, L0) → index=13664
  'the cat' → UID(reg=938, zeros=6, L0) → index=21580
  'cat sat' → UID(reg=1008, zeros=1, L0) → index=23185

Same content ALWAYS maps to same index!
This is what enables idempotent merge.


## 4. Corpus and Initial HRT

In [4]:
# Sample corpus
CORPUS = [
    "The cat sat on the mat",
    "The dog ran in the park",
    "A bird flew over the house",
    "The fish swam in the pond",
    "She walked to the store",
    "Stars twinkled in the sky",
    "Rain fell on the ground",
    "The sun rose in the east",
]

print(f"Corpus: {len(CORPUS)} texts")
print()

# Build initial HRT using unified pipeline
print("Building initial HRT via unified pipeline...")
start_time = time.time()

# Start with empty HRT
empty_am = SparseAM3D.from_edges(config, [])
empty_lattice = SparseLattice3D.from_sparse_am(empty_am)
current_hrt = SparseHRT3D(am=empty_am, lattice=empty_lattice, config=config, lut=frozenset(), step=0)
current_W = {}

# Ingest each text (same pipeline as query!)
for text in CORPUS:
    result = unified_process(text, current_hrt, current_W, config, lut, N_GRAM_SIZE)
    current_hrt = result.merged_hrt
    # Rebuild W after each merge
    current_W = build_w_from_am(current_hrt.am, config)

build_time = time.time() - start_time

print(f"\nHRT built in {build_time*1000:.1f}ms")
print(f"  Total edges: {current_hrt.nnz}")
print(f"  LUT n-tokens: {len(lut.ntoken_to_index)}")
print(f"  W entries: {sum(sum(len(cols) for cols in rows.values()) for rows in current_W.values())}")

Corpus: 8 texts

Building initial HRT via unified pipeline...

HRT built in 322.0ms
  Total edges: 111
  LUT n-tokens: 94
  W entries: 111


## 5. Unified Processing Demo

In [5]:
print("═" * 60)
print("UNIFIED PROCESSING: Ingestion = Query")
print("═" * 60)
print()

# Process a query (exact same pipeline as ingestion)
query = "The cat ran in the park"
print(f"Query: '{query}'")
print()

result = unified_process(
    query,
    current_hrt,
    current_W,
    config,
    lut,
    N_GRAM_SIZE
)

print(f"Pipeline steps:")
print(f"  1. INPUT → HLLSet:  cardinality={result.input_hllset.cardinality():.0f}")
print(f"  2. → Sub-HRT:       {result.sub_hrt.nnz} edges")
print(f"  3. → Extend:        +{len(result.context_edges)} context edges")
print(f"  4. → Merge:         {result.merged_hrt.nnz} total edges")
print()
print(f"Delta: +{result.merged_hrt.nnz - current_hrt.nnz} new edges")

════════════════════════════════════════════════════════════
UNIFIED PROCESSING: Ingestion = Query
════════════════════════════════════════════════════════════

Query: 'The cat ran in the park'

Pipeline steps:
  1. INPUT → HLLSet:  cardinality=16
  2. → Sub-HRT:       16 edges
  3. → Extend:        +0 context edges
  4. → Merge:         116 total edges

Delta: +5 new edges


## 6. Algebraic Operations

In [6]:
print("═" * 60)
print("ALGEBRAIC OPERATIONS")
print("═" * 60)
print()

# Convert to algebraic form
AM = Sparse3DMatrix.from_am(current_hrt.am, config)
W = Sparse3DMatrix.from_w(current_W, config)

print(f"Structures:")
print(f"  AM: {AM.nnz} entries across {len(AM.layers)} layers")
print(f"  W:  {W.nnz} entries across {len(W.layers)} layers")
print()

# ─────────────────────────────────────────────────────────────────────────
print("─" * 60)
print("PROJECTION (π)")
print("─" * 60)
for n in range(config.max_n):
    layer = project_layer(AM, n)
    print(f"  π_{n}(AM) = Layer {n}: {layer.nnz} entries ({n+1}-grams)")
print()

# ─────────────────────────────────────────────────────────────────────────
print("─" * 60)
print("TRANSPOSE (T) - Backpropagation")
print("─" * 60)
AM_T = transpose_3d(AM)
print(f"  T(AM) = {AM_T.nnz} entries (reversed direction)")
print(f"  Forward:  START → ... → END")
print(f"  Backward: END → ... → START")
print()

# ─────────────────────────────────────────────────────────────────────────
print("─" * 60)
print("NORMALIZATION (N): AM → W")
print("─" * 60)
W_computed = am_to_w(AM)
print(f"  N(AM) = W: {W_computed.nnz} entries")
print(f"  Each row sums to 1.0 (transition probabilities)")
print()

# ─────────────────────────────────────────────────────────────────────────
print("─" * 60)
print("PATH COMPOSITION (∘)")
print("─" * 60)
layer0 = project_layer(AM, 0)
two_hop = compose_chain(layer0, layer0)
print(f"  AM[0] ∘ AM[0] = {two_hop.nnz} entries (2-hop paths)")
print()

# ─────────────────────────────────────────────────────────────────────────
print("─" * 60)
print("TRANSITIVE CLOSURE (M*)")
print("─" * 60)
closure = path_closure(layer0, max_hops=3)
print(f"  AM[0]* (3 hops) = {closure.nnz} entries")
print(f"  Original: {layer0.nnz} → Closure: {closure.nnz}")
print()

# ─────────────────────────────────────────────────────────────────────────
print("─" * 60)
print("REACHABILITY")
print("─" * 60)
start_idx = lut.get_ntoken_index(START)
if start_idx:
    reach_1 = reachable_from(layer0, {start_idx}, hops=1)
    reach_2 = reachable_from(layer0, {start_idx}, hops=2)
    print(f"  From START:")
    print(f"    1-hop: {len(reach_1)} nodes")
    print(f"    2-hop: {len(reach_2)} nodes")

════════════════════════════════════════════════════════════
ALGEBRAIC OPERATIONS
════════════════════════════════════════════════════════════

Structures:
  AM: 111 entries across 3 layers
  W:  111 entries across 3 layers

────────────────────────────────────────────────────────────
PROJECTION (π)
────────────────────────────────────────────────────────────
  π_0(AM) = Layer 0: 50 entries (1-grams)
  π_1(AM) = Layer 1: 32 entries (2-grams)
  π_2(AM) = Layer 2: 29 entries (3-grams)

────────────────────────────────────────────────────────────
TRANSPOSE (T) - Backpropagation
────────────────────────────────────────────────────────────
  T(AM) = 111 entries (reversed direction)
  Forward:  START → ... → END
  Backward: END → ... → START

────────────────────────────────────────────────────────────
NORMALIZATION (N): AM → W
────────────────────────────────────────────────────────────
  N(AM) = W: 111 entries
  Each row sums to 1.0 (transition probabilities)

─────────────────────────────

## 7. CRDT Properties Verification

In [7]:
print("═" * 60)
print("CRDT PROPERTIES")
print("═" * 60)
print()

# Test idempotence
print("1. IDEMPOTENCE: process(X) twice → same sub-HRT")
r1 = unified_process("hello world", current_hrt, current_W, config, lut)
r2 = unified_process("hello world", current_hrt, current_W, config, lut)
print(f"   Sub-HRT 1: {r1.sub_hrt.nnz} edges")
print(f"   Sub-HRT 2: {r2.sub_hrt.nnz} edges")
print(f"   Same: {r1.sub_hrt.nnz == r2.sub_hrt.nnz}")
print()

# Test commutativity
print("2. COMMUTATIVITY: merge(A, B) = merge(B, A)")
_, _, edges_a = input_to_hllset("cat sat", config, lut)
_, _, edges_b = input_to_hllset("dog ran", config, lut)
sub_a = build_sub_hrt(edges_a, config)
sub_b = build_sub_hrt(edges_b, config)
merged_ab = merge_hrt(sub_a, sub_b, config)
merged_ba = merge_hrt(sub_b, sub_a, config)
print(f"   merge(A, B): {merged_ab.nnz} edges")
print(f"   merge(B, A): {merged_ba.nnz} edges")
print(f"   Commutative: {merged_ab.nnz == merged_ba.nnz}")
print()

# Test associativity
print("3. ASSOCIATIVITY: merge(merge(A,B), C) = merge(A, merge(B,C))")
_, _, edges_c = input_to_hllset("bird flew", config, lut)
sub_c = build_sub_hrt(edges_c, config)
merged_abc_1 = merge_hrt(merge_hrt(sub_a, sub_b, config), sub_c, config)
merged_abc_2 = merge_hrt(sub_a, merge_hrt(sub_b, sub_c, config), config)
print(f"   (A+B)+C: {merged_abc_1.nnz} edges")
print(f"   A+(B+C): {merged_abc_2.nnz} edges")
print(f"   Associative: {merged_abc_1.nnz == merged_abc_2.nnz}")
print()

print("═" * 60)
print("All CRDT properties verified!")
print("This guarantees eventual consistency.")
print("═" * 60)

════════════════════════════════════════════════════════════
CRDT PROPERTIES
════════════════════════════════════════════════════════════

1. IDEMPOTENCE: process(X) twice → same sub-HRT
   Sub-HRT 1: 4 edges
   Sub-HRT 2: 4 edges
   Same: True

2. COMMUTATIVITY: merge(A, B) = merge(B, A)
   merge(A, B): 8 edges
   merge(B, A): 8 edges
   Commutative: True

3. ASSOCIATIVITY: merge(merge(A,B), C) = merge(A, merge(B,C))
   (A+B)+C: 12 edges
   A+(B+C): 12 edges
   Associative: True

════════════════════════════════════════════════════════════
All CRDT properties verified!
This guarantees eventual consistency.
════════════════════════════════════════════════════════════


## 8. Summary

### Manifold Algebra Key Concepts

**Universal Identifier**: `(reg, zeros)`
- Same content → same index everywhere
- Glues HLLSet, AM, W, Sheaf sections

**Unified Pipeline**: `INPUT → HLLSet → Sub-HRT → Extend → Merge`
- Same for ingestion AND query
- Sub-structure isolation
- Idempotent merge

**CRDT Properties**:
- ✓ Idempotent
- ✓ Commutative  
- ✓ Associative
- → Eventual consistency guaranteed

**Algebraic Operations**:
- Projection: `π_n`, `π_R`, `π_C`
- Transform: `T`, `N`, `S_α`
- Composition: `+`, `∘`
- Path: `reach`, `closure`
- Lift/Lower: `↑`, `↓`