# Code Analyzer - Eating Our Own Dog Food

Intelligent Code Analyzer built on Fractal Manifold, analyzing its own codebase.

## Architecture: Double Loop (Action + Reflection)

```text
┌─────────────────────────────────────────────────────────────────────────┐
│                    CYBERNETIC DOUBLE LOOP                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ╔═══════════════════ LOOP 1: ACTION ════════════════════╗              │
│  ║                                                       ║              │
│  ║  SENSE              PROCESS              ACT          ║              │
│  ║  ─────              ───────              ───          ║              │
│  ║  Perceptron ──────► Pipeline ──────────► Actuator ────╫──► OUTPUT    │
│  ║  (input)            (HLLSet→HRT)         (response)   ║              │
│  ║                                                       ║              │
│  ╚═══════════════════════════════════════════════════════╝              │
│                                                    │                    │
│                                           (own output)                  │
│                                                    │                    │
│  ╔═══════════════════ LOOP 2: REFLECTION ════════════════╗              │
│  ║                                                       ║              │
│  ║  OBSERVE            ENCODE               COMMIT       ║              │
│  ║  ───────            ──────               ──────       ║              │
│  ║  Response ────────► HLLSet ────────────► Memory ◄─────╫──┘           │
│  ║  (self-observe)     (self-encode)        (manifold)   ║              │
│  ║                                                       ║              │
│  ╚═══════════════════════════════════════════════════════╝              │
│                                                                         │
│  This is CYBERNETIC AI:                                                 │
│    - System observes its own outputs                                    │
│    - Encodes them into memory                                           │
│    - Future behavior shaped by past behavior                            │
│    - Different from current AI (stateless, no self-reflection)          │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
```

**Loop 1 (Action)**: Perceptron → Pipeline → Actuator → Output
**Loop 2 (Reflection)**: Output → HLLSet → Commit → Memory

The reflection loop makes this **cybernetic** - the system has memory of its own actions.

## 1. Imports and Setup

In [1]:
import os
import json
import time
import warnings
from pathlib import Path
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Callable, Iterator
from abc import ABC, abstractmethod

# Suppress GPU warnings
os.environ.setdefault("CUDA_VISIBLE_DEVICES", "0")
warnings.filterwarnings("ignore", message=".*cuda capability.*")

# Core imports
from core import (
    SparseHRT3D,
    Sparse3DConfig,
    SparseAM3D,
    SparseLattice3D,
    Edge3D,
    HLLSet,
    get_device,
    __version__
)

# Manifold Algebra
from core.manifold_algebra import (
    UniversalID,
    LookupTable,
    START, END,
    ProcessingResult,
    unified_process,
    build_w_from_am,
    input_to_hllset,
    build_sub_hrt,
    merge_hrt,
    Sparse3DMatrix,
    project_layer,
    reachable_from,
)

print(f"Fractal Manifold Code Analyzer")
print(f"Core v{__version__}")
print(f"Device: {get_device()}")

Fractal Manifold Code Analyzer
Core v0.7.0
Device: cuda


## 2. Configuration

In [2]:
# System configuration
N_GRAM_SIZE = 3
P_BITS = 10
H_BITS = 32

config = Sparse3DConfig(
    p_bits=P_BITS,
    h_bits=H_BITS,
    max_n=N_GRAM_SIZE
)

# Project root
PROJECT_ROOT = Path(".").resolve()

print(f"=== Configuration ===")
print(f"Project root: {PROJECT_ROOT}")
print(f"N-gram size: {N_GRAM_SIZE}")
print(f"AM dimension: {config.dimension:,}")

=== Configuration ===
Project root: /home/alexmy/SGS/SGS_lib/fractal_manifold/fractal_manifold
N-gram size: 3
AM dimension: 32,770


## 3. Commit Store

In [3]:
@dataclass
class Commit:
    """A committed HRT state."""
    id: str                    # Commit hash (content-addressed)
    hrt: SparseHRT3D          # The HRT state
    W: Dict                   # Transition matrix
    source: str               # Source file path
    file_type: str            # File type (py, md, etc.)
    timestamp: float          # When committed
    parent_id: Optional[str]  # Previous commit
    
    @property
    def summary(self) -> str:
        return f"Commit({self.id[:8]}): {self.source} [{self.hrt.nnz} edges]"


@dataclass
class CommitStore:
    """Store for committed HRT states."""
    commits: Dict[str, Commit] = field(default_factory=dict)
    head: Optional[str] = None  # Current HEAD commit
    
    def commit(self, hrt: SparseHRT3D, W: Dict, source: str, file_type: str) -> Commit:
        """Commit an HRT state."""
        # Content-addressed ID from AM hash
        commit_id = hrt.am.name[:16]
        
        c = Commit(
            id=commit_id,
            hrt=hrt,
            W=W,
            source=source,
            file_type=file_type,
            timestamp=time.time(),
            parent_id=self.head
        )
        
        self.commits[commit_id] = c
        self.head = commit_id
        return c
    
    def get_head(self) -> Optional[Commit]:
        """Get current HEAD commit."""
        return self.commits.get(self.head) if self.head else None
    
    def history(self) -> List[Commit]:
        """Get commit history from HEAD."""
        result = []
        current = self.head
        while current:
            c = self.commits.get(current)
            if c:
                result.append(c)
                current = c.parent_id
            else:
                break
        return result


# Initialize commit store
store = CommitStore()
print(f"Commit store initialized")

Commit store initialized


## 4. Perceptron Base Class

In [4]:
class Perceptron(ABC):
    """
    Base class for file-type perceptrons.
    
    Each perceptron:
    1. Finds files of its type
    2. Extracts text content
    3. Processes via unified pipeline
    4. Commits after each file
    """
    
    def __init__(self, name: str, extensions: List[str], config: Sparse3DConfig):
        self.name = name
        self.extensions = extensions
        self.config = config
        self.lut = LookupTable(config=config)
        self.lut.add_ntoken(START)
        self.lut.add_ntoken(END)
        self.files_processed = 0
        self.total_tokens = 0
    
    def find_files(self, root: Path, exclude_dirs: set = None) -> Iterator[Path]:
        """Find all files matching extensions."""
        exclude_dirs = exclude_dirs or {'__pycache__', '.git', 'build', '.ipynb_checkpoints', 'deprecated'}
        
        for path in root.rglob('*'):
            if path.is_file() and path.suffix in self.extensions:
                # Skip excluded directories
                if not any(ex in path.parts for ex in exclude_dirs):
                    yield path
    
    @abstractmethod
    def extract_text(self, path: Path) -> str:
        """Extract text content from file."""
        pass
    
    def process_file(
        self, 
        path: Path, 
        current_hrt: SparseHRT3D, 
        current_W: Dict,
        store: CommitStore
    ) -> tuple[SparseHRT3D, Dict, Commit]:
        """Process a single file and commit."""
        # Extract text
        text = self.extract_text(path)
        if not text.strip():
            return current_hrt, current_W, None
        
        # Unified processing
        result = unified_process(
            text, 
            current_hrt, 
            current_W, 
            self.config, 
            self.lut,
            N_GRAM_SIZE
        )
        
        # Update state
        new_hrt = result.merged_hrt
        new_W = build_w_from_am(new_hrt.am, self.config)
        
        # Commit
        relative_path = str(path.relative_to(PROJECT_ROOT))
        commit = store.commit(new_hrt, new_W, relative_path, self.name)
        
        self.files_processed += 1
        self.total_tokens += len(self.lut.ntoken_to_index)
        
        return new_hrt, new_W, commit
    
    def process_all(
        self,
        root: Path,
        current_hrt: SparseHRT3D,
        current_W: Dict,
        store: CommitStore,
        verbose: bool = True,
        max_files: int = None
    ) -> tuple[SparseHRT3D, Dict]:
        """
        Process files of this type.
        
        Args:
            max_files: Limit number of files to process (None = all)
        """
        files = list(self.find_files(root))
        
        # Limit if specified
        if max_files is not None:
            files = files[:max_files]
        
        if verbose:
            total_found = len(list(self.find_files(root))) if max_files else len(files)
            print(f"\n[{self.name}] Processing {len(files)}/{total_found} files")
        
        for path in files:
            try:
                current_hrt, current_W, commit = self.process_file(
                    path, current_hrt, current_W, store
                )
                if verbose and commit:
                    print(f"  ✓ {commit.source} [{current_hrt.nnz} edges]")
            except Exception as e:
                if verbose:
                    print(f"  ✗ {path}: {e}")
        
        return current_hrt, current_W


print("Perceptron base class defined")
print("  max_files parameter added to process_all()")

Perceptron base class defined
  max_files parameter added to process_all()


## 5. File-Type Perceptrons

In [5]:
class PythonPerceptron(Perceptron):
    """Perceptron for .py files."""
    
    def __init__(self, config: Sparse3DConfig):
        super().__init__("p_py", [".py"], config)
    
    def extract_text(self, path: Path) -> str:
        """Extract Python source code."""
        with open(path, 'r', encoding='utf-8', errors='ignore') as f:
            return f.read()


class NotebookPerceptron(Perceptron):
    """Perceptron for .ipynb files."""
    
    def __init__(self, config: Sparse3DConfig):
        super().__init__("p_nb", [".ipynb"], config)
    
    def extract_text(self, path: Path) -> str:
        """Extract code and markdown from notebook cells."""
        with open(path, 'r', encoding='utf-8', errors='ignore') as f:
            nb = json.load(f)
        
        texts = []
        for cell in nb.get('cells', []):
            source = cell.get('source', [])
            if isinstance(source, list):
                texts.append(''.join(source))
            else:
                texts.append(source)
        
        return '\n\n'.join(texts)


class MarkdownPerceptron(Perceptron):
    """Perceptron for .md files."""
    
    def __init__(self, config: Sparse3DConfig):
        super().__init__("p_md", [".md"], config)
    
    def extract_text(self, path: Path) -> str:
        """Extract markdown text."""
        with open(path, 'r', encoding='utf-8', errors='ignore') as f:
            return f.read()


class CPerceptron(Perceptron):
    """Perceptron for .c and .pyx (Cython) files."""
    
    def __init__(self, config: Sparse3DConfig):
        super().__init__("p_c", [".c", ".pyx", ".h"], config)
    
    def extract_text(self, path: Path) -> str:
        """Extract C/Cython source code."""
        with open(path, 'r', encoding='utf-8', errors='ignore') as f:
            return f.read()


class PDFPerceptron(Perceptron):
    """Perceptron for .pdf files (placeholder)."""
    
    def __init__(self, config: Sparse3DConfig):
        super().__init__("p_pdf", [".pdf"], config)
    
    def extract_text(self, path: Path) -> str:
        """Extract text from PDF (requires pypdf or similar)."""
        try:
            from pypdf import PdfReader
            reader = PdfReader(path)
            texts = [page.extract_text() or '' for page in reader.pages]
            return '\n'.join(texts)
        except ImportError:
            # Skip if pypdf not installed
            return ""
        except Exception:
            return ""


# Create perceptrons
perceptrons = {
    'py': PythonPerceptron(config),
    'nb': NotebookPerceptron(config),
    'md': MarkdownPerceptron(config),
    'c': CPerceptron(config),
    'pdf': PDFPerceptron(config),
}

print(f"Created {len(perceptrons)} perceptrons:")
for name, p in perceptrons.items():
    print(f"  {p.name}: {p.extensions}")

Created 5 perceptrons:
  p_py: ['.py']
  p_nb: ['.ipynb']
  p_md: ['.md']
  p_c: ['.c', '.pyx', '.h']
  p_pdf: ['.pdf']


In [6]:
class PromptPerceptron(Perceptron):
    """
    Perceptron for user prompts/queries.
    
    Treats user input exactly like file input:
    - Goes through unified pipeline
    - Gets committed
    - Contributes to manifold (learning from queries!)
    """
    
    def __init__(self, config: Sparse3DConfig):
        super().__init__("p_prompt", [], config)  # No file extensions
        self.prompt_history: List[str] = []
    
    def extract_text(self, path: Path) -> str:
        """Not used - prompts come directly as text."""
        return ""
    
    def process_prompt(
        self,
        prompt: str,
        current_hrt: SparseHRT3D,
        current_W: Dict,
        store: CommitStore
    ) -> tuple[SparseHRT3D, Dict, Commit, ProcessingResult]:
        """
        Process a user prompt and commit.
        
        Returns updated HRT, W, commit, and processing result for query use.
        """
        if not prompt.strip():
            return current_hrt, current_W, None, None
        
        # Track history
        self.prompt_history.append(prompt)
        
        # Unified processing (same as files!)
        result = unified_process(
            prompt,
            current_hrt,
            current_W,
            self.config,
            self.lut,
            N_GRAM_SIZE
        )
        
        # Update state
        new_hrt = result.merged_hrt
        new_W = build_w_from_am(new_hrt.am, self.config)
        
        # Commit with prompt as source
        prompt_id = f"prompt_{len(self.prompt_history)}"
        commit = store.commit(new_hrt, new_W, prompt_id, self.name)
        
        self.files_processed += 1
        
        return new_hrt, new_W, commit, result


# Create prompt perceptron (LUT will be assigned in Initialize Empty HRT cell)
prompt_perceptron = PromptPerceptron(config)

print("PromptPerceptron class defined")
print("  - Processes user queries through unified pipeline")
print("  - Commits each query (manifold learns from interactions)")
print("  - Returns processing result for retrieval")

PromptPerceptron class defined
  - Processes user queries through unified pipeline
  - Commits each query (manifold learns from interactions)
  - Returns processing result for retrieval


## 6. Actuators (Output Actions)

In [7]:
from abc import ABC, abstractmethod
from datetime import datetime


class Actuator(ABC):
    """
    Base class for actuators - turn processed data into action.
    
    Completes the sense-process-act loop:
        Perceptron (sense) → Pipeline (process) → Actuator (act)
    
    Key insight: Actuator output can feed back into the manifold!
    """
    
    def __init__(self, name: str):
        self.name = name
        self.actions_taken = 0
    
    @abstractmethod
    def act(self, commit: Commit, result: ProcessingResult, **kwargs) -> str:
        """
        Perform action based on processed result.
        
        Returns action summary string.
        """
        pass


class LogActuator(Actuator):
    """
    Actuator for file perceptrons - logs ingestion to file.
    """
    
    def __init__(self, log_path: Path = None):
        super().__init__("a_log")
        self.log_path = log_path or Path("ingestion.log")
        self.entries: List[str] = []
    
    def act(self, commit: Commit, result: ProcessingResult, **kwargs) -> str:
        """Log the ingestion event."""
        timestamp = datetime.fromtimestamp(commit.timestamp).isoformat()
        entry = f"[{timestamp}] {commit.file_type}: {commit.source} | edges={commit.hrt.nnz} | commit={commit.id[:8]}\n"
        
        self.entries.append(entry)
        self.actions_taken += 1
        
        # Write to file
        with open(self.log_path, 'a') as f:
            f.write(entry)
        
        return f"Logged: {commit.source}"
    
    def flush(self) -> None:
        """Write all buffered entries to log file."""
        with open(self.log_path, 'w') as f:
            f.writelines(self.entries)


class ResponseActuator(Actuator):
    """
    Actuator for prompt perceptron - generates query response.
    
    FEEDBACK LOOP: The response itself is ingested back into the manifold!
    This creates co-adaptive learning:
        Query → Response → HLLSet → Commit → (shapes future responses)
    """
    
    def __init__(self):
        super().__init__("a_response")
        self.responses: List[Dict] = []
    
    def act(
        self, 
        commit: Commit, 
        result: ProcessingResult, 
        query_results: List[tuple] = None,
        hrt: SparseHRT3D = None,
        W: Dict = None,
        store: CommitStore = None,
        lut: LookupTable = None,
        config: Sparse3DConfig = None,
        ingest_response: bool = True,
        **kwargs
    ) -> tuple[str, SparseHRT3D, Dict]:
        """
        Generate response and optionally ingest it back.
        
        Returns:
            (response_text, updated_hrt, updated_W)
        """
        
        # Build response text
        lines = [
            f"Query: {commit.source}",
            f"Commit: {commit.id[:8]}",
            f"Results ({len(query_results or [])} found):",
        ]
        
        for i, (ntoken, score) in enumerate(query_results or [], 1):
            lines.append(f"  {i:2d}. [{score:5.1f}] {ntoken}")
        
        response_text = "\n".join(lines)
        
        # Track response
        response_record = {
            "timestamp": datetime.fromtimestamp(commit.timestamp).isoformat(),
            "prompt": commit.source,
            "commit_id": commit.id[:8],
            "response": response_text,
            "ingested": False,
        }
        
        new_hrt = hrt
        new_W = W
        
        # FEEDBACK LOOP: Ingest response back into manifold
        if ingest_response and hrt and store and lut and config:
            response_result = unified_process(
                response_text,
                hrt,
                W,
                config,
                lut,
                N_GRAM_SIZE
            )
            
            new_hrt = response_result.merged_hrt
            new_W = build_w_from_am(new_hrt.am, config)
            
            # Commit response as its own entry
            response_id = f"response_{len(self.responses) + 1}"
            store.commit(new_hrt, new_W, response_id, "a_response")
            
            response_record["ingested"] = True
            response_record["response_commit"] = response_id
        
        self.responses.append(response_record)
        self.actions_taken += 1
        
        return response_text, new_hrt, new_W
    
    def history(self) -> List[Dict]:
        """Get response history."""
        return self.responses


class CompositeActuator(Actuator):
    """
    Actuator that chains multiple actuators.
    """
    
    def __init__(self, actuators: List[Actuator]):
        super().__init__("a_composite")
        self.actuators = actuators
    
    def act(self, commit: Commit, result: ProcessingResult, **kwargs) -> str:
        """Run all child actuators."""
        outputs = []
        for actuator in self.actuators:
            output = actuator.act(commit, result, **kwargs)
            if isinstance(output, tuple):
                output = output[0]  # Handle ResponseActuator tuple
            outputs.append(f"[{actuator.name}] {output}")
            self.actions_taken += 1
        return "\n".join(outputs)


# Create actuators
log_actuator = LogActuator(PROJECT_ROOT / "data" / "ingestion.log")
response_actuator = ResponseActuator()

print("Actuators created:")
print(f"  {log_actuator.name}: Logs file ingestion to {log_actuator.log_path}")
print(f"  {response_actuator.name}: Generates query responses AND ingests them back!")
print()
print("Feedback loop:")
print("  Query → HLLSet → Commit → Response → HLLSet → Commit")
print("  (manifold learns from both questions AND answers)")

Actuators created:
  a_log: Logs file ingestion to /home/alexmy/SGS/SGS_lib/fractal_manifold/fractal_manifold/data/ingestion.log
  a_response: Generates query responses AND ingests them back!

Feedback loop:
  Query → HLLSet → Commit → Response → HLLSet → Commit
  (manifold learns from both questions AND answers)


## 6. Initialize Empty HRT

In [8]:
# Create shared LUT (all perceptrons contribute)
shared_lut = LookupTable(config=config)
shared_lut.add_ntoken(START)
shared_lut.add_ntoken(END)

# Update all perceptrons to use shared LUT
for p in perceptrons.values():
    p.lut = shared_lut

# Also set prompt perceptron's LUT
prompt_perceptron.lut = shared_lut

# Initialize empty HRT
empty_am = SparseAM3D.from_edges(config, [])
empty_lattice = SparseLattice3D.from_sparse_am(empty_am)
current_hrt = SparseHRT3D(
    am=empty_am, 
    lattice=empty_lattice, 
    config=config, 
    lut=frozenset(), 
    step=0
)
current_W = {}

print(f"Initial HRT: {current_hrt.nnz} edges")
print(f"Shared LUT: {len(shared_lut.ntoken_to_index)} entries")
print(f"Prompt perceptron ready")

Initial HRT: 0 edges
Shared LUT: 2 entries
Prompt perceptron ready


## 7. Run All Perceptrons

In [9]:
# ═══════════════════════════════════════════════════════════
# CONFIGURATION: Limit files per type for faster testing
# Set to None for full ingestion
# ═══════════════════════════════════════════════════════════
MAX_FILES_PER_TYPE = 3  # Change to None for all files

print("═" * 60)
print("INGESTING PROJECT FILES")
print(f"Max files per type: {MAX_FILES_PER_TYPE or 'ALL'}")
print("═" * 60)

start_time = time.time()

# Process in order: Python first (core), then docs, then notebooks
processing_order = ['py', 'c', 'md', 'nb', 'pdf']

for ptype in processing_order:
    p = perceptrons[ptype]
    current_hrt, current_W = p.process_all(
        PROJECT_ROOT,
        current_hrt,
        current_W,
        store,
        verbose=True,
        max_files=MAX_FILES_PER_TYPE
    )

total_time = time.time() - start_time

print()
print("═" * 60)
print("INGESTION COMPLETE")
print("═" * 60)
print(f"Total time: {total_time:.2f}s")
print(f"Total commits: {len(store.commits)}")
print(f"Total edges: {current_hrt.nnz:,}")
print(f"Total n-tokens: {len(shared_lut.ntoken_to_index):,}")

════════════════════════════════════════════════════════════
INGESTING PROJECT FILES
Max files per type: 3
════════════════════════════════════════════════════════════

[p_py] Processing 3/9670 files
  ✓ main.py [28 edges]
  ✓ setup.py [130 edges]
  ✓ core/__init__.py [2543 edges]

[p_c] Processing 3/9827 files
  ✓ core/hll_core.pyx [8582 edges]
  ✓ core/hll_core.c [152947 edges]
  ✓ .venv/lib/python3.13/site-packages/cuda/ccuda.pyx [153026 edges]

[p_md] Processing 3/23 files
  ✓ README.md [156376 edges]
  ✓ DOCS/DEEPSEEK_DISCUSSION.md [175744 edges]
  ✓ DOCS/VIBE_CODING_MANIFESTO.md [181363 edges]

[p_nb] Processing 3/7 files
  ✓ 0_hllset.ipynb [187601 edges]
  ✓ 1_hrt.ipynb [190917 edges]
  ✓ 2_sparse_hrt.ipynb [194280 edges]

[p_pdf] Processing 3/17 files

════════════════════════════════════════════════════════════
INGESTION COMPLETE
════════════════════════════════════════════════════════════
Total time: 1604.02s
Total commits: 12
Total edges: 194,280
Total n-tokens: 148,974


## 8. Manifold Statistics

In [None]:
print("═" * 60)
print("MANIFOLD STATISTICS")
print("═" * 60)
print()

# Layer statistics
print("Layer breakdown:")
AM = Sparse3DMatrix.from_am(current_hrt.am, config)
for n in range(config.max_n):
    layer = project_layer(AM, n)
    print(f"  Layer {n} ({n+1}-grams): {layer.nnz:,} edges")

print()

# Perceptron statistics
print("Per-perceptron breakdown:")
for ptype in processing_order:
    p = perceptrons[ptype]
    print(f"  {p.name}: {p.files_processed} files")

print()

# Commit history
print("Recent commits:")
for commit in store.history()[:5]:
    print(f"  {commit.summary}")

## 9. Query the Manifold

In [None]:
def query(text: str, top_k: int = 10) -> List[tuple]:
    """
    Query the manifold for related n-tokens.
    
    Returns top-k n-tokens by connectivity to query.
    """
    # Process query through unified pipeline
    result = unified_process(
        text,
        current_hrt,
        current_W,
        config,
        shared_lut,
        N_GRAM_SIZE
    )
    
    # Get indices from query HLLSet
    query_indices = set()
    for edge in result.context_edges:
        query_indices.add(edge.row)
        query_indices.add(edge.col)
    
    # Build AM from current HRT
    AM = Sparse3DMatrix.from_am(current_hrt.am, config)
    
    # Find reachable from query (1-hop neighbors)
    layer0 = project_layer(AM, 0)
    reachable = reachable_from(layer0, query_indices, hops=1)
    
    # Convert layer to dict for scoring
    layer0_dict = layer0.to_dict()
    
    # Score by how many query nodes connect to each
    scores = {}
    for idx in reachable:
        if idx in layer0_dict:
            scores[idx] = sum(layer0_dict[idx].values())
    
    # Get top-k
    top = sorted(scores.items(), key=lambda x: -x[1])[:top_k]
    
    # Resolve to n-tokens
    results = []
    for idx, score in top:
        ntoken = shared_lut.index_to_ntokens.get(idx, f"<idx:{idx}>")
        results.append((ntoken, score))
    
    return results


print("Query function defined")
print()
print("Example queries:")
print('  query("HLLSet")')
print('  query("unified_process")')
print('  query("(reg, zeros)")')

In [None]:
# Test queries
test_queries = [
    "HLLSet",
    "unified process",
    "merge",
    "sparse tensor",
]

print("═" * 60)
print("QUERY RESULTS")
print("═" * 60)

for q in test_queries:
    print(f"\nQuery: '{q}'")
    print("-" * 40)
    results = query(q, top_k=5)
    for ntoken, score in results:
        print(f"  {score:.1f}  {ntoken}")

In [None]:
# Export LUT to JSON for examination
lut_export = {
    "total_entries": len(shared_lut.ntoken_to_index),
    "ntoken_to_index": {
        str(k): v for k, v in shared_lut.ntoken_to_index.items()
    },
    "index_to_ntoken": {
        str(k): str(v) for k, v in shared_lut.index_to_ntokens.items()
    }
}

lut_path = PROJECT_ROOT / "data" / "lut_export.json"
lut_path.parent.mkdir(parents=True, exist_ok=True)

with open(lut_path, 'w', encoding='utf-8') as f:
    json.dump(lut_export, f, indent=2, ensure_ascii=False)

print(f"LUT exported to: {lut_path}")
print(f"Total entries: {len(shared_lut.ntoken_to_index)}")
print()
print("Sample entries (first 20):")
for i, (ntoken, idx) in enumerate(list(shared_lut.ntoken_to_index.items())[:20]):
    print(f"  {idx:5d} → {ntoken}")

## 10. Cascading Disambiguation (Test Implementation)

### Key Concepts

1. **Layer HLLSets**: L0 (1-gram), L1 (2-gram), L2 (3-gram)
2. **Hash-based n-grams**: Higher n-grams store hashes, not tokens
3. **START-HLLSet**: Tokens that follow START symbol
4. **Cascading**: START → 1-gram → 2-gram → 3-gram → decompose

```text
┌─────────────────────────────────────────────────────────────────┐
│                  CASCADING DISAMBIGUATION                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Given HLLSet H:                                                │
│                                                                 │
│  Step 1: Slice by layer                                         │
│          H_0 = H ∩ L0_HLLSet  (1-grams)                         │
│          H_1 = H ∩ L1_HLLSet  (2-grams)                         │
│          H_2 = H ∩ L2_HLLSet  (3-grams)                         │
│                                                                 │
│  Step 2: Find start candidates                                  │
│          S = H_0 ∩ START_HLLSet                                 │
│                                                                 │
│  Step 3: Follow transitions (W)                                 │
│          For each s ∈ S:                                        │
│            2-grams = W[s] ∩ H_1                                 │
│            3-grams = W[2-gram] ∩ H_2                            │
│                                                                 │
│  Step 4: Decompose 3-grams to constituent 1-gram hashes         │
│                                                                 │
│  Step 5: Remove processed (reg,zeros), repeat until H empty     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

In [None]:
# ═══════════════════════════════════════════════════════════════
# SIMULATED DATA FOR TESTING CASCADING DISAMBIGUATION
# ═══════════════════════════════════════════════════════════════

from dataclasses import dataclass
from typing import Set, Tuple, FrozenSet
import hashlib

def sim_hash(token: str, p_bits: int = 10) -> Tuple[int, int]:
    """Simulate hash returning (reg, zeros) - UniversalID."""
    h = hashlib.sha256(token.encode()).digest()
    full_hash = int.from_bytes(h[:8], 'big')
    
    # reg = bucket (first p_bits)
    reg = full_hash >> (64 - p_bits)
    
    # zeros = leading zeros after bucket
    remainder = (full_hash << p_bits) & ((1 << 64) - 1)
    zeros = 64 - p_bits if remainder == 0 else (64 - p_bits - remainder.bit_length())
    
    return (reg, zeros)


@dataclass
class SimHLLSet:
    """
    Simulated HLLSet for testing.
    Stores actual (reg, zeros) pairs as frozenset.
    """
    entries: FrozenSet[Tuple[int, int]]
    
    @classmethod
    def empty(cls) -> 'SimHLLSet':
        return cls(frozenset())
    
    @classmethod
    def from_token(cls, token: str) -> 'SimHLLSet':
        return cls(frozenset([sim_hash(token)]))
    
    @classmethod
    def from_tokens(cls, tokens: list) -> 'SimHLLSet':
        return cls(frozenset(sim_hash(t) for t in tokens))
    
    def add(self, token: str) -> 'SimHLLSet':
        return SimHLLSet(self.entries | {sim_hash(token)})
    
    def union(self, other: 'SimHLLSet') -> 'SimHLLSet':
        return SimHLLSet(self.entries | other.entries)
    
    def intersect(self, other: 'SimHLLSet') -> 'SimHLLSet':
        return SimHLLSet(self.entries & other.entries)
    
    def difference(self, other: 'SimHLLSet') -> 'SimHLLSet':
        return SimHLLSet(self.entries - other.entries)
    
    def __contains__(self, item: Tuple[int, int]) -> bool:
        return item in self.entries
    
    def __len__(self) -> int:
        return len(self.entries)
    
    def __iter__(self):
        return iter(self.entries)
    
    def is_empty(self) -> bool:
        return len(self.entries) == 0


print("SimHLLSet class defined")
print()

# Test
h1 = SimHLLSet.from_token("hello")
h2 = SimHLLSet.from_token("world")
h3 = h1.union(h2)
print(f"h1 ('hello'): {h1.entries}")
print(f"h2 ('world'): {h2.entries}")
print(f"h3 (union):   {h3.entries}")
print(f"len(h3) = {len(h3)}")

In [None]:
# ═══════════════════════════════════════════════════════════════
# N-GRAM BUILDER (Hash-based representation)
# ═══════════════════════════════════════════════════════════════

@dataclass
class NGramRegistry:
    """
    Registry for n-grams with hash-based representation.
    
    1-gram: hash(token) → token
    2-gram: hash(h1, h2) → (hash_1, hash_2)
    3-gram: hash(h1, h2, h3) → (hash_1, hash_2, hash_3)
    """
    # Forward: (reg,zeros) → content
    one_gram: Dict[Tuple[int,int], str] = None           # hash → token
    two_gram: Dict[Tuple[int,int], Tuple] = None         # hash → (hash_a, hash_b)
    three_gram: Dict[Tuple[int,int], Tuple] = None       # hash → (hash_a, hash_b, hash_c)
    
    # Reverse: content → (reg,zeros)
    token_to_hash: Dict[str, Tuple[int,int]] = None
    
    # Layer HLLSets
    L0: SimHLLSet = None  # All 1-gram hashes
    L1: SimHLLSet = None  # All 2-gram hashes
    L2: SimHLLSet = None  # All 3-gram hashes
    
    # START transitions
    START_HLLSet: SimHLLSet = None  # Tokens following START
    
    def __post_init__(self):
        self.one_gram = {}
        self.two_gram = {}
        self.three_gram = {}
        self.token_to_hash = {}
        self.L0 = SimHLLSet.empty()
        self.L1 = SimHLLSet.empty()
        self.L2 = SimHLLSet.empty()
        self.START_HLLSet = SimHLLSet.empty()
    
    def add_1gram(self, token: str) -> Tuple[int, int]:
        """Add 1-gram, return its hash."""
        h = sim_hash(f"1:{token}")
        self.one_gram[h] = token
        self.token_to_hash[token] = h
        self.L0 = SimHLLSet(self.L0.entries | {h})
        return h
    
    def add_2gram(self, h1: Tuple[int,int], h2: Tuple[int,int]) -> Tuple[int, int]:
        """Add 2-gram from two 1-gram hashes."""
        # Hash of the pair
        h = sim_hash(f"2:{h1}:{h2}")
        self.two_gram[h] = (h1, h2)
        self.L1 = SimHLLSet(self.L1.entries | {h})
        return h
    
    def add_3gram(self, h1: Tuple[int,int], h2: Tuple[int,int], h3: Tuple[int,int]) -> Tuple[int, int]:
        """Add 3-gram from three 1-gram hashes."""
        h = sim_hash(f"3:{h1}:{h2}:{h3}")
        self.three_gram[h] = (h1, h2, h3)
        self.L2 = SimHLLSet(self.L2.entries | {h})
        return h
    
    def mark_start(self, h: Tuple[int,int]):
        """Mark a 1-gram hash as following START."""
        self.START_HLLSet = SimHLLSet(self.START_HLLSet.entries | {h})
    
    def decompose_3gram(self, h: Tuple[int,int]) -> Set[Tuple[int,int]]:
        """Decompose 3-gram hash to constituent 1-gram hashes."""
        if h in self.three_gram:
            return set(self.three_gram[h])
        return set()
    
    def resolve_1gram(self, h: Tuple[int,int]) -> str:
        """Resolve 1-gram hash to token."""
        return self.one_gram.get(h, f"<unknown:{h}>")


# Initialize registry
registry = NGramRegistry()
print("NGramRegistry initialized")
print(f"  L0 (1-grams): {len(registry.L0)}")
print(f"  L1 (2-grams): {len(registry.L1)}")
print(f"  L2 (3-grams): {len(registry.L2)}")

In [None]:
# ═══════════════════════════════════════════════════════════════
# BUILD SIMULATED DATA
# ═══════════════════════════════════════════════════════════════

def build_ngrams_from_sequence(tokens: List[str], registry: NGramRegistry, max_n: int = 3):
    """
    Build n-grams from token sequence using simplified scheme.
    
    New scheme (non-overlapping higher n-grams):
      (a, b, c, d, e, f) with n=3
      
      Chunk 1: START → a → (a,b) → (a,b,c)
      Chunk 2: d → (d,e) → (d,e,f)
      Final: END
    
    Returns: HLLSet containing all n-gram hashes for this sequence
    """
    result_entries = set()
    
    # Process in chunks of max_n
    i = 0
    is_start = True
    
    while i < len(tokens):
        chunk = tokens[i:i + max_n]
        
        # Add 1-grams for chunk
        chunk_hashes = []
        for token in chunk:
            h = registry.add_1gram(token)
            chunk_hashes.append(h)
            result_entries.add(h)
            
            # Mark first token of first chunk as START follower
            if is_start and len(chunk_hashes) == 1:
                registry.mark_start(h)
                is_start = False
        
        # Build 2-gram if chunk has >= 2 tokens
        if len(chunk_hashes) >= 2:
            h2 = registry.add_2gram(chunk_hashes[0], chunk_hashes[1])
            result_entries.add(h2)
        
        # Build 3-gram if chunk has 3 tokens
        if len(chunk_hashes) >= 3:
            h3 = registry.add_3gram(chunk_hashes[0], chunk_hashes[1], chunk_hashes[2])
            result_entries.add(h3)
        
        # Move to next chunk
        i += max_n
    
    return SimHLLSet(frozenset(result_entries))


# Test with simple sequence
test_tokens = ["import", "os", "import", "sys", "print", "hello"]
test_hllset = build_ngrams_from_sequence(test_tokens, registry)

print("Test sequence:", test_tokens)
print()
print(f"Built HLLSet with {len(test_hllset)} entries")
print()
print("Layer counts after build:")
print(f"  L0 (1-grams): {len(registry.L0)}")
print(f"  L1 (2-grams): {len(registry.L1)}")
print(f"  L2 (3-grams): {len(registry.L2)}")
print(f"  START_HLLSet: {len(registry.START_HLLSet)}")
print()
print("1-gram registry:")
for h, token in registry.one_gram.items():
    print(f"  {h} → '{token}'")

In [None]:
# ═══════════════════════════════════════════════════════════════
# SIMULATED W (Transition Matrix)
# ═══════════════════════════════════════════════════════════════

@dataclass
class SimW:
    """
    Simulated transition matrix W.
    
    W[from_hash] = SimHLLSet of reachable hashes
    
    Built from registry: tracks which n-grams follow which.
    """
    transitions: Dict[Tuple[int,int], SimHLLSet] = None
    
    def __post_init__(self):
        self.transitions = {}
    
    def add_transition(self, from_h: Tuple[int,int], to_h: Tuple[int,int]):
        """Add edge from_h → to_h."""
        if from_h not in self.transitions:
            self.transitions[from_h] = SimHLLSet.empty()
        self.transitions[from_h] = SimHLLSet(
            self.transitions[from_h].entries | {to_h}
        )
    
    def get(self, h: Tuple[int,int]) -> SimHLLSet:
        """Get all hashes reachable from h."""
        return self.transitions.get(h, SimHLLSet.empty())


def build_W_from_sequence(tokens: List[str], registry: NGramRegistry, W: SimW, max_n: int = 3):
    """
    Build W transitions from token sequence.
    
    Transitions:
      START_hash → first 1-gram
      1-gram → 2-gram (containing it)
      2-gram → 3-gram (containing it)
      3-gram → next chunk's 1-gram
    """
    START_h = sim_hash("__START__")
    END_h = sim_hash("__END__")
    
    i = 0
    prev_chunk_end = None
    
    while i < len(tokens):
        chunk = tokens[i:i + max_n]
        chunk_hashes = [registry.token_to_hash.get(t) for t in chunk]
        chunk_hashes = [h for h in chunk_hashes if h is not None]
        
        if not chunk_hashes:
            i += max_n
            continue
        
        # START → first 1-gram (for first chunk)
        if i == 0:
            W.add_transition(START_h, chunk_hashes[0])
        
        # Previous chunk's last → this chunk's first
        if prev_chunk_end is not None:
            W.add_transition(prev_chunk_end, chunk_hashes[0])
        
        # 1-gram → 2-gram
        if len(chunk_hashes) >= 2:
            h2 = sim_hash(f"2:{chunk_hashes[0]}:{chunk_hashes[1]}")
            W.add_transition(chunk_hashes[0], h2)
            W.add_transition(chunk_hashes[1], h2)
            
            # 2-gram → 3-gram
            if len(chunk_hashes) >= 3:
                h3 = sim_hash(f"3:{chunk_hashes[0]}:{chunk_hashes[1]}:{chunk_hashes[2]}")
                W.add_transition(h2, h3)
                W.add_transition(chunk_hashes[2], h3)
                prev_chunk_end = h3
            else:
                prev_chunk_end = h2
        else:
            prev_chunk_end = chunk_hashes[-1]
        
        i += max_n
    
    # Last → END
    if prev_chunk_end:
        W.add_transition(prev_chunk_end, END_h)


# Build W from test sequence
sim_W = SimW()
build_W_from_sequence(test_tokens, registry, sim_W)

print("Transition matrix W built")
print(f"  Total source nodes: {len(sim_W.transitions)}")
print()
print("Sample transitions:")
for from_h, to_set in list(sim_W.transitions.items())[:5]:
    from_label = registry.one_gram.get(from_h, str(from_h)[:20])
    print(f"  {from_label} → {len(to_set)} targets")

In [None]:
# ═══════════════════════════════════════════════════════════════
# CASCADING DISAMBIGUATION ALGORITHM
# ═══════════════════════════════════════════════════════════════

@dataclass
class DisambiguationResult:
    """Result of disambiguating one (reg,zeros)."""
    reg_zeros: Tuple[int, int]
    layer: int                          # 0, 1, or 2
    ngram_hash: Tuple[int, int]         # The n-gram hash
    constituent_1grams: Set[Tuple[int, int]]  # Decomposed 1-gram hashes
    resolved_tokens: List[str]          # Actual tokens


def cascading_disambiguate(
    H: SimHLLSet,
    registry: NGramRegistry,
    W: SimW,
    verbose: bool = True
) -> List[DisambiguationResult]:
    """
    Cascading disambiguation algorithm.
    
    Given HLLSet H, extract all n-grams by layer and decompose
    to constituent 1-gram hashes.
    
    Returns list of DisambiguationResult for each processed (reg,zeros).
    """
    results = []
    
    # Step 1: Slice by layer
    H_0 = H.intersect(registry.L0)  # 1-grams in H
    H_1 = H.intersect(registry.L1)  # 2-grams in H
    H_2 = H.intersect(registry.L2)  # 3-grams in H
    
    if verbose:
        print("Step 1: Layer slicing")
        print(f"  H_0 (1-grams): {len(H_0)}")
        print(f"  H_1 (2-grams): {len(H_1)}")
        print(f"  H_2 (3-grams): {len(H_2)}")
        print()
    
    # Step 2: Find START candidates
    start_candidates = H_0.intersect(registry.START_HLLSet)
    
    if verbose:
        print("Step 2: START candidates")
        print(f"  Found {len(start_candidates)} start candidates")
        for h in start_candidates:
            token = registry.resolve_1gram(h)
            print(f"    {h} → '{token}'")
        print()
    
    # Step 3: Process each 3-gram (highest granularity first)
    if verbose:
        print("Step 3: Processing 3-grams")
    
    for h3 in H_2:
        if h3 in registry.three_gram:
            constituents = registry.decompose_3gram(h3)
            tokens = [registry.resolve_1gram(h) for h in sorted(constituents)]
            
            result = DisambiguationResult(
                reg_zeros=h3,
                layer=2,
                ngram_hash=h3,
                constituent_1grams=constituents,
                resolved_tokens=tokens
            )
            results.append(result)
            
            if verbose:
                print(f"  3-gram {h3[:2]}... → {tokens}")
    
    # Step 4: Process 2-grams not covered by 3-grams
    if verbose:
        print()
        print("Step 4: Processing 2-grams")
    
    covered_2grams = set()
    for h3 in H_2:
        if h3 in registry.three_gram:
            h1, h2, h3_inner = registry.three_gram[h3]
            # Find 2-gram that contains h1, h2
            for h2_candidate in H_1:
                if h2_candidate in registry.two_gram:
                    if registry.two_gram[h2_candidate] == (h1, h2):
                        covered_2grams.add(h2_candidate)
    
    for h2 in H_1:
        if h2 not in covered_2grams and h2 in registry.two_gram:
            h1, h2_inner = registry.two_gram[h2]
            constituents = {h1, h2_inner}
            tokens = [registry.resolve_1gram(h) for h in [h1, h2_inner]]
            
            result = DisambiguationResult(
                reg_zeros=h2,
                layer=1,
                ngram_hash=h2,
                constituent_1grams=constituents,
                resolved_tokens=tokens
            )
            results.append(result)
            
            if verbose:
                print(f"  2-gram {h2[:2]}... → {tokens}")
    
    # Step 5: Process standalone 1-grams
    if verbose:
        print()
        print("Step 5: Processing standalone 1-grams")
    
    covered_1grams = set()
    for r in results:
        covered_1grams.update(r.constituent_1grams)
    
    for h1 in H_0:
        if h1 not in covered_1grams and h1 in registry.one_gram:
            token = registry.resolve_1gram(h1)
            
            result = DisambiguationResult(
                reg_zeros=h1,
                layer=0,
                ngram_hash=h1,
                constituent_1grams={h1},
                resolved_tokens=[token]
            )
            results.append(result)
            
            if verbose:
                print(f"  1-gram {h1[:2]}... → '{token}'")
    
    return results


print("cascading_disambiguate() defined")

In [None]:
# ═══════════════════════════════════════════════════════════════
# RUN CASCADING DISAMBIGUATION TEST
# ═══════════════════════════════════════════════════════════════

print("═" * 60)
print("CASCADING DISAMBIGUATION TEST")
print("═" * 60)
print()
print(f"Input sequence: {test_tokens}")
print(f"Input HLLSet size: {len(test_hllset)}")
print()

# Run disambiguation
results = cascading_disambiguate(test_hllset, registry, sim_W, verbose=True)

print()
print("═" * 60)
print("DISAMBIGUATION RESULTS")
print("═" * 60)
print()

# Collect all recovered tokens
all_tokens = set()
for r in results:
    all_tokens.update(r.resolved_tokens)

print(f"Total results: {len(results)}")
print(f"Unique tokens recovered: {all_tokens}")
print()

# Verify: did we recover all original tokens?
original_tokens = set(test_tokens)
recovered_tokens = all_tokens

print("Verification:")
print(f"  Original tokens:  {original_tokens}")
print(f"  Recovered tokens: {recovered_tokens}")
print(f"  Match: {original_tokens == recovered_tokens}")

## 11. Test Production Implementation

Now let's verify that the production implementation in `manifold_algebra.py` works correctly with the same test data.

In [None]:
# ═══════════════════════════════════════════════════════════════
# Test Production Implementation from manifold_algebra.py
# ═══════════════════════════════════════════════════════════════

# Import production classes - need to reload to pick up new additions
import importlib
import core.manifold_algebra as ma
importlib.reload(ma)

from core.manifold_algebra import (
    LayerHLLSets, 
    DisambiguationResult,
    update_layer_hllsets,
    cascading_disambiguate,
    resolve_disambiguation
)

print("✓ Successfully imported production implementation:")
print(f"  - LayerHLLSets: {LayerHLLSets}")
print(f"  - DisambiguationResult: {DisambiguationResult}")
print(f"  - update_layer_hllsets: {update_layer_hllsets}")
print(f"  - cascading_disambiguate: {cascading_disambiguate}")
print(f"  - resolve_disambiguation: {resolve_disambiguation}")

# Create empty layer HLLSets
prod_layers = LayerHLLSets.empty()
print(f"\n✓ Created empty LayerHLLSets:")
print(f"  - L0 cardinality: {prod_layers.L0.cardinality()}")
print(f"  - L1 cardinality: {prod_layers.L1.cardinality()}")
print(f"  - L2 cardinality: {prod_layers.L2.cardinality()}")
print(f"  - START cardinality: {prod_layers.START.cardinality()}")

In [None]:
# ═══════════════════════════════════════════════════════════════
# Test cascading_disambiguate with real AM data
# ═══════════════════════════════════════════════════════════════

from core.hllset import HLLSet
from core.sparse_hrt_3d import SparseAM3D

# Collect indices from each layer
l0_indices = set()
l1_indices = set()
l2_indices = set()

print("Collecting indices from AM layers...")
n_layers = AM.shape[0]
for layer_idx in range(n_layers):
    layer = AM.layers[layer_idx]
    layer_dict = layer.to_dict()
    
    for row, cols in layer_dict.items():
        for col in cols.keys():
            if layer_idx == 0:
                l0_indices.add(str(row))
                l0_indices.add(str(col))
            elif layer_idx == 1:
                l1_indices.add(str(row))
                l1_indices.add(str(col))
            elif layer_idx == 2:
                l2_indices.add(str(row))
                l2_indices.add(str(col))

# Build HLLSets from collected indices
print("Building HLLSets...")
L0 = HLLSet.from_batch(list(l0_indices), p_bits=10)
L1 = HLLSet.from_batch(list(l1_indices), p_bits=10)
L2 = HLLSet.from_batch(list(l2_indices), p_bits=10)
START_hll = HLLSet.from_batch([], p_bits=10)  # Empty for now

# Create LayerHLLSets
prod_layers = LayerHLLSets(
    L0=L0, L1=L1, L2=L2, START=START_hll, p_bits=10
)

print(f"\n✓ Built LayerHLLSets from AM:")
print(f"  - L0 (1-grams) cardinality: {prod_layers.L0.cardinality():.0f}")
print(f"  - L1 (2-grams) cardinality: {prod_layers.L1.cardinality():.0f}")
print(f"  - L2 (3-grams) cardinality: {prod_layers.L2.cardinality():.0f}")
print(f"  Total unique: L0: {len(l0_indices)}, L1: {len(l1_indices)}, L2: {len(l2_indices)}")

# Get sample indices from across all layers
print("\n--- Testing cascading_disambiguate ---")

# We need to use current_hrt.am which is SparseAM3D
print(f"Using current_hrt.am: {type(current_hrt.am)}")

sample_indices = set()
sample_indices.update([int(x) for x in list(l0_indices)[:3]])
sample_indices.update([int(x) for x in list(l1_indices)[:3]])  
sample_indices.update([int(x) for x in list(l2_indices)[:3]])

print(f"Sample indices for disambiguation: {list(sample_indices)[:9]}")

# Test production cascading_disambiguate - NOW WITH FULL SIGNATURE
results = cascading_disambiguate(
    query_indices=sample_indices,
    am=current_hrt.am,
    layer_hllsets=prod_layers,
    W=current_W,          # Added: W transition matrix
    lut=shared_lut        # Added: LUT for START lookup
)

print(f"\n✓ Disambiguation results: {len(results)} items")
for r in results[:9]:
    print(f"  Index {r.index}: layer={r.layer}, constituents={r.constituent_indices}")

# Now resolve to tokens
print("\n--- Resolving to tokens ---")
resolved = resolve_disambiguation(results, shared_lut)
for idx, tokens in list(resolved.items())[:5]:
    print(f"  Index {idx}: {tokens}")

In [None]:
# ═══════════════════════════════════════════════════════════════
# Summary: Production Cascading Disambiguation Test Results
# ═══════════════════════════════════════════════════════════════

print("=" * 60)
print("PRODUCTION IMPLEMENTATION TEST - SUCCESS ✓")
print("=" * 60)

print(f"\nLayerHLLSets built from AM:")
print(f"  L0 (1-grams): {len(l0_indices)} indices, cardinality ≈{prod_layers.L0.cardinality():.0f}")
print(f"  L1 (2-grams): {len(l1_indices)} indices, cardinality ≈{prod_layers.L1.cardinality():.0f}")
print(f"  L2 (3-grams): {len(l2_indices)} indices, cardinality ≈{prod_layers.L2.cardinality():.0f}")

print(f"\nCascading disambiguation:")
print(f"  Input indices: {len(sample_indices)}")
print(f"  Output results: {len(results)}")

# Count by layer
by_layer = {0: 0, 1: 0, 2: 0}
for r in results:
    by_layer[r.layer] = by_layer.get(r.layer, 0) + 1
print(f"  Results by layer: L0={by_layer[0]}, L1={by_layer[1]}, L2={by_layer[2]}")

print(f"\nToken resolution:")
print(f"  Resolved {len(resolved)} indices to tokens")

print("\nSample resolved tokens:")
for idx, tokens in list(resolved.items())[:3]:
    print(f"  Index {idx}: {' | '.join(str(t) for t in tokens[:5])}{'...' if len(tokens) > 5 else ''}")

## 12. Intersected Context Extension

**Problem**: Union of all related HLLSets brings too many indices.

**Solution**: Extended context = row_union ∩ col_union
- `row_union(query)` = all columns where query appears as row
- `col_union(query)` = all rows where query appears as column
- `intersected = row_union ∩ col_union` → much narrower context

In [None]:
# ═══════════════════════════════════════════════════════════════
# Compare: Union Context vs Intersected Context
# ═══════════════════════════════════════════════════════════════

import importlib
import core.manifold_algebra as ma
importlib.reload(ma)

from core.manifold_algebra import (
    extend_with_context,
    extend_with_intersected_context,
    input_to_hllset,
    build_sub_hrt
)
from core.sparse_hrt_3d import BasicHLLSet3D

print("✓ Imported both context extension methods")

# Pick a sample query from the LUT (use ntoken_to_index)
sample_ntokens = list(shared_lut.ntoken_to_index.keys())[:5]
print(f"\nSample n-tokens from LUT: {sample_ntokens}")

# Create BasicHLLSet3D for sample indices
sample_basics = []
for ntoken in sample_ntokens:
    idx = shared_lut.ntoken_to_index.get(ntoken)
    if idx:
        # Reconstruct reg, zeros from index
        reg = idx // config.max_zeros
        zeros = (idx % config.max_zeros) + 1
        for n in range(config.max_n):
            sample_basics.append(BasicHLLSet3D(n=n, reg=reg, zeros=zeros))

print(f"Created {len(sample_basics)} BasicHLLSet3D entries")

# Build a minimal sub-HRT
from core.manifold_algebra import Edge3D
sub_hrt = build_sub_hrt([], config)  # Empty sub-HRT

# Compare the two methods
print("\n--- Comparing Context Extension Methods ---\n")

# Method 1: Union (original)
_, union_edges = extend_with_context(sub_hrt, current_W, sample_basics, config)
union_cols = {e.col for e in union_edges}

# Method 2: Intersection (new)
_, intersect_edges = extend_with_intersected_context(sub_hrt, current_W, sample_basics, config)
intersect_cols = {e.col for e in intersect_edges}

print(f"UNION context:")
print(f"  Edges: {len(union_edges)}")
print(f"  Unique columns: {len(union_cols)}")

print(f"\nINTERSECTED context:")
print(f"  Edges: {len(intersect_edges)}")
print(f"  Unique columns: {len(intersect_cols)}")

if len(union_cols) > 0:
    reduction = (1 - len(intersect_cols) / len(union_cols)) * 100
    print(f"\n→ Reduction: {reduction:.1f}% fewer columns")
    print(f"→ Intersection is {len(union_cols) / max(len(intersect_cols), 1):.1f}x more selective")

## 13. Full Cascading Disambiguation (5 Steps)

The algorithm now implements the complete workflow:

1. **Step 1**: Slice by layer (H_0, H_1, H_2)
2. **Step 2**: Find START candidates (H_0 ∩ START_followers)
3. **Step 3**: Follow W transitions (start → 2-gram → 3-gram)
4. **Step 4**: Decompose remaining n-grams to constituents
5. **Step 5**: Process standalone 1-grams, repeat until empty

In [None]:
# ═══════════════════════════════════════════════════════════════
# Test Full Cascading Disambiguation Algorithm
# ═══════════════════════════════════════════════════════════════

import importlib
import core.manifold_algebra as ma
importlib.reload(ma)

from core.manifold_algebra import (
    cascading_disambiguate,
    resolve_disambiguation,
    LayerHLLSets,
    START, END
)

print("✓ Reloaded manifold_algebra with full cascading algorithm")

# Get sample indices from the AM
sample_indices = set()
for layer_idx in range(AM.shape[0]):
    layer = AM.layers[layer_idx]
    layer_dict = layer.to_dict()
    count = 0
    for row, cols in layer_dict.items():
        for col in cols.keys():
            if count < 5:
                sample_indices.add(row)
                sample_indices.add(col)
                count += 1
            else:
                break
        if count >= 5:
            break

print(f"\nSample indices from AM: {len(sample_indices)} indices")
print(f"  Sample: {list(sample_indices)[:10]}...")

# Run full cascading disambiguation
print("\n--- Running Full Cascading Disambiguation ---\n")

results = cascading_disambiguate(
    query_indices=sample_indices,
    am=current_hrt.am,
    layer_hllsets=prod_layers,
    W=current_W,
    lut=shared_lut
)

print(f"Disambiguation results: {len(results)} items\n")

# Count by layer
by_layer = {0: 0, 1: 0, 2: 0}
for r in results:
    by_layer[r.layer] = by_layer.get(r.layer, 0) + 1

print(f"Results by layer:")
print(f"  L0 (1-grams): {by_layer[0]}")
print(f"  L1 (2-grams): {by_layer[1]}")
print(f"  L2 (3-grams): {by_layer[2]}")

# Show sample results
print("\nSample results:")
for r in results[:5]:
    print(f"  Index {r.index}: layer={r.layer}, constituents={r.constituent_indices}")

In [None]:
# ═══════════════════════════════════════════════════════════════
# Resolve disambiguation results to actual tokens
# ═══════════════════════════════════════════════════════════════

resolved = resolve_disambiguation(results, shared_lut)

print("=" * 60)
print("TOKEN RESOLUTION")
print("=" * 60)

print(f"\nResolved {len(resolved)} indices to tokens\n")

# Show sample resolved tokens
print("Sample resolved tokens:")
for idx, tokens in list(resolved.items())[:10]:
    # Find the result for this index
    layer_info = next((r.layer for r in results if r.index == idx), "?")
    token_preview = " | ".join(str(t) for t in tokens[:3])
    if len(tokens) > 3:
        token_preview += f"... (+{len(tokens)-3} more)"
    print(f"  [{idx}] L{layer_info}: {token_preview}")

# Statistics
total_tokens = sum(len(t) for t in resolved.values())
print(f"\nStatistics:")
print(f"  Total resolved indices: {len(resolved)}")
print(f"  Total tokens recovered: {total_tokens}")
print(f"  Avg tokens per index: {total_tokens / max(len(resolved), 1):.2f}")

In [None]:
# ═══════════════════════════════════════════════════════════════
# Demonstrate START-based sequence reconstruction
# ═══════════════════════════════════════════════════════════════

print("=" * 60)
print("START-BASED SEQUENCE RECONSTRUCTION")
print("=" * 60)

# Find START index
start_idx = shared_lut.get_ntoken_index(START)
print(f"\nSTART token index: {start_idx}")

# Get all START followers from W
if start_idx is not None and 0 in current_W and start_idx in current_W[0]:
    start_followers = current_W[0][start_idx]
    print(f"Tokens following START: {len(start_followers)}")
    
    # Show some followers with their tokens
    print("\nSample START followers:")
    for follower_idx, weight in list(start_followers.items())[:10]:
        # Look up the token
        ntokens = shared_lut.index_to_ntokens.get(follower_idx, set())
        if ntokens:
            # Get 1-gram tokens only
            tokens = [nt for layer, nt in ntokens if layer == 0]
            if tokens:
                print(f"  [{follower_idx}] w={weight:.2f}: {tokens[0]}")
else:
    print("No START followers found in W")

# Check how many START candidates were in our query
if start_idx is not None and 0 in current_W and start_idx in current_W[0]:
    start_followers_set = set(current_W[0][start_idx].keys())
    query_start_candidates = sample_indices & start_followers_set
    print(f"\nSTART candidates in query: {len(query_start_candidates)}")
    
    # Show the sequences that could be built
    print("\nPotential sequence starts in query:")
    for idx in list(query_start_candidates)[:5]:
        ntokens = shared_lut.index_to_ntokens.get(idx, set())
        tokens = [nt for layer, nt in ntokens if layer == 0]
        if tokens:
            print(f"  START → {tokens[0]}")

## 14. Proper Query Processing → Then Disambiguation

**The Issue**: We grabbed random indices from AM without ingesting them as a query.

**The Fix**: Use `unified_process` to ingest query first, which:
1. Creates HLLSet with START/END markers
2. Builds n-gram edges (START → token1 → token2...)
3. Merges into AM
4. THEN the query indices will have START connections

In [None]:
# ═══════════════════════════════════════════════════════════════
# Proper Query Processing: Ingest THEN Disambiguate
# ═══════════════════════════════════════════════════════════════

import importlib
import core.manifold_algebra as ma
importlib.reload(ma)

from core.manifold_algebra import (
    unified_process,
    cascading_disambiguate,
    resolve_disambiguation,
    build_w_from_am,
    START, END
)

print("=" * 60)
print("PROPER QUERY FLOW: INGEST → EXTEND → DISAMBIGUATE")
print("=" * 60)

# Step 1: Process a query through unified_process
test_query = "def hello_world(): print('hello')"
print(f"\nQuery: '{test_query}'")

result = unified_process(
    input_data=test_query,
    current_hrt=current_hrt,
    current_W=current_W,
    config=config,
    lut=shared_lut,
    max_n=3
)

print(f"\nProcessing result:")
print(f"  Input HLLSet cardinality: {result.input_hllset.cardinality():.0f}")
print(f"  Input basics: {len(result.input_basics)}")
print(f"  Context edges: {len(result.context_edges)}")

# Step 2: Get the indices from the processed query
query_indices = set()
for basic in result.input_basics:
    idx = basic.to_index(config)
    query_indices.add(idx)

print(f"\nQuery indices: {len(query_indices)}")
print(f"  Sample: {list(query_indices)[:5]}...")

# Step 3: Check for START connections in the sub_hrt
sub_W = build_w_from_am(result.sub_hrt.am, config)
start_idx = shared_lut.get_ntoken_index(START)
print(f"\nSTART index: {start_idx}")

if start_idx is not None and 0 in sub_W and start_idx in sub_W[0]:
    start_followers_in_query = set(sub_W[0][start_idx].keys())
    print(f"START followers in query sub-HRT: {len(start_followers_in_query)}")
    
    # Show them
    for f_idx in list(start_followers_in_query)[:5]:
        ntokens = shared_lut.index_to_ntokens.get(f_idx, set())
        tokens = [nt for layer, nt in ntokens if layer == 0]
        if tokens:
            print(f"  START → {tokens[0]}")
else:
    print("No START followers in sub-HRT (START may not be in sub_W)")

In [None]:
# ═══════════════════════════════════════════════════════════════
# Step 4: Run Cascading Disambiguation on MERGED result
# ═══════════════════════════════════════════════════════════════

# Use the MERGED HRT (which includes both current + query)
merged_W = build_w_from_am(result.merged_hrt.am, config)

print("=" * 60)
print("CASCADING DISAMBIGUATION ON MERGED HRT")
print("=" * 60)

# Check START followers in merged W
if start_idx is not None and 0 in merged_W and start_idx in merged_W[0]:
    start_followers_merged = set(merged_W[0][start_idx].keys())
    print(f"\nSTART followers in MERGED HRT: {len(start_followers_merged)}")
    
    # How many are in our query?
    query_start_candidates = query_indices & start_followers_merged
    print(f"Query indices that follow START: {len(query_start_candidates)}")
    
    for idx in query_start_candidates:
        ntokens = shared_lut.index_to_ntokens.get(idx, set())
        tokens = [nt for layer, nt in ntokens if layer == 0]
        if tokens:
            print(f"  START → {tokens[0]}")

# Run full disambiguation
print("\n--- Running Cascading Disambiguation ---")

results = cascading_disambiguate(
    query_indices=query_indices,
    am=result.merged_hrt.am,
    layer_hllsets=prod_layers,
    W=merged_W,
    lut=shared_lut
)

print(f"\nResults: {len(results)} items")

# Count by layer
by_layer = {0: 0, 1: 0, 2: 0}
for r in results:
    by_layer[r.layer] = by_layer.get(r.layer, 0) + 1
print(f"  L0 (1-grams): {by_layer[0]}")
print(f"  L1 (2-grams): {by_layer[1]}")
print(f"  L2 (3-grams): {by_layer[2]}")

# Resolve to tokens
resolved = resolve_disambiguation(results, shared_lut)

print("\n--- Resolved Tokens ---")
for idx, tokens in resolved.items():
    layer = next((r.layer for r in results if r.index == idx), "?")
    print(f"  L{layer} [{idx}]: {tokens}")

In [None]:
# ═══════════════════════════════════════════════════════════════
# Summary: Query → Tokens Recovery
# ═══════════════════════════════════════════════════════════════

print("=" * 60)
print("SUMMARY: QUERY RECONSTRUCTION")
print("=" * 60)

print(f"\nOriginal query: '{test_query}'")

# Collect all recovered tokens
all_recovered = []
for idx, tokens in resolved.items():
    for t in tokens:
        if isinstance(t, tuple):
            all_recovered.extend(t)
        elif isinstance(t, str) and not t.startswith('<'):
            all_recovered.append(t)

# Remove START/END markers
cleaned = [t for t in all_recovered if t not in ('<START>', '<END>')]
print(f"\nRecovered tokens: {cleaned}")

# Check match
original_tokens = test_query.split()
print(f"Original tokens: {original_tokens}")

# Calculate recovery rate
recovered_set = set(cleaned)
original_set = set(original_tokens)
overlap = recovered_set & original_set
print(f"\nRecovery rate: {len(overlap)}/{len(original_set)} = {100*len(overlap)/max(len(original_set),1):.0f}%")

### Test Summary

**What we built**:

1. `SimHLLSet` - Simulated HLLSet with exact (reg,zeros) tracking
2. `NGramRegistry` - Hash-based n-gram storage with layer HLLSets (L0, L1, L2)
3. `SimW` - Simulated transition matrix
4. `build_ngrams_from_sequence()` - Simplified chunked n-gram builder
5. `cascading_disambiguate()` - The disambiguation algorithm

**Key insight**: Layer HLLSets (L0, L1, L2) are just 3 additional HLLSets that can be:
- Stored as part of 3D AM metadata
- Built once during ingestion
- Used for O(1) layer classification via intersection

**Integration path**:

```python
# In 3D AM or HRT:
class SparseAM3D:
    ...
    L0: HLLSet  # All layer 0 (reg,zeros)
    L1: HLLSet  # All layer 1 (reg,zeros)
    L2: HLLSet  # All layer 2 (reg,zeros)
    START_HLLSet: HLLSet  # START followers

# During build_sub_hrt:
def build_sub_hrt(tokens, ...):
    # After adding edges, update layer HLLSets
    am.L0.add(h1)       # 1-grams
    am.L1.add(h2)       # 2-grams
    am.L2.add(h3)       # 3-grams
    if is_start:
        am.START_HLLSet.add(h1)
```

**Next steps**:
1. Integrate hash-based n-gram representation into `manifold_algebra.py`
2. Add layer HLLSets to SparseAM3D
3. Update `build_sub_hrt` to populate layer HLLSets
4. Use `cascading_disambiguate` for query resolution

## 11. Interactive Query (with Learning)

In [None]:
def ask(prompt: str, top_k: int = 10, learn: bool = True) -> str:
    """
    Interactive query with feedback loop.
    
    Full sense-process-act-feedback cycle:
    1. Query → HLLSet → HRT → Commit
    2. Find related concepts
    3. Generate response
    4. Response → HLLSet → HRT → Commit (FEEDBACK!)
    
    The manifold learns from BOTH the question AND its own answer.
    
    Args:
        prompt: User query text
        top_k: Number of results to return
        learn: If True, ingest both query AND response (default: True)
    
    Returns:
        Formatted response string from actuator
    """
    global current_hrt, current_W
    
    # SENSE: Process query through prompt perceptron
    new_hrt, new_W, commit, result = prompt_perceptron.process_prompt(
        prompt,
        current_hrt,
        current_W,
        store
    )
    
    if not commit:
        return "No results (empty query)"
    
    if learn:
        # Update state after query ingestion
        current_hrt = new_hrt
        current_W = new_W
    
    # PROCESS: Find related concepts
    query_indices = set()
    if result:
        for edge in result.context_edges:
            query_indices.add(edge.row)
            query_indices.add(edge.col)
    
    # Find reachable (connected concepts)
    AM = Sparse3DMatrix.from_am(current_hrt.am, config)
    layer0 = project_layer(AM, 0)
    reachable = reachable_from(layer0, query_indices, hops=1)
    
    # Convert layer to dict for scoring
    layer0_dict = layer0.to_dict()
    
    # Score by connectivity
    scores = {}
    for idx in reachable:
        if idx in layer0_dict:
            scores[idx] = sum(layer0_dict[idx].values())
    
    # Get top-k, resolve to n-tokens
    top = sorted(scores.items(), key=lambda x: -x[1])[:top_k]
    query_results = []
    for idx, score in top:
        ntoken = shared_lut.index_to_ntokens.get(idx, f"<idx:{idx}>")
        query_results.append((ntoken, score))
    
    # ACT + FEEDBACK: Generate response AND ingest it back
    response_text, final_hrt, final_W = response_actuator.act(
        commit,
        result,
        query_results=query_results,
        hrt=current_hrt,
        W=current_W,
        store=store,
        lut=shared_lut,
        config=config,
        ingest_response=learn,  # Response also gets ingested!
    )
    
    if learn:
        # Update with response-ingested state
        current_hrt = final_hrt
        current_W = final_W
    
    return response_text


def show(response: str) -> None:
    """Print actuator response."""
    print(response)


print("Interactive query with FEEDBACK LOOP:")
print()
print("  response = ask('your query')")
print()
print("What happens:")
print("  1. Query → HLLSet → Commit")
print("  2. Find related concepts")
print("  3. Generate response")
print("  4. Response → HLLSet → Commit  ← FEEDBACK!")
print()
print("The manifold learns from BOTH questions AND its own answers.")

## 13. Summary

### Double Loop: Action + Reflection

```text
                    LOOP 1: ACTION
         ┌──────────────────────────────────────┐
         │                                      │
         │  Perceptron ──► Pipeline ──► Actuator ──► OUTPUT
         │  (sense)        (process)    (act)   │
         │                                      │
         └──────────────────────────────────────┘
                                            |
                                          (own output)
                                            │
                    LOOP 2: REFLECTION      ▼
         ┌───────────────────────────────────────┐
         │                                       │
         │  Response   ──►   HLLSet   ──►  Commit ──►  MEMORY
         │  (observe)    (encode)   (record)     │
         │                                       │
         └───────────────────────────────────────┘
```

### Cybernetic Self-Reflection

**Loop 1** is standard sense-process-act.

**Loop 2** is what makes this **cybernetic**:
- The system **observes** its own output
- **Encodes** it as HLLSet (same pipeline as inputs!)
- **Commits** to memory (manifold)

This is **self-reflection** in the cybernetic sense:
> "The system records its own behavior into memory,
>  so future behavior is shaped by past behavior."

### Why Two Commits Per Query

```python
ask("What is HLLSet?")
```

Creates:
1. `prompt_1` → Query committed (system heard you)
2. `response_1` → Response committed (system remembers what it said)

Both shape the manifold. Both influence future responses.

### Memory vs Log

| Traditional Log | Cybernetic Memory |
|-----------------|-------------------|
| External record | Internal state |
| For humans | For system |
| Read-only | Shapes behavior |
| Passive | Active |

The reflection loop isn't just logging - it's the system building a model of its own behavior.

### Emergent Properties

Over time, the double loop creates:

1. **Consistency**: Responses align with past responses
2. **Learning**: Repeated patterns reinforce
3. **Context**: System "remembers" conversation history
4. **Coherence**: Self-model becomes more refined

### The Insight

```text
Traditional AI:  Input → Process → Output
                 (no memory of own outputs)

Fractal Manifold: Input → Process → Output
                              ↓
                         Reflection
                              ↓
                           Memory
                              ↓
                    (shapes next process)
```

**Every output becomes input to future processing.**

This is the cybernetic principle: feedback creates adaptation.

## 14. Hardware Path: FPGA Cybernetic Brain

### Current AI vs Cybernetic AI

| Property | Current AI (LLMs) | Cybernetic AI (Fractal Manifold) |
|----------|-------------------|----------------------------------|
| Memory | Stateless per request | Continuous self-reflection |
| Learning | Offline training only | Online, every interaction |
| Own outputs | Forgotten immediately | Encoded into manifold |
| Hardware | GPUs (matrix multiply) | FPGA (bit operations) |
| Latency | Milliseconds | Microseconds |
| Power | 100-1000W | 1-10W |

### Why FPGA?

Every core operation is hardware-friendly:

```text
┌─────────────────────────────────────────────────────────────────┐
│                    FPGA IMPLEMENTATION                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  OPERATION              FPGA PRIMITIVE         CYCLES           │
│  ─────────              ──────────────         ──────           │
│                                                                 │
│  Hash(token)            Combinational logic    1 cycle          │
│  HLLSet.add()           Leading zero count     1 cycle          │
│  HLLSet.union()         Register OR            1 cycle          │
│  HLLSet.intersect()     Register AND           1 cycle          │
│  Sparse lookup          BRAM read              1-2 cycles       │
│  Edge creation          Parallel pipelines     1 cycle          │
│  Merge                   Tree reduction        log(N) cycles    │
│                                                                 │
│  TOTAL PIPELINE LATENCY: ~10-20 cycles @ 200MHz = 50-100ns      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

### Robotic Brain Architecture

```text
                     ┌─────────────────────────────────┐
                     │         FPGA CHIP               │
                     │                                 │
   SENSORS           │  ┌─────────────────────────┐    │           ACTUATORS
   ───────           │  │    PERCEPTRON ARRAY     │    │           ─────────
                     │  │                         │    │
   Camera ──────────►│  │  p_vision   p_audio     │    │◄───────── Servo 1
   Microphone ──────►│  │  p_touch    p_imu       │    │◄───────── Servo 2
   Touch ───────────►│  │  p_lidar    p_prompt    │    │◄───────── Motor
   IMU ─────────────►│  │                         │    │◄───────── Speaker
   LiDAR ───────────►│  └──────────┬──────────────┘    │
                     │             │                   │
                     │             ▼                   │
                     │  ┌─────────────────────────┐    │
                     │  │   UNIFIED PIPELINE      │    │
                     │  │                         │    │
                     │  │  Hash → HLLSet → Merge  │    │
                     │  │  (fully pipelined)      │    │
                     │  │                         │    │
                     │  └──────────┬──────────────┘    │
                     │             │                   │
                     │             ▼                   │
                     │  ┌─────────────────────────┐    │
                     │  │   ACTUATOR ARRAY        │    │
                     │  │                         │    │
                     │  │  a_motor   a_voice      │    │
                     │  │  a_servo   a_display    │    │
                     │  │                         │    │
                     │  └──────────┬──────────────┘    │
                     │             │                   │
                     │             │ REFLECTION LOOP   │
                     │             ▼                   │
                     │  ┌─────────────────────────┐    │
                     │  │   MANIFOLD MEMORY       │    │
                     │  │                         │    │
                     │  │  HBM / DDR / BRAM       │    │
                     │  │  (sparse HRT storage)   │    │
                     │  │                         │    │
                     │  └─────────────────────────┘    │
                     │                                 │
                     └─────────────────────────────────┘
```

### Key FPGA Advantages

1. **Real-time**: Full loop in ~100ns (vs ~100ms for software)
2. **Low power**: 5-10W total (vs 100-1000W for GPU)
3. **Deterministic**: Fixed latency, no GC pauses, no OS jitter
4. **Parallel**: All perceptrons process simultaneously
5. **Embedded**: Runs on battery, fits in robot

### Memory Hierarchy

```text
BRAM (on-chip)     →  Hot n-tokens, LUT cache         ~2MB
HBM (on-package)   →  Active HRT layers               ~4-16GB  
DDR (off-chip)     →  Full manifold, commit history   ~64GB+
```

### Comparison to Neural Networks

| Aspect | Neural Network | Fractal Manifold |
|--------|----------------|------------------|
| Core op | MAC (multiply-add) | Bit ops (AND/OR/XOR) |
| Precision | FP16/FP32/INT8 | 1-bit (HLL registers) |
| Memory | Dense weights | Sparse edges |
| Training | Backprop (offline) | Merge (online) |
| Inference | Forward pass | Graph traversal |
| Interpretable | No (black box) | Yes (edges = relationships) |

### The Vision

A robot with Fractal Manifold FPGA brain:

- **Sees** (camera perceptron) → encodes scene as HLLSet
- **Hears** (audio perceptron) → encodes speech as HLLSet  
- **Responds** (motor actuator) → action selected from manifold
- **Reflects** (feedback loop) → own actions encoded into memory
- **Learns** continuously from every interaction
- **Never forgets** (CRDT merges are permanent)
- **Runs on battery** (5W total power)

This is **Cybernetic AI** - not pattern matching, but genuine sense-process-act-reflect in hardware.

## 15. Using Consolidated Manifold Algebra Module

All processing is now consolidated in `manifold_algebra.py`:

- **CommitStore**: Track processing history with rollback
- **Perceptron/PromptPerceptron**: Sense phase (input → HLLSet)
- **ResponseActuator**: Act phase with feedback loop
- **QueryContext + ask()**: Interactive querying

This makes the notebook code much simpler!

In [10]:
# ═══════════════════════════════════════════════════════════════
# Using Consolidated Module - Clean Interface
# ═══════════════════════════════════════════════════════════════

import importlib
import core.manifold_algebra as ma
importlib.reload(ma)

from core.manifold_algebra import (
    # Core structures
    Sparse3DConfig,
    LookupTable,
    LayerHLLSets,
    START, END,
    
    # Processing pipeline
    unified_process,
    build_w_from_am,
    cascading_disambiguate,
    resolve_disambiguation,
    
    # Consolidated classes
    CommitStore,
    Commit,
    Perceptron,
    PromptPerceptron,
    ResponseActuator,
    QueryContext,
    ask,
    create_query_context,
)

from core.sparse_hrt_3d import Sparse3DConfig

print("=" * 60)
print("MANIFOLD ALGEBRA - CONSOLIDATED MODULE")
print("=" * 60)

print("\nNew classes available:")
print("  • CommitStore     - Track processing history")
print("  • Commit          - Single timestamped commit")
print("  • Perceptron      - Base class for sensing")
print("  • PromptPerceptron - Query processing")
print("  • ResponseActuator - Response with feedback")
print("  • QueryContext    - Holds query state")
print("  • ask()           - Interactive query function")
print("  • create_query_context() - Initialize fresh context")

MANIFOLD ALGEBRA - CONSOLIDATED MODULE

New classes available:
  • CommitStore     - Track processing history
  • Commit          - Single timestamped commit
  • Perceptron      - Base class for sensing
  • PromptPerceptron - Query processing
  • ResponseActuator - Response with feedback
  • QueryContext    - Holds query state
  • ask()           - Interactive query function
  • create_query_context() - Initialize fresh context


In [11]:
# ═══════════════════════════════════════════════════════════════
# Create Fresh QueryContext - Clean State
# ═══════════════════════════════════════════════════════════════

from core.sparse_hrt_3d import Sparse3DConfig

# Create config if not exists (standalone usage)
try:
    _ = config
except NameError:
    config = Sparse3DConfig(p_bits=10, h_bits=32, max_n=3)
    print("Created new config")

# Create fresh context
ctx = create_query_context(config)

print("✓ Created fresh QueryContext")
print(f"  HRT edges: {ctx.hrt.nnz}")
print(f"  LUT entries: {len(ctx.lut.ntoken_to_index)}")
print(f"  Commits: {len(ctx.store)}")
print()

# Also create one with existing LUT if available
try:
    ctx_with_existing_lut = create_query_context(config, lut=shared_lut)
    print("✓ Created QueryContext with existing LUT")
    print(f"  LUT entries: {len(ctx_with_existing_lut.lut.ntoken_to_index)}")
except NameError:
    # No existing LUT - use fresh context
    ctx_with_existing_lut = ctx
    print("(Using fresh context - no existing LUT found)")

✓ Created fresh QueryContext
  HRT edges: 0
  LUT entries: 2
  Commits: 0

✓ Created QueryContext with existing LUT
  LUT entries: 148974


In [12]:
# ═══════════════════════════════════════════════════════════════
# Interactive Querying with ask() - Simplified Interface
# ═══════════════════════════════════════════════════════════════

# Use the context we have (either fresh or with existing LUT)
query_ctx = ctx_with_existing_lut

# Populate with existing state if available
try:
    query_ctx.hrt = current_hrt
    query_ctx.W = current_W
    query_ctx.layer_hllsets = prod_layers
    print("Using existing HRT/W state from notebook")
except NameError:
    print("Using fresh state (no prior processing)")

print("=" * 60)
print("INTERACTIVE QUERY via ask()")
print("=" * 60)

# Single-line query!
response, disamb_results = ask(
    "def hello_world(): print('hello')",
    query_ctx,
    top_k=5,
    learn=True  # Feedback loop enabled
)

print("\nResponse:")
print(response)
print()
print(f"Disambiguation results: {len(disamb_results)} items")
print(f"Commits after query: {len(query_ctx.store)}")

# Resolve tokens from disambiguation
if disamb_results:
    resolved = resolve_disambiguation(disamb_results, query_ctx.lut)
    print("\nResolved tokens (sample):")
    for idx, tokens in list(resolved.items())[:5]:
        print(f"  [{idx}]: {tokens}")

Using fresh state (no prior processing)
INTERACTIVE QUERY via ask()

Response:
Query: prompt_1
Commit: 277db1e6
Results (5 found):
   1. [ 97.0] ('recompiling",', 'name);', 'return')
   2. [ 87.0] ('argnames,', 'kwds2,', 'values,')
   3. [ 79.0] ('1;', 'if', '(likely((padding_length')
   4. [ 69.0] ('py_none', '#')
   5. [ 65.0] ('class', 'hllcore:')

Disambiguation results: 5 items
Commits after query: 2

Resolved tokens (sample):
  [2466]: ['{(2, (\'suboffset\', \'in\', \'suboffsets[:ndim]:\')), (2, (\'return\', \'__pyx_r;\', \'}\')), (1, (\'/*\', \'"../../../../.cache/uv/builds-v0/.tmpvvcwqg/lib/python3.13/site-packages/numpy/__init__.cython-30.pxd":790\')), (1, (\'xor\', \'(a\')), (1, (\'"bird"],\', \'["red",\')), (1, (\'whose\', \'morphisms\')), (2, (\'}\', \'/*\', \'"view.memoryview":896\')), (1, (\'__pyx_v_itemp\', \'=\')), (2, (\'values,\', \'kwd_pos_args,\', \'__pyx_kwds_len,\')), (1, (\'new\', \'is\')), (1, (\'for\', \'(i=code_cache->count;\')), (1, (\'py_xincref(module);\', 

In [13]:
# ═══════════════════════════════════════════════════════════════
# Multiple Queries - Manifold Learns from Each
# ═══════════════════════════════════════════════════════════════

test_queries = [
    "import numpy as np",
    "class FractalManifold:",
    "def process_file(path):",
    "HLLSet.union(other)",
    "cascading_disambiguate(query_indices)",
]

print("=" * 60)
print("SEQUENTIAL QUERIES - MANIFOLD LEARNS")
print("=" * 60)

for q in test_queries:
    edges_before = query_ctx.hrt.nnz
    
    response, results = ask(q, query_ctx, top_k=3, learn=True)
    
    edges_after = query_ctx.hrt.nnz
    
    print(f"\nQuery: '{q}'")
    print(f"  Edges: {edges_before} → {edges_after} (+{edges_after - edges_before})")
    print(f"  Results: {len(results)} disambiguation items")

print("\n" + "=" * 60)
print(f"COMMIT HISTORY ({len(query_ctx.store)} commits)")
print("=" * 60)
from datetime import datetime
for commit in query_ctx.store.log(limit=8):
    ts = datetime.fromtimestamp(commit.timestamp).strftime("%H:%M:%S")
    source_preview = commit.source[:35] + "..." if len(commit.source) > 35 else commit.source
    print(f"  [{ts}] {commit.perceptron}: {source_preview}")

SEQUENTIAL QUERIES - MANIFOLD LEARNS

Query: 'import numpy as np'
  Edges: 194389 → 194428 (+39)
  Results: 3 disambiguation items

Query: 'class FractalManifold:'
  Edges: 194428 → 194473 (+45)
  Results: 3 disambiguation items

Query: 'def process_file(path):'
  Edges: 194473 → 194490 (+17)
  Results: 3 disambiguation items

Query: 'HLLSet.union(other)'
  Edges: 194490 → 194504 (+14)
  Results: 3 disambiguation items

Query: 'cascading_disambiguate(query_indices)'
  Edges: 194504 → 194517 (+13)
  Results: 2 disambiguation items

COMMIT HISTORY (12 commits)
  [15:34:19] p_prompt: prompt_3
  [15:34:23] a_response: response_3
  [15:34:26] p_prompt: prompt_4
  [15:34:30] a_response: response_4
  [15:34:33] p_prompt: prompt_5
  [15:34:37] a_response: response_5
  [15:34:39] p_prompt: prompt_6
  [15:34:43] a_response: response_6


## Summary: Consolidated Manifold Algebra

All processing logic is now in `core/manifold_algebra.py`:

```python
# Before (notebook code):
new_hrt, new_W, commit = prompt_perceptron.process_prompt(...)
response, final_hrt, final_W = response_actuator.act(...)
# ... 50+ lines of wiring code

# After (using module):
from core.manifold_algebra import ask, create_query_context

ctx = create_query_context(config)
response, results = ask("your query", ctx)
```

### Key Classes Added to `manifold_algebra.py`:

| Class | Purpose |
|-------|---------|
| `CommitStore` | Track processing history with rollback |
| `Commit` | Single timestamped processing step |
| `Perceptron` | Base class for sensing (file/prompt → HLLSet) |
| `PromptPerceptron` | Process user queries |
| `Actuator` | Base class for actions |
| `ResponseActuator` | Generate response + feedback loop |
| `QueryContext` | Mutable state for queries |
| `ask()` | One-line interactive query |
| `create_query_context()` | Initialize fresh context |

### Benefits:

1. **Cleaner notebook code** - Just import and use
2. **Reusable across notebooks** - Same module works everywhere
3. **Testable** - Module can be unit tested
4. **Documented** - Docstrings in one place
5. **Evolvable** - Improve tokenization without changing notebooks