# Manifold Algebra + Manifold OS Integration

## Goals

1. **Sync mf_algebra and mf_os**: Create a unified architecture where:
   - `ManifoldOS` is the orchestration layer (git-like operations, persistence)
   - `mf_algebra` provides the algebraic operations on HRT structures

2. **DuckDB Backend**: Use DuckDB as the persistent store for:
   - HRT state (AM, Lattice)
   - LUT (Lookup Table)
   - Commits (versioning)

3. **Simple .txt Input**: Work with normal text documents

## Architecture

```
┌─────────────────────────────────────────────────────────────────────────┐
│                          ManifoldOS (mf_os)                             │
│   - Git-like operations: commit, push, pull, rollback                   │
│   - Orchestrates processing pipeline                                    │
│   - Manages DuckDB persistence                                          │
├─────────────────────────────────────────────────────────────────────────┤
│                       ManifoldAlgebra (mf_algebra)                      │
│   - SparseHRT3D (AM + Lattice + LUT)                                    │
│   - Unified Processing Pipeline                                         │
│   - Perceptrons & Actuators                                             │
│   - QueryContext & ask()                                                │
├─────────────────────────────────────────────────────────────────────────┤
│                        DuckDB Store Layer                               │
│   - Tables: commits, blobs, lut_entries, refs                           │
│   - Content-addressed storage                                           │
│   - Efficient querying                                                  │
└─────────────────────────────────────────────────────────────────────────┘
```

## 1. Setup & Imports

In [1]:
import sys
from pathlib import Path
import importlib

# Ensure core is importable
PROJECT_ROOT = Path.cwd()
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print(f"Project root: {PROJECT_ROOT}")

# Check DuckDB availability
try:
    import duckdb
    print(f"DuckDB version: {duckdb.__version__}")
except ImportError:
    print("Installing DuckDB...")
    !pip install duckdb
    import duckdb
    print(f"DuckDB version: {duckdb.__version__}")

# Reload core modules to pick up any changes
import core.mf_algebra
import core.mf_os
importlib.reload(core.mf_algebra)
importlib.reload(core.mf_os)
print("Core modules reloaded")

Project root: /home/alexmy/SGS/SGS_lib/fractal_manifold/fractal_manifold
DuckDB version: 1.4.4
Core modules reloaded


In [2]:
# Import core modules
from core.sparse_hrt_3d import (
    Sparse3DConfig,
    SparseHRT3D,
    SparseAM3D,
    SparseLattice3D,
    Edge3D,
)
from core.mf_algebra import (
    # LUT
    LookupTable,
    START, END,
    
    # Processing pipeline
    unified_process,
    build_w_from_am,
    tokenize,
    generate_ntokens,
    
    # Query interface
    QueryContext,
    create_query_context,
    ask,
    
    # Perceptrons & Actuators
    Perceptron,
    PromptPerceptron,
    ResponseActuator,
    
    # Commit store
    Commit,
    CommitStore,
)
from core.hllset import HLLSet, DEFAULT_HASH_CONFIG, REGISTER_DTYPE
from core.duckdb_store import DuckDBStore
from core.constants import P_BITS

print("Core modules imported successfully")
print(f"Default hash config: {DEFAULT_HASH_CONFIG}")
print(f"Register dtype: {REGISTER_DTYPE}")

Core modules imported successfully
Default hash config: HashConfig(hash_type=<HashType.MURMUR3: 'murmur3'>, p_bits=10, seed=42, h_bits=64)
Register dtype: <class 'numpy.uint32'>


## 2. Test DuckDB Store (from core.duckdb_store)

The `DuckDBStore` is now imported from `core/duckdb_store.py`. Let's test it.

In [3]:
# Test DuckDB store (imported from core.duckdb_store)
store = DuckDBStore(":memory:")
print(f"DuckDB store initialized: {store.stats()}")

# Show available methods
print("\nDuckDBStore methods:")
for method in dir(store):
    if not method.startswith('_'):
        print(f"  - {method}")

DuckDB store initialized: {'commits': 0, 'blobs': 0, 'lut_entries': 0, 'refs': 0, 'blob_types': {}, 'db_path': ':memory:'}

DuckDBStore methods:
  - checkout
  - close
  - commit
  - conn
  - create_branch
  - db_path
  - delete_branch
  - delete_ref
  - export_commits
  - export_lut
  - fetch_blob
  - fetch_hrt
  - fetch_layer_hllsets
  - fetch_w
  - find_commits_by_source
  - find_common_ancestor
  - get_branch_commit
  - get_commit
  - get_commits_since
  - get_current_branch
  - get_head
  - get_ntoken_index
  - get_ntokens_at_index
  - get_ref
  - has_blob
  - has_commit
  - import_commits
  - import_lut
  - list_branches
  - list_refs
  - load_lut_to_memory
  - log
  - lut_count
  - merge_branch
  - merge_hrts
  - merge_layer_hllsets
  - merge_w_matrices
  - set_ref
  - stats
  - store_blob
  - store_hrt
  - store_layer_hllsets
  - store_lut_entry
  - store_w
  - switch_branch
  - sync_from
  - sync_lut_from_memory
  - sync_to
  - vacuum


## 3. Text File Perceptron

A perceptron for processing plain .txt files.

In [4]:
from pathlib import Path
from typing import Iterator


class TextFilePerceptron(Perceptron):
    """
    Perceptron for plain text files (.txt).
    
    Simple and effective for normal documents.
    """
    
    def __init__(self, config: Sparse3DConfig):
        super().__init__("p_text", [".txt"], config)
    
    def extract_text(self, path: Path) -> str:
        """Read text file content."""
        try:
            return path.read_text(encoding='utf-8')
        except UnicodeDecodeError:
            # Fallback to latin-1 if UTF-8 fails
            return path.read_text(encoding='latin-1')


# Test
config = Sparse3DConfig(p_bits=P_BITS, h_bits=32, max_n=3)
text_perceptron = TextFilePerceptron(config)
print(f"TextFilePerceptron initialized: {text_perceptron.name}")
print(f"Extensions: {text_perceptron.extensions}")

TextFilePerceptron initialized: p_text
Extensions: ['.txt']


## 4. ManifoldOS with DuckDB Backend

The orchestration layer that combines mf_algebra with DuckDB persistence.

## 5. Test with Sample Text

Let's test the integrated system with some sample text.

In [6]:
# Sample text for testing
sample_texts = [
    """
    The quick brown fox jumps over the lazy dog.
    This sentence contains every letter of the alphabet.
    It is often used for testing fonts and keyboards.
    """,
    """
    Machine learning is a subset of artificial intelligence.
    It enables computers to learn from data without explicit programming.
    Deep learning uses neural networks with many layers.
    """,
    """
    The fractal manifold represents knowledge as a geometric structure.
    HyperLogLog sets provide efficient cardinality estimation.
    The adjacency matrix captures relationships between concepts.
    """
]

# Ingest all samples
for i, text in enumerate(sample_texts):
    commit_id = mos.ingest(text, f"sample_{i+1}")
    print(f"Ingested sample {i+1}: commit {commit_id[:8]}")

print(f"\nStatus after ingestion: {mos.status()}")

Ingested sample 1: commit 9bf2a9e0
Ingested sample 2: commit 931d3cdf
Ingested sample 3: commit fc90c87d

Status after ingestion: {'step': 3, 'edges': 215, 'lut_entries': 206, 'layer_hllsets': {'L0': 132, 'L1': 131, 'L2': 134, 'START': 2}, 'head_commit': 'fc90c87d6052849684e8b56c5d6465e0a95952e6', 'current_branch': None, 'store_stats': {'commits': 3, 'blobs': 9, 'lut_entries': 206, 'refs': 1, 'blob_types': {'w_matrix': {'count': 3, 'bytes': 7957}, 'hrt': {'count': 3, 'bytes': 9156}, 'layer_hllsets': {'count': 3, 'bytes': 49494}}, 'db_path': ':memory:'}}


In [7]:
# Test queries
queries = [
    "What is machine learning?",
    "Tell me about the fox",
    "fractal geometry"
]

for q in queries:
    print(f"\n{'='*60}")
    print(f"Query: {q}")
    result = mos.query(q, top_k=5, learn=True)
    print(f"Results ({len(result['results'])} found):")
    for r in result['results']:
        print(f"  [{r['score']:5.1f}] {r['tokens']}")


Query: What is machine learning?
Results (5 found):
  [  6.7] ('the',)
  [  3.0] ('learning', 'is', 'a')
  [  2.0] ('layers.',)
  [  2.0] ('concepts.',)
  [  2.0] ('keyboards.',)

Query: Tell me about the fox
Results (5 found):
  [  2.9] ('machine',)
  [  2.0] ('layers.',)
  [  2.0] ('learning?',)
  [  2.0] ('concepts.',)
  [  2.0] ('keyboards.',)

Query: fractal geometry
Results (5 found):
  [  8.8] ('the',)
  [  3.1] ('machine',)
  [  3.0] ('fox',)
  [  2.0] ('layers.',)
  [  2.0] ('learning?',)


In [8]:
# View commit history
print("Commit History:")
print("-" * 60)
for commit in mos.log(10):
    from datetime import datetime
    ts = datetime.fromtimestamp(commit['timestamp']).strftime('%H:%M:%S')
    print(f"{commit['commit_id'][:8]} | {ts} | step {commit['step_number']:3d} | {commit['source'][:40]}")

Commit History:
------------------------------------------------------------
ac845a1e | 13:58:12 | step   6 | query:fractal geometry
414ee230 | 13:58:12 | step   5 | query:Tell me about the fox
a282c21b | 13:58:11 | step   4 | query:What is machine learning?
fc90c87d | 13:58:11 | step   3 | sample_3
931d3cdf | 13:58:11 | step   2 | sample_2
9bf2a9e0 | 13:58:11 | step   1 | sample_1


In [9]:
# Store stats
print(f"\nStore Statistics:")
stats = mos.store.stats()
for k, v in stats.items():
    print(f"  {k}: {v}")


Store Statistics:
  commits: 6
  blobs: 15
  lut_entries: 225
  refs: 1
  blob_types: {'hrt': {'count': 6, 'bytes': 23550}, 'w_matrix': {'count': 6, 'bytes': 20468}, 'layer_hllsets': {'count': 3, 'bytes': 49494}}
  db_path: :memory:


## 6. File-Based Persistence Test

Test with a file-based DuckDB database.

In [10]:
# Create a file-based store
db_file = PROJECT_ROOT / "data" / "manifold.duckdb"
db_file.parent.mkdir(exist_ok=True)

# Remove if exists (fresh start)
if db_file.exists():
    db_file.unlink()

print(f"Creating persistent store at: {db_file}")

# Create ManifoldOS with file store
mos_persistent = ManifoldOS(str(db_file))

# Ingest sample data
mos_persistent.ingest(sample_texts[0], "sample_1")
mos_persistent.ingest(sample_texts[1], "sample_2")

print(f"Status: {mos_persistent.status()}")

# Close and reopen to test persistence
head_before = mos_persistent._head_commit_id
mos_persistent.close()
print(f"\nClosed store. HEAD was: {head_before[:8]}")

Creating persistent store at: /home/alexmy/SGS/SGS_lib/fractal_manifold/fractal_manifold/data/manifold.duckdb
Status: {'step': 2, 'edges': 152, 'lut_entries': 146, 'layer_hllsets': {'L0': 100, 'L1': 93, 'L2': 96, 'START': 2}, 'head_commit': 'd26e14bbac04fb9e4dfee0f3a70b7f2d1bd284a3', 'current_branch': None, 'store_stats': {'commits': 2, 'blobs': 6, 'lut_entries': 146, 'refs': 1, 'blob_types': {'layer_hllsets': {'count': 2, 'bytes': 32996}, 'w_matrix': {'count': 2, 'bytes': 4118}, 'hrt': {'count': 2, 'bytes': 4758}}, 'db_path': '/home/alexmy/SGS/SGS_lib/fractal_manifold/fractal_manifold/data/manifold.duckdb'}}

Closed store. HEAD was: d26e14bb


In [11]:
# Reopen and verify state
mos_reload = ManifoldOS(str(db_file))
print(f"Reopened store. Status: {mos_reload.status()}")

# Verify LUT was persisted
print(f"\nLUT entries loaded: {len(mos_reload.lut.ntoken_to_index)}")

# Query to verify data integrity
result = mos_reload.query("machine learning", top_k=3, learn=False)
print(f"\nQuery results: {result['results']}")

mos_reload.close()

Reopened store. Status: {'step': 2, 'edges': 152, 'lut_entries': 146, 'layer_hllsets': {'L0': 100, 'L1': 93, 'L2': 96, 'START': 2}, 'head_commit': None, 'current_branch': None, 'store_stats': {'commits': 2, 'blobs': 6, 'lut_entries': 146, 'refs': 1, 'blob_types': {'hrt': {'count': 2, 'bytes': 4758}, 'w_matrix': {'count': 2, 'bytes': 4118}, 'layer_hllsets': {'count': 2, 'bytes': 32996}}, 'db_path': '/home/alexmy/SGS/SGS_lib/fractal_manifold/fractal_manifold/data/manifold.duckdb'}}

LUT entries loaded: 146

Query results: [{'index': 22081, 'tokens': ('the',), 'score': 4.666666746139526}, {'index': 1104, 'tokens': ('layers.',), 'score': 2.0}, {'index': 23230, 'tokens': ('keyboards.',), 'score': 2.0}]


## 7. Summary

We've successfully synced `mf_algebra` and `mf_os` with the following architecture:

1. **DuckDBStore**: Persistent storage backend
   - Content-addressed blob storage (deduplicated)
   - Commit history with parent tracking
   - LUT persistence
   - Ref management (HEAD, branches)

2. **ManifoldOS**: Orchestration layer
   - Uses `unified_process` from `mf_algebra`
   - Git-like versioning (commit, checkout, rollback)
   - Query interface with co-adaptive learning
   - File ingestion via Perceptrons

3. **TextFilePerceptron**: Simple .txt file processing

### Next Steps

- [ ] Move DuckDBStore to `core/duckdb_store.py`
- [ ] Update `manifold_os_iica.py` to use new architecture (or replace)
- [ ] Add branch operations
- [ ] Add merge operations for parallel ingestion
- [ ] Performance optimization for large documents