# Manifold Algebra + Manifold OS Integration

## Goals

1. **Sync mf_algebra and mf_os**: Create a unified architecture where:
   - `ManifoldOS` is the orchestration layer (git-like operations, persistence)
   - `mf_algebra` provides the algebraic operations on HRT structures

2. **DuckDB Backend**: Use DuckDB as the persistent store for:
   - HRT state (AM, Lattice)
   - LUT (Lookup Table)
   - Commits (versioning)

3. **Simple .txt Input**: Work with normal text documents

## Architecture

```
┌─────────────────────────────────────────────────────────────────────────┐
│                          ManifoldOS (mf_os)                             │
│   - Git-like operations: commit, push, pull, rollback                  │
│   - Orchestrates processing pipeline                                    │
│   - Manages DuckDB persistence                                          │
├─────────────────────────────────────────────────────────────────────────┤
│                       ManifoldAlgebra (mf_algebra)                      │
│   - SparseHRT3D (AM + Lattice + LUT)                                   │
│   - Unified Processing Pipeline                                        │
│   - Perceptrons & Actuators                                            │
│   - QueryContext & ask()                                               │
├─────────────────────────────────────────────────────────────────────────┤
│                        DuckDB Store Layer                               │
│   - Tables: commits, blobs, lut_entries, refs                          │
│   - Content-addressed storage                                          │
│   - Efficient querying                                                 │
└─────────────────────────────────────────────────────────────────────────┘
```

## 1. Setup & Imports

In [None]:
import sys
from pathlib import Path

# Ensure core is importable
PROJECT_ROOT = Path.cwd()
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print(f"Project root: {PROJECT_ROOT}")

# Check DuckDB availability
try:
    import duckdb
    print(f"DuckDB version: {duckdb.__version__}")
except ImportError:
    print("Installing DuckDB...")
    !pip install duckdb
    import duckdb
    print(f"DuckDB version: {duckdb.__version__}")

In [None]:
# Import core modules
from core.sparse_hrt_3d import (
    Sparse3DConfig,
    SparseHRT3D,
    SparseAM3D,
    SparseLattice3D,
    Edge3D,
)
from core.mf_algebra import (
    # LUT
    LookupTable,
    START, END,
    
    # Processing pipeline
    unified_process,
    build_w_from_am,
    tokenize,
    generate_ntokens,
    
    # Query interface
    QueryContext,
    create_query_context,
    ask,
    
    # Perceptrons & Actuators
    Perceptron,
    PromptPerceptron,
    ResponseActuator,
    
    # Commit store
    Commit,
    CommitStore,
)
from core.hllset import HLLSet
from core.constants import P_BITS

print("Core modules imported successfully")

## 2. DuckDB Store Implementation

We'll create a DuckDB-backed store that replaces the in-memory `CommitStore` and provides persistent LUT storage.

In [None]:
import duckdb
import json
import pickle
import time
from typing import Dict, List, Optional, Set, Tuple, Any
from dataclasses import dataclass
from core.hllset import compute_sha1


class DuckDBStore:
    """
    DuckDB-backed persistent store for ManifoldOS.
    
    Tables:
    - commits: Version history with HRT snapshots
    - lut_entries: Lookup table entries (token → index mapping)
    - blobs: Content-addressed binary storage
    - refs: Branch references (like Git refs)
    """
    
    def __init__(self, db_path: str = ":memory:"):
        """
        Initialize DuckDB store.
        
        Args:
            db_path: Path to database file, or ":memory:" for in-memory
        """
        self.db_path = db_path
        self.conn = duckdb.connect(db_path)
        self._init_schema()
    
    def _init_schema(self):
        """Create tables if they don't exist."""
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS commits (
                commit_id VARCHAR PRIMARY KEY,
                parent_id VARCHAR,
                timestamp DOUBLE,
                source VARCHAR,
                perceptron VARCHAR,
                step_number INTEGER,
                hrt_blob_id VARCHAR,
                w_blob_id VARCHAR,
                metadata JSON
            )
        """)
        
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS lut_entries (
                idx INTEGER,
                layer INTEGER,
                ntoken VARCHAR,
                ntoken_hash VARCHAR,
                PRIMARY KEY (idx, layer, ntoken)
            )
        """)
        
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS blobs (
                blob_id VARCHAR PRIMARY KEY,
                blob_type VARCHAR,
                data BLOB,
                created_at DOUBLE
            )
        """)
        
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS refs (
                ref_name VARCHAR PRIMARY KEY,
                commit_id VARCHAR,
                updated_at DOUBLE
            )
        """)
        
        # Create indexes for faster lookups
        self.conn.execute("CREATE INDEX IF NOT EXISTS idx_commits_parent ON commits(parent_id)")
        self.conn.execute("CREATE INDEX IF NOT EXISTS idx_lut_ntoken ON lut_entries(ntoken)")
        self.conn.execute("CREATE INDEX IF NOT EXISTS idx_lut_idx ON lut_entries(idx)")
    
    # -------------------------------------------------------------------------
    # Blob Operations (Content-Addressed Storage)
    # -------------------------------------------------------------------------
    
    def store_blob(self, data: bytes, blob_type: str) -> str:
        """Store binary data, return content hash (deduplicated)."""
        blob_id = compute_sha1(data)
        
        # Check if already exists
        exists = self.conn.execute(
            "SELECT 1 FROM blobs WHERE blob_id = ?", [blob_id]
        ).fetchone()
        
        if not exists:
            self.conn.execute(
                "INSERT INTO blobs (blob_id, blob_type, data, created_at) VALUES (?, ?, ?, ?)",
                [blob_id, blob_type, data, time.time()]
            )
        
        return blob_id
    
    def fetch_blob(self, blob_id: str) -> Optional[bytes]:
        """Fetch blob data by ID."""
        result = self.conn.execute(
            "SELECT data FROM blobs WHERE blob_id = ?", [blob_id]
        ).fetchone()
        return result[0] if result else None
    
    # -------------------------------------------------------------------------
    # HRT Serialization
    # -------------------------------------------------------------------------
    
    def store_hrt(self, hrt: SparseHRT3D) -> str:
        """Serialize and store HRT, return blob ID."""
        # Get all edges from the tensor - use edges() method which returns List[Edge3D]
        all_edges = hrt.am.tensor.edges()
        # Convert Edge3D to tuples for serialization
        edge_tuples = [(e.n, e.row, e.col, e.value) for e in all_edges]
        
        data = pickle.dumps({
            'am_edges': edge_tuples,
            'config': {
                'p_bits': hrt.config.p_bits,
                'h_bits': hrt.config.h_bits,
                'max_n': hrt.config.max_n,
                'dimension': hrt.config.dimension,
            },
            'step': hrt.step,
        })
        return self.store_blob(data, 'hrt')
    
    def fetch_hrt(self, blob_id: str, config: Sparse3DConfig) -> Optional[SparseHRT3D]:
        """Fetch and deserialize HRT."""
        data = self.fetch_blob(blob_id)
        if not data:
            return None
        
        obj = pickle.loads(data)
        # Edge tuples are (n, row, col, value)
        edges = [Edge3D(n=e[0], row=e[1], col=e[2], value=e[3]) for e in obj['am_edges']]
        am = SparseAM3D.from_edges(config, edges)
        lattice = SparseLattice3D.from_sparse_am(am)
        
        return SparseHRT3D(
            am=am,
            lattice=lattice,
            config=config,
            lut=frozenset(),
            step=obj['step']
        )
    
    def store_w(self, W: Dict[int, Dict[int, Dict[int, float]]]) -> str:
        """Serialize and store W matrix, return blob ID."""
        data = pickle.dumps(W)
        return self.store_blob(data, 'w_matrix')
    
    def fetch_w(self, blob_id: str) -> Optional[Dict[int, Dict[int, Dict[int, float]]]]:
        """Fetch and deserialize W matrix."""
        data = self.fetch_blob(blob_id)
        return pickle.loads(data) if data else None
    
    # -------------------------------------------------------------------------
    # Commit Operations
    # -------------------------------------------------------------------------
    
    def commit(
        self,
        hrt: SparseHRT3D,
        W: Dict[int, Dict[int, Dict[int, float]]],
        source: str,
        perceptron: str,
        parent_id: Optional[str] = None,
        metadata: Optional[Dict] = None
    ) -> str:
        """Create a new commit with HRT and W state."""
        hrt_blob_id = self.store_hrt(hrt)
        w_blob_id = self.store_w(W)
        
        # Generate commit ID from content
        content = f"{hrt_blob_id}:{w_blob_id}:{time.time()}:{source}"
        commit_id = compute_sha1(content)
        
        self.conn.execute("""
            INSERT INTO commits 
            (commit_id, parent_id, timestamp, source, perceptron, step_number, hrt_blob_id, w_blob_id, metadata)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, [
            commit_id,
            parent_id,
            time.time(),
            source,
            perceptron,
            hrt.step,
            hrt_blob_id,
            w_blob_id,
            json.dumps(metadata or {})
        ])
        
        return commit_id
    
    def get_commit(self, commit_id: str) -> Optional[Dict]:
        """Get commit metadata by ID."""
        result = self.conn.execute(
            "SELECT * FROM commits WHERE commit_id = ?", [commit_id]
        ).fetchone()
        
        if not result:
            return None
        
        return {
            'commit_id': result[0],
            'parent_id': result[1],
            'timestamp': result[2],
            'source': result[3],
            'perceptron': result[4],
            'step_number': result[5],
            'hrt_blob_id': result[6],
            'w_blob_id': result[7],
            'metadata': json.loads(result[8]) if result[8] else {}
        }
    
    def checkout(self, commit_id: str, config: Sparse3DConfig) -> Tuple[Optional[SparseHRT3D], Optional[Dict]]:
        """Checkout HRT and W from a commit."""
        commit = self.get_commit(commit_id)
        if not commit:
            return None, None
        
        hrt = self.fetch_hrt(commit['hrt_blob_id'], config)
        W = self.fetch_w(commit['w_blob_id'])
        
        return hrt, W
    
    # -------------------------------------------------------------------------
    # Refs (Branch Management)
    # -------------------------------------------------------------------------
    
    def set_ref(self, ref_name: str, commit_id: str):
        """Set a ref to point to a commit."""
        self.conn.execute("""
            INSERT OR REPLACE INTO refs (ref_name, commit_id, updated_at)
            VALUES (?, ?, ?)
        """, [ref_name, commit_id, time.time()])
    
    def get_ref(self, ref_name: str) -> Optional[str]:
        """Get commit ID for a ref."""
        result = self.conn.execute(
            "SELECT commit_id FROM refs WHERE ref_name = ?", [ref_name]
        ).fetchone()
        return result[0] if result else None
    
    def get_head(self, config: Sparse3DConfig) -> Tuple[Optional[SparseHRT3D], Optional[Dict]]:
        """Get the current HEAD state."""
        head_commit = self.get_ref('HEAD')
        if not head_commit:
            return None, None
        return self.checkout(head_commit, config)
    
    # -------------------------------------------------------------------------
    # LUT Operations
    # -------------------------------------------------------------------------
    
    def store_lut_entry(self, idx: int, layer: int, ntoken: Tuple[str, ...]):
        """Store a LUT entry."""
        ntoken_str = " ".join(ntoken)
        ntoken_hash = compute_sha1(ntoken_str)
        
        self.conn.execute("""
            INSERT OR IGNORE INTO lut_entries (idx, layer, ntoken, ntoken_hash)
            VALUES (?, ?, ?, ?)
        """, [idx, layer, ntoken_str, ntoken_hash])
    
    def get_ntokens_at_index(self, idx: int) -> Set[Tuple[int, Tuple[str, ...]]]:
        """Get all (layer, ntoken) pairs at an index."""
        results = self.conn.execute(
            "SELECT layer, ntoken FROM lut_entries WHERE idx = ?", [idx]
        ).fetchall()
        
        return {(row[0], tuple(row[1].split())) for row in results}
    
    def get_ntoken_index(self, ntoken: Tuple[str, ...]) -> Optional[int]:
        """Get index for an ntoken."""
        ntoken_str = " ".join(ntoken)
        result = self.conn.execute(
            "SELECT idx FROM lut_entries WHERE ntoken = ? LIMIT 1", [ntoken_str]
        ).fetchone()
        return result[0] if result else None
    
    def sync_lut_from_memory(self, lut: LookupTable):
        """Sync in-memory LUT to DuckDB."""
        for idx, entries in lut.index_to_ntokens.items():
            for layer, ntoken in entries:
                self.store_lut_entry(idx, layer, ntoken)
    
    def load_lut_to_memory(self, config: Sparse3DConfig) -> LookupTable:
        """Load LUT from DuckDB to memory."""
        lut = LookupTable(config=config)
        
        results = self.conn.execute(
            "SELECT idx, layer, ntoken FROM lut_entries"
        ).fetchall()
        
        for idx, layer, ntoken_str in results:
            ntoken = tuple(ntoken_str.split())
            lut.index_to_ntokens[idx].add((layer, ntoken))
            lut.ntoken_to_index[ntoken] = idx
        
        return lut
    
    # -------------------------------------------------------------------------
    # History & Stats
    # -------------------------------------------------------------------------
    
    def log(self, limit: int = 10) -> List[Dict]:
        """Get recent commits."""
        results = self.conn.execute("""
            SELECT commit_id, parent_id, timestamp, source, perceptron, step_number
            FROM commits
            ORDER BY timestamp DESC
            LIMIT ?
        """, [limit]).fetchall()
        
        return [{
            'commit_id': r[0],
            'parent_id': r[1],
            'timestamp': r[2],
            'source': r[3],
            'perceptron': r[4],
            'step_number': r[5]
        } for r in results]
    
    def stats(self) -> Dict[str, int]:
        """Get store statistics."""
        commits = self.conn.execute("SELECT COUNT(*) FROM commits").fetchone()[0]
        blobs = self.conn.execute("SELECT COUNT(*) FROM blobs").fetchone()[0]
        lut_entries = self.conn.execute("SELECT COUNT(*) FROM lut_entries").fetchone()[0]
        
        return {
            'commits': commits,
            'blobs': blobs,
            'lut_entries': lut_entries
        }
    
    def close(self):
        """Close the database connection."""
        self.conn.close()


# Test DuckDB store
store = DuckDBStore(":memory:")
print(f"DuckDB store initialized: {store.stats()}")

## 3. Text File Perceptron

A perceptron for processing plain .txt files.

In [None]:
from pathlib import Path
from typing import Iterator


class TextFilePerceptron(Perceptron):
    """
    Perceptron for plain text files (.txt).
    
    Simple and effective for normal documents.
    """
    
    def __init__(self, config: Sparse3DConfig):
        super().__init__("p_text", [".txt"], config)
    
    def extract_text(self, path: Path) -> str:
        """Read text file content."""
        try:
            return path.read_text(encoding='utf-8')
        except UnicodeDecodeError:
            # Fallback to latin-1 if UTF-8 fails
            return path.read_text(encoding='latin-1')


# Test
config = Sparse3DConfig(p_bits=P_BITS, h_bits=32, max_n=3)
text_perceptron = TextFilePerceptron(config)
print(f"TextFilePerceptron initialized: {text_perceptron.name}")
print(f"Extensions: {text_perceptron.extensions}")

## 4. ManifoldOS with DuckDB Backend

The orchestration layer that combines mf_algebra with DuckDB persistence.

In [None]:
from dataclasses import dataclass, field
from typing import Optional, List, Dict, Tuple, Any


@dataclass
class ManifoldOSConfig:
    """Configuration for ManifoldOS."""
    p_bits: int = 10
    h_bits: int = 32
    max_n: int = 3
    auto_commit: bool = True
    auto_sync_lut: bool = True
    
    @property
    def sparse_config(self) -> Sparse3DConfig:
        return Sparse3DConfig(
            p_bits=self.p_bits,
            h_bits=self.h_bits,
            max_n=self.max_n
        )


class ManifoldOS:
    """
    ManifoldOS - Orchestration layer for Manifold operations.
    
    Features:
    - Git-like versioning (commit, checkout, rollback)
    - DuckDB persistence for HRT, W, and LUT
    - Unified processing pipeline from mf_algebra
    - Perceptron/Actuator architecture
    """
    
    def __init__(
        self,
        db_path: str = ":memory:",
        config: Optional[ManifoldOSConfig] = None
    ):
        self.config = config or ManifoldOSConfig()
        self.sparse_config = self.config.sparse_config
        
        # Initialize DuckDB store
        self.store = DuckDBStore(db_path)
        
        # Initialize in-memory state
        self._init_state()
        
        # Track current commit
        self._head_commit_id: Optional[str] = None
    
    def _init_state(self):
        """Initialize empty state or load from store."""
        # Try to load HEAD from store
        hrt, W = self.store.get_head(self.sparse_config)
        
        if hrt is not None and W is not None:
            self.hrt = hrt
            self.W = W
            self.lut = self.store.load_lut_to_memory(self.sparse_config)
            print(f"Loaded existing state from store (step {self.hrt.step})")
        else:
            # Create empty state
            empty_am = SparseAM3D.from_edges(self.sparse_config, [])
            empty_lattice = SparseLattice3D.from_sparse_am(empty_am)
            self.hrt = SparseHRT3D(
                am=empty_am,
                lattice=empty_lattice,
                config=self.sparse_config,
                lut=frozenset(),
                step=0
            )
            self.W: Dict[int, Dict[int, Dict[int, float]]] = {n: {} for n in range(self.config.max_n)}
            self.lut = LookupTable(config=self.sparse_config)
            self.lut.add_ntoken(START)
            self.lut.add_ntoken(END)
            print("Initialized empty state")
    
    # -------------------------------------------------------------------------
    # Ingestion
    # -------------------------------------------------------------------------
    
    def ingest(self, text: str, source: str = "input", perceptron: str = "manual") -> str:
        """
        Ingest text into the manifold.
        
        Args:
            text: Text content to ingest
            source: Source identifier (e.g., filename)
            perceptron: Perceptron name that processed this
        
        Returns:
            commit_id if auto_commit is True
        """
        if not text.strip():
            return ""
        
        # Process through unified pipeline
        result = unified_process(
            text,
            self.hrt,
            self.W,
            self.sparse_config,
            self.lut,
            self.config.max_n
        )
        
        # Update state
        self.hrt = result.merged_hrt
        self.W = build_w_from_am(self.hrt.am, self.sparse_config)
        
        # Auto-commit
        commit_id = ""
        if self.config.auto_commit:
            commit_id = self.commit(source, perceptron)
        
        return commit_id
    
    def ingest_file(self, path: Path, perceptron: Optional[Perceptron] = None) -> str:
        """
        Ingest a file using appropriate perceptron.
        """
        if perceptron is None:
            # Use TextFilePerceptron as default
            perceptron = TextFilePerceptron(self.sparse_config)
            perceptron.initialize(self.lut)
        
        text = perceptron.extract_text(path)
        return self.ingest(text, str(path), perceptron.name)
    
    # -------------------------------------------------------------------------
    # Git-like Operations
    # -------------------------------------------------------------------------
    
    def commit(self, source: str, perceptron: str, message: str = "") -> str:
        """
        Create a new commit with current state.
        """
        commit_id = self.store.commit(
            self.hrt,
            self.W,
            source,
            perceptron,
            parent_id=self._head_commit_id,
            metadata={'message': message} if message else None
        )
        
        self._head_commit_id = commit_id
        self.store.set_ref('HEAD', commit_id)
        
        # Sync LUT to store
        if self.config.auto_sync_lut:
            self.store.sync_lut_from_memory(self.lut)
        
        return commit_id
    
    def checkout(self, commit_id: str) -> bool:
        """
        Checkout a specific commit.
        """
        hrt, W = self.store.checkout(commit_id, self.sparse_config)
        if hrt is None:
            return False
        
        self.hrt = hrt
        self.W = W
        self._head_commit_id = commit_id
        return True
    
    def rollback(self, steps: int = 1) -> bool:
        """
        Rollback to a previous commit.
        """
        commit = self.store.get_commit(self._head_commit_id)
        
        for _ in range(steps):
            if commit is None or commit['parent_id'] is None:
                return False
            commit = self.store.get_commit(commit['parent_id'])
        
        if commit:
            return self.checkout(commit['commit_id'])
        return False
    
    def log(self, limit: int = 10) -> List[Dict]:
        """Get commit history."""
        return self.store.log(limit)
    
    # -------------------------------------------------------------------------
    # Query Interface
    # -------------------------------------------------------------------------
    
    def query(self, prompt: str, top_k: int = 10, learn: bool = True) -> Dict[str, Any]:
        """
        Query the manifold.
        
        Args:
            prompt: Query text
            top_k: Number of results to return
            learn: If True, the query is also ingested (co-adaptive learning)
        
        Returns:
            Query results with tokens and scores
        """
        from core.manifold_algebra import (
            reachable_from, project_layer, Sparse3DMatrix,
            cascading_disambiguate, LayerHLLSets
        )
        
        # Process query (same as ingestion!)
        result = unified_process(
            prompt,
            self.hrt,
            self.W,
            self.sparse_config,
            self.lut,
            self.config.max_n
        )
        
        # Get query indices
        query_indices = set()
        for basic in result.input_basics:
            query_indices.add(basic.to_index(self.sparse_config))
        
        # Find reachable concepts
        AM = Sparse3DMatrix.from_am(self.hrt.am, self.sparse_config)
        layer0 = project_layer(AM, 0)
        reachable = reachable_from(layer0, query_indices, hops=1)
        
        # Score by connectivity
        layer0_dict = layer0.to_dict()
        scores = {}
        for idx in reachable:
            if idx in layer0_dict:
                scores[idx] = sum(layer0_dict[idx].values())
        
        # Get top-k results with token resolution
        top = sorted(scores.items(), key=lambda x: -x[1])[:top_k]
        results = []
        for idx, score in top:
            ntokens = self.lut.index_to_ntokens.get(idx, set())
            if ntokens:
                _, ntoken = next(iter(ntokens))
                results.append({'index': idx, 'tokens': ntoken, 'score': score})
            else:
                results.append({'index': idx, 'tokens': f'<idx:{idx}>', 'score': score})
        
        # Learn from query if enabled
        if learn:
            self.hrt = result.merged_hrt
            self.W = build_w_from_am(self.hrt.am, self.sparse_config)
            if self.config.auto_commit:
                self.commit(f"query:{prompt[:50]}", "p_query")
        
        return {
            'prompt': prompt,
            'query_indices': list(query_indices),
            'results': results,
            'edges_added': len(result.context_edges)
        }
    
    # -------------------------------------------------------------------------
    # Status & Info
    # -------------------------------------------------------------------------
    
    def status(self) -> Dict[str, Any]:
        """Get current status."""
        return {
            'step': self.hrt.step,
            'edges': self.hrt.nnz,
            'lut_entries': len(self.lut.ntoken_to_index),
            'head_commit': self._head_commit_id,
            'store_stats': self.store.stats()
        }
    
    def close(self):
        """Close store connection."""
        self.store.close()
    
    def __enter__(self):
        return self
    
    def __exit__(self, *args):
        self.close()


# Test ManifoldOS
mos = ManifoldOS(":memory:")
print(f"ManifoldOS status: {mos.status()}")

## 5. Test with Sample Text

Let's test the integrated system with some sample text.

In [None]:
# Sample text for testing
sample_texts = [
    """
    The quick brown fox jumps over the lazy dog.
    This sentence contains every letter of the alphabet.
    It is often used for testing fonts and keyboards.
    """,
    """
    Machine learning is a subset of artificial intelligence.
    It enables computers to learn from data without explicit programming.
    Deep learning uses neural networks with many layers.
    """,
    """
    The fractal manifold represents knowledge as a geometric structure.
    HyperLogLog sets provide efficient cardinality estimation.
    The adjacency matrix captures relationships between concepts.
    """
]

# Ingest all samples
for i, text in enumerate(sample_texts):
    commit_id = mos.ingest(text, f"sample_{i+1}")
    print(f"Ingested sample {i+1}: commit {commit_id[:8]}")

print(f"\nStatus after ingestion: {mos.status()}")

In [None]:
# Test queries
queries = [
    "What is machine learning?",
    "Tell me about the fox",
    "fractal geometry"
]

for q in queries:
    print(f"\n{'='*60}")
    print(f"Query: {q}")
    result = mos.query(q, top_k=5, learn=True)
    print(f"Results ({len(result['results'])} found):")
    for r in result['results']:
        print(f"  [{r['score']:5.1f}] {r['tokens']}")

In [None]:
# View commit history
print("Commit History:")
print("-" * 60)
for commit in mos.log(10):
    from datetime import datetime
    ts = datetime.fromtimestamp(commit['timestamp']).strftime('%H:%M:%S')
    print(f"{commit['commit_id'][:8]} | {ts} | step {commit['step_number']:3d} | {commit['source'][:40]}")

In [None]:
# Store stats
print(f"\nStore Statistics:")
stats = mos.store.stats()
for k, v in stats.items():
    print(f"  {k}: {v}")

## 6. File-Based Persistence Test

Test with a file-based DuckDB database.

In [None]:
# Create a file-based store
db_file = PROJECT_ROOT / "data" / "manifold.duckdb"
db_file.parent.mkdir(exist_ok=True)

# Remove if exists (fresh start)
if db_file.exists():
    db_file.unlink()

print(f"Creating persistent store at: {db_file}")

# Create ManifoldOS with file store
mos_persistent = ManifoldOS(str(db_file))

# Ingest sample data
mos_persistent.ingest(sample_texts[0], "sample_1")
mos_persistent.ingest(sample_texts[1], "sample_2")

print(f"Status: {mos_persistent.status()}")

# Close and reopen to test persistence
head_before = mos_persistent._head_commit_id
mos_persistent.close()
print(f"\nClosed store. HEAD was: {head_before[:8]}")

In [None]:
# Reopen and verify state
mos_reload = ManifoldOS(str(db_file))
print(f"Reopened store. Status: {mos_reload.status()}")

# Verify LUT was persisted
print(f"\nLUT entries loaded: {len(mos_reload.lut.ntoken_to_index)}")

# Query to verify data integrity
result = mos_reload.query("machine learning", top_k=3, learn=False)
print(f"\nQuery results: {result['results']}")

mos_reload.close()

## 7. Summary

We've successfully synced `mf_algebra` and `mf_os` with the following architecture:

1. **DuckDBStore**: Persistent storage backend
   - Content-addressed blob storage (deduplicated)
   - Commit history with parent tracking
   - LUT persistence
   - Ref management (HEAD, branches)

2. **ManifoldOS**: Orchestration layer
   - Uses `unified_process` from `mf_algebra`
   - Git-like versioning (commit, checkout, rollback)
   - Query interface with co-adaptive learning
   - File ingestion via Perceptrons

3. **TextFilePerceptron**: Simple .txt file processing

### Next Steps

- [ ] Move DuckDBStore to `core/duckdb_store.py`
- [ ] Update `manifold_os_iica.py` to use new architecture (or replace)
- [ ] Add branch operations
- [ ] Add merge operations for parallel ingestion
- [ ] Performance optimization for large documents