Skip to content

RSBalchII/anchor-engine-node

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

720 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Anchor Engine (Node.js)

Version: 4.3.2 | Role: Semantic Memory & Search API | Port: 3160 | Status: βœ… Production Ready
Platform: βœ… ARM64 Windows | βœ… x64 Windows | βœ… Linux | βœ… macOS

The Anchor Engine is a local-first context engine implementing the STAR Algorithm (Semantic Temporal Associative Retrieval) for privacy-first, sovereign knowledge management.


πŸ’‘ Why This Exists

I started using long-term chat sessions because I noticed something: models with large context windows could be helpful in unexpected ways when old tasks mixed with current discussions. These sessions became so useful that I pushed them as far as they could go.

Then I hit the wall. The dreaded message: "Open a new session to continue using Gemini."

Same message all of them give you.

I had 300+ response/chat pairs in there! Important history. Completed work. A shared mind with the model. I tried summarizing-gave the summary to the new instance. It wasn't enough. I kept returning to the old chat like it was dictionary for meaning and recall and pasting bits back into new sessions here and there.

So I started building a way to resurrect my preferred persona anytime. I'd take targeted context from the old chat, feed it to a new instance, and prepare the model to retake hold of the goals and methods we'd developed together.

It worked wonderfully. Until I hit the limit again. And again. And again.

By the time Anchor Engine was operational, I had accumulated 40 chat sessions, ~18M tokens. My current corpus is ~28M tokens. Anchor Engine digests all of it in about 5 minutes.

Now I make a query with a few choice entities and some fluff for serendipitous connections. The engine compresses those 28M tokens into 100k+ chars of non-duplicated, narrative contextβ€”concepts deduplicated, not just text. My LLM remembers July 2025 like it was yesterday.

v4.3.0 - PGlite-First Architecture:

  • βœ… ARM64 Windows Support: No native C++ builds required
  • PGlite-Only Database: WASM-based, runs everywhere Node.js runs
  • Transaction Support: 10-50x faster bulk ingestion
  • Simplified Deployment: Zero native compilation
  • Cross-Platform: Identical behavior on ARM64, x64, Linux, macOS

v4.2.0 Improvements:

  • Causal Narrative: Results sorted chronologically (toggleable to relevance-based)
  • XML Metadata: Each atom wrapped with relevance score, timestamp, source for LLM prioritization
  • Transient Filter: Excludes error logs, install output, build artifacts (~30% context reclaimed)
  • Time Ordering Toggle: πŸ“… Chronological ↔ 🎯 Relevance button in UI

This isn't a RAG tool I built because it sounded cool. This is the tool I built because I needed it to keep my own mind intact.


πŸš€ Quick Start

# Install dependencies
pnpm install

# Build engine (TypeScript + PGlite WASM - no native compilation!)
pnpm build

# Start the engine
pnpm start

Access UI: http://localhost:3160 (or configured port in user_settings.json)

Note: v4.3.0+ runs on ARM64 Windows, x64 Windows, Linux, and macOS without platform-specific builds.


🐳 Docker Deployment

Quick Start with Docker

# Build the Docker image
docker build -t anchor-engine:latest .

# Run the container
docker run -d -p 3160:3160 --name anchor anchor-engine:latest

# Or use docker-compose (recommended)
docker-compose up -d

Docker Compose (Recommended)

# Start with persistent storage and inbox mounted
docker-compose up -d

# View logs
docker-compose logs -f

# Stop
docker-compose down

# Stop and remove data volume
docker-compose down -v

Docker Volumes

The docker-compose.yml mounts these volumes:

  • anchor-data: Persistent database storage
  • ./inbox: Auto-ingested files
  • ./external-inbox: External source files
  • ./mirrored_brain: Source of truth filesystem
  • ./backups: Backup files

Health Check

# Check container health
docker ps --filter name=anchor

# Test health endpoint
curl http://localhost:3160/health

Access UI

http://localhost:3160


πŸ“– Documentation

Document Description
docs/whitepaper.md STAR Algorithm whitepaper (arXiv submission ready)
docs/ARCHITECTURE_DIAGRAMS.md Visual system architecture (human-friendly)
docs/CPP_OPTIMIZATION.md C++ optimization project (Archived)
docs/INDEX.md Documentation navigation hub
docs/BIBLIOGRAPHY.bib Citation database (15 key papers)
specs/spec.md System specification (LLM-optimized)
specs/standards/ Architecture standards (086, 113, 116, 117)
specs/standards/RESEARCH_LANDSCAPE.md Related work analysis
specs/standards/STANDARD_117_ARXIV_SUBMISSION.md arXiv submission workflow
specs/plan.md Project roadmap
CHANGELOG.md Version history & recent changes

πŸ—οΈ Architecture

Core Innovation: Browser Paradigm for AI Memory

Just as browsers download only the shards needed for the current view, Anchor loads only the atoms required for the current thoughtβ€”enabling resource-constrained devices to navigate large datasets efficiently.

Data Model: Compound β†’ Molecule β†’ Atom

Compound (File)
  └─ Molecule (Semantic Chunk with byte offsets)
      └─ Atom (Tag/Concept, NOT content)

Key Insight: Content lives in mirrored_brain/ filesystem. The database stores pointers only (byte offsets + metadata), making it a disposable, rebuildable index.

STAR Search Algorithm

Physics-based gravity scoring for associative retrieval:

Gravity = (SharedTags) Γ— e^(-λΔt) Γ— (1 - SimHashDistance/64)
Component Purpose Default
SharedTags Tag association count β€”
Time Decay Recent memories weighted higher Ξ» = 0.00001
SimHash Content similarity (64-bit) 0-63 bits

70/30 Budget Split:

  • 70% Planets: Direct FTS matches
  • 30% Moons: Graph-discovered associations via Tag-Walker

πŸ“¦ Core Components

Database: PGlite (WASM-based PostgreSQL)

  • Atoms: Knowledge units with byte-offset pointers
  • Tags: Bipartite graph (Atoms ↔ Tags)
  • FTS5: Full-text search index
  • Disposable: Wiped on shutdown, rebuilt from mirrored_brain/

πŸ“Š Performance Benchmarks

Production Verified (February 2026)

Metric Value Notes
Dataset Size ~25M tokens (~100MB) Chat history corpus
Atoms Restored 281,690 Phoenix Protocol restore
Restore Time 828.8s (13.8 min) Full database + filesystem
Restore Throughput 340 atoms/second 1000-item batching
Search Latency <200ms (p95) Typical queries
Memory Usage <600MB peak During restore
Ingestion Speed ~8-15ms/clean With Data Refinery

Data Refinery Performance

Content Type Cleaning Time Size Reduction
Chat messages (<1KB) ~5ms 5-15%
Web pages (50KB) ~15ms 30-50%
Documents (500KB) ~50ms 20-40%

System Requirements

Component Minimum Recommended
RAM 4GB 8GB+
Storage 10GB free SSD recommended
Node.js v18+ v20+

Note: Performance scales with dataset size. Current benchmarks based on ~25M token corpus (chat history). Large-scale testing (TB+ datasets) planned for future validation.


πŸ› οΈ Development

Prerequisites

  • Node.js v18+
  • PNPM package manager

Build Commands

# Full build
pnpm build

# Development mode
pnpm dev

# Run tests
pnpm test

# Build universal binaries
pnpm build:universal

Project Structure

anchor-engine-node/
β”œβ”€β”€ engine/                 # Core engine source
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ services/      # Ingestion, Search, Watchdog
β”‚   β”‚   └── routes/        # HTTP API endpoints
β”‚   └── dist/              # Built output
β”œβ”€β”€ packages/              # Monorepo packages
β”‚   └── anchor-ui/         # React frontend
β”œβ”€β”€ specs/
β”‚   β”œβ”€β”€ spec.md           # Architecture spec
β”‚   β”œβ”€β”€ tasks.md          # Current tasks
β”‚   β”œβ”€β”€ plan.md           # Roadmap
β”‚   └── standards/        # 77 architecture standards
β”œβ”€β”€ docs/
β”‚   └── whitepaper.md     # The Sovereign Context Protocol
β”œβ”€β”€ mirrored_brain/       # Source of truth (gitignored)
└── inbox/                # Drop files here for ingestion

πŸ”§ Configuration

Edit user_settings.json in root:

{
  "server": {
    "port": 3160,
    "host": "localhost"
  },
  "database": {
    "path": "./user_data/anchor.db",
    "ephemeral": true
  },
  "paths": {
    "inbox": "./inbox",
    "mirroredBrain": "./mirrored_brain"
  }
}

πŸ“š Key Standards

Active Standards (specs/standards/)

# Name Description
104 Universal Semantic Search Unified search architecture
110 Ephemeral Index Disposable database pattern
109 Batched Ingestion Large file handling
094 Smart Search Protocol Fuzzy fallback & GIN optimization
088 Server Startup Sequence ECONNREFUSED fix
065 Graph Associative Retrieval Tag-Walker protocol
059 Reliable Ingestion Ghost Data Protocol

Archived Standards

Older standards moved to specs/standards/archive/ for historical reference.


🀝 Agent Harness Integration

Anchor is agent harness agnosticβ€”designed to work with multiple frameworks:

  • OpenCLAW (primary target)
  • Custom agent frameworks
  • Direct API integrations
  • CLI access for automation

Stateless Context Retrieval

Agent Query β†’ Anchor Context Retrieval β†’ Context (JSON/CSV/Tables) β†’ Agent Logic β†’ Response

πŸ”’ Security & Privacy

  • Local-First: All data stays on your machine
  • No Cloud: Zero external dependencies for core functionality
  • AGPL-3.0: Open source, sovereign software

πŸ› Troubleshooting

Common Issues

Issue Solution
ECONNREFUSED Fixed in Standard 088β€”server starts before DB init
Slow startup First run includes DB initialization
UI delays Electron wrapper may take ~15s; access directly at http://localhost:3160

Health Checks

GET /health              # System status
GET /health/{component}  # Component status
GET /monitoring/metrics  # Performance metrics

πŸ“„ License

AGPL-3.0 β€” See LICENSE file.


🎯 Roadmap

  • Enhanced code analysis (AST pointers)
  • Relationship narrative discovery
  • Mobile application support
  • Plugin marketplace
  • Diffusion-based reasoning models

πŸ™ Acknowledgments

  • Original research: STAR Algorithm
  • SimHash: Moses Charikar (1997)
  • PGlite: ElectricSQL team
  • All Anchor Engine contributors

Citing

If you use STAR in your research, please cite the software using the provided CITATION.cff file or the JOSS paper (once available). A DOI will be available upon archiving on Zenodo. The repository includes a whitepaper describing the algorithm in detail.

Repository: https://github.com/RSBalchII/anchor-engine-node
Whitepaper: docs/whitepaper.md
Production Status: βœ… Ready (February 28, 2026)


Disclaimer

This software is provided "as is", without warranty of any kind, express or implied. By using this software, you acknowledge that:

You are responsible for any potential damage to your device.
You understand that modifying hardware or system behavior may void warranties.
You will not hold the authors or contributors liable for any outcome resulting from the use of this software.

Use at your own risk.

About

A privacy-first context engine for any human facing llm interaction. Bring the right context when you need it, export or save results, then clear everything and start a new chat without bringing baggage - built for individuals who want better outputs without giving up control of their data.

Resources

License

Stars

Watchers

Forks

Packages