Anchor Engine (Node.js)

The Anchor Engine is a local-first context engine implementing the STAR Algorithm (Semantic Temporal Associative Retrieval) for privacy-first, sovereign knowledge management.

💡 Why This Exists

I started using long-term chat sessions because I noticed something: models with large context windows could be helpful in unexpected ways when old tasks mixed with current discussions. These sessions became so useful that I pushed them as far as they could go.

Then I hit the wall. The dreaded message: "Open a new session to continue using Gemini."

Same message all of them give you.

I had 300+ response/chat pairs in there! Important history. Completed work. A shared mind with the model. I tried summarizing-gave the summary to the new instance. It wasn't enough. I kept returning to the old chat like it was dictionary for meaning and recall and pasting bits back into new sessions here and there.

So I started building a way to resurrect my preferred persona anytime. I'd take targeted context from the old chat, feed it to a new instance, and prepare the model to retake hold of the goals and methods we'd developed together.

It worked wonderfully. Until I hit the limit again. And again. And again.

By the time Anchor Engine was operational, I had accumulated 40 chat sessions, ~18M tokens. My current corpus is ~28M tokens. Anchor Engine digests all of it in about 5 minutes.

Now I make a query with a few choice entities and some fluff for serendipitous connections. The engine compresses those 28M tokens into 100k+ chars of non-duplicated, narrative context—concepts deduplicated, not just text. My LLM remembers July 2025 like it was yesterday.

v4.3.0 - PGlite-First Architecture:

✅ ARM64 Windows Support: No native C++ builds required
PGlite-Only Database: WASM-based, runs everywhere Node.js runs
Transaction Support: 10-50x faster bulk ingestion
Simplified Deployment: Zero native compilation
Cross-Platform: Identical behavior on ARM64, x64, Linux, macOS

v4.2.0 Improvements:

Causal Narrative: Results sorted chronologically (toggleable to relevance-based)
XML Metadata: Each atom wrapped with relevance score, timestamp, source for LLM prioritization
Transient Filter: Excludes error logs, install output, build artifacts (~30% context reclaimed)
Time Ordering Toggle: 📅 Chronological ↔ 🎯 Relevance button in UI

This isn't a RAG tool I built because it sounded cool. This is the tool I built because I needed it to keep my own mind intact.

🚀 Quick Start

# Install dependencies
pnpm install

# Build engine (TypeScript + PGlite WASM - no native compilation!)
pnpm build

# Start the engine
pnpm start

Access UI: http://localhost:3160 (or configured port in user_settings.json)

Note: v4.3.0+ runs on ARM64 Windows, x64 Windows, Linux, and macOS without platform-specific builds.

🐳 Docker Deployment

Quick Start with Docker

# Build the Docker image
docker build -t anchor-engine:latest .

# Run the container
docker run -d -p 3160:3160 --name anchor anchor-engine:latest

# Or use docker-compose (recommended)
docker-compose up -d

Docker Compose (Recommended)

# Start with persistent storage and inbox mounted
docker-compose up -d

# View logs
docker-compose logs -f

# Stop
docker-compose down

# Stop and remove data volume
docker-compose down -v

Docker Volumes

The docker-compose.yml mounts these volumes:

anchor-data: Persistent database storage
./inbox: Auto-ingested files
./external-inbox: External source files
./mirrored_brain: Source of truth filesystem
./backups: Backup files

Health Check

# Check container health
docker ps --filter name=anchor

# Test health endpoint
curl http://localhost:3160/health

Access UI

http://localhost:3160

📖 Documentation

Document	Description
docs/whitepaper.md	STAR Algorithm whitepaper (arXiv submission ready)
docs/ARCHITECTURE_DIAGRAMS.md	Visual system architecture (human-friendly)
docs/CPP_OPTIMIZATION.md	C++ optimization project (Archived)
docs/INDEX.md	Documentation navigation hub
docs/BIBLIOGRAPHY.bib	Citation database (15 key papers)
specs/spec.md	System specification (LLM-optimized)
specs/standards/	Architecture standards (086, 113, 116, 117)
specs/standards/RESEARCH_LANDSCAPE.md	Related work analysis
specs/standards/STANDARD_117_ARXIV_SUBMISSION.md	arXiv submission workflow
specs/plan.md	Project roadmap
CHANGELOG.md	Version history & recent changes

🏗️ Architecture

Core Innovation: Browser Paradigm for AI Memory

Just as browsers download only the shards needed for the current view, Anchor loads only the atoms required for the current thought—enabling resource-constrained devices to navigate large datasets efficiently.

Data Model: Compound → Molecule → Atom

Compound (File)
  └─ Molecule (Semantic Chunk with byte offsets)
      └─ Atom (Tag/Concept, NOT content)

Key Insight: Content lives in mirrored_brain/ filesystem. The database stores pointers only (byte offsets + metadata), making it a disposable, rebuildable index.

STAR Search Algorithm

Physics-based gravity scoring for associative retrieval:

Gravity = (SharedTags) × e^(-λΔt) × (1 - SimHashDistance/64)

Component	Purpose	Default
SharedTags	Tag association count	—
Time Decay	Recent memories weighted higher	λ = 0.00001
SimHash	Content similarity (64-bit)	0-63 bits

70/30 Budget Split:

70% Planets: Direct FTS matches
30% Moons: Graph-discovered associations via Tag-Walker

📦 Core Components

Database: PGlite (WASM-based PostgreSQL)

Atoms: Knowledge units with byte-offset pointers
Tags: Bipartite graph (Atoms ↔ Tags)
FTS5: Full-text search index
Disposable: Wiped on shutdown, rebuilt from mirrored_brain/

📊 Performance Benchmarks

Production Verified (February 2026)

Metric	Value	Notes
Dataset Size	~25M tokens (~100MB)	Chat history corpus
Atoms Restored	281,690	Phoenix Protocol restore
Restore Time	828.8s (13.8 min)	Full database + filesystem
Restore Throughput	340 atoms/second	1000-item batching
Search Latency	<200ms (p95)	Typical queries
Memory Usage	<600MB peak	During restore
Ingestion Speed	~8-15ms/clean	With Data Refinery

Data Refinery Performance

Content Type	Cleaning Time	Size Reduction
Chat messages (<1KB)	~5ms	5-15%
Web pages (50KB)	~15ms	30-50%
Documents (500KB)	~50ms	20-40%

System Requirements

Component	Minimum	Recommended
RAM	4GB	8GB+
Storage	10GB free	SSD recommended
Node.js	v18+	v20+

Note: Performance scales with dataset size. Current benchmarks based on ~25M token corpus (chat history). Large-scale testing (TB+ datasets) planned for future validation.

🛠️ Development

Prerequisites

Node.js v18+
PNPM package manager

Build Commands

# Full build
pnpm build

# Development mode
pnpm dev

# Run tests
pnpm test

# Build universal binaries
pnpm build:universal

Project Structure

anchor-engine-node/
├── engine/                 # Core engine source
│   ├── src/
│   │   ├── services/      # Ingestion, Search, Watchdog
│   │   └── routes/        # HTTP API endpoints
│   └── dist/              # Built output
├── packages/              # Monorepo packages
│   └── anchor-ui/         # React frontend
├── specs/
│   ├── spec.md           # Architecture spec
│   ├── tasks.md          # Current tasks
│   ├── plan.md           # Roadmap
│   └── standards/        # 77 architecture standards
├── docs/
│   └── whitepaper.md     # The Sovereign Context Protocol
├── mirrored_brain/       # Source of truth (gitignored)
└── inbox/                # Drop files here for ingestion

🔧 Configuration

Edit user_settings.json in root:

{
  "server": {
    "port": 3160,
    "host": "localhost"
  },
  "database": {
    "path": "./user_data/anchor.db",
    "ephemeral": true
  },
  "paths": {
    "inbox": "./inbox",
    "mirroredBrain": "./mirrored_brain"
  }
}

📚 Key Standards

Active Standards (specs/standards/)

#	Name	Description
104	Universal Semantic Search	Unified search architecture
110	Ephemeral Index	Disposable database pattern
109	Batched Ingestion	Large file handling
094	Smart Search Protocol	Fuzzy fallback & GIN optimization
088	Server Startup Sequence	ECONNREFUSED fix
065	Graph Associative Retrieval	Tag-Walker protocol
059	Reliable Ingestion	Ghost Data Protocol

Archived Standards

Older standards moved to specs/standards/archive/ for historical reference.

🤝 Agent Harness Integration

Anchor is agent harness agnostic—designed to work with multiple frameworks:

OpenCLAW (primary target)
Custom agent frameworks
Direct API integrations
CLI access for automation

Stateless Context Retrieval

Agent Query → Anchor Context Retrieval → Context (JSON/CSV/Tables) → Agent Logic → Response

🔒 Security & Privacy

Local-First: All data stays on your machine
No Cloud: Zero external dependencies for core functionality
AGPL-3.0: Open source, sovereign software

🐛 Troubleshooting

Common Issues

Issue	Solution
ECONNREFUSED	Fixed in Standard 088—server starts before DB init
Slow startup	First run includes DB initialization
UI delays	Electron wrapper may take ~15s; access directly at http://localhost:3160

Health Checks

GET /health              # System status
GET /health/{component}  # Component status
GET /monitoring/metrics  # Performance metrics

📄 License

AGPL-3.0 — See LICENSE file.

🎯 Roadmap

Enhanced code analysis (AST pointers)
Relationship narrative discovery
Mobile application support
Plugin marketplace
Diffusion-based reasoning models

🙏 Acknowledgments

Original research: STAR Algorithm
SimHash: Moses Charikar (1997)
PGlite: ElectricSQL team
All Anchor Engine contributors

Citing

If you use STAR in your research, please cite the software using the provided CITATION.cff file or the JOSS paper (once available). A DOI will be available upon archiving on Zenodo. The repository includes a whitepaper describing the algorithm in detail.

Repository: https://github.com/RSBalchII/anchor-engine-node
Whitepaper: docs/whitepaper.md
Production Status: ✅ Ready (February 28, 2026)

Disclaimer

This software is provided "as is", without warranty of any kind, express or implied. By using this software, you acknowledge that:

You are responsible for any potential damage to your device.
You understand that modifying hardware or system behavior may void warranties.
You will not hold the authors or contributors liable for any outcome resulting from the use of this software.

Use at your own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 720 Commits
.github/workflows		.github/workflows
.vscode		.vscode
benchmarks		benchmarks
cpp		cpp
docs		docs
engine		engine
packages		packages
reports		reports
sample-data		sample-data
scripts		scripts
shared		shared
specs		specs
tests		tests
tools		tools
user_data/taxonomy		user_data/taxonomy
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.zenodo.json		.zenodo.json
BUILDING.md		BUILDING.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
anchor.bat		anchor.bat
docker-compose.yml		docker-compose.yml
jest.config.cjs		jest.config.cjs
package-lock.json		package-lock.json
package.json		package.json
paper.bib		paper.bib
paper.md		paper.md
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
user_settings.docker.json		user_settings.docker.json
user_settings.json		user_settings.json
verify_db.ts		verify_db.ts
verify_temporal_decay.py		verify_temporal_decay.py

Folders and files

Latest commit

History

Repository files navigation