Version: 4.3.2 | Role: Semantic Memory & Search API | Port: 3160 | Status: β
Production Ready
Platform: β
ARM64 Windows | β
x64 Windows | β
Linux | β
macOS
The Anchor Engine is a local-first context engine implementing the STAR Algorithm (Semantic Temporal Associative Retrieval) for privacy-first, sovereign knowledge management.
I started using long-term chat sessions because I noticed something: models with large context windows could be helpful in unexpected ways when old tasks mixed with current discussions. These sessions became so useful that I pushed them as far as they could go.
Then I hit the wall. The dreaded message: "Open a new session to continue using Gemini."
Same message all of them give you.
I had 300+ response/chat pairs in there! Important history. Completed work. A shared mind with the model. I tried summarizing-gave the summary to the new instance. It wasn't enough. I kept returning to the old chat like it was dictionary for meaning and recall and pasting bits back into new sessions here and there.
So I started building a way to resurrect my preferred persona anytime. I'd take targeted context from the old chat, feed it to a new instance, and prepare the model to retake hold of the goals and methods we'd developed together.
It worked wonderfully. Until I hit the limit again. And again. And again.
By the time Anchor Engine was operational, I had accumulated 40 chat sessions, ~18M tokens. My current corpus is ~28M tokens. Anchor Engine digests all of it in about 5 minutes.
Now I make a query with a few choice entities and some fluff for serendipitous connections. The engine compresses those 28M tokens into 100k+ chars of non-duplicated, narrative contextβconcepts deduplicated, not just text. My LLM remembers July 2025 like it was yesterday.
v4.3.0 - PGlite-First Architecture:
- β ARM64 Windows Support: No native C++ builds required
- PGlite-Only Database: WASM-based, runs everywhere Node.js runs
- Transaction Support: 10-50x faster bulk ingestion
- Simplified Deployment: Zero native compilation
- Cross-Platform: Identical behavior on ARM64, x64, Linux, macOS
v4.2.0 Improvements:
- Causal Narrative: Results sorted chronologically (toggleable to relevance-based)
- XML Metadata: Each atom wrapped with relevance score, timestamp, source for LLM prioritization
- Transient Filter: Excludes error logs, install output, build artifacts (~30% context reclaimed)
- Time Ordering Toggle: π Chronological β π― Relevance button in UI
This isn't a RAG tool I built because it sounded cool. This is the tool I built because I needed it to keep my own mind intact.
# Install dependencies
pnpm install
# Build engine (TypeScript + PGlite WASM - no native compilation!)
pnpm build
# Start the engine
pnpm startAccess UI: http://localhost:3160 (or configured port in user_settings.json)
Note: v4.3.0+ runs on ARM64 Windows, x64 Windows, Linux, and macOS without platform-specific builds.
# Build the Docker image
docker build -t anchor-engine:latest .
# Run the container
docker run -d -p 3160:3160 --name anchor anchor-engine:latest
# Or use docker-compose (recommended)
docker-compose up -d# Start with persistent storage and inbox mounted
docker-compose up -d
# View logs
docker-compose logs -f
# Stop
docker-compose down
# Stop and remove data volume
docker-compose down -vThe docker-compose.yml mounts these volumes:
anchor-data: Persistent database storage./inbox: Auto-ingested files./external-inbox: External source files./mirrored_brain: Source of truth filesystem./backups: Backup files
# Check container health
docker ps --filter name=anchor
# Test health endpoint
curl http://localhost:3160/health| Document | Description |
|---|---|
| docs/whitepaper.md | STAR Algorithm whitepaper (arXiv submission ready) |
| docs/ARCHITECTURE_DIAGRAMS.md | Visual system architecture (human-friendly) |
| docs/CPP_OPTIMIZATION.md | C++ optimization project (Archived) |
| docs/INDEX.md | Documentation navigation hub |
| docs/BIBLIOGRAPHY.bib | Citation database (15 key papers) |
| specs/spec.md | System specification (LLM-optimized) |
| specs/standards/ | Architecture standards (086, 113, 116, 117) |
| specs/standards/RESEARCH_LANDSCAPE.md | Related work analysis |
| specs/standards/STANDARD_117_ARXIV_SUBMISSION.md | arXiv submission workflow |
| specs/plan.md | Project roadmap |
| CHANGELOG.md | Version history & recent changes |
Just as browsers download only the shards needed for the current view, Anchor loads only the atoms required for the current thoughtβenabling resource-constrained devices to navigate large datasets efficiently.
Compound (File)
ββ Molecule (Semantic Chunk with byte offsets)
ββ Atom (Tag/Concept, NOT content)
Key Insight: Content lives in mirrored_brain/ filesystem. The database stores pointers only (byte offsets + metadata), making it a disposable, rebuildable index.
Physics-based gravity scoring for associative retrieval:
Gravity = (SharedTags) Γ e^(-Ξ»Ξt) Γ (1 - SimHashDistance/64)
| Component | Purpose | Default |
|---|---|---|
| SharedTags | Tag association count | β |
| Time Decay | Recent memories weighted higher | Ξ» = 0.00001 |
| SimHash | Content similarity (64-bit) | 0-63 bits |
70/30 Budget Split:
- 70% Planets: Direct FTS matches
- 30% Moons: Graph-discovered associations via Tag-Walker
- Atoms: Knowledge units with byte-offset pointers
- Tags: Bipartite graph (Atoms β Tags)
- FTS5: Full-text search index
- Disposable: Wiped on shutdown, rebuilt from
mirrored_brain/
| Metric | Value | Notes |
|---|---|---|
| Dataset Size | ~25M tokens (~100MB) | Chat history corpus |
| Atoms Restored | 281,690 | Phoenix Protocol restore |
| Restore Time | 828.8s (13.8 min) | Full database + filesystem |
| Restore Throughput | 340 atoms/second | 1000-item batching |
| Search Latency | <200ms (p95) | Typical queries |
| Memory Usage | <600MB peak | During restore |
| Ingestion Speed | ~8-15ms/clean | With Data Refinery |
| Content Type | Cleaning Time | Size Reduction |
|---|---|---|
| Chat messages (<1KB) | ~5ms | 5-15% |
| Web pages (50KB) | ~15ms | 30-50% |
| Documents (500KB) | ~50ms | 20-40% |
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 4GB | 8GB+ |
| Storage | 10GB free | SSD recommended |
| Node.js | v18+ | v20+ |
Note: Performance scales with dataset size. Current benchmarks based on ~25M token corpus (chat history). Large-scale testing (TB+ datasets) planned for future validation.
- Node.js v18+
- PNPM package manager
# Full build
pnpm build
# Development mode
pnpm dev
# Run tests
pnpm test
# Build universal binaries
pnpm build:universalanchor-engine-node/
βββ engine/ # Core engine source
β βββ src/
β β βββ services/ # Ingestion, Search, Watchdog
β β βββ routes/ # HTTP API endpoints
β βββ dist/ # Built output
βββ packages/ # Monorepo packages
β βββ anchor-ui/ # React frontend
βββ specs/
β βββ spec.md # Architecture spec
β βββ tasks.md # Current tasks
β βββ plan.md # Roadmap
β βββ standards/ # 77 architecture standards
βββ docs/
β βββ whitepaper.md # The Sovereign Context Protocol
βββ mirrored_brain/ # Source of truth (gitignored)
βββ inbox/ # Drop files here for ingestion
Edit user_settings.json in root:
{
"server": {
"port": 3160,
"host": "localhost"
},
"database": {
"path": "./user_data/anchor.db",
"ephemeral": true
},
"paths": {
"inbox": "./inbox",
"mirroredBrain": "./mirrored_brain"
}
}| # | Name | Description |
|---|---|---|
| 104 | Universal Semantic Search | Unified search architecture |
| 110 | Ephemeral Index | Disposable database pattern |
| 109 | Batched Ingestion | Large file handling |
| 094 | Smart Search Protocol | Fuzzy fallback & GIN optimization |
| 088 | Server Startup Sequence | ECONNREFUSED fix |
| 065 | Graph Associative Retrieval | Tag-Walker protocol |
| 059 | Reliable Ingestion | Ghost Data Protocol |
Older standards moved to specs/standards/archive/ for historical reference.
Anchor is agent harness agnosticβdesigned to work with multiple frameworks:
- OpenCLAW (primary target)
- Custom agent frameworks
- Direct API integrations
- CLI access for automation
Agent Query β Anchor Context Retrieval β Context (JSON/CSV/Tables) β Agent Logic β Response
- Local-First: All data stays on your machine
- No Cloud: Zero external dependencies for core functionality
- AGPL-3.0: Open source, sovereign software
| Issue | Solution |
|---|---|
| ECONNREFUSED | Fixed in Standard 088βserver starts before DB init |
| Slow startup | First run includes DB initialization |
| UI delays | Electron wrapper may take ~15s; access directly at http://localhost:3160 |
GET /health # System status
GET /health/{component} # Component status
GET /monitoring/metrics # Performance metricsAGPL-3.0 β See LICENSE file.
- Enhanced code analysis (AST pointers)
- Relationship narrative discovery
- Mobile application support
- Plugin marketplace
- Diffusion-based reasoning models
- Original research: STAR Algorithm
- SimHash: Moses Charikar (1997)
- PGlite: ElectricSQL team
- All Anchor Engine contributors
If you use STAR in your research, please cite the software using the provided CITATION.cff file or the JOSS paper (once available). A DOI will be available upon archiving on Zenodo. The repository includes a whitepaper describing the algorithm in detail.
Repository: https://github.com/RSBalchII/anchor-engine-node
Whitepaper: docs/whitepaper.md
Production Status: β
Ready (February 28, 2026)
Disclaimer
This software is provided "as is", without warranty of any kind, express or implied. By using this software, you acknowledge that:
You are responsible for any potential damage to your device.
You understand that modifying hardware or system behavior may void warranties.
You will not hold the authors or contributors liable for any outcome resulting from the use of this software.
Use at your own risk.