-
Notifications
You must be signed in to change notification settings - Fork 11
perf(memory): optimize fact operations (~30% faster writes, ~56% faster reads) #2174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues found across 8 files
Prompt for AI agents (all 3 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="packages/memory/test/benchmark.ts">
<violation number="1" location="packages/memory/test/benchmark.ts:65">
Warm-up calls `session.transact` without ever checking the returned Result, so initialization failures are silently ignored and every benchmark can run with invalid state.</violation>
<violation number="2" location="packages/memory/test/benchmark.ts:65">
The workflow benchmark drops every Result from `session.transact` and `session.query`, so any failure in create/read/update/retract steps is ignored and the workflow timing becomes meaningless.</violation>
<violation number="3" location="packages/memory/test/benchmark.ts:233">
Each query in the “get fact (10 specific docs)” benchmark discards the Result from `session.query`, so read failures are silently ignored and the benchmark can time no-op iterations.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
fef44d7 to
871324c
Compare
871324c to
e4fae96
Compare
seefeldb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great! cc @ubik2 who might find this really interesting as well.
Add comprehensive benchmarks to measure write and read performance, including isolation benchmarks to identify specific bottlenecks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add bounded LRU cache (1000 entries) to memoize refer() results in
reference.ts. refer() is a pure function computing SHA-256 hashes,
which was identified as the primary bottleneck via isolation benchmarks.
Benchmark results for cache hits:
- 3x refer() calls: 44µs vs ~500µs uncached (27x faster)
- 10x unclaimed refs: 2.5µs (400k ops/sec)
The memoization benefits real-world usage patterns:
- Repeated entity access (queries, updates on same docs)
- unclaimed({ the, of }) patterns called multiple times
- Multi-step transaction flows referencing same content
Implementation:
- reference.ts: LRU cache using Map with bounded eviction
- Updated imports in fact.ts, access.ts, error.ts, entity.ts to
use memoized refer() from ./reference.ts instead of merkle-reference
The cache uses JSON.stringify as key (~7µs for 16KB) which is ~25x
faster than the SHA-256 hash computation (~170µs for 16KB).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implemented prepared statement caching to eliminate redundant statement preparation overhead on every database operation. Uses a WeakMap-based cache per database connection to ensure proper cleanup and memory safety. Changes: - Added PreparedStatements type and getPreparedStatement() helper - Cached 7 frequently-used SQL statements (EXPORT, CAUSE_CHAIN, GET_FACT, IMPORT_DATUM, IMPORT_FACT, IMPORT_MEMORY, SWAP) - Removed manual finalize() calls as statements are reused - Added finalizePreparedStatements() to close() for cleanup - Updated all database query functions to use cached statements Benchmark results (before → after): - Single GET query: 117.5µs → 53.4µs (54.6% faster / 2.2x speedup) - Single UPDATE: 906.6µs → 705.8µs (22.1% faster) - Batch retract (10): 2.5ms → 1.9ms (24.0% faster) - Query from 1000 docs: 89.6µs → 66.7µs (25.5% faster) - Batch SET (100): 99.4ms → 88.1ms (11.4% faster) - Batch SET (10): 8.6ms → 7.9ms (8.1% faster) - Single SET: 1.2ms → 1.1ms (8.3% faster) Overall, the optimization provides consistent improvements across all operations with particularly strong gains in read-heavy workloads. All 31 existing tests pass without modifications. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…t caching
The merkle-reference library caches sub-objects by identity during traversal.
By computing the datum hash BEFORE the fact hash, the subsequent refer(assertion)
call hits the cache when it encounters the payload sub-object, avoiding redundant
hashing of the same 16KB payload twice.
Before: refer(assertion) then refer(datum) - payload hashed twice
After: refer(datum) then refer(assertion) - payload hash reused via WeakMap
This ordering matters because:
1. refer(datum) hashes the payload and caches it by object identity
2. refer(assertion) traverses {the, of, is: payload, cause} - when it reaches
the 'is' field, the payload object reference hits the WeakMap cache
Benchmark results (16KB payload):
- set fact (single): 1.1ms → 924.7µs (16% faster)
- retract fact (single): 483.8µs → 462.4µs (4% faster)
- update fact (single): ~705µs → ~723µs (within noise)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Previously, getLabels() performed N individual SELECT queries to look up labels for N facts in a transaction. This adds latency proportional to the number of facts being processed. Now uses a single batched query with SQLite's json_each() function to handle an array of 'of' values: SELECT ... WHERE state.the = :the AND state.of IN (SELECT value FROM json_each(:ofs)) This reduces N queries to 1 query regardless of transaction size. Changes: - Added GET_LABELS_BATCH query constant using json_each() for IN clause - Added 'getLabelsBatch' to prepared statement cache - Rewrote getLabels() to collect 'of' values and execute single batch query The optimization benefits workloads with label facts (access control, classification). Benchmarks show ~4% improvement on batch operations, with larger gains expected in label-heavy workloads. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
In conflict detection, we were reading a fact from the database and then calling refer(actual) to compute its hash for comparison. But the fact hash is already stored in the database (row.fact) - we were discarding it and recomputing it unnecessarily. Changes: - Added RevisionWithFact<T> type that includes the stored fact hash - Updated recall() to return row.fact in the revision - Use revision.fact directly instead of refer(actual).toString() - Strip 'fact' field from error reporting to maintain API compatibility This eliminates a refer() call (~50-170µs) on the conflict detection path, which is taken for duplicate detection and first insertions. Benchmark results: - set fact (single): ~1.0ms → 846µs (15% faster) - update fact (single): ~740µs → 688µs (7% faster) - retract fact (single): ~428µs → 360µs (16% faster) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Fix Reference type annotations for memoized refer() - Validate Result in benchmarks to catch silent failures - Apply deno fmt 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
032200e to
74f990f
Compare
This makes it work automatically with `deno bench`
Summary
This PR significantly improves the performance of the memory package's fact operations through a series of targeted optimizations. The changes are purely performance improvements with no changes to the public API or data model.
Benchmark Results (16KB payload)
What Was Optimized
1. LRU memoization for merkle reference hashing
Added a bounded LRU cache (1000 entries) for
refer()results. Provides ~25x speedup on cache hits when the same content is referenced multiple times within or across transactions.2. SQLite prepared statement caching
Caches 7 frequently-used SQL statements per database connection using a WeakMap for memory safety. Eliminates redundant statement preparation overhead. Result: 2.2x faster queries, ~20% faster writes.
3. Datum/fact hash reordering
The merkle-reference library caches sub-objects by identity during traversal. By computing the datum hash BEFORE the fact hash, the subsequent
refer(assertion)call hits the cache when it encounters the payload sub-object. This is a simple reorder with a comment explaining why order matters. Result: ~16% faster sets.4. Batch label lookups
Previously
getLabels()performed N individual SELECT queries for N facts. Now uses a single batched query with SQLite'sjson_each()function. Reduces N queries to 1 query regardless of transaction size.5. Use stored fact hash instead of recomputing
In conflict detection, we were reading a fact from the database (which includes the stored hash), discarding the hash, then calling
refer()to recompute it. Nowrecall()returns the stored hash directly. Result: 7-16% faster across all operations.Why These Changes Are Safe
Bottleneck Analysis
Investigation showed that SQLite is NOT the bottleneck (~2-5µs for raw INSERT). Time breakdown for a 16KB fact write:
The optimizations target the hashing and database overhead where practical, while respecting the merkle DAG architecture that provides content-addressability, conflict detection, and deduplication.
Test plan
🤖 Generated with Claude Code
Summary by cubic
Optimized memory fact operations with internal caching and SQL statement reuse, yielding ~30% faster writes and ~56% faster reads on 16KB payloads. No API or data model changes.
Performance
New Features
Written for commit d066037. Summary will update automatically on new commits.