docs: zero-downtime rolling deploy design#129
Merged
Conversation
…zations Multi-phase dump correctness: - DocOp::Merge variant: merges fields into existing docs instead of replacing - All dump phases use Merge for object-level writes (fixes data loss bug) - Tags post-pass: bitmap inversion writes one Merge per slot (4.5B→109M ops) - 10 unit tests for Merge semantics (roundtrip, accumulate, delete+resurrect) Pipeline performance (StreamingDocWriter fixes): - BufWriter 256→8192 bytes on new shard creation (2x throughput improvement) - Hardware CRC32 via crc32fast (replaces software byte-at-a-time table) - Remove per-shard fsync in finalize (saves 20-80s per phase) - Background enrichment drop (50s blocking → non-blocking) - Mmap explicit drop after parse (zombie RSS 83GB→24GB) DataSilo crate (crates/datasilo/): - Generic mmap'd key-value store: 35M writes/sec, 23M reads/sec - ParallelWriter with atomic bump + 1MB thread-local regions - OpsLog with CRC32 append + replay on startup - Compaction (replay ops → rewrite data file) - 6 unit tests passing Server endpoints: - POST /time-buckets/rebuild: rebuild from sort field data + cache clear - GET /dictionaries: reverse maps for LCS/MappedString fields - GET /ui-config: serves YAML as JSON for config-driven UI Config-driven UI (static/index.html): - Dynamic filter/sort controls from engine metadata + YAML overrides - Card rendering with image URL templates, badges, meta fields - Detail modal with configurable fields, display types, formats - URL state sync for bookmarkable/shareable filter states - Civitai UI config (deploy/configs/civitai/ui-config.yaml) Design docs: - docs/design/docop-merge.md (GPT + Gemini reviewed) - docs/design/datasilo-implementation-plan.md (full migration plan) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Frozen query path (Task #33): - BitmapSilo frozen accessors: get_frozen_filter(), get_frozen_sort_layer() - mark_filters_backed() / mark_sorts_backed() — startup marks bitmaps as unloaded placeholders, reads from mmap at query time - QueryExecutor: get_effective_bitmap() + and_effective_bitmap() helpers with frozen fallback for all filter ops (Eq, In, NotEq, NotIn, Or, Range) - Sort traversal: bifurcate_frozen(), apply_cursor_filter_frozen(), reconstruct_value_frozen() — frozen layers from BitmapSilo mmap - ConcurrentEngine holds BitmapSilo behind RwLock, passes to executor Aggressive V2 retirement (~15K lines removed): - Removed lazy loading: pending_filter_loads, pending_sort_loads, lazy_value_fields, ensure_fields_loaded(), LazyLoad enum - Removed eviction: eviction_stamps, eviction_total, idle sweep - Removed existence sets: existing_keys - Deleted bitmap_memory_cache.rs, bitmap_fs.rs, bound_store.rs, doc_cache.rs, field_handler.rs, preset.rs, shard_store*.rs (4 files) - Removed FilterField::load_field_complete(), load_values(), clear_bases_and_unload() - Cleaned up 47 stale TODO comments (49→2) - Deleted 8 dead test stubs, un-ignored 3 tests (0 ignored remaining) 635 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove dead SortField wrappers: bifurcate(), order_results(), apply_cursor_filter() — only frozen variants remain - Remove dead FlushCommand fields: skip_lazy, cursors, dictionaries - Remove dead docstore_root field from ConcurrentEngine - Clean unused imports across datasilo, concurrent_engine, executor - 635 tests passing, 0 failed, 0 ignored Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Arc<RoaringBitmap> on VersionedBitmap.base was for ArcSwap CoW snapshot publishing. With V3 frozen mmap, published snapshots read bases from BitmapSilo mmap, making the Arc unnecessary overhead. - base: Arc<RoaringBitmap> → base: RoaringBitmap - Removed from_arc() constructor - Simplified merge(), or_into_base(), load_base() — direct mutation - Updated all .base().as_ref() call sites to .base() - diff: Arc<BitmapDiff> stays (still needed for swap_diff) 635 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Temporary get_shard/get_shard_packed shims on DocSiloAdapter to unblock server compilation. These will be replaced with proper get_document() API that reads from mmap + applies pending ops. 635 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Zero references to ShardStore, DocStoreV3, BitmapFs, doc_cache, bound_store, or field_handler remain in the codebase. - Removed DocCacheConfigEntry struct + doc_cache config field - Removed 8 dead doc_cache metrics from metrics.rs - Removed evict_doc_cache() + doc_cache_stats() stub methods - Removed doc_cache metric scraping from server.rs - Updated all comments from V2 system names to V3 (DataSilo/BitmapSilo) - Updated test assertions from DocStoreV3 to DataSilo 635 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DocSink and Ingester<B> were V2 abstractions never used in production. Keep BitmapSink trait, CoalescerSink, AccumSink (actively used). 631 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DataSilo::delete(key) appends Delete tombstone to ops log - get_with_ops() respects delete tombstones (returns None) - Cold compaction: deleted keys excluded from output data file - Hot compaction: deleted keys have index entry zeroed out - OpsLog::for_each_ops() yields full SiloOp (Put + Delete) - Delete CRC validation in for_each() - 4 new tests: cold delete, hot delete, get_with_ops delete, delete+reinsert 29 datasilo tests passing, 631 lib tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge thread now only does: 1. DataSilo compaction when dirty (apply pending doc ops) 2. RSS-aware memory pressure eviction Removed: unused inner clone, time_buckets capture, cursors capture, suppress-unused hacks. Named the thread "bitdex-merge". 631 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added compact_threshold to SiloConfig (default 0.20 = 20%) - Added dead_bytes counter on DataSilo (AtomicU64) - Hot compaction tracks dead bytes from deletes (zeroed index entries) and relocating updates (overflows where old slot becomes dead) - Cold compaction resets dead_bytes to 0 (full rewrite) - Added dead_bytes(), dead_ratio(), needs_compaction() accessors - BitmapSilo uses compact_threshold=0.0 (bitmaps rewritten in full) 29 datasilo tests, 631 lib tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cold compaction now uses mmap for both data and index writes: 1. Compute entry layouts sequentially (offsets are cumulative) 2. Pre-allocate data file at exact size and mmap it 3. Write entries via pointer copy to pre-computed offsets Each entry targets a unique non-overlapping region, ready for parallel writes (rayon) when needed. Currently sequential but the infrastructure is in place — just change .for_each to .par_iter().for_each() when rayon is added to datasilo. 29 datasilo tests, 631 lib tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deleted pg_sync/backfill.rs entirely (no external callers) - Deleted pg_sync/csv_ops.rs entirely (no external callers) - Removed apply_ops_batch_dump + process_wal_dump from ops_processor.rs - Removed 7 dead parse_*_row functions from copy_queries.rs (kept parse_post_row, parse_model_version_row, parse_model_row) - Removed associated dead types: CopyImageRow, CopyResourceRow, CopyMetricRow 631 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New src/cache_silo.rs: - CacheEntryData: serializable subset of UnifiedEntry (bitmap, metadata, sorted_keys) - Binary format v1: fixed header + variable bitmap + optional sorted_keys - hash_unified_key(): folds 64-bit hash to u32 for DataSilo key - save_entry/delete_entry: append to ops log - load_all: scan ops log + data file for last-write-wins restore - compact: delegates to DataSilo compaction Wiring in ConcurrentEngine: - cache_silo field (Arc<RwLock<CacheSilo>>) - Startup: open + load_all from bitmap_path/cache_silo/ - Flush thread: drain_dirty_for_silo() → save dirty entries after cache maintenance - Merge thread: compact CacheSilo when dead space exceeds threshold UnifiedCache additions: - drain_dirty_for_silo(): collects dirty entries as (key_hash, CacheEntryData) 8 new CacheSilo tests, 639 total tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CacheSilo restore fix: - Added UnifiedKey serialization to CacheEntryData binary format (v2) - Added key field to CacheEntryData (encode/decode round-trips the key) - Wired actual restore path: load_all → from_cache_entry_data → insert_restored_entry - Added UnifiedEntry::from_cache_entry_data() constructor - begin_restore/finish_restore for batch eviction Dead enrichment code removal: - Removed PostEnrichment, MvEnrichment, ModelEnrichment structs - Removed load_posts_enrichment, load_mv_enrichment, load_model_enrichment - Removed CopyPostRow, CopyModelVersionRow, CopyModelRow + parse functions - Removed dead helper functions (is_null, parse_opt_*, parse_bool, parse_i64_fast) 639 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
compact_hot() previously truncated the single ops log after compaction, losing any ops written during the compaction window. Fix: two ops log slots (ops_a.log, ops_b.log) with atomic swap. Protocol: 1. Freeze active slot, redirect writes to other slot (atomic xor) 2. Compact data from frozen slot 3. Truncate frozen slot only after data+index fully flushed Legacy migration: existing ops.log renamed to ops_a.log on first open. Tests: test_ab_swap_no_ops_lost, test_legacy_ops_log_migration. 31 datasilo tests, 639 lib tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs fixed in compact_hot_from():
1. Reader blocking: old code dropped self.data_mmap during compaction,
causing get() to return None. Fix: write to data.bin.tmp while old
mmap stays alive, then rename over data.bin.
2. Data/index interleaving: old code wrote data AND updated index in
same loop body. Crash mid-loop = corrupt state. Fix: three strict
phases — classify (read-only), write data (tmp file), update index
(only after data flushed).
Dead-space accounting also fixed: captures old_allocated during the
read-only classification pass before any mutations.
Tests: test_hot_compact_does_not_drop_read_mmap_early,
test_hot_compact_data_before_index_sequential_rounds
33 datasilo tests, 639 lib tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- BitmapSilo compact_threshold: 0.0 → 0.20 (20% dead space triggers) - Added compact() and needs_compaction() to BitmapSilo - Merge thread now round-robins across doc, cache, and bitmap silos - bitmap_silo_arc created early for sharing with merge thread 639 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scarlet audit: previous hot compaction copied the ENTIRE data file to a temp file on every cycle — 25GB memcpy at 107M docs. Fix: two-tier approach: - In-place updates: seek+write to existing data.bin at allocated offsets - Overflows: append to end of existing data.bin (old slot = dead space) - Full file rewrite only when dead_ratio > compact_threshold (separate pass) - Never copy the entire file for routine compaction No temp file, no rename. data_mmap remaps only when file grows (overflows). In-place path doesn't touch the mmap at all — readers unblocked throughout. 33 datasilo tests, 639 lib tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BitmapSilo true silo (Phase 2 foundation): - Ops encoding: OP_SET_BIT (0x01) and OP_CLEAR_BIT (0x02) for individual bit mutations, alongside existing full frozen bitmap format - Mutation methods: filter_set/clear, sort_set/clear, alive_set/clear — append 5-byte ops to silo ops log - Ops-on-read: get_filter_with_ops, get_sort_layer_with_ops, get_alive_with_ops — read frozen base + scan ops for pending set/clear, apply inline - DataSilo.scan_ops_for_key() — scan both A-B logs for all ops on a key Dead stubs cleanup (Phase 5 partial): - Deleted memory_pressure.rs + all references - Deleted get_rss_bytes() + Windows/Linux FFI from concurrent_engine - Deleted dead stubs: boundstore_*, preload_*, build_all_from_docstore, rebuild_fields_from_docstore, add_fields_from_docstore, etc. - Merge thread: removed RSS eviction loop (no heap data to evict) - Removed rebuild_on_boot from server.rs 636 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
get_effective_bitmap now reads from BitmapSilo first (frozen base + pending silo ops), then merges with in-memory VersionedBitmap diffs for mutations not yet written to the silo. During the Phase 2→4 transition both sources may have data; union combines them. and_effective_bitmap simplified to delegate to get_effective_bitmap. 636 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added send_mutation_ops() helper that dual-writes every MutationOp to both the BitmapSilo ops log (V3 path) and the coalescer channel (V2 path, removed in Phase 4). All 6 mutation entry points wired. Filter, sort, AND alive mutations all go to the silo ops log: - FilterInsert/Remove → silo.filter_set/clear per slot - SortSet/Clear → silo.sort_set/clear per slot - AliveInsert/Remove → silo.alive_set/clear per slot Combined with the executor ops-on-read from the previous commit, this means the silo now has complete mutation data AND reads apply it. The coalescer/ArcSwap path is now redundant (Phase 4 removes it). 636 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Query path now checks CacheSilo before UnifiedCache: 1. Hash UnifiedKey → key_hash 2. If not in UnifiedCache, try cache_silo.get_entry(key_hash) 3. On silo hit: promote to UnifiedCache via from_cache_entry_data 4. Downstream logic (sorted_keys, radix, bucket diffs) works unchanged New: CacheSilo.get_entry(key_hash) — single-key read via get_with_ops New: silo_hits metric for tracking cross-restart cache effectiveness Write path unchanged: flush thread drain_dirty_for_silo still handles persistence 4 new get_entry tests. 640 total tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- BitmapSilo: RwLock on name_to_key for concurrent key auto-creation (new bitmap values auto-assign silo keys instead of silently skipping) - send_mutation_ops(): skip coalescer when bitmap_silo exists (mutations go ONLY to silo ops log for engines with a silo) - get_effective_bitmap(): simplified to silo-first, VB fallback for tests - Removed V2 lazy-load test (tested flush thread mechanics, N/A with silo) Phase 4 foundation: with silo-only mutations, the coalescer path is now dead for production engines. Tests without silos still use it. 640 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MutationOp and MutationSender now live in mutation.rs (their natural home) instead of write_coalescer.rs. Updated all imports across concurrent_engine, ingester, ops_processor. write_coalescer.rs now imports from mutation.rs — preparation for deleting the coalescer. 640 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WriteCoalescer batching system replaced by direct silo ops log writes. Flush thread uses local FlushBatch struct for remaining staging updates. MutationOp + MutationSender already moved to mutation.rs. FilterGroupKey moved to unified_cache.rs. 615 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
UnifiedCache replaced entirely by CacheSilo: - Query path reads cache via CacheSilo.get_entry() only - No in-memory HashMap, no radix sort index, no LRU tracking - UnifiedKey moved to cache_silo.rs - Flush thread live maintenance removed (~1,800 lines from concurrent_engine) - Prefetch worker removed - Cache stats/metrics simplified to CacheSilo-only Total removed this commit: ~5,200 lines (unified_cache.rs + flush thread code) 561 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FlushCommand (ForcePublish, SyncUnloaded, ExitLoadingSaveUnload) and the cmd_tx/cmd_rx command channel are gone. loading_mode AtomicBool and all enter/exit methods removed. - enter_loading_mode() / exit_loading_mode() → no-ops - exit_loading_mode_and_save_unload() → just calls save_snapshot() - save_and_unload() → calls publish_staging directly - Flush thread simplified: no command handling, no loading mode checks - 2 V2 tests deleted (loading mode timing tests) 559 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ArcSwap<InnerEngine> replaced with direct RwLock fields: - slots: Arc<RwLock<SlotAllocator>> - filters: Arc<RwLock<FilterIndex>> - sorts: Arc<RwLock<SortIndex>> Queries hold read locks. Flush thread holds write locks for mutation application only. No more staging clone, no snapshot publishing. Bulk-load paths (clone_staging/publish_staging) still work via read-lock clone → offline build → write-lock swap. 559 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes the following methods that are no longer part of the engine API: - put_via_wal, patch_document_via_wal (WAL write path — superseded by ops pipeline) - put_inner (inlined into put()) - patch, patch_document (PATCH semantics — use PUT for all writes) - sync_filter_values (filter_only sync — use PUT for all writes) - put_many, put_bulk, put_bulk_loading, put_bulk_into (bulk loading) - spawn_docstore_writer, write_docs_to_docstore (docstore helpers) - apply_accum (BitmapAccum apply — superseded by apply_bitmap_maps) - wal_writer field and set_wal_writer (WAL path removed) Keeps: put(), delete(), clone_staging(), publish_staging(), apply_bitmap_maps() — these are still used by dump_processor, loader, and remove_fields. Server PATCH and filter_sync handlers now return 501 Not Implemented. Removes 7 tests that covered the deleted methods. Benchmark "bulk" stage replaced with a no-op placeholder. cargo check --lib: 0 errors cargo check --features server,pg-sync: 0 errors cargo test --lib: 548 passed, 0 failed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Converted concurrent_engine.rs to directory module. Extracted 6 query methods to src/concurrent_engine/query.rs: - query(), execute_query(), execute_query_impl() - execute_query_traced(), execute_query_with_collector() - resolve_filters(), post_validate() concurrent_engine/mod.rs is now the engine struct + construction + mutations. concurrent_engine/query.rs is the query execution path. 548 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scarlet audit items 1-4: - Alive bitmap: load_alive().to_owned() → get_alive_with_ops() (ops-on-read) - put() removed from ConcurrentEngine (test helper added in tests.rs) - InFlightTracker removed (field + all calls + post_validate) - Loading mode call sites removed from server.rs + benchmark.rs - Cache setter call sites removed from server.rs config patch handler 536 lib tests passing. Server + pg-sync features compile clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FlushArgs struct + run_flush_thread() function extracted from build(). mod.rs: 1,578 → 1,244 lines (-334). flush.rs: 436 lines (flush loop + deferred alive + time buckets). 536 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Organized source into domain directories: - src/engine/: executor, filter, sort, slot, versioned_bitmap - src/silos/: bitmap_silo, cache_silo, doc_silo_adapter, doc_format - src/query/: planner + query types (BitdexQuery, FilterClause, etc.) engine.rs → engine_facade.rs (avoid conflict with engine/ dir). query.rs content folded into query/mod.rs. 21 files updated with new import paths. 536 tests passing, server+pg-sync features compile clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
load_alive() with .to_owned() removed. Test updated to use get_alive_with_ops(). Alive bitmap is not special — same ops-on-read as all other bitmaps. 536 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deleted remove_fields() from concurrent_engine (server endpoint returns 501) - Extracted janitor to src/janitor.rs (compaction round-robin across silos) - Merge thread now delegates to janitor::run_janitor() - Time bucket methods + config setters verified as thin delegations (no change needed) mod.rs: 1,244 → 1,190 lines. 536 tests passing, server+pg-sync features compile clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Delete 4 dead files (2,211 lines): meta_index.rs, engine_facade.rs, concurrency.rs, radix_sort.rs + execute_from_radix from executor.rs - Add DataSilo::write_batch_parallel() — rayon parallel mmap writes bypassing ops log for bulk saves (used by BitmapSilo::save_all_parallel) - Add rayon to datasilo crate, parallelize cold compaction mmap writes - Add ParallelBitmapWriter for lock-free bulk bitmap mutations - Clean flush thread: remove dead cache invalidation no-op + merge_dirty - Remove deprecated enabled_metrics config field (keep disabled_metrics) - Add QueryExecutor::new_full() replacing 5 conditional .with_*() chains - Move concurrent_engine/ under engine/ as engine/concurrent_engine/ - Move cache.rs → silos/cache.rs, query_metrics.rs → query/metrics.rs - Defer save_snapshot + compact from per-phase to server handler 488 tests passing, net -6,681 lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace to_owned() + per-op insert/remove with frozen.apply_ops(&sets, &clears) in BitmapSilo::get_bitmap_with_ops() — only copies containers touched by ops - Aggressive cache silo compaction: compact whenever ops exist (not just on threshold) - Add CacheSilo::has_ops() delegation 488 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move concurrent_engine/{mod,flush,flush_batch,query,tests}.rs up to
engine/ as flat siblings. Fields on ConcurrentEngine promoted to
pub(crate) for cross-module access. Delete the nested directory.
Layout: engine/{concurrent_engine,executor,filter,flush,flush_batch,
query,slot,sort,tests,versioned_bitmap}.rs
488 tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename src/pg_sync/ to src/sync/ — not PG-specific anymore - Move dump_processor.rs, dump_enrichment.rs, dump_expression.rs into sync/ - Move ingester.rs, loader.rs into sync/ - Delete old standalone files and pg_sync/ directory - Update all import paths (crate::pg_sync → crate::sync, etc.) - Fix crate::concurrent_engine → crate::engine::concurrent_engine 654 tests passing (with pg-sync feature). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add ParallelOpsWriter::write_put_reuse() — zero-alloc per call - Add encode_merge_fields_into() — writes to caller buffer - Wire thread-local scratch buffers in dump parse loop - Baseline: 579K rows/s → Fix 1: 597K rows/s (+3%) Bigger win expected at 107M scale (214M fewer allocations) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HashMap::with_capacity(8) for config_computed_sort_vals — avoids reallocation growth on first insert. Minimal impact at 14.6M scale (591K/s, within noise of 597K/s baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Enable parallel ops writer for ALL phases (was disabled for MV phases) - Multi-value post-pass now uses par_iter + write_put instead of sequential append_ops_batch with Mutex contention - Tags/tools/techniques at 107M scale will benefit most (4.73B rows through lock-free mmap writes instead of locked sequential append) - No regression on images phase: 599K/s (baseline 579K/s) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
src/bin/pg_sync.rs: bitdex_v2::pg_sync::* → bitdex_v2::sync::* Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lescer, pg_sync - Remove 'UnifiedCache' from metrics description, cache_silo docs, engine comments - Remove 'BoundStore' comment from concurrent_engine struct, server purge handler - Remove 'WriteCoalescer' reference from flush_batch docs - Update 'pg_sync' comment in loader.rs to 'sync pipeline' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ParallelOpsWriter::write_frame() returned false on mmap overflow but callers silently ignored it, dropping doc ops. This could cause missing documents after dump. Fix: Add overflow_count AtomicU64 to ParallelOpsWriter. Incremented on every dropped write. Dump processor checks after parallel writes and logs WARNING with count of dropped ops. Bug 1 (fill_indexed_fields reuse) deferred — borrow checker conflict with row lifetime vs thread-local buffer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-row RoaringBitmap::insert() with Vec<u32> collection + sort_unstable() + from_sorted_iter() for sort layer bitmaps. from_sorted_iter uses push_unchecked (O(1) per value) vs insert's binary search across ~1,678 containers. Benchmarked at 5.86x speedup on 32 bit-layers × 7.3M values (9,592ms → 1,638ms). Each rayon thread collects slot IDs into Vec<u32> per bit-layer during the row loop, then builds bitmaps in one shot after all rows processed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Applied OS page-management hints across every mmap creation site so the kernel can make better decisions about readahead, THP, and page reclaim: - dump_processor.rs: SEQUENTIAL after map (bulk CSV read, left-to-right), DONTNEED (Linux only) immediately before drop to release pages promptly - dump_enrichment.rs: SEQUENTIAL after map (same bulk read pattern) - slot_arena.rs: RANDOM at creation (random slot lookups), DONTNEED (Linux only) in cleanup() before drop to reclaim arena pages after phase - datasilo/lib.rs: SEQUENTIAL on bulk-write data mmaps (build_cold, rebuild), RANDOM on load_index (random bucket lookups), RANDOM + conditional HUGEPAGE (>512 MB, Linux only) on load_data for large silos - datasilo/ops_log.rs: SEQUENTIAL on both open-existing and ensure_capacity grow paths (append-only log, purely sequential writes) - datasilo/hash_index.rs: RANDOM on both create() and open() (hash table — scattered random access by definition) All advise() calls are #[cfg(unix)] gated (method does not exist on Windows). DontNeed and HugePage are additionally #[cfg(target_os = "linux")]. Uses let _ = mmap.advise(...) — errors ignored (hints are advisory only; failure is never fatal). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Benchmarks all viable approaches for merging N partial roaring bitmaps: A) sequential pairwise |= B) rayon fold+reduce |= (current dump pipeline) C) MultiOps::union() refs — CoW streaming merge (roaring-rs built-in) D) MultiOps::union() owned E) largest-first sequential |= F) k-way iterator merge → from_sorted_iter G) parallel tree reduction Results across 6 scenarios (8/32 threads × large/medium/sparse): MEDIUM-32 (most common tagId shape): C=3.6ms vs B=18.8ms — 5.2x faster LARGE-8 (dense nsfwLevel shape): C=1.2ms vs A=1.4ms — 1.2x faster SPARSE-32 (rare tag, many threads): C=0.5ms vs B=0.8ms — 1.7x faster Winner is always C (MultiOps::union refs). It does a single streaming merge walk over all N bitmaps, borrowing containers from the largest bitmap first, deferring ensure_correct_store() until the final pass. Pairwise |= calls promote and fix cardinality on every intermediate step. Rayon fold+reduce (B) is slower than single-threaded A in 5 of 6 scenarios because the merge is memory-bandwidth-bound, not CPU-bound. Parallel tree (G) and owned MultiOps (D) are consistently worse than C. Recommendation: replace the dump pipeline's par_iter fold/reduce with bitmaps.iter().union() (MultiOps trait from roaring). Expected 4-5x speedup on the per-value merge for tagIds (31K distinct values, medium cardinality). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ahash 0.8 and swaps std::collections::HashMap for AHashMap in the three hot-path modules: dump_processor (51 uses), engine/filter (FilterField bitmap map), and engine/sort (SortIndex field map). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… 4 scenarios)
Investigates whether threads sharing a single accumulator can eliminate the
per-thread bitmap merge that costs 6.4s+ in the dump pipeline.
8 strategies benchmarked across 4 cardinality shapes at 14.6M rows:
A per-thread HashMap<u64,bitmap> + sequential OR reduce (current baseline)
A2 same parse + MultiOps::union() merge (previous benchmark winner)
B shared DashMap<u64,Mutex<bitmap>> — zero merge cost
C shared DashMap<u64,Mutex<Vec<u32>>> + sort/from_sorted_iter finalize
D per-thread Vec<(val,slot)> + global sort + group-by finalize
E per-thread HashMap<u64,Vec<u32>> + parallel sort/from_sorted_iter
F 256-shard batched Mutex<HashMap<Vec<u32>>> accumulator
G per-thread HashMap<u64,Vec<u32>> + sharded parallel finalize
Key finding: NO single approach wins across all cardinalities.
Low/mid-card (nsfwLevel, tagIds, <50K distinct values):
G/E win at 106ms and 415ms vs A at 59ms/2086ms. A2 (MultiOps merge)
is the simplest win: 5x speedup on mid-card with no parse change.
B is catastrophic on low-card: 3.4s from 14.6M threads on 5 Mutexes.
High-card (userId, postId, 2M distinct values):
B wins at 2.4s vs A at 8.2s (3.5x). D (flat Vec + global sort) is
nearly as fast at 2.5s with zero lock overhead — simpler and safer.
A2 is WORSE than A here: MultiOps overhead on 2M bitmaps with 7
entries each dominates. E/G are also worse than A.
Recommendation:
Low/mid-card fields: keep per-thread structure, switch merge to
MultiOps::union() — 1.8x–5x faster, zero structural change.
High-card fields (>50K distinct values): switch to approach D —
per-thread Vec<(u64,u32)>, concat+sort+group-by finalize.
3x speedup, ~175MB working buffer, no locks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…election) Proposes same-node shared-PVC architecture for zero-downtime deploys. File-lock writer election, read-only serving mode, 503 on write endpoints for sidecar compatibility. ~200 lines of Rust when implemented. Depends on V3 mmap architecture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prototypes 5 approaches for parallelizing the compact_cold_from() scan against the current single-threaded baseline, with 5 scenarios. Approaches tested: Baseline Sequential for_each_ops scan → HashMap<key, Vec<u8>> (current) 2 Sequential header prescan → offset table → parallel chunk scan 2B Fully parallel header prescan → offset table → parallel scan 3 Byte-range parallel scan with CRC self-sync (no prescan) 4 Sequential scan → flat Vec<(key, offset)> + sort (no HashMap) 5 Sequential scan, no value copy (lower bound measurement) Results (1M keys × 300B, 400MB log, 32 threads): Baseline: 584ms Approach 2: 704ms (0.83x — SLOWER) Approach 3: 586ms (1.00x — breakeven only) Thread scaling (approach 3): 0.61x-0.86x at 2-32 threads Finding: ALL parallel approaches are slower on Windows (no MADV_SEQUENTIAL). The scan is memory-bandwidth bound; sequential access wins because the OS prefetcher predicts the sequential pattern. Multiple threads thrash TLB and compete for the same memory bus bandwidth. Critical discovery from approach 5: the no-copy lower bound is 335ms vs 584ms baseline, meaning 43% of scan time is Vec<u8> allocation overhead (14.6M × 300B = 4.4GB of heap allocations per compact). Real bottleneck: TWO full passes over 4.4GB — scan copies values to heap, write phase reads them back. Total: ~9GB of memory traffic for a 4.4GB log. Correct fix: zero-copy compaction. Store HashMap<key, (mmap_offset, len)> instead of Vec<u8>. Write phase reads directly from source mmap to dest data file. Eliminates the 4.4GB heap allocation pass entirely (~2x speedup). Parallel scan remains viable on Linux with MADV_SEQUENTIAL — the production pod already has the hint applied (from previous madvise PR). Re-benchmarking there should show real scaling for approach 3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add --read-only flag (or BITDEX_READ_ONLY=1 env var) that starts the
server in read-only mode:
- Write endpoints (POST /ops, PUT /dumps) return 503 with clear message
- All admin routes (create/delete/upsert/config) blocked via middleware
- WAL reader thread skipped (no write pipeline)
- Health endpoint reports {"status":"ok","mode":"read-only"|"read-write"}
- Queries, stats, cursors, and all read endpoints work normally
This enables K8s rolling deploys where the new pod starts read-only,
serves queries immediately from shared mmap'd data, and the sidecar's
existing retry logic handles the 503s until the pod is promoted.
See docs/design/zero-downtime-deploy.md for the full architecture.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gies Compares 4 strategies for building filter bitmaps during the dump parse loop across 4 Civitai-realistic data shapes (low/med/high cardinality and 8-field mixed). Key finding: Approach A (current HashMap insert) is 5x slower than B/D on the realistic mixed scenario (71.6s vs 13.4s for 116M tuples). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Compares merge strategies for combining 32-thread filter bitmap outputs at 1M-row scale with 8 fields (2 low, 3 medium, 3 high cardinality). Results: Approach B (per-field parallel merge) wins the merge-only phase at 516ms vs 2591ms for current rayon fold+reduce (5x faster). Pipeline total including per-thread bitmap build: B = 747ms vs A = 2822ms (3.78x). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comprehensive dump pipeline performance overhaul. Parse+merge time reduced from 30.9s to 14.8s on 14.6M images-small dataset. ## Per-row optimizations (20,230 → 11,968 ns/row, -41%) - Mmap enrichment with dense Vec offset index: 6.1x faster build, 5.2x less memory (HashMap → mmap + Vec<u64> for >100MB CSVs) - Sort bitmap from_sorted_iter: collect Vec<u32> per bit-layer, build bitmaps via sort + from_sorted_iter after row loop - Flat Vec filter bitmap batch insert (Approach B): push (field_idx, value, slot) tuples per row, sort + grouped from_sorted_iter in post-pass. 66% faster than per-row HashMap insert. - Compiled DocFieldPlan: pre-resolve all field indices, value types, and skip flags at phase setup. Single flat loop per row, zero HashMap/HashSet lookups. - DumpFieldValue with zero-copy strings: borrow &str from mmap/ enrichment instead of .to_string(). Shared wire format primitives in doc_format.rs (write_field_int/bool/str/multi_int). - Duplicate config-computed sort elimination: compute GREATEST/LEAST once (early), reuse for bitmap writes (was 22% of parse time). - Reusable indexed_fields Vec (lifetime fix: 'a mmap, not 'b row) - Reusable enrichment buffer (enrich_row_indexed_into) - O(1) enriched_get via AHashMap (was O(n) linear scan, 8 calls/row) - ahash in dump_expression.rs + dump_enrichment.rs for hot-path maps ## Merge phase optimization (5.6s → 2.4s, -57%) - Per-field parallel merge: sequential collect into per-field Vecs, then rayon par_iter over ~20 fields. Each field merges independently. userId (2M values) gets its own thread. ## Infrastructure - Zero-copy cold compaction: SiloOpRef stores mmap offsets instead of Vec<u8> copies. 43% faster compaction scan. - dump-timing feature flag: per-row nanosecond instrumentation with doc_encode sub-timings (field_collect, pack_encode, mmap_write). Zero overhead when feature is off. - streaming_merge config option on dump request body (MultiOps::union path for 107M+ scale, default off). - Mi merge concatenation fix (Merge ops concatenate multi-int arrays) ## Cleanup - Deleted dead CacheStats/CacheEntryDetail stubs + zero-value metrics - Renamed clear_unified_cache → clear_cache - Deleted dead enrich_from_lookup method - Panic guard on EnrichmentTable::get() for Mmap-backed tables - MADV_RANDOM for mmap enrichment lookup phase - 200M key cap warning for dense Vec enrichment index 685 tests pass. 11 files changed, +1207 -507. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace .iter().clone() with .into_iter() when converting AHashMap to std::HashMap for apply_bitmap_maps. Eliminates deep-cloning millions of RoaringBitmaps during the filter/sort bitmap transfer to engine staging. Also uses into_iter for sort_maps_indexed conversion and removes unnecessary .clone() on alive bitmap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move apply_bitmap_maps from process_dump (outer) into process_dump_with_progress (inner), right after the merge phase. Merged bitmaps are consumed directly via into_iter — no intermediate PhaseResult storage, no AHashMap→HashMap conversion overhead. process_dump becomes a thin wrapper (save dictionaries + return). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Write frozen bitmaps directly to BitmapSilo via write_dump_maps() instead of the V2 clone_staging → apply → publish → save_snapshot roundtrip. Eliminates ~15s overhead (5s apply + 10.5s save_snapshot) at 14.6M scale. Results: 1,048K → 1,428K rows/sec (+36%), total process_dump 19.9s → 11.2s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
flock) writer election — no K8s API dependency, no external coordinationPOST /ops,PUT /dumps) return 503 in read-only mode — sidecar retries naturallyNotes
rolling-restart-cursors.mdfor same-node deployments🤖 Generated with Claude Code