Skip to content

try perf things#1

Merged
jaredLunde merged 17 commits into
mainfrom
jared/try-stuff
Feb 4, 2026
Merged

try perf things#1
jaredLunde merged 17 commits into
mainfrom
jared/try-stuff

Conversation

@jaredLunde
Copy link
Copy Markdown
Contributor

No description provided.

@jaredLunde jaredLunde merged commit eb1fc25 into main Feb 4, 2026
24 checks passed
jaredLunde pushed a commit that referenced this pull request Feb 4, 2026
Update README.md with link to SlateDB
@jaredLunde jaredLunde deleted the jared/try-stuff branch March 4, 2026 06:17
jaredLunde added a commit that referenced this pull request May 16, 2026
Encodes the handoff state machine as a typed enum (Idle/Warming/Freezing/
Cutover) instead of a single AtomicBool. Preserves exact "any non-Idle"
semantics in the two read sites (flush_packs gate, checkpoint gate) via
HandoffPhase::is_active(), so behavior is unchanged.

Sets up future per-phase behavior (#1 atomic flush refactor, future PIOD
work) without forcing it now. Validated by handoff_sequential_50_crh
(50/50 clean, 574s).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jaredLunde added a commit that referenced this pull request May 16, 2026
…suite

The 217-test tests/integration suite (which my local --tests run missed)
caught four real issues from refactor #1 (c3e560b atomic flush_packs):

1. flush_to_s3 must always push the manifest, not just when packs were
   uploaded. Cold readers depend on the S3 manifest's presence to
   bootstrap; an all-zero-write export that cross-dedups to zero packs
   was leaving S3 empty (prop_zero_block_roundtrip).

2. flush_dirty_inner skipped flushes when a flushing file was on disk
   even after every claimed block had been promoted back to DIRTY
   (via guest write) — a stale rotation the cache couldn't recover from
   on its own. Now detected via `has_any_syncing()`: if `flushing_active`
   is set but no block is SYNCING, clean up the orphan and proceed
   (state_transition_table_completeness).

3,4. test_c1_…_causes_data_loss and test_manifest_failure_in_drain_…
   asserted the OLD non-atomic ordering's failure modes (blocks evicted
   to NP, data lost across manifest failure + crash). Atomic flush
   eliminates those windows: manifest failure returns Err BEFORE
   eviction; outer recovery re-dirties via the flushing file; crash
   recovery preserves data. Updated assertions to verify the new
   (stronger) invariants.

Also: handoff_durability test fixture now bumps RLIMIT_NOFILE to 65536
so foyer's SSD cache (16 GB, many segment fds) doesn't hit EMFILE on
CI runners with the default 1024 soft limit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant