persist: Make sure to obtain a lease before selecting a batch by bkirwi · Pull Request #35554 · MaterializeInc/materialize

bkirwi · 2026-03-19T17:50:08Z

A "seqno lease" is the tool Persist uses internally to prevent garbage collection of a batch that a reader is still processing. It's important that we obtain the lease before we choose the batch to return, to avoid a race where the state changes between the batch being selected and the lease being taken. Unfortunately, callers did this in the wrong order - chose a batch and then obtained a lease for it.

This may have been exacerbated by the recent-ish #34590, which allows more aggressive seqno downgrades to avoid leaks.

Motivation

Incident response - a race here could cause an unexpected read-time halt.

github-actions · 2026-03-19T17:50:20Z

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

The PR title is descriptive and will make sense in the git log.
This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

bosconi · 2026-03-19T18:47:22Z

bugbot run

cursor · 2026-03-19T18:47:29Z

PR Summary

Medium Risk
Changes the ordering and waiting logic for snapshot/listen to ensure a seqno lease is obtained before selecting batches, which affects read correctness and could introduce new panics or latency regressions if the wait/upper logic is wrong.

Overview
Refactors persist read paths to wait for the shard upper to advance, then obtain a seqno lease before selecting snapshot/listen batches, preventing races where chosen batches/parts could be GC’d before being leased.

This introduces a shared Machine::wait_for_upper_past primitive (used by Listen::next, ReadHandle::snapshot*, and WriteHandle::wait_for_upper_past), adds RetryParameters::persist_defaults, and updates snapshot stats/parts stats to use a new Machine::unleased_snapshot helper. Metrics are renamed/retargeted from listen/snapshot-specific watch counters to generic wait-for-upper counters.

^{Written by Cursor Bugbot for commit 61ab3dc. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.}

src/persist-client/src/read.rs

mtabebe

This change does make sense to me given our discussions. And the key thing is that the invariant that the sequence hold is before the actual batch read makes sense.

It also does fix jan's test. I think we should consider merging in jan's repro as well with this change so we have the test.

I don't know that I should be the approver, maybe we should wait for @teskje

src/persist-client/src/internal/machine.rs

mtabebe · 2026-03-19T19:15:26Z

src/persist-client/src/read.rs


    fn lease_batch_parts(
        &mut self,
+        lease: Lease,


Cool, this enforces that we actually have the lease through the contract of the api

mtabebe · 2026-03-19T19:20:42Z

src/persist-client/src/read.rs

-            match tokio::time::timeout(min_elapsed, next_batch).await {
-                Ok(batch) => break batch,
-                Err(_elapsed) => {
-                    self.handle.maybe_downgrade_since(&self.since).await;


Is it intentional to drop the maybe_downgrade_since here? in this retry loop.

I think it does make sense because these are disjoint concepts. We should just wait for the lease, not do anything with the since. Just checking...

Yeah, but I see why it's confusing! This loop was a workaround for an issue in an earlier version of the code, where we only relaxed any seqno holds at the same time as we downgraded the since, so we had to time out calls like this and insert calls that would be otherwise noops. As of a couple months ago, we downgrade the seqno in the background thread, so we do not need this sort of noop call. (You can see that in the latest version of this method, this only updates some metadata and doesn't trigger any actual work.)

mtabebe · 2026-03-19T19:22:35Z

src/persist-client/src/internal/machine.rs

+            as_of,
+            &mut watch,
+            None,
+            &self.applier.metrics.retries.snapshot,


Is it meaningful to relable this metric as unleased_snapshot?

I don't have a case in mind where I'd want to break these down separately, but it's definitely possible if there's a use-case for it!

teskje

I'm not sure that I'm a more useful reviewer than Michael, but this makes sense to me, fwiw!

teskje · 2026-03-20T10:07:04Z

src/persist-client/src/internal/machine.rs

-            if !logged_at_info && start.elapsed() >= Duration::from_millis(1024) {
-                logged_at_info = true;
-                info!(
-                    "snapshot {} {} as of {:?} not yet available for {} upper {:?}",


Looks like we lost these logs. Is that fine? I think they have been useful once or twice for me in the past, when debugging why things hang.

Yeah, fair enough - let me see what I can do!

I've restored this log, but parameterized to make it make sense in this slightly more generic context. (Though I've hacked it up to only log at info for snapshots, since that's the old behaviour and I think it might be a bit noisy otherwise.)

I also took a second pass in general to try and make sure the behaviour was as 1:1 with the old code as possible, except of course for the stuff we're trying to improve. :) Details in the last commit.

DAlperin · 2026-03-20T15:30:44Z

src/persist-client/src/read.rs

+        let lease = self.handle.lease_seqno().await;
+        let batch = match self
+            .handle
+            .machine
+            .applier
+            .next_listen_batch(&self.frontier)


Took me a minute to convince myself this can't race in a meaningful way. We acquire a lease for whatever version of state we are at. If in between acquiring the lease, and reading the next batch the state advances then we get a batch at a new version of state. But due to gc's handling of seqno_since this lease still serves as a safe lower bound. The only risk is holding the GC back too far, I can imagine a version of next_listen_batch that atomically provides a lease at it's current state version, but this is fine.

I also got tripped up here going through hypothetical races with state advancing. Maybe worth a safety argument inline for why this works

Yeah, that's the trick - the seqno hold protects all future versions of state as well, so it's okay to non-atomically grab a lease and check the state as long as the lease happens first. (And it would be hard to do atomically in any case, since the state is process-global and lots of other handles might be updating it concurrently.)

I've enhanced the comment on lease_seqno with the reasoning here to make that more clear to future readers!

pH14

Okay been staring at this for a while, the race diagnosis + switch to lease-then-get-batches behavior change here makes sense to me. Also I like the simplification of wait_for_upper_past over the previous model

I'm not sure if our tests are set up for this at all, but is there any way to write a regression test for this?

This recovers some logging, and also restores some other minor behaviour to its state before this PR. (Including that the retry policy for the write handle used the listener's retry params.)

bkirwi · 2026-03-20T15:45:17Z

I'm not sure if our tests are set up for this at all, but is there any way to write a regression test for this?

@teskje wrote a reproducer for this, though it involves adding some targeted sleeps and isn't something we can merge directly. I've confirmed that it passes on this PR, though. And I hope to follow up with a mergeable version of it when I have a moment...

bkirwi · 2026-03-20T15:52:12Z

Alright, thank you all for the review!

I'll get this merged so we can pull it in for next week's release.

A "seqno lease" is the tool Persist uses internally to prevent garbage collection of a batch that a reader is still processing. It's important that we obtain the lease _before_ we choose the batch to return, to avoid a race where the state changes between the batch being selected and the lease being taken. Unfortunately, callers did this in the wrong order - chose a batch and then obtained a lease for it. This may have been exacerbated by the recent-ish #34590, which allows more aggressive seqno downgrades to avoid leaks. ### Motivation Incident response - a race here could cause an unexpected read-time halt.

bkirwi force-pushed the lease-fix branch 2 times, most recently from 83ff452 to 61ab3dc Compare March 19, 2026 18:23

cursor bot reviewed Mar 19, 2026

View reviewed changes

src/persist-client/src/read.rs Outdated Show resolved Hide resolved

bkirwi marked this pull request as ready for review March 19, 2026 18:56

bkirwi requested a review from a team as a code owner March 19, 2026 18:56

bkirwi force-pushed the lease-fix branch from 61ab3dc to 4806699 Compare March 19, 2026 18:58

mtabebe reviewed Mar 19, 2026

View reviewed changes

bkirwi added the release-blocker Critical issue that should block *any* release if not fixed label Mar 19, 2026

bkirwi requested review from DAlperin, pH14 and teskje March 19, 2026 20:58

teskje approved these changes Mar 20, 2026

View reviewed changes

bkirwi force-pushed the lease-fix branch from 7a57546 to 4be8ba4 Compare March 20, 2026 15:28

DAlperin reviewed Mar 20, 2026

View reviewed changes

DAlperin approved these changes Mar 20, 2026

View reviewed changes

pH14 reviewed Mar 20, 2026

View reviewed changes

bkirwi added 2 commits March 20, 2026 11:41

Make sure to obtain a lease before selecting a batch

0974b50

Apply PR comments and restore some behaviour

a15ef1e

This recovers some logging, and also restores some other minor behaviour to its state before this PR. (Including that the retry policy for the write handle used the listener's retry params.)

bkirwi force-pushed the lease-fix branch from 4be8ba4 to a15ef1e Compare March 20, 2026 15:41

bkirwi enabled auto-merge (squash) March 20, 2026 15:51

bkirwi merged commit b33ffcb into MaterializeInc:main Mar 20, 2026
127 checks passed

Conversation

bkirwi commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

github-actions bot commented Mar 19, 2026

PR title guidelines

Pre-merge checklist

Uh oh!

bosconi commented Mar 19, 2026

Uh oh!

cursor bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtabebe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

teskje left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pH14 left a comment

Choose a reason for hiding this comment

Uh oh!

bkirwi commented Mar 20, 2026

Uh oh!

bkirwi commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bkirwi commented Mar 19, 2026 •

edited

Loading

cursor bot commented Mar 19, 2026 •

edited

Loading