Skip to content

[Oneshot Sources] Fix bad assert in storage worker reconciliation#35556

Merged
patrickwwbutler merged 1 commit intoMaterializeInc:mainfrom
patrickwwbutler:patrick/oneshot-non-existent
Mar 20, 2026
Merged

[Oneshot Sources] Fix bad assert in storage worker reconciliation#35556
patrickwwbutler merged 1 commit intoMaterializeInc:mainfrom
patrickwwbutler:patrick/oneshot-non-existent

Conversation

@patrickwwbutler
Copy link
Contributor

Motivation

fixes https://github.com/MaterializeInc/database-issues/issues/11255

Description

Fixes a previous mistake in writing an assert - the assert was written as though it panics when the condition is true, so this inverts the condition, and improves variable names and error message to better clarify the intent.

@patrickwwbutler patrickwwbutler requested review from a team and def- March 19, 2026 18:55
@patrickwwbutler patrickwwbutler requested a review from a team as a code owner March 19, 2026 18:55
@github-actions
Copy link

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@patrickwwbutler
Copy link
Contributor Author

started a nightly to check that the error doesn't recur: https://buildkite.com/materialize/nightly/builds/15770/steps/canvas

mz_ore::soft_assert_or_log!(
!created && dropped,
"dropped non-existent oneshot source"
!(!to_create && to_drop),
Copy link
Contributor Author

@patrickwwbutler patrickwwbutler Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was waffling about whether this should be:

!(!to_create && to_drop)

or

to_create || !to_drop

I figured the first option communicates the intent better - we want to ensure that reconciliation is not trying to drop any ingestions that it did not create - as it should have created anything that it wants to exist in the first place.

but, assert_false(!to_create && to_drop) would be the best option, but alas we do not have mz_ore::soft_assert_false_or_log!

Copy link
Contributor

@martykulma martykulma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

The one you chose looks good to me. Here's something to add to your waffling - you can also just add an intermediate variable for readability.

let drop_without_create = to_drop && !to_create;
mz_ore::soft_assert_or_log(
    !drop_without_create,
    ...
);

I think for a quick fix this is good, but we should definitely take a broader accounting of the flow. I started looking at how we reconcile, and as far as I can tell, the storage controller wouldn't send a CancelOneshotIngestion.

If a replica disconnects (either network issue, or restart), the storage controller should call add_replica(), which first calls reduce on the command history. The result of the reduce doesn't store any CancelOnceshotIngestion in the history, it removes RunOneshotIngestion commands from the history.

It looks like oneshot ingestions are only kept in memory in envd, so if envd restarts, it also wouldn't send cancellations.

@patrickwwbutler patrickwwbutler merged commit 6904a6b into MaterializeInc:main Mar 20, 2026
332 of 334 checks passed
DAlperin pushed a commit that referenced this pull request Mar 20, 2026
…5556)

### Motivation

fixes MaterializeInc/database-issues#11255

### Description

Fixes a previous mistake in writing an assert - the assert was written
as though it panics when the condition is true, so this inverts the
condition, and improves variable names and error message to better
clarify the intent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants