Skip to content

bootstrap: fix as-of selection for compute reconciliation#26239

Merged
teskje merged 1 commit intoMaterializeInc:mainfrom
teskje:fix-compute-reconciliation
Mar 28, 2024
Merged

bootstrap: fix as-of selection for compute reconciliation#26239
teskje merged 1 commit intoMaterializeInc:mainfrom
teskje:fix-compute-reconciliation

Conversation

@teskje
Copy link
Copy Markdown
Contributor

@teskje teskje commented Mar 25, 2024

This PR fixes another issue with dataflow as-of bootstrapping that could disable compute reconciliation. The oversight here was that an envd restart can make index write frontiers regress (since they are reset to their newly selected read frontiers), which makes it likely that using these frontiers to determine an as-of for compute dataflows will produce an as-of that is earlier than the frontiers already installed dataflows have been allowed to compact to.

The solution to this issue is to select as-ofs based on the frontiers of (transitive) storage dependencies instead. Storage frontiers cannot regress, so they are guaranteed to provide an as-of that is far enough advanced for reconciliation to succeed.

The knowledge about transitive storage dependencies was not yet available during bootstrapping, so this PR adds it by converting the "index dependent MVs" collection step into a more generic "storage constraints" collection step that gathers all storage dependencies and storage dependants needed to constrain the initial timestamp for each dataflow.

Finally, this PR also merges the two bootstrap_*_as_of methods into a single one, to get rid of all the accumulated code and comment duplication.

Motivation

  • This PR fixes a recognized bug.

Fixes MaterializeInc/database-issues#7783

Tips for reviewer

I have not attempted to keep the changes minimal, so it's probably easier to review the two versions side-by-side, rather than as a line-by-line diff.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:
    • N/A

@teskje teskje force-pushed the fix-compute-reconciliation branch 3 times, most recently from ade840a to bf0f6a9 Compare March 26, 2024 11:25
@teskje
Copy link
Copy Markdown
Contributor Author

teskje commented Mar 26, 2024

Nightlies look good. The failing output consistency checks are unrelated.

@teskje teskje marked this pull request as ready for review March 26, 2024 16:58
@teskje teskje requested a review from a team as a code owner March 26, 2024 16:58
@teskje teskje requested review from ParkMyCar and ggevay March 26, 2024 16:58
Copy link
Copy Markdown
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote some small comments. Will finish the review tomorrow.

@teskje teskje force-pushed the fix-compute-reconciliation branch from bf0f6a9 to 949d7e4 Compare March 28, 2024 09:52
@teskje teskje requested a review from a team March 28, 2024 09:52
This commit fixes another issue with dataflow as-of bootstrapping that
could disable compute reconciliation. The oversight here was that an
envd restart can make index write frontiers regress (since they are
reset to their newly selected read frontiers), which makes it likely
that using these frontiers to determine an as-of for compute dataflows
will produce an as-of that is earlier than the frontiers of already
installed dataflows have been allowed to compact to.

The solution to this issue is to select as-ofs based on the frontiers of
(transitive) storage dependencies instead. Storage frontiers cannot
regress, so they are guaranteed to provide an as-of that is far enough
advanced for reconciliation to succeed.

The knowledge about transitive storage dependencies was not yet
available during bootstrapping, so this commit adds it by converting the
"index dependent MVs" collection step into a more generic "storage
constraints" collection step that gathers all storage dependencies and
storage dependants needed to constrain the initial timestamp for each
dataflow.

Finally, this commit merges the two `bootstrap_*_as_of` methods into a
single one, to get rid of all the accumulated code and comment
duplication.
@teskje teskje force-pushed the fix-compute-reconciliation branch from 949d7e4 to a30ddb1 Compare March 28, 2024 10:49
Copy link
Copy Markdown
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@teskje
Copy link
Copy Markdown
Contributor Author

teskje commented Mar 28, 2024

TFTR!

@teskje teskje merged commit f34e0bb into MaterializeInc:main Mar 28, 2024
@teskje teskje deleted the fix-compute-reconciliation branch March 28, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants