Skip to content

Conversation

@DAlperin
Copy link
Member

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@DAlperin DAlperin marked this pull request as ready for review May 28, 2025 19:47
@DAlperin DAlperin requested a review from a team as a code owner May 28, 2025 19:47
@DAlperin DAlperin requested a review from bkirwi May 28, 2025 19:47
@bkirwi
Copy link
Contributor

bkirwi commented May 28, 2025

I'll follow up with a full review when I can, but two quick things to check in on:

  • Code like p.diffs_sum(metrics).unwrap_or(0) suggests that missing sums are equivalent to 0, but in other places None indicates that we haven't calculated the sum. I believe this could cause false positives.
  • We can't add methods like pub fn diffs_sum(&self, metrics: &ColumnarMetrics) -> Option<i64> {, because diffs are not guaranteed to be i64. (Though they usually are!) This may need to be a type parameter.

@DAlperin
Copy link
Member Author

Ah yep. I made it generic, but most invocations now call it like

 p.diffs_sum::<Diff>(metrics)
                    .expect("new batch must have diffs sum")

where Diff is the mz_repr::Diff that we seem to be hardcoding basically everywhere we describe diff types. I'm not clear if we should be adding more generics to push further up the the caller or if this is fine.

@DAlperin DAlperin force-pushed the dov/assert-diffs-sum branch from df60a3c to 661fc16 Compare May 28, 2025 22:10
.iter()
.filter_map(|p| p.diffs_sum::<Diff>(&metrics.columnar))
.sum::<Diff>()
.encode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a forward-compat hazard, since older versions don't have this field in the schema and would discard it.

Normally I'd suggest doing what you did for other PRs, but in this case I think it may be fine without a new flag... the hollow run stuff is already behind the max_run_len rollout flag, which is off in staging / prod. Though it is on in CI, so we may have to do some tweaking there.

Copy link
Contributor

@bkirwi bkirwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Overall shape seems good.

There remains a bunch of almost-duplicated code with subtle differences, which makes me a bit nervous and is kind of hard to review. I made one suggestion at a small scale (summing diffs of parts) but if there are any other ideas you think have a good cost-benefit ratio I think they're probably worthwhile!

There's a fair bit of red in the nightlies, but I haven't taken a look in detail - let me know if you want a second pair of eyes there.

Copy link
Contributor

@bkirwi bkirwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of code movement so this is a little hard to verify... but I think this is good! (And nightlies look pretty clean.) Appreciate all the cleanup.

One nit that I think is worth dealing with before merging, but otherwise this seems good to go.

We've verified that the fixed version of this still errors on the bad change in CI, yeah?

@def-
Copy link
Contributor

def- commented Jun 4, 2025

We've verified that the fixed version of this still errors on the bad change in CI, yeah?

I did a run here: https://buildkite.com/materialize/nightly/builds/12213. Seeing lots of

platform-checks-materialized-1       | 2025-06-04T16:39:14.746851Z  thread 'persist:0008' panicked at src/persist-client/src/internal/trace.rs:1018:17: assertion `left == right` failed: merge res diffs sum (-1406225) did not match spine batch diffs sum (-919999)

so it seems good!

@DAlperin DAlperin merged commit 9521044 into MaterializeInc:main Jun 6, 2025
260 of 263 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants