-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monotonic select #18546
Monotonic select #18546
Conversation
f2ba0d7
to
3755c60
Compare
An evaluation of the first two commits in this PR can be found in this spreadsheet. In contrast to the evaluation of PR #14607, we included here a few additional queries with |
Important note for follow-up work: The plans produced here still sometimes set
Above, we see that |
99302cb
to
16933dd
Compare
I tried to schedule SQL-related nightly runs for this PR; however, all of the errors found are also happening in |
e0f5221
to
8895d80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggesting changes for an explain oversight!
b09d5c8
to
24d6932
Compare
@aalexandrov @teskje @jkosh44 I am tagging you as reviewers for this PR. The PR is an MVP for the feature of exploiting monotonic operator variants in single-time queries (i.e., one-shot
I can create follow-up issues for (1) and (3) above if you are in agreement. Even in its current version, we can already measure quite sizable query processing times gains for some queries (when the feature flag is turned on). @frankmcsherry Hope that it is OK with you to initiate review for this PR; otherwise, please let us know what other strategy to take! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surfaces changes LGTM
Just to be on the safe side, I scheduled a slow SQL Logic Test run on commit 8895d80, prior to introducing the feature flag. No errors were reported, so it looks like doing operator selection for one-shot |
I've been told this may be blocked on me! It shouldn't; if it's good to go, go for it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I was wondering about the risk involved with removing those "always consolidates". But with the ensure_consolidate
s we won't produce incorrect data, so that seems fine.
} | ||
} | ||
let plan = self.finalize_dataflow(dataflow, instance)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels very brittle that we have to rely on nobody modifying the finalized plan anymore for correctness. The "lock down the COMPUTE interface" part of #18496 can solve this, but I wonder if there is a way to short-term ensure that plan
is not modified accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is something to give some thought to as part of #18496, but I am afraid that trying to address this locally might exceed the scope of this PR. Still, we are introducing an unwritten API requirement here that until
needs to be assigned prior to calling Coordinator::finalize_dataflow
, which was not previously present. I wrote down this extra requirement now in the docstring of Coordinator::finalize_dataflow
.
24d6932
to
58e84c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, I have some minor questions / nits about naming and code organization.
} | ||
|
||
let debug_name = self.debug_name.to_string(); | ||
let (collection, errs) = collection.ensure_monotonic(move |data, diff| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that this check was added in #18415.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check was added by #18140 to both monotonic top-1 and to monotonic reduce; however, it was not added there to monotonic top-k. This is because monotonic top-k does not need to use explode_one
, and adding explode_one
to the other two cases was the intended change of #18140.
So the present PR complements that work and makes sure that we have an ensure_monotonic
check in all of our monotonic rendering paths.
.map_err(AdapterError::Internal) | ||
Plan::<mz_repr::Timestamp>::finalize_dataflow( | ||
dataflow, | ||
self.catalog() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess without tweaking the dataflow
construction logic this is not sufficient for triggering the refine_single_time_dataflow
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct. I did not change the setting of until
here for dataflow
, because otherwise, we'd show one-shot SELECT
plans on EXPLAIN PHYSICAL PLAN
when the feature flag is enabled. I propose that we leave figuring out the right output to produce for EXPLAIN
as part of #18089.
1b5a444
to
6230505
Compare
This commit fixes the invalid accumulation tests in the mzcompose cluster test workflows to use non-monotonic rendering whenever a min/max aggregate is employed. This is ensured by use of materialized views instead of one-off SELECTs in the tests, since the latter now exploit logical monotonicity.
This commit fixes the negative multiplicities tests in testdrive to use non-monotonic rendering whenever a min/max aggregate is employed. This is ensured by use of indexed views instead of one-off SELECTs in the tests, since the latter now exploit logical monotonicity.
This commit adds a monotonicity check with the EnsureMonotonic operator to the rendering of monotonic top-k. This step increases the consistency of the rendering code across monotonic top-k, monotonic top-1, and monotonic hierarchical reductions.
This commit introduces the feature flag enable_monotonic_oneshot_selects, turned off by default, and gates single-time operator selection to use monotonic plans for one-shot SELECTs behind this flag. As a consequence, after this commit, all tests in CI will by default execute against non-monotonic plans, as previously. The commit enables us to carefully evaluate when to make the functionality of monotonic plans for one-shot SELECTs available by default.
This commit includes code improvements to refactor plan refinements performed at the end of Plan::finalize_dataflow into separate functions, improving readability. In addition, the commit: (a) makes explicit in the documentation of the API of Coordinator::finalize_dataflow the new requirement to set the until field of the dataflow, and (b) adds a method to DataflowDescription to test if a dataflow refers to a single timestamp.
This commit refactors the method flag_snapshot added in connection with single-time dataflow refinement to a more general as_monotonic conversion method. The as_monotonic name reflects better the intent to upgrade plans to monotonic, but can also be used uniformly to set consolidation requirements. The commit also includes an as_monotonic method in CollationPlan, making the usage in single-time dataflow refinement more consistent.
This commit introduces the consolidate_named_if method in CollectionExt that allows us to conditionally consolidate a differential dataflow collection. The method is used in rendering in monotonic operators, which take a must_consolidate flag to indicate whether their input requires a consolidation step to turn logical into physical monotonicity. The method allows the uses in rendering to reduce code duplication and avoid mutability in collection assignments.
6230505
to
a80ee03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM! Thanks for the readability improvements!
This commit adds instrumentation to finalize_dataflow so that lowering from MIR to LIR as well as subsequent plan refinements can be observed through EXPLAIN OPTIMIZER TRACE. As part of the commit, the lowering step is abstracted into a separate function and the physical plan explain stage is renamed to finalize_dataflow, where mir_to_lir is but one of its components.
a80ee03
to
44c0888
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thank you all for the reviews! |
Demonstration of one approach to introducing monotonicity for SELECT queries, by informing LIR plans of consolidation that when performed should result in collections with non-negative updates.
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-proto
label.