New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework column_knowledge
so it is recursion_safe
#18323
Rework column_knowledge
so it is recursion_safe
#18323
Conversation
81a3df6
to
e11a42f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With respect to hand-crafted queries, I have given a few to Alexander and he will paste them in an SLT file.
With respect to randomized testing:
No plan regressions with non WMR queries.
With WMR queries, the optimization kicks in as described -- all the changes are localized to the Map
s and there were no plan regressions as compared to the prior commit, no wrong results.
08acc37
to
d62885a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These WMR optimizations seem to be harder than I thought, see comments below...
// barriers for this optimization. | ||
|
||
// A map to hold knowledge from shadowed bindings | ||
let mut shadowed_knowledge = BTreeMap::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NormalizeLets
already assumes that there is no shadowing, erroring out if it encounters shadowing. I'm guessing you are handling shadowing here because the ::Let
block of ColumnKnowledge
is handling it. But I guess the decision that we won't have shadowing was made later than the ::Let
block of ColumnKnowledge
was written. I'm now not sure whether to
- Remove the shadowing handling from
ColumnKnowledge
(and any other code that we touch) completely. - Just don't handle it in any new code.
- Handle it where it's easy to handle.
I guess we can leave your code as it is, this is more a question for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a similar issue in the typechecker---I didn't know shadowing was disallowed, so I wrote code to mostly account for it (panicking when a let rec shadows another one).
Considering the typechecker's role, I'm tempted to check for _non-_shadowing (a/k/a "freshness") just to make sure nobody accidentally broke the invariant. Does that seem reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have talked about introducing the invariant and it is true that NormalizeLets enforces it. I guess I'm just paranoid here. I'm alignment with that I should probably change this code to panic if it sees a shadowing ID.
Wait a moment, we have an even worse problem: |
How about having just a basic implementation where we just don't propagate knowledge through Gets whose Id is in |
Btw. Edit: Making |
(Or, maybe a shortcut solution to avoid the big refactoring but still do a proper fixpoint loop would be to clone the tree at every step of the fixpoint loop, call Edit 2: Alex was also suggesting the cloning trick here. Now I was thinking about this again, and I think this would work! Even if |
Edit: Also wouldn't work for |
Terminological proposal: Which way is up in our lattices is just a terminological convention, i.e., everything we say about our lattices would work the same way if we flip "up" and "down" everywhere. This is true for each of our optimizations separately, i.e., we need to decide which way is up for each of our optimizations. It would be great to come up with a way to decide which way is up that has some consistency across optimizations. Many of our optimizations are about exploiting that some operator's input is constrained in some way or its output will be subjected to some constraints (and then the optimization is to do less work by not handling those inputs that won't occur or not emitting those outputs that would be eliminated later). I'd propose to use downwards motion for getting to know about tighter constraints on the inputs or outputs of our operators. (This is also consistent with slide 3 here.) For example:
I'm also thinking about what to do when a component of our lattice is a boolean (e.g., monotonicity or nullability). It sounds great at first glance to always set up the boolean in a way that
|
👍 I think this is consistent with most of the literature I checked so far.
I realized that as well on Friday, so I'm going to remove it. In addition, specifically for enum DatumKnowledge {
Any(/* nullable: */ bool),
Literal(/* value: */ Result<mz_repr::Row, EvalError>, /* typ: */ ScalarType),
Nothing
} where DatumKnowledge::Nothing < DatumKnowledge::Literal(_, _) < DatumKnowledge::Any(true)` and DatumKnowledge::Any(false) < DatumKnowledge::Any(true) As a diagram using the graph BT;
Nothing --> lit_dots("Literal(..., Int64)");
Nothing --> lit_1("Literal(Ok(1), Int64)");
Nothing --> lit_2("Literal(Ok(2), Int64)");
Nothing --> lit_err("Literal(Err(...), Int64)");
Nothing --> lit_null("Literal(Ok(null))");
lit_dots --> any_false("Any(false)");
lit_1 --> any_false("Any(false)");
lit_2 --> any_false("Any(false)");
lit_err --> any_false("Any(false)");
lit_null --> any_true("Any(true)");
any_false --> any_true;
|
I was thinking that as well. In the "advanced case" description of the associated issue I was proposing to always operate on a clone of the |
The new Ok, great, so then we have workable plans for both the basic and advanced versions. And with the cloning solution, even the advanced version wouldn't be a prohibitively big effort. |
Yes, I would say the strategy should be as follows:
|
Ok, sounds good! |
d62885a
to
b98fb71
Compare
3089c8b
to
0cc7884
Compare
// iterations. As a consequence, the "no shadowing" | ||
// assertion at the beginning of this block will fail at the | ||
// inner LetRec for the second outer LetRec iteration. | ||
for id in ids.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggevay / @mgree This was somewhat surprising to me, but at the end of the day a logical consequence of :
- running the fixpoint loop above,
- allowing nested
LetRec
blocks.
I believe that this "remove from context map" requirement will carry over to other transforms that have a loop in their LetRec { .. }
case.
Unfortunately, |
0cc7884
to
288948c
Compare
I probably messed up the lattice logic somewhere or I substituted the old with the new API in an inconsistent way. I will fix this tomorrow morning. Update: I did mess up the lattice logic somewhere. |
Ok. (It's also possible that the slightly smarter lattice now triggered some bug that was already present before.) |
Item No 1 . This query is incorrectly constant-folded to return no rows in this branch and correctly evaluated in full in the prior commit to return 1 row:
|
Item No 2. The simplified query from the SLT test is this:
The plan is missing the |
288948c
to
a83bcfb
Compare
I think all issues reported so far were caused by me not correctly implementing the lattice
cases. |
b62338d
to
d44a1e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
// counts that will be enfoced when actually evaluating the | ||
// loop (see #18362 and #16800). | ||
if curr_iteration >= max_iterations { | ||
if curr_iteration >= 3 * let_rec_arity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one more thing: I think we need >
, because we need 1 iteration to detect that there is no change.
// the following conditions is met: | ||
// | ||
// 1. The knowledge bindings have stabilized at a fixpoint. | ||
// 1. No fixpoint was found after `max_iterations`. If this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loop { | ||
// Check for condition (1). | ||
// TODO: This should respect the soft and hard max iteration | ||
// counts that will be enfoced when actually evaluating the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enfoced
let max_iterations = 100; | ||
let mut curr_iteration = 0; | ||
loop { | ||
// Check for condition (1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2
break; | ||
} | ||
|
||
// Check for condition (2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1
curr_iteration += 1; | ||
} | ||
|
||
// Descend into the value and the body with the inferred knowledge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Descend into the values with the inferred knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No wrong results with d44a1e2 and no panics.
The query plans on WMR queries are invariably better as compared to the parent commit.
d44a1e2
to
0802c4d
Compare
cba471f
to
b80a92c
Compare
Move `EXPLAIN` tests from `with_mutually_recursive.slt` to a file that lives in the `transform/` subfolder.
b80a92c
to
9f7480f
Compare
Fixes #18161 (the basic case).
Motivation
Tips for reviewer
v1
is ready for review.column_knowledge
so it isrecursion_safe
#18161. The comments on the*.slt
tests also describe some cases where we can be smarter, but this is left as future work.v2
is ready for review.fold_constants
so it isrecursion_safe
#18382 so I can merge the latter.v1
commit changed slightly.(v2)
) implements the "advanced version" of the Reworkcolumn_knowledge
so it isrecursion_safe
#18161 issue description. This should be the focus of the review, so here are some tips:DatumKnowledge
implementation and make sure that thejoin_assign
andmeet_assign
methods adhere to the bounded lattice laws. I plan to add some proptest-style tests as a skunkworks project to validate those.union ⇒ join_assign
andabsorb ⇒ meet_assign
.LeetRec
block implementation in the harvest method. The new implementation found a bug and improved coverage of the tests added inv1
.Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-proto
label.