Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework column_knowledge so it is recursion_safe #18323

Merged

Conversation

aalexandrov
Copy link
Contributor

@aalexandrov aalexandrov commented Mar 22, 2023

Fixes #18161 (the basic case).

Motivation

  • This PR adds a known-desirable feature.

Tips for reviewer

v1 is ready for review.

v2 is ready for review.

  • I swapped the order of this PR and Rework fold_constants so it is recursion_safe #18382 so I can merge the latter.
  • Consequently, the tests in the original v1 commit changed slightly.
  • The second commit (suffixed with (v2)) implements the "advanced version" of the Rework column_knowledge so it is recursion_safe #18161 issue description. This should be the focus of the review, so here are some tips:
    • Start with the new DatumKnowledge implementation and make sure that the join_assign and meet_assign methods adhere to the bounded lattice laws. I plan to add some proptest-style tests as a skunkworks project to validate those.
    • Proceed with changes made to the rest of the code based on the new API. In general this is mostly renaming union ⇒ join_assign and absorb ⇒ meet_assign.
    • Finally, review the new LeetRec block implementation in the harvest method. The new implementation found a bug and improved coverage of the tests added in v1.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:
    • There are no user-facing behavior changes.

@aalexandrov aalexandrov added A-optimization Area: query optimization and transformation A-COMPUTE Topics related to the Compute layer labels Mar 22, 2023
@aalexandrov aalexandrov self-assigned this Mar 22, 2023
@aalexandrov aalexandrov force-pushed the 18161-letrec-column_knowledge branch from 81a3df6 to e11a42f Compare March 23, 2023 13:00
@aalexandrov aalexandrov marked this pull request as ready for review March 23, 2023 13:00
@aalexandrov aalexandrov requested a review from a team as a code owner March 23, 2023 13:00
@aalexandrov aalexandrov requested review from philip-stoev and ggevay and removed request for a team March 23, 2023 13:03
Copy link
Contributor

@philip-stoev philip-stoev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With respect to hand-crafted queries, I have given a few to Alexander and he will paste them in an SLT file.

With respect to randomized testing:

No plan regressions with non WMR queries.

With WMR queries, the optimization kicks in as described -- all the changes are localized to the Maps and there were no plan regressions as compared to the prior commit, no wrong results.

@aalexandrov aalexandrov force-pushed the 18161-letrec-column_knowledge branch 2 times, most recently from 08acc37 to d62885a Compare March 24, 2023 17:35
Copy link
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These WMR optimizations seem to be harder than I thought, see comments below...

// barriers for this optimization.

// A map to hold knowledge from shadowed bindings
let mut shadowed_knowledge = BTreeMap::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NormalizeLets already assumes that there is no shadowing, erroring out if it encounters shadowing. I'm guessing you are handling shadowing here because the ::Let block of ColumnKnowledge is handling it. But I guess the decision that we won't have shadowing was made later than the ::Let block of ColumnKnowledge was written. I'm now not sure whether to

  • Remove the shadowing handling from ColumnKnowledge (and any other code that we touch) completely.
  • Just don't handle it in any new code.
  • Handle it where it's easy to handle.

I guess we can leave your code as it is, this is more a question for the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a similar issue in the typechecker---I didn't know shadowing was disallowed, so I wrote code to mostly account for it (panicking when a let rec shadows another one).

Considering the typechecker's role, I'm tempted to check for _non-_shadowing (a/k/a "freshness") just to make sure nobody accidentally broke the invariant. Does that seem reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have talked about introducing the invariant and it is true that NormalizeLets enforces it. I guess I'm just paranoid here. I'm alignment with that I should probably change this code to panic if it sees a shadowing ID.

src/transform/src/column_knowledge.rs Outdated Show resolved Hide resolved
src/transform/src/column_knowledge.rs Outdated Show resolved Hide resolved
src/transform/src/column_knowledge.rs Outdated Show resolved Hide resolved
@ggevay
Copy link
Contributor

ggevay commented Mar 25, 2023

Wait a moment, we have an even worse problem: harvest modifies stuff based on the current knowledge. If we call harvest during the fixpoint loop, it will butcher our expressions based on incomplete knowledge. I think the only way to fix this is to refactor the entire ColumnKnowledge to make it possible to just gather knowledge but not apply it, and then apply it in a separate step. So in the fixpoint loop we would just gather knowledge, and then after we reached a fixpoint we would apply it. But this is a pretty big refactoring...

@ggevay
Copy link
Contributor

ggevay commented Mar 25, 2023

How about having just a basic implementation where we just don't propagate knowledge through Gets whose Id is in recursive_ids? We would make the knowledge on these Ids Top, i.e., that we don't know anything. This would still propagate knowledge in many places in the plan, somewhat similarly to what my basic implementation of PredicatePushdown covers.

@ggevay
Copy link
Contributor

ggevay commented Mar 25, 2023

Btw. MonotonicFlag has the exact same problem.

Edit: Making recursive_ids start from bottom is the "optimistic" approach (because it's good for us if the input constraints to our operators are tight) and making them start from top is the "pessimistic" one.

@ggevay
Copy link
Contributor

ggevay commented Mar 25, 2023

(Or, maybe a shortcut solution to avoid the big refactoring but still do a proper fixpoint loop would be to clone the tree at every step of the fixpoint loop, call harvest on the clone, and then throw away the mutated tree and continue working with the original. But I'm not 100% sure this would be correct: It only works if harvest doesn't create knowledge based on the mutations it performs, because the mutations would be invalid. E.g., let's say it temporarily thinks that some predicate is impossible based on the value field, so it wipes away the entire collection, and then concludes from the empty collection that it's not nullable. This looks messy, so I'd say let's just go with the basic implementation for now. Edit: The above example is actually fine.)

Edit 2: Alex was also suggesting the cloning trick here. Now I was thinking about this again, and I think this would work! Even if optimize performs any changes, those changes can't go out of what the current knowledge would imply. In other words, if the changes introduced new knowledge beyond what the current knowledge already implies, then the changes would be invalid.

@ggevay
Copy link
Contributor

ggevay commented Mar 26, 2023

MonotonicFlag does absorb/meet in a fixpoint loop, starting from top 😮 But interestingly, this works correctly there, because (get this) MonotonicFlag is monotonic: if we flip an input to monotonic, the result either stays non-monotonic, stays monotonic, or flips from non-monotonic to monotonic, but can't switch from monotonic to non-monotonic. This means that monotonicity can't oscillate between false and true across iterations. But unfortunately, nullability is not monotonic. @aalexandrov, maybe you were also thinking about something like this when you wrote this comment?

Edit: Also wouldn't work for PredicatePushdown, because an arbitrary predicate can, of course, oscillate.

@ggevay
Copy link
Contributor

ggevay commented Mar 26, 2023

Terminological proposal:

Which way is up in our lattices is just a terminological convention, i.e., everything we say about our lattices would work the same way if we flip "up" and "down" everywhere. This is true for each of our optimizations separately, i.e., we need to decide which way is up for each of our optimizations. It would be great to come up with a way to decide which way is up that has some consistency across optimizations.

Many of our optimizations are about exploiting that some operator's input is constrained in some way or its output will be subjected to some constraints (and then the optimization is to do less work by not handling those inputs that won't occur or not emitting those outputs that would be eliminated later). I'd propose to use downwards motion for getting to know about tighter constraints on the inputs or outputs of our operators. (This is also consistent with slide 3 here.) For example:

  • For nullability, non-nullable is less than nullable.
  • For monotonicity, monotonic is less than non-monotonic.
  • For value, adding a constraint like x=7 (or x>7 in the future) is down.
  • The impossible constraint is the bottom.
  • When we don't know of any constraints, that's the top.
  • In PredicatePushdown (in whose name "down" is understood on the operator tree, i.e., a completely different "down" than what I'm talking about here...), I think a lattice element will be a set of arbitrary predicates expressed as MirScalarExprs. And then we are joining them (I'm saying "join" in the lattice-theoretical sense) when we are intersecting (!) sets of predicates to collect predicate info from the Gets of an Id. (Intersection takes away from the constraints, thereby weakening them, and therefore it's up.)
  • Our optimizations need to compute upper bounds, as explained here.
  • Edit: This terminology unfortunately breaks down a bit with unique key inference, where it would be great to make top be all the possible subsets of columns, but that should be bottom in the above terminology...

I'm also thinking about what to do when a component of our lattice is a boolean (e.g., monotonicity or nullability). It sounds great at first glance to always set up the boolean in a way that false < true, so that we can consistently map && and || to meet and join. But this would imply the following:

  • nullable is ok (but the current comment on nullable is wrong),
  • monotonic is not ok. This sounds a bit sad, to be honest, so maybe we don't need to follow this convention on booleans everywhere.

@aalexandrov
Copy link
Contributor Author

aalexandrov commented Mar 27, 2023

I'd propose to use downwards motion for getting to know about tighter constraints on the inputs or outputs of our operators.

👍 I think this is consistent with most of the literature I checked so far.

nullable is ok (but the current comment on nullable is wrong),

I realized that as well on Friday, so I'm going to remove it.

In addition, specifically for DatumKnowledge I propose

enum DatumKnowledge {
  Any(/* nullable: */ bool),
  Literal(/* value: */ Result<mz_repr::Row, EvalError>, /* typ: */ ScalarType),
  Nothing
}

where

DatumKnowledge::Nothing < DatumKnowledge::Literal(_, _) < DatumKnowledge::Any(true)`

and

DatumKnowledge::Any(false) < DatumKnowledge::Any(true) 

As a diagram using the INT type:

graph BT;
    Nothing --> lit_dots("Literal(..., Int64)");
    Nothing --> lit_1("Literal(Ok(1), Int64)");
    Nothing --> lit_2("Literal(Ok(2), Int64)");
    Nothing --> lit_err("Literal(Err(...), Int64)");
    Nothing --> lit_null("Literal(Ok(null))");
    lit_dots --> any_false("Any(false)");
    lit_1 --> any_false("Any(false)");
    lit_2 --> any_false("Any(false)");
    lit_err --> any_false("Any(false)");
    lit_null --> any_true("Any(true)");
    any_false --> any_true;

@aalexandrov
Copy link
Contributor Author

I think the only way to fix this is to refactor the entire ColumnKnowledge to make it possible to just gather knowledge but not apply it, and then apply it in a separate step.

I was thinking that as well. In the "advanced case" description of the associated issue I was proposing to always operate on a clone of the values in the fixpoint loop. This is a more pragmatic solution that comes at some allocation cost.

@ggevay
Copy link
Contributor

ggevay commented Mar 27, 2023

The new DatumKnowledge enum sounds really good! I really like the diagram, it makes things crystal clear! And it's good that the union of Some(1) and Some(2) will be Any(false).

Ok, great, so then we have workable plans for both the basic and advanced versions. And with the cloning solution, even the advanced version wouldn't be a prohibitively big effort.

@aalexandrov
Copy link
Contributor Author

Yes, I would say the strategy should be as follows:

  1. Implement the advanced version with cloning and a bounded amount of effort. Let's say M = max(n * 2, C) where n is the sum of all LetRec binding arities and C is some constant).
  2. If the above loop doesn't reach a fixpoint after M iterations, descend with Any(true) (the least precise knowledge) for all bindings.

@ggevay
Copy link
Contributor

ggevay commented Mar 27, 2023

Ok, sounds good!

// iterations. As a consequence, the "no shadowing"
// assertion at the beginning of this block will fail at the
// inner LetRec for the second outer LetRec iteration.
for id in ids.iter() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggevay / @mgree This was somewhat surprising to me, but at the end of the day a logical consequence of :

  1. running the fixpoint loop above,
  2. allowing nested LetRec blocks.

I believe that this "remove from context map" requirement will carry over to other transforms that have a loop in their LetRec { .. } case.

@ggevay
Copy link
Contributor

ggevay commented Mar 28, 2023

Unfortunately, test/sqllogictest/joins.slt has some wrong results in some non-LetRec query.

@aalexandrov aalexandrov force-pushed the 18161-letrec-column_knowledge branch from 0cc7884 to 288948c Compare March 28, 2023 17:38
@aalexandrov
Copy link
Contributor Author

aalexandrov commented Mar 28, 2023

I probably messed up the lattice logic somewhere or I substituted the old with the new API in an inconsistent way. I will fix this tomorrow morning.

Update: I did mess up the lattice logic somewhere.

@ggevay
Copy link
Contributor

ggevay commented Mar 28, 2023

Ok. (It's also possible that the slightly smarter lattice now triggered some bug that was already present before.)

@philip-stoev
Copy link
Contributor

Item No 1 . This query is incorrectly constant-folded to return no rows in this branch and correctly evaluated in full in the prior commit to return 1 row:

DROP TABLE IF EXISTS t1 CASCADE;
CREATE TABLE t1 (f1 INTEGER, f2 INTEGER NOT NULL);
INSERT INTO t1 VALUES (1,1);
SELECT * FROM t1 a2  WHERE  NULLIF( f2  , f1  )  IS  NULL  ;

@philip-stoev
Copy link
Contributor

philip-stoev commented Mar 29, 2023

Item No 2. The simplified query from the SLT test is this:

SELECT (SELECT oid FROM mz_catalog.mz_indexes) IS NULL

The plan is missing the Map (error("more than one record produced in subquery")) guard, so it incorrectly returns multiple rows for a query that should only ever return 1 row, if any.

@aalexandrov aalexandrov force-pushed the 18161-letrec-column_knowledge branch from 288948c to a83bcfb Compare March 29, 2023 10:41
@aalexandrov
Copy link
Contributor Author

aalexandrov commented Mar 29, 2023

I think all issues reported so far were caused by me not correctly implementing the lattice join operator for the

  • Lit { null, _ } join Any { false } and
  • Any { false } join Lit { null, _ }

cases.

@aalexandrov aalexandrov force-pushed the 18161-letrec-column_knowledge branch 2 times, most recently from b62338d to d44a1e2 Compare March 29, 2023 16:05
Copy link
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

// counts that will be enfoced when actually evaluating the
// loop (see #18362 and #16800).
if curr_iteration >= max_iterations {
if curr_iteration >= 3 * let_rec_arity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one more thing: I think we need >, because we need 1 iteration to detect that there is no change.

// the following conditions is met:
//
// 1. The knowledge bindings have stabilized at a fixpoint.
// 1. No fixpoint was found after `max_iterations`. If this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loop {
// Check for condition (1).
// TODO: This should respect the soft and hard max iteration
// counts that will be enfoced when actually evaluating the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enfoced

let max_iterations = 100;
let mut curr_iteration = 0;
loop {
// Check for condition (1).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2

break;
}

// Check for condition (2).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1

curr_iteration += 1;
}

// Descend into the value and the body with the inferred knowledge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Descend into the values with the inferred knowledge.

Copy link
Contributor

@philip-stoev philip-stoev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No wrong results with d44a1e2 and no panics.

The query plans on WMR queries are invariably better as compared to the parent commit.

@aalexandrov aalexandrov force-pushed the 18161-letrec-column_knowledge branch from d44a1e2 to 0802c4d Compare March 30, 2023 11:15
@aalexandrov aalexandrov force-pushed the 18161-letrec-column_knowledge branch 2 times, most recently from cba471f to b80a92c Compare March 30, 2023 14:15
@aalexandrov aalexandrov force-pushed the 18161-letrec-column_knowledge branch from b80a92c to 9f7480f Compare March 30, 2023 14:57
@aalexandrov aalexandrov merged commit 24c5793 into MaterializeInc:main Mar 30, 2023
@aalexandrov aalexandrov deleted the 18161-letrec-column_knowledge branch March 30, 2023 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-COMPUTE Topics related to the Compute layer A-optimization Area: query optimization and transformation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rework column_knowledge so it is recursion_safe
4 participants