Rework `column_knowledge` so it is `recursion_safe` #18323

aalexandrov · 2023-03-22T09:03:07Z

Fixes #18161 (the basic case).

Motivation

This PR adds a known-desirable feature.

Tips for reviewer

v1 is ready for review.

The first commit consists of some cosmetic changes.
The second commit fixes Rework column_knowledge so it is recursion_safe #18161. The comments on the *.slt tests also describe some cases where we can be smarter, but this is left as future work.

v2 is ready for review.

I swapped the order of this PR and Rework fold_constants so it is recursion_safe #18382 so I can merge the latter.
Consequently, the tests in the original v1 commit changed slightly.
The second commit (suffixed with (v2)) implements the "advanced version" of the Rework column_knowledge so it is recursion_safe #18161 issue description. This should be the focus of the review, so here are some tips:
- Start with the new DatumKnowledge implementation and make sure that the join_assign and meet_assign methods adhere to the bounded lattice laws. I plan to add some proptest-style tests as a skunkworks project to validate those.
- Proceed with changes made to the rest of the code based on the new API. In general this is mostly renaming union ⇒ join_assign and absorb ⇒ meet_assign.
- Finally, review the new LeetRec block implementation in the harvest method. The new implementation found a bug and improved coverage of the tests added in v1.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:
- There are no user-facing behavior changes.

philip-stoev

With respect to hand-crafted queries, I have given a few to Alexander and he will paste them in an SLT file.

With respect to randomized testing:

No plan regressions with non WMR queries.

With WMR queries, the optimization kicks in as described -- all the changes are localized to the Maps and there were no plan regressions as compared to the prior commit, no wrong results.

ggevay

These WMR optimizations seem to be harder than I thought, see comments below...

ggevay · 2023-03-24T18:01:52Z

src/transform/src/column_knowledge.rs

+                    // barriers for this optimization.
+
+                    // A map to hold knowledge from shadowed bindings
+                    let mut shadowed_knowledge = BTreeMap::new();


NormalizeLets already assumes that there is no shadowing, erroring out if it encounters shadowing. I'm guessing you are handling shadowing here because the ::Let block of ColumnKnowledge is handling it. But I guess the decision that we won't have shadowing was made later than the ::Let block of ColumnKnowledge was written. I'm now not sure whether to

Remove the shadowing handling from ColumnKnowledge (and any other code that we touch) completely.

Just don't handle it in any new code.

Handle it where it's easy to handle.

I guess we can leave your code as it is, this is more a question for the future.

I have a similar issue in the typechecker---I didn't know shadowing was disallowed, so I wrote code to mostly account for it (panicking when a let rec shadows another one).

Considering the typechecker's role, I'm tempted to check for _non-_shadowing (a/k/a "freshness") just to make sure nobody accidentally broke the invariant. Does that seem reasonable?

We have talked about introducing the invariant and it is true that NormalizeLets enforces it. I guess I'm just paranoid here. I'm alignment with that I should probably change this code to panic if it sees a shadowing ID.

src/transform/src/column_knowledge.rs

ggevay · 2023-03-25T20:30:45Z

Wait a moment, we have an even worse problem: harvest modifies stuff based on the current knowledge. If we call harvest during the fixpoint loop, it will butcher our expressions based on incomplete knowledge. I think the only way to fix this is to refactor the entire ColumnKnowledge to make it possible to just gather knowledge but not apply it, and then apply it in a separate step. So in the fixpoint loop we would just gather knowledge, and then after we reached a fixpoint we would apply it. But this is a pretty big refactoring...

ggevay · 2023-03-25T20:43:40Z

How about having just a basic implementation where we just don't propagate knowledge through Gets whose Id is in recursive_ids? We would make the knowledge on these Ids Top, i.e., that we don't know anything. This would still propagate knowledge in many places in the plan, somewhat similarly to what my basic implementation of PredicatePushdown covers.

ggevay · 2023-03-25T20:48:54Z

Btw. MonotonicFlag has the exact same problem.

Edit: Making recursive_ids start from bottom is the "optimistic" approach (because it's good for us if the input constraints to our operators are tight) and making them start from top is the "pessimistic" one.

ggevay · 2023-03-25T22:14:04Z

(Or, maybe a shortcut solution to avoid the big refactoring but still do a proper fixpoint loop would be to clone the tree at every step of the fixpoint loop, call harvest on the clone, and then throw away the mutated tree and continue working with the original. But I'm not 100% sure this would be correct: It only works if harvest doesn't create knowledge based on the mutations it performs, because the mutations would be invalid. E.g., let's say it temporarily thinks that some predicate is impossible based on the value field, so it wipes away the entire collection, and then concludes from the empty collection that it's not nullable. This looks messy, so I'd say let's just go with the basic implementation for now. Edit: The above example is actually fine.)

Edit 2: Alex was also suggesting the cloning trick here. Now I was thinking about this again, and I think this would work! Even if optimize performs any changes, those changes can't go out of what the current knowledge would imply. In other words, if the changes introduced new knowledge beyond what the current knowledge already implies, then the changes would be invalid.

ggevay · 2023-03-26T11:50:01Z

MonotonicFlag does absorb/meet in a fixpoint loop, starting from top 😮 But interestingly, this works correctly there, because (get this) MonotonicFlag is monotonic: if we flip an input to monotonic, the result either stays non-monotonic, stays monotonic, or flips from non-monotonic to monotonic, but can't switch from monotonic to non-monotonic. This means that monotonicity can't oscillate between false and true across iterations. But unfortunately, nullability is not monotonic. @aalexandrov, maybe you were also thinking about something like this when you wrote this comment?

Edit: Also wouldn't work for PredicatePushdown, because an arbitrary predicate can, of course, oscillate.

ggevay · 2023-03-26T13:12:13Z

Terminological proposal:

Which way is up in our lattices is just a terminological convention, i.e., everything we say about our lattices would work the same way if we flip "up" and "down" everywhere. This is true for each of our optimizations separately, i.e., we need to decide which way is up for each of our optimizations. It would be great to come up with a way to decide which way is up that has some consistency across optimizations.

Many of our optimizations are about exploiting that some operator's input is constrained in some way or its output will be subjected to some constraints (and then the optimization is to do less work by not handling those inputs that won't occur or not emitting those outputs that would be eliminated later). I'd propose to use downwards motion for getting to know about tighter constraints on the inputs or outputs of our operators. (This is also consistent with slide 3 here.) For example:

For nullability, non-nullable is less than nullable.
For monotonicity, monotonic is less than non-monotonic.
For value, adding a constraint like x=7 (or x>7 in the future) is down.
The impossible constraint is the bottom.
When we don't know of any constraints, that's the top.
In PredicatePushdown (in whose name "down" is understood on the operator tree, i.e., a completely different "down" than what I'm talking about here...), I think a lattice element will be a set of arbitrary predicates expressed as MirScalarExprs. And then we are joining them (I'm saying "join" in the lattice-theoretical sense) when we are intersecting (!) sets of predicates to collect predicate info from the Gets of an Id. (Intersection takes away from the constraints, thereby weakening them, and therefore it's up.)
Our optimizations need to compute upper bounds, as explained here.
Edit: This terminology unfortunately breaks down a bit with unique key inference, where it would be great to make top be all the possible subsets of columns, but that should be bottom in the above terminology...

I'm also thinking about what to do when a component of our lattice is a boolean (e.g., monotonicity or nullability). It sounds great at first glance to always set up the boolean in a way that false < true, so that we can consistently map && and || to meet and join. But this would imply the following:

nullable is ok (but the current comment on nullable is wrong),
monotonic is not ok. This sounds a bit sad, to be honest, so maybe we don't need to follow this convention on booleans everywhere.

aalexandrov · 2023-03-27T09:27:14Z

I'd propose to use downwards motion for getting to know about tighter constraints on the inputs or outputs of our operators.

👍 I think this is consistent with most of the literature I checked so far.

nullable is ok (but the current comment on nullable is wrong),

I realized that as well on Friday, so I'm going to remove it.

In addition, specifically for DatumKnowledge I propose

enum DatumKnowledge {
  Any(/* nullable: */ bool),
  Literal(/* value: */ Result<mz_repr::Row, EvalError>, /* typ: */ ScalarType),
  Nothing
}

where

DatumKnowledge::Nothing < DatumKnowledge::Literal(_, _) < DatumKnowledge::Any(true)`

and

DatumKnowledge::Any(false) < DatumKnowledge::Any(true)

As a diagram using the INT type:

graph BT;
    Nothing --> lit_dots("Literal(..., Int64)");
    Nothing --> lit_1("Literal(Ok(1), Int64)");
    Nothing --> lit_2("Literal(Ok(2), Int64)");
    Nothing --> lit_err("Literal(Err(...), Int64)");
    Nothing --> lit_null("Literal(Ok(null))");
    lit_dots --> any_false("Any(false)");
    lit_1 --> any_false("Any(false)");
    lit_2 --> any_false("Any(false)");
    lit_err --> any_false("Any(false)");
    lit_null --> any_true("Any(true)");
    any_false --> any_true;

aalexandrov · 2023-03-27T09:33:39Z

I think the only way to fix this is to refactor the entire ColumnKnowledge to make it possible to just gather knowledge but not apply it, and then apply it in a separate step.

I was thinking that as well. In the "advanced case" description of the associated issue I was proposing to always operate on a clone of the values in the fixpoint loop. This is a more pragmatic solution that comes at some allocation cost.

ggevay · 2023-03-27T09:49:02Z

The new DatumKnowledge enum sounds really good! I really like the diagram, it makes things crystal clear! And it's good that the union of Some(1) and Some(2) will be Any(false).

Ok, great, so then we have workable plans for both the basic and advanced versions. And with the cloning solution, even the advanced version wouldn't be a prohibitively big effort.

aalexandrov · 2023-03-27T10:20:02Z

Yes, I would say the strategy should be as follows:

Implement the advanced version with cloning and a bounded amount of effort. Let's say M = max(n * 2, C) where n is the sum of all LetRec binding arities and C is some constant).
If the above loop doesn't reach a fixpoint after M iterations, descend with Any(true) (the least precise knowledge) for all bindings.

ggevay · 2023-03-27T10:46:12Z

Ok, sounds good!

test/sqllogictest/transform/column_knowledge.slt

aalexandrov · 2023-03-28T17:20:38Z

src/transform/src/column_knowledge.rs

+                    // iterations. As a consequence, the "no shadowing"
+                    // assertion at the beginning of this block will fail at the
+                    // inner LetRec for the second outer LetRec iteration.
+                    for id in ids.iter() {


@ggevay / @mgree This was somewhat surprising to me, but at the end of the day a logical consequence of :

running the fixpoint loop above,

allowing nested LetRec blocks.

I believe that this "remove from context map" requirement will carry over to other transforms that have a loop in their LetRec { .. } case.

ggevay · 2023-03-28T17:21:42Z

Unfortunately, test/sqllogictest/joins.slt has some wrong results in some non-LetRec query.

aalexandrov · 2023-03-28T17:55:31Z

I probably messed up the lattice logic somewhere or I substituted the old with the new API in an inconsistent way. I will fix this tomorrow morning.

Update: I did mess up the lattice logic somewhere.

ggevay · 2023-03-28T17:58:18Z

Ok. (It's also possible that the slightly smarter lattice now triggered some bug that was already present before.)

philip-stoev · 2023-03-29T10:13:49Z

Item No 1 . This query is incorrectly constant-folded to return no rows in this branch and correctly evaluated in full in the prior commit to return 1 row:

DROP TABLE IF EXISTS t1 CASCADE;
CREATE TABLE t1 (f1 INTEGER, f2 INTEGER NOT NULL);
INSERT INTO t1 VALUES (1,1);
SELECT * FROM t1 a2  WHERE  NULLIF( f2  , f1  )  IS  NULL  ;

philip-stoev · 2023-03-29T10:24:50Z

Item No 2. The simplified query from the SLT test is this:

SELECT (SELECT oid FROM mz_catalog.mz_indexes) IS NULL

The plan is missing the Map (error("more than one record produced in subquery")) guard, so it incorrectly returns multiple rows for a query that should only ever return 1 row, if any.

aalexandrov · 2023-03-29T10:47:02Z

I think all issues reported so far were caused by me not correctly implementing the lattice join operator for the

Lit { null, _ } join Any { false } and
Any { false } join Lit { null, _ }

cases.

ggevay

Great!

ggevay · 2023-03-30T08:21:25Z

src/transform/src/column_knowledge.rs

+                        // counts that will be enfoced when actually evaluating the
+                        // loop (see #18362 and #16800).
+                        if curr_iteration >= max_iterations {
+                            if curr_iteration >= 3 * let_rec_arity {


Sorry, one more thing: I think we need >, because we need 1 iteration to detect that there is no change.

ggevay · 2023-03-30T08:25:32Z

src/transform/src/column_knowledge.rs

+                    // the following conditions is met:
+                    //
+                    // 1. The knowledge bindings have stabilized at a fixpoint.
+                    // 1. No fixpoint was found after `max_iterations`. If this


ggevay · 2023-03-30T08:25:50Z

src/transform/src/column_knowledge.rs

+                    loop {
+                        // Check for condition (1).
+                        // TODO: This should respect the soft and hard max iteration
+                        // counts that will be enfoced when actually evaluating the


ggevay · 2023-03-30T08:27:10Z

src/transform/src/column_knowledge.rs

+                    let max_iterations = 100;
+                    let mut curr_iteration = 0;
+                    loop {
+                        // Check for condition (1).


ggevay · 2023-03-30T08:27:14Z

src/transform/src/column_knowledge.rs

+                            break;
+                        }
+
+                        // Check for condition (2).


ggevay · 2023-03-30T08:29:13Z

src/transform/src/column_knowledge.rs

+                        curr_iteration += 1;
+                    }
+
+                    // Descend into the value and the body with the inferred knowledge.


Descend into the values with the inferred knowledge.

philip-stoev

No wrong results with d44a1e2 and no panics.

The query plans on WMR queries are invariably better as compared to the parent commit.

Move `EXPLAIN` tests from `with_mutually_recursive.slt` to a file that lives in the `transform/` subfolder.

aalexandrov added A-optimization Area: query optimization and transformation A-COMPUTE Topics related to the Compute layer labels Mar 22, 2023

aalexandrov self-assigned this Mar 22, 2023

aalexandrov force-pushed the 18161-letrec-column_knowledge branch from 81a3df6 to e11a42f Compare March 23, 2023 13:00

aalexandrov marked this pull request as ready for review March 23, 2023 13:00

aalexandrov requested a review from a team as a code owner March 23, 2023 13:00

aalexandrov requested review from philip-stoev and ggevay and removed request for a team March 23, 2023 13:03

philip-stoev approved these changes Mar 23, 2023

View reviewed changes

aalexandrov mentioned this pull request Mar 24, 2023

Rework fold_constants so it is recursion_safe #18382

Merged

5 tasks

aalexandrov force-pushed the 18161-letrec-column_knowledge branch 2 times, most recently from 08acc37 to d62885a Compare March 24, 2023 17:35

ggevay requested changes Mar 24, 2023

View reviewed changes

ggevay mentioned this pull request Mar 24, 2023

Impossible predicate detection is all over the place #18091

Open

aalexandrov mentioned this pull request Mar 28, 2023

Rework column_knowledge so it is recursion_safe #18161

Closed

aalexandrov force-pushed the 18161-letrec-column_knowledge branch from d62885a to b98fb71 Compare March 28, 2023 13:07

aalexandrov commented Mar 28, 2023

View reviewed changes

test/sqllogictest/transform/column_knowledge.slt Show resolved Hide resolved

aalexandrov commented Mar 28, 2023

View reviewed changes

test/sqllogictest/transform/column_knowledge.slt Outdated Show resolved Hide resolved

aalexandrov force-pushed the 18161-letrec-column_knowledge branch from 3089c8b to 0cc7884 Compare March 28, 2023 17:17

aalexandrov requested a review from ggevay March 28, 2023 17:18

aalexandrov commented Mar 28, 2023

View reviewed changes

aalexandrov force-pushed the 18161-letrec-column_knowledge branch from 0cc7884 to 288948c Compare March 28, 2023 17:38

aalexandrov force-pushed the 18161-letrec-column_knowledge branch from 288948c to a83bcfb Compare March 29, 2023 10:41

aalexandrov force-pushed the 18161-letrec-column_knowledge branch 2 times, most recently from b62338d to d44a1e2 Compare March 29, 2023 16:05

ggevay approved these changes Mar 30, 2023

View reviewed changes

philip-stoev approved these changes Mar 30, 2023

View reviewed changes

aalexandrov mentioned this pull request Mar 30, 2023

Cancellation of divergent recursive dataflows #16800

Closed

aalexandrov force-pushed the 18161-letrec-column_knowledge branch from d44a1e2 to 0802c4d Compare March 30, 2023 11:15

aalexandrov enabled auto-merge March 30, 2023 11:17

aalexandrov force-pushed the 18161-letrec-column_knowledge branch 2 times, most recently from cba471f to b80a92c Compare March 30, 2023 14:15

aalexandrov added 4 commits March 30, 2023 17:56

transform: cosmetic changes + minor fixes in ColumnKnowledge

a59a46b

transform: refactor DatumKnowledge helper for ColumnKnowledge

707a8ce

transform: rework column_knowledge to make it recursion_safe

db0d891

test: move parts of with_mutually_recursive.slt to transform/

9f7480f

Move `EXPLAIN` tests from `with_mutually_recursive.slt` to a file that lives in the `transform/` subfolder.

aalexandrov force-pushed the 18161-letrec-column_knowledge branch from b80a92c to 9f7480f Compare March 30, 2023 14:57

aalexandrov merged commit 24c5793 into MaterializeInc:main Mar 30, 2023

aalexandrov deleted the 18161-letrec-column_knowledge branch March 30, 2023 16:55

aalexandrov mentioned this pull request Mar 31, 2023

transform: use base_eq in DatumKnowledge::join_assign #18524

Merged

5 tasks

materialize-bot mentioned this pull request Mar 31, 2023

release: v0.49.0 required reviews #18541

Closed

8 tasks

ggevay mentioned this pull request Apr 3, 2023

Improve unique key inference for LetRec #18553

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework `column_knowledge` so it is `recursion_safe` #18323

Rework `column_knowledge` so it is `recursion_safe` #18323

aalexandrov commented Mar 22, 2023 •

edited

philip-stoev left a comment

ggevay left a comment

ggevay Mar 24, 2023

mgree Mar 24, 2023

aalexandrov Mar 24, 2023

ggevay commented Mar 25, 2023

ggevay commented Mar 25, 2023

ggevay commented Mar 25, 2023 •

edited

ggevay commented Mar 25, 2023 •

edited

ggevay commented Mar 26, 2023 •

edited

ggevay commented Mar 26, 2023 •

edited

aalexandrov commented Mar 27, 2023 •

edited

aalexandrov commented Mar 27, 2023

ggevay commented Mar 27, 2023 •

edited

aalexandrov commented Mar 27, 2023

ggevay commented Mar 27, 2023

aalexandrov Mar 28, 2023 •

edited

ggevay commented Mar 28, 2023

aalexandrov commented Mar 28, 2023 •

edited

ggevay commented Mar 28, 2023

philip-stoev commented Mar 29, 2023

philip-stoev commented Mar 29, 2023 •

edited

aalexandrov commented Mar 29, 2023 •

edited

ggevay left a comment

ggevay Mar 30, 2023

ggevay Mar 30, 2023

ggevay Mar 30, 2023

ggevay Mar 30, 2023

ggevay Mar 30, 2023

ggevay Mar 30, 2023

philip-stoev left a comment

Rework column_knowledge so it is recursion_safe #18323

Rework column_knowledge so it is recursion_safe #18323

Conversation

aalexandrov commented Mar 22, 2023 • edited

Motivation

Tips for reviewer

Checklist

philip-stoev left a comment

Choose a reason for hiding this comment

ggevay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggevay commented Mar 25, 2023

ggevay commented Mar 25, 2023

ggevay commented Mar 25, 2023 • edited

ggevay commented Mar 25, 2023 • edited

ggevay commented Mar 26, 2023 • edited

ggevay commented Mar 26, 2023 • edited

aalexandrov commented Mar 27, 2023 • edited

aalexandrov commented Mar 27, 2023

ggevay commented Mar 27, 2023 • edited

aalexandrov commented Mar 27, 2023

ggevay commented Mar 27, 2023

aalexandrov Mar 28, 2023 • edited

Choose a reason for hiding this comment

ggevay commented Mar 28, 2023

aalexandrov commented Mar 28, 2023 • edited

ggevay commented Mar 28, 2023

philip-stoev commented Mar 29, 2023

philip-stoev commented Mar 29, 2023 • edited

aalexandrov commented Mar 29, 2023 • edited

ggevay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philip-stoev left a comment

Choose a reason for hiding this comment

Rework `column_knowledge` so it is `recursion_safe` #18323

Rework `column_knowledge` so it is `recursion_safe` #18323

aalexandrov commented Mar 22, 2023 •

edited

ggevay commented Mar 25, 2023 •

edited

ggevay commented Mar 25, 2023 •

edited

ggevay commented Mar 26, 2023 •

edited

ggevay commented Mar 26, 2023 •

edited

aalexandrov commented Mar 27, 2023 •

edited

ggevay commented Mar 27, 2023 •

edited

aalexandrov Mar 28, 2023 •

edited

aalexandrov commented Mar 28, 2023 •

edited

philip-stoev commented Mar 29, 2023 •

edited

aalexandrov commented Mar 29, 2023 •

edited