Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve outer join lowering #16451

Closed

Conversation

frankmcsherry
Copy link
Contributor

@frankmcsherry frankmcsherry commented Dec 4, 2022

This PR improves the "optimistic" outer join lowering from applying only to exact column equality to equalities of expressions supported by the two inputs. This is most clearly seen on outer joins that appear to be column equality but are in fact type conversions. For example,

create table foo (a int8, u text);
create table bar (b int4, v text);
explain select * from foo left join bar on (a = b);

On main this plans as

materialize=> explain select * from foo left join bar on (a = b);
                        Optimized Plan                        
--------------------------------------------------------------
 Explained Query:                                            +
   Return                                                    +
     Union                                                   +
       Get l0                                                +
       Project (#0, #1, #4, #5)                              +
         Map (null, null)                                    +
           Join on=(#0 = #2 AND #1 = #3) type=differential   +
             ArrangeBy keys=[[#0, #1]]                       +
               Union                                         +
                 Negate                                      +
                   Distinct group_by=[#0, #1]                +
                     Project (#0, #1)                        +
                       Get l0                                +
                 Distinct group_by=[#0, #1]                  +
                   Get materialize.public.foo                +
             ArrangeBy keys=[[#0, #1]]                       +
               Get materialize.public.foo                    +
   With                                                      +
     cte l0 =                                                +
       Join on=(#0 = integer_to_bigint(#2)) type=differential+
         ArrangeBy keys=[[#0]]                               +
           Filter (#0) IS NOT NULL                           +
             Get materialize.public.foo                      +
         ArrangeBy keys=[[integer_to_bigint(#0)]]            +
           Filter (#0) IS NOT NULL                           +
             Get materialize.public.bar                      +

and in this PR plans as

materialize=> explain select * from foo left join bar on (a = b);
                        Optimized Plan                        
--------------------------------------------------------------
 Explained Query:                                            +
   Return                                                    +
     Union                                                   +
       Map (null, null)                                      +
         Union                                               +
           Negate                                            +
             Project (#0, #1)                                +
               Join on=(#0 = #2) type=differential           +
                 ArrangeBy keys=[[#0]]                       +
                   Get materialize.public.foo                +
                 ArrangeBy keys=[[#0]]                       +
                   Distinct group_by=[#0]                    +
                     Project (#0)                            +
                       Get l0                                +
           Get materialize.public.foo                        +
       Get l0                                                +
   With                                                      +
     cte l0 =                                                +
       Join on=(#0 = integer_to_bigint(#2)) type=differential+
         ArrangeBy keys=[[#0]]                               +
           Filter (#0) IS NOT NULL                           +
             Get materialize.public.foo                      +
         ArrangeBy keys=[[integer_to_bigint(#0)]]            +
           Filter (#0) IS NOT NULL                           +
             Get materialize.public.bar                      +

The two CTEs are identical, but in the main case there is a Distinct on multiple columns of foo, rather than just on the key column. The main plan generally will keep full copies of all columns of foo, rather than just the distinct keys.

Motivation

  • This PR adds a feature that has not yet been specified.

Outer join lowering was limited in scope to exact column equalities. In many cases there can be conversions of types, truncation of values, or other transformations that foil the existing lowering but are not incorrect to perform.

Tips for reviewer

This has not been exercised on exotic outer joins such as those that do not just equate simple transformations of columns. It still appears correct, under the premise that it finds outer joins that could have been column equality if each input had been subjected to map(keys). Tagging in @philip-stoev as a review to point several unpleasant forms of outer joins at this PR.

Checklist

@ggevay
Copy link
Contributor

ggevay commented Dec 5, 2022

(This is a bit related to #4171)

It would be good to add the query in the PR description as an slt test.

@philip-stoev
Copy link
Contributor

I second @ggevay . We should use every opportunity to add new queries to the SLT if none of the existing queries shows any plan change.

@frankmcsherry
Copy link
Contributor Author

The aggregation_nullability.slt plan does demonstrate an improvement. I'm happy to go with that (I'm presently frustrated by the number and amount of time our .slt tests take to run)! We don't appear to have a lowering.slt where this would most obviously go, but perhaps in the future we could consolidate tests into bundles.

@ggevay
Copy link
Contributor

ggevay commented Dec 5, 2022

You could add it to outer_join.slt. But now with the change in aggregation_nullability.slt, it's also fine if you don't add it.

I'm presently frustrated by the number and amount of time our .slt tests take to run

I think it's mostly the startup time that is bad, so if you add it to an existing file, then maybe it's fine.

@philip-stoev
Copy link
Contributor

Item No 1. This only slightly degenerate query has an addtional CrossJoin as compared to the prior commit:


CREATE TABLE t1 (f1 DOUBLE PRECISION, f2 DOUBLE PRECISION NOT NULL);;
CREATE MATERIALIZED VIEW pk1 AS SELECT DISTINCT ON (f1) f1 , f2 FROM t1 WHERE f1 IS NOT NULL AND f2 IS NOT NULL;

EXPLAIN SELECT  * FROM t1 AS a1 RIGHT JOIN pk1 AS a2 ON ( a1 . f2 = 2 ) WHERE 7 = a2 . f1;

 Explained Query:                         +
   Return                                 +
     Union                                +
       Project (#2, #3, #0, #1)           +
         Map (null, null)                 +
           Union                          +
             Negate                       +
               CrossJoin type=differential+ <<<<<===== THIS ONE HERE
                 ArrangeBy keys=[[]]      +
                   Get l1                 +
                 ArrangeBy keys=[[]]      +
                   Distinct               +
                     Project ()           +
                       Get l0             +
             Get l1                       +
       Filter (#2 = 7)                    +
         Get l0                           +
   With                                   +
     cte l1 =                             +
       Filter (#0 = 7)                    +
         Get materialize.public.pk1       +
     cte l0 =                             +
       CrossJoin type=differential        +
         ArrangeBy keys=[[]]              +
           Filter (#1 = 2)                +
             Get materialize.public.t1    +
         ArrangeBy keys=[[]]              +
           Get materialize.public.pk1     +
                                          +
 Source materialize.public.t1             +
   filter=((#1 = 2))                      +

@philip-stoev
Copy link
Contributor

Item No 2. This slightly more realistic query has one extra Join in this branch:

pstoev@Ubuntu-2004-focal-64-minimal:~/randgen$ echo " EXPLAIN  SELECT  FROM pk1 a1  LEFT  JOIN t1 a2  ON a2 .f1  + a2 .f2  = a1 .f2   ;" | ~/diff.sh 
EXPLAIN SELECT
  FROM pk1 a1
  LEFT  JOIN t1 a2
    ON a2 .f1 + a2 .f2 = a1 .f2 ;

--- 6875.out	2022-12-05 15:38:53.558948430 +0100
+++ 16875.out	2022-12-05 15:38:53.602949248 +0100
@@ -1,34 +1,24 @@
 Explained Query:
   Return
     Union
+      Project ()
+        Get l0
       Negate
         Project ()
-          Join on=(#0 = #1) type=differential <<<<<===== THIS JOIN HERE
+          Distinct group_by=[#0, #1]
             Get l0
-            ArrangeBy keys=[[#0]]
-              Distinct group_by=[#0]
-                Get l1
       Project ()
         Get materialize.public.pk1
-      Project ()
-        Get l1
   With
-    cte l1 =
-      Project (#0)
-        Join on=(#0 = (#1 + #2)) type=differential
-          Get l0
+    cte l0 =
+      Project (#0, #1)
+        Join on=(#1 = (#2 + #3)) type=differential
+          ArrangeBy keys=[[#1]]
+            Get materialize.public.pk1
           ArrangeBy keys=[[(#0 + #1)]]
             Filter (#0) IS NOT NULL
               Get materialize.public.t1
-    cte l0 =
-      ArrangeBy keys=[[#0]]
-        Project (#1)
-          Get materialize.public.pk1
-
-Source materialize.public.pk1
-  project=(#2, #1)
-  map=(dummy)
 
 Used Indexes:
   - materialize.public.t1i1

The database schema is the same as in Item No 1.

@philip-stoev
Copy link
Contributor

Item No 3. This query is detected as constant in the prior commit but is a join in this one:

DROP SCHEMA public CASCADE;

CREATE SCHEMA public;

CREATE TABLE t1 (f1 DOUBLE PRECISION, f2 DOUBLE PRECISION NOT NULL);;
CREATE MATERIALIZED VIEW pk1 AS SELECT DISTINCT ON (f1) f1 , f2 FROM t1 WHERE f1 IS NOT NULL AND f2 IS NOT NULL;


EXPLAIN SELECT *
  FROM t1
 RIGHT JOIN pk1
    ON (pk1.f1 = t1.f1 - 1)
 WHERE pk1.f2 = 66
   AND pk1.f2 = 200 ;

I have also seen the opposite case, where this branch does better constant detection than the prior commit :-( This is all quite disconcerting and arbitrary, made worse by the fact that the impossible condition is not complex at all, it does not involve multiple tables and does not require any propagation or transformations to be detectable as impossible. Also, this is not the first PR that this dance happens for no apparent reason -- maybe we should consider a more general solution that somehow guarantees the detection of simple impossible cases regardless of any join context they appear in.

@philip-stoev
Copy link
Contributor

Item No 4. Here is a plan over a TPC-* schema that has the extra join clearly shown:

# 2022-12-05T16:30:44 Simplified query:  EXPLAIN  SELECT * FROM lineitem  RIGHT  JOIN orders_pk  ON l_commitDATE  = o_orderdate  + ' 5 MONTH '  ;
# 2022-12-05T16:30:44 Result set diff:
# 2022-12-05T16:30:44 --- /tmp//randgen136543-1670254244-server0.dump	2022-12-05 16:30:44.057473320 +0100
# 2022-12-05T16:30:44 +++ /tmp//randgen136543-1670254244-server1.dump	2022-12-05 16:30:44.057473320 +0100
# 2022-12-05T16:30:44 @@ -1,21 +1,15 @@
# 2022-12-05T16:30:44  Explained Query:
# 2022-12-05T16:30:44    Return
# 2022-12-05T16:30:44      Union
# 2022-12-05T16:30:44 +      Get l0
# 2022-12-05T16:30:44        Project (<#>..=<#>,<#>..=<#>)
# 2022-12-05T16:30:44          Map (null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null)
# 2022-12-05T16:30:44            Union
# 2022-12-05T16:30:44              Negate
# 2022-12-05T16:30:44 +              Distinct group_by=[<#>,<#>,<#>,<#>,<#>,<#>,<#>,<#>,<#>]
# 2022-12-05T16:30:44                Project (<#>..=<#>)
# 2022-12-05T16:30:44 -                Join on=(<#> = <#>) type=differential
# 2022-12-05T16:30:44 -                  ArrangeBy keys=[[<#>]]
# 2022-12-05T16:30:44 -                    Map ((<#> + 5 months))
# 2022-12-05T16:30:44 -                      Get materialize.public.orders_pk
# 2022-12-05T16:30:44 -                  ArrangeBy keys=[[<#>]]
# 2022-12-05T16:30:44 -                    Distinct group_by=[date_to_timestamp(<#>)]
# 2022-12-05T16:30:44 -                      Project (<#>)
# 2022-12-05T16:30:44                          Get l0
# 2022-12-05T16:30:44              Get materialize.public.orders_pk
# 2022-12-05T16:30:44 -      Get l0
# 2022-12-05T16:30:44    With
# 2022-12-05T16:30:44      cte l0 =
# 2022-12-05T16:30:44        Join on=(date_to_timestamp(<#>) = (<#> + 5 months)) type=differential

schema:

CREATE TABLE lineitem (
    l_orderkey       integer ,
    l_partkey        integer ,
    l_suppkey        integer ,
    l_linenumber     integer ,
    l_quantity       decimal(15, 2) ,
    l_extendedprice  decimal(15, 2) ,
    l_discount       decimal(15, 2) ,
    l_tax            decimal(15, 2) ,
    l_returnflag     text ,
    l_linestatus     text ,
    l_shipdate       date ,
    l_commitdate     date ,
    l_receiptdate    date ,
    l_shipinstruct   text ,
    l_shipmode       text ,
    l_comment        text 
);

CREATE INDEX pk_lineitem_orderkey_linenumber ON lineitem (l_orderkey, l_linenumber);

CREATE INDEX fk_lineitem_orderkey ON lineitem (l_orderkey ASC);
CREATE INDEX fk_lineitem_partkey ON lineitem (l_partkey ASC);
CREATE INDEX fk_lineitem_suppkey ON lineitem (l_suppkey ASC);
CREATE INDEX fk_lineitem_partsuppkey ON lineitem (l_partkey ASC, l_suppkey ASC);

CREATE TABLE orders (
    o_orderkey       integer NOT NULL,
    o_custkey        integer NOT NULL,
    o_orderstatus    text ,
    o_totalprice     decimal(15, 2) ,
    o_orderdate      DATE ,
    o_orderpriority  text ,
    o_clerk          text ,
    o_shippriority   integer ,
    o_comment        text
);

CREATE INDEX pk_orders_orderkey ON orders (o_orderkey);

CREATE INDEX fk_orders_custkey ON orders (o_custkey ASC);

CREATE MATERIALIZED VIEW orders_pk AS
SELECT DISTINCT ON (o_orderkey) o_orderkey, o_custkey, o_orderstatus,  o_totalprice, o_orderdate , o_orderpriority ,  o_clerk , o_shippriority , o_comment FROM orders;

@frankmcsherry
Copy link
Contributor Author

I've only looked at the second one in detail, but it seems like it is 50-50 on whether it is a regression. The join that it introduced is a "narrow" join on just the keys of the join:

           Join on=(#0 = #1) type=differential <<<<<===== THIS JOIN HERE
             Get l0
             ArrangeBy keys=[[#0]]
               Distinct group_by=[#0]
                 Get l1

whereas in the original plan that join does not exist presumably due to CSE, but instead we have a "wide" Distinct that captures all of the columns of pk1:

         Project ()
           Distinct group_by=[#0, #1]
             Get l0

Let me think a bit about whether this is intrinsic, but my intuition is that the plan is in fact improved (the join is not as expensive as the arrangements that feed it, and the arrangements are larger in the main plan).

@frankmcsherry
Copy link
Contributor Author

Item No 4. Here is a plan over a TPC-* schema that has the extra join clearly shown:

If we look at the arrangements instead, we remove a

Distinct group_by=[<#>,<#>,<#>,<#>,<#>,<#>,<#>,<#>,<#>]

which will maintain all distinct occurrences of all of these columns. By comparison we add

# 2022-12-05T16:30:44 -                Join on=(<#> = <#>) type=differential
# 2022-12-05T16:30:44 -                  ArrangeBy keys=[[<#>]]
# 2022-12-05T16:30:44 -                    Map ((<#> + 5 months))
# 2022-12-05T16:30:44 -                      Get materialize.public.orders_pk
# 2022-12-05T16:30:44 -                  ArrangeBy keys=[[<#>]]
# 2022-12-05T16:30:44 -                    Distinct group_by=[date_to_timestamp(<#>)]
# 2022-12-05T16:30:44 -                      Project (<#>)

which include a "narrow" Distinct, and more worryingly an arrangement of materialize.public.orders_pk with one column added, that could have been lifted into the join equality constraint.

@philip-stoev
Copy link
Contributor

Yeah, measured by wallclock time, this PR is doing better on Item No 4 as compared to the prior commit. So I guess we are at the stage where I should stop reading plans to divinate which one is better than the other based on the diff. @frankmcsherry can you suggest some query against the introspection table that will , as faithfully as possible, capture the total "effort" expended to evaluate the entire query across the entire dataflow? I could then have the RQG use such a summarization query to flag performance regressions automatically.

Copy link
Contributor

@philip-stoev philip-stoev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No panics or wrong results. The plan differences that I examined all appear to fall into the types listed as Items.

@frankmcsherry
Copy link
Contributor Author

Item 3: I think we are just missing a fundamental transform that looks for incompatible filters. Looking at it, there is no reason would shouldn't be able to take the output and optimize it further; it doesn't seem like the alternate lowering has made this any harder:

materialize=> EXPLAIN SELECT *
  FROM t1
 RIGHT JOIN pk1
    ON (pk1.f1 = t1.f1 - 1)
 WHERE pk1.f2 = 66
   AND pk1.f2 = 200 ;
Optimized Plan
Explained Query:
  Filter (#3 = 200) AND (#3 = 66)
    Join on=(#2 = (#0 - 1)) type=differential
      ArrangeBy keys=[[(#0 - 1)]]
        Filter (#0) IS NOT NULL
          Get materialize.public.t1
      ArrangeBy keys=[[#0]]
        Get materialize.public.pk1

Source materialize.public.t1
  filter=((#0) IS NOT NULL)

(1 row)
materialize=> 

@frankmcsherry
Copy link
Contributor Author

frankmcsherry commented Dec 6, 2022

Item 2. I thought a fair bit more about this, and I am now much more confident that the new plan is better despite the superficial complexity. The easiest argument I see is that the new plan immediately projects away pk1.f1. There might be millions of distinct values there, especially as it is the primary key for pk1, and the new plan will ignore all of these values. The previous plan's memory requirements would scale with the variety of values in pk1.f1.

I can also explain why the old plans appears "simpler", and it is an interesting quirk! The LEFT JOIN is not on a primary key, but pk1 does have a primary key (the projectable-away pk1.f1). If you were to do the query not with pk1 but t1 in its place, the plan is more obviously improved by this PR. In that plan, we end up with

    Distinct group_by=[#0, #1]               +
      Get materialize.public.t1              +

which .. by virtue of pk1's "irrelevant" primary key we are able to remove the Distinct. This then seems to lead to a cascade of other optimizations that simplifies things dramatically (but not canceling a reliance on pk1.f1 elsewhere in the query, making it still potentially unboundedly more expensive).

@frankmcsherry
Copy link
Contributor Author

I think wrt items 1-4, I'm ok on all the plans if we figure out what is going on here in Item 4.

# 2022-12-05T16:30:44 -                Join on=(<#> = <#>) type=differential
# 2022-12-05T16:30:44 -                  ArrangeBy keys=[[<#>]]
# 2022-12-05T16:30:44 -                    Map ((<#> + 5 months))
# 2022-12-05T16:30:44 -                      Get materialize.public.orders_pk

This looks like either a missed opportunity or a regression, depending on what we expected. I think we see similar things in a recent PR of @ggevay and maybe the answer is "we do not intentionally try to prevent this, or do but occasionally miss, and putting that on the todo list is good enough".

@frankmcsherry
Copy link
Contributor Author

frankmcsherry commented Dec 6, 2022

Actually, @philip-stoev I take it back and Item 4. isn't weird right at that moment, which may require that arrangement, but instead why is that arrangement not shared with the join in l0. The answer seems to be disagreement on the null predicate. However, if you make the column NOT NULL you still get a sketch plan that has both

                  ArrangeBy keys=[[#9]]
                    Map ((#4 + 5 months))
                      Get materialize.public.orders_pk

and

        ArrangeBy keys=[[(#4 + 5 months)]]
          Get materialize.public.orders_pk

in it.

Sorta weird that we have both, but also sort of makes sense how we would get confused in that you can't really use the latter in place of the former (wrong number of columns) even though all the information is there.

What is maybe interesting is that these two arrangements will be identical once we perform "thinning", which is something LIR does to reduce memory requirements. So, .. no reason MIR should be able to realize they would be similar. But I suspect LIR may not realize that there are patterns of arrangement re-use that only it can detect.

@philip-stoev
Copy link
Contributor

Fair enough, thank you for your patience!

Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just two nits regarding comments. (Caveat: I am just the random reviewer.)

@@ -1727,17 +1727,48 @@ impl AggregateExpr {
}
}

// TODO: move this to the `mz_expr` crate.
/// If the on clause of an outer join is an equijoin, figure out the join keys.
/// Column version of `derive_equijoin_cols`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Column version of `derive_equijoin_cols`.
/// Column version of `derive_equijoin_exprs`.

}
}
}
// Offset columns in right expressions by the columns introduce by the left input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Offset columns in right expressions by the columns introduce by the left input.
// Offset columns in right expressions by the columns introduced by the left input.

@ggevay
Copy link
Contributor

ggevay commented Dec 8, 2022

Should I review this now? Or should I wait until it is moved out of draft state?

@CLAassistant
Copy link

CLAassistant commented Dec 8, 2022

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@frankmcsherry
Copy link
Contributor Author

@aalexandrov my guess is that this is now outdated, but I wanted to check with you about the current state of outer join lowering.

@aalexandrov
Copy link
Contributor

@frankmcsherry Yes, this was fixed in main with #22560. The current plan coincides with the plan from this PR:

materialize=> explain select * from foo left join bar on (a = b);
                        Optimized Plan                        
--------------------------------------------------------------
 Explained Query:                                            +
   Return                                                    +
     Union                                                   +
       Map (null, null)                                      +
         Union                                               +
           Negate                                            +
             Project (#0, #1)                                +
               Join on=(#0 = #2) type=differential           +
                 ArrangeBy keys=[[#0]]                       +
                   ReadStorage materialize.public.foo        +
                 ArrangeBy keys=[[#0]]                       +
                   Distinct project=[#0]                     +
                     Project (#0)                            +
                       Get l0                                +
           ReadStorage materialize.public.foo                +
       Get l0                                                +
   With                                                      +
     cte l0 =                                                +
       Join on=(#0 = integer_to_bigint(#2)) type=differential+
         ArrangeBy keys=[[#0]]                               +
           Filter (#0) IS NOT NULL                           +
             ReadStorage materialize.public.foo              +
         ArrangeBy keys=[[integer_to_bigint(#0)]]            +
           Filter (#0) IS NOT NULL                           +
             ReadStorage materialize.public.bar              +
                                                             +
 Source materialize.public.bar                               +
   filter=((#0) IS NOT NULL)                                 +

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants