sql: implement nested SRFs #26234

knz · 2018-05-30T14:45:53Z

for example:

kena=# explain(verbose) select 123, generate_series(1, generate_series(2,3)), generate_series(4,generate_series(5,generate_series(6,generate_Series(7,8)))) from foo;
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..11356550895255.20 rows=2260000000000000 width=12)
   Output: 123, (generate_series(1, (generate_series(2, 3)))), generate_series(4, (generate_series(5, (generate_series(6, (generate_series(7, 8)))))))
   ->  ProjectSet  (cost=0.00..11350895255.20 rows=2260000000000 width=8)
         Output: generate_series(5, (generate_series(6, (generate_series(7, 8))))), (generate_series(1, (generate_series(2, 3))))
         ->  ProjectSet  (cost=0.00..11345255.20 rows=2260000000 width=8)
               Output: generate_series(1, (generate_series(2, 3))), generate_series(6, (generate_series(7, 8)))
               ->  ProjectSet  (cost=0.00..11355.20 rows=2260000 width=8)
                     Output: generate_series(2, 3), generate_series(7, 8)
                     ->  Seq Scan on kena.foo  (cost=0.00..32.60 rows=2260 width=0)
                           Output: foo, bar

Jira issue: CRDB-5683

The text was updated successfully, but these errors were encountered:

knz · 2018-05-30T15:52:14Z

So here are some investigation steps:

kena=# explain(verbose) select 123, generate_series(1,2);
                   QUERY PLAN
-------------------------------------------------
 ProjectSet  (cost=0.00..5.02 rows=1000 width=8)
   Output: 123, generate_series(1, 2)
   ->  Result  (cost=0.00..0.01 rows=1 width=0)

Learned:

ProjectSet is able to render simple scalar expressions.
if there is no FROM clause, then there is a project set operator but no further relational expression underneath. In other words the implicit relational first operand of a project set generated with no FROM clause is the unary table (with no column).

kena=# explain(verbose) select 123+generate_series(1,2);
                      QUERY PLAN
-------------------------------------------------------
 Result  (cost=0.00..20.02 rows=1000 width=4)
   Output: (123 + (generate_series(1, 2)))
   ->  ProjectSet  (cost=0.00..5.02 rows=1000 width=4)
         Output: generate_series(1, 2)
         ->  Result  (cost=0.00..0.01 rows=1 width=0)

Learned: If the result of a SRF is used in a larger scalar context, the operation is performed by inserting an additional regular projection on top of the "project set" operator.

kena=# explain(verbose) select generate_series(1,2)+10, 123 from kv;
                              QUERY PLAN
-----------------------------------------------------------------------
 Result  (cost=0.00..45249.55 rows=2260000 width=8)
   Output: ((generate_series(1, 2)) + 10), 123
   ->  ProjectSet  (cost=0.00..11349.55 rows=2260000 width=4)
         Output: generate_series(1, 2)
         ->  Seq Scan on kena.kv  (cost=0.00..32.60 rows=2260 width=0)
               Output: k, v

Learned: as an exception to rule 1 above, if rule 2 has caused the introduction of a render stage, then that render stage is used to render non-SRF expressions.

kena=# explain(verbose) select generate_series(1, random()::int);
                    QUERY PLAN
---------------------------------------------------
 ProjectSet  (cost=0.00..5.02 rows=1000 width=4)
   Output: generate_series(1, (random())::integer)
   ->  Result  (cost=0.00..0.01 rows=1 width=0)

Learned: if the input of a SRF is a scalar expression, the "project set" oeprator is able to evaluate it itself (i.e. it does not need a projection "underneath")

kena=# explain(verbose) select 123, generate_series(1,2), generate_series(3,4);
                         QUERY PLAN
-------------------------------------------------------------
 ProjectSet  (cost=0.00..5.02 rows=1000 width=12)
   Output: 123, generate_series(1, 2), generate_series(3, 4)
   ->  Result  (cost=0.00..0.01 rows=1 width=0)

Learned: if there are multiple SRFs at the same level of projection, a single "project set" operator is used (but see below).

kena=# explain(Verbose) select generate_series(1,2) from kv;
                           QUERY PLAN
-----------------------------------------------------------------
 ProjectSet  (cost=0.00..11349.55 rows=2260000 width=4)
   Output: generate_series(1, 2)
   ->  Seq Scan on kena.kv  (cost=0.00..32.60 rows=2260 width=0)
         Output: k, v

Learned: a SRF in render position with a single input table is a project set with that table as relational operand.

kena=# explain(Verbose) select generate_series(1,2) from kv, foo, rows from(unnest(array[1,2,3]));
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..2564018102.50 rows=510760000000 width=4)
   Output: generate_series(1, 2)
   ->  Nested Loop  (cost=0.00..6387402.50 rows=510760000 width=0)
         ->  Nested Loop  (cost=0.00..2864.25 rows=226000 width=0)
               ->  Function Scan on pg_catalog.unnest  (cost=0.00..1.00 rows=100 width=0)
                     Output: unnest.unnest
                     Function Call: unnest('{1,2,3}'::integer[])
               ->  Materialize  (cost=0.00..43.90 rows=2260 width=0)
                     ->  Seq Scan on kena.kv  (cost=0.00..32.60 rows=2260 width=0)
         ->  Materialize  (cost=0.00..43.90 rows=2260 width=0)
               ->  Seq Scan on kena.foo  (cost=0.00..32.60 rows=2260 width=0)

Learned: a SRF in render position with a complex FROM clause uses a single "project set" clause with the entire relational expression of the original FROM clause as operand.

kena=# select *, generate_series(1,2) from kv;
  k  | v  | generate_series
-----+----+-----------------

Learned: the * expands against the original FROM clause, not the FROM clause expanded with "project set".

Drawing the line so far, suppose we introduce a new relational operator project_set(E, f, g, h, ...) which zips (f, g, h, ...) for every row in E.

Then here are the "simple case" transformations on input queries:

select ... srf(x) ... with no FROM

-> proceed as per select ... srf(x) ... from unary_table()
select *, srfa(x), srfb(y), z from E and there are N columns in E

transformed to: select E.*, @(N+1) as "srfa", @(N+2) as "srfb", z from project_set(E, srfa(x), srfb(y))
select *, f(srfa(x)), g(srfb(y)), z from E and there are N columns in E

transformed to: select E.*, f(@(N+1)), g(@(N+2)), z from project_set(E, srfa(x), srfb(y))

All this is not considering SRFs as argument to other SRFs.

knz · 2018-05-30T16:55:58Z

Nested SRFs.

For the analysis I'll be writing

g(N, x) to shortengenerate_series(N, x).
project_set(E, f(x)) as t to rename the pseudo-table returned by project_set() to t.
t:N to refer to the Nth column of the pseudo-table t currently available in the context of an expression

Let's analyze this recursion.

On the left:

select foo, g(a, a), g(Z,Z) from E
->

        project_set(E, 
                           foo, g(A,A), g(Z, Z)
        ) as t1

select foo, g(A, g(B, B)), g(Z, Z) from E
->

     project_set(
         project_set(E, 
                            g(B,B), g(Z, Z), foo
          ) as t1,
         foo, g(A, t1:1), t1:2
     ) as t2

select foo, g(A, g(B, g(C,C)), g(Z, Z) from E
->

     project_set(
       project_set(
         project_set(E, 
                            g(C,C), g(Z, Z), foo
         ) as t1,
         g(B, t1:1), foo, t1:2
       ) as t2,
       foo, g(A, t2:1), t2:3
     ) as t3

select foo, g(A, g(B, g(C,g(D,D))), g(Z, Z) from E
->

     project_set(
       project_set(
         project_set(
             project_set(E, 
                            g(D,D), g(Z, Z), foo
             ) as t1,
             g(C, t1:1), foo, t1:2
         ) as t2,
         g(B, t2:1), foo, t2:3
       ) as t3,
       foo, g(A, t3:1), t3:3
   ) as t4

A few observations already:

the non-SRF renders are pulled on the right in the innermost (lowest) project_set
they are pulled on the left in the outermost project_set, to respect the original query ordering
in the middle something else is happening.

We have something like a recursion.

Let's make an attempt to define it. Let's define XFORM(E, f) to be the result of transforming scalar expression f in the context of a FROM clause E. Then we have:

XFORM(E, srf(x)) = project_set(E, srf(x)) if x does not contain a SRF
XFORM(E, srf(F)) = project_set(XFORM(E, F) as t, srf(t:1)) otherwise

Let's try this:

XFORM(E, g(A,A))
-> project_set(E, g(A,A))
XFORM(E, g(A, g(B,B)))
-> project_set(XFORM(E, g(B,B)) as t, g(A, t:1))
-> project_set(project_set(E, g(B,B)) as t, g(A, t:1))
XFORM(E, g(A, g(B, g(C, C))))
-> project_set(XFORM(E, g(B,g(C,C))) as t, g(A, t:1))
-> project_set(project_set(XFORM(E, g(C,C)) as u, g(B, u:1)) as t, g(A, t:1))
-> project_set(project_set(project_set(E, g(C,C)) as u, g(B, u:1)) as t, g(A, t:1))

This is equivalent to the observed structure above, looking good!

knz · 2018-05-30T17:19:13Z

Let's try to extend our attempt to define XFORM above to support simple, non-SRF columns.

Let's denote XFORM(E, a, b) the result of transforming the expressions a, b where a may contain a SRF, but b cannot.

We have already:

XFORM(E, srf(x), b) = project_set(E, srf(x), b) if x does not contain a SRF
XFORM(E, srf(F), b) = project_set(XFORM(E, F, b) as t, srf(t:1), t1:2) if F contains a SRF

And we can add:

XFORM(E, a, b) = project(E, a, b) if a doesn't contain SRFs
XFORM(E, f(a), b) = project(XFORM(E, a, b) as t, t:1, t:2) if f is not a SRF and a may contain SRFs

knz · 2018-05-30T19:29:28Z

Now that I knew what the recursion would look like, I knew how to recognize it in pg's own source code, and here it is:

 * split_pathtarget_at_srfs
 *              Split given PathTarget into multiple levels to position SRFs safely
 *
 * The executor can only handle set-returning functions that appear at the
 * top level of the targetlist of a ProjectSet plan node.  If we have any SRFs
 * that are not at top level, we need to split up the evaluation into multiple
 * plan levels in which each level satisfies this constraint.  This function
 * creates appropriate PathTarget(s) for each level.
 *
 * As an example, consider the tlist expression
 *              x + srf1(srf2(y + z))
 * This expression should appear as-is in the top PathTarget, but below that
 * we must have a PathTarget containing
 *              x, srf1(srf2(y + z))
 * and below that, another PathTarget containing
 *              x, srf2(y + z)
 * and below that, another PathTarget containing
 *              x, y, z
 * When these tlists are processed by setrefs.c, subexpressions that match
 * output expressions of the next lower tlist will be replaced by Vars,
 * so that what the executor gets are tlists looking like
 *              Var1 + Var2
 *              Var1, srf1(Var2)
 *              Var1, srf2(Var2 + Var3)
 *              x, y, z
 * which satisfy the desired property.
 *
 * Another example is
 *              srf1(x), srf2(srf3(y))
 * That must appear as-is in the top PathTarget, but below that we need
 *              srf1(x), srf3(y)
 * That is, each SRF must be computed at a level corresponding to the nesting
 * depth of SRFs within its arguments.
 *

src/backend/optimizer/util/tlist.c (target list)

knz · 2018-05-30T19:58:55Z

The algorithm to perform this substitution is split into two phases:

a first phase splits each target expression into two data structures. This phase is applied to all the target expressions, including the main SELECT targets, but also the other sub-clauses including ORDER BY, GROUP BY, etc (but not HAVING)
after the various sub-clauses have been analyzed and planned, the collected data structure pairs are processed to modify the resulting logical plan.

jordanlewis · 2019-04-30T15:43:33Z

@RaduBerinde moving to planning

jordanlewis · 2023-01-04T16:06:36Z

Still a thing.

knz added this to Triage in (DEPRECATED) SQL Front-end, Lang & Semantics via automation May 30, 2018

knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-pgcompat Semantic compatibility with PostgreSQL labels May 30, 2018

This was referenced May 30, 2018

sql: cleanups + new "project set" relop + ROWS FROM + local exec logic #26223

Merged

opt: Enable additional logic tests for opt configs #26191

Merged

sql: address discrepancy with pg and SRFs #20511

Closed

[WIP] "project set" #26235

Closed

knz moved this from Triage to Current milestone in (DEPRECATED) SQL Front-end, Lang & Semantics May 30, 2018

knz changed the title ~~sql: define the semantics of pg's "Project set" properly~~ sql: implement nested SRFs Jun 6, 2018

knz moved this from Current milestone to Feature requests / pie-in-the-skie in (DEPRECATED) SQL Front-end, Lang & Semantics Jun 6, 2018

knz added the X-anchored-telemetry The issue number is anchored by telemetry references. label Mar 13, 2019

jordanlewis added this to Triage in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 23, 2019

jordanlewis added this to To do in BACKLOG, NO NEW ISSUES: SQL Optimizer via automation Apr 30, 2019

jordanlewis removed this from Feature requests / pie-in-the-skie in (DEPRECATED) SQL Front-end, Lang & Semantics Apr 30, 2019

jordanlewis removed this from Triage in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 30, 2019

RaduBerinde moved this from Triage to Lower Priority Backlog in BACKLOG, NO NEW ISSUES: SQL Optimizer Jul 9, 2019

RaduBerinde moved this from Lower Priority Backlog to Functional issues in BACKLOG, NO NEW ISSUES: SQL Optimizer Apr 18, 2020

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

rytaft added this to Triage in SQL Queries via automation Mar 28, 2023

rytaft moved this from Triage to New Backlog in SQL Queries Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: implement nested SRFs #26234

sql: implement nested SRFs #26234

knz commented May 30, 2018 •

edited by cockroach-jira-scripts

knz commented May 30, 2018 •

edited

knz commented May 30, 2018

knz commented May 30, 2018 •

edited

knz commented May 30, 2018

knz commented May 30, 2018 •

edited

jordanlewis commented Apr 30, 2019

jordanlewis commented Jan 4, 2023

sql: implement nested SRFs #26234

sql: implement nested SRFs #26234

Comments

knz commented May 30, 2018 • edited by cockroach-jira-scripts

knz commented May 30, 2018 • edited

knz commented May 30, 2018

knz commented May 30, 2018 • edited

knz commented May 30, 2018

knz commented May 30, 2018 • edited

jordanlewis commented Apr 30, 2019

jordanlewis commented Jan 4, 2023

knz commented May 30, 2018 •

edited by cockroach-jira-scripts

knz commented May 30, 2018 •

edited

knz commented May 30, 2018 •

edited

knz commented May 30, 2018 •

edited