sql: parallel backfill of materialized views, CTAS even when ordered #66003

dt · 2021-06-02T18:17:49Z

CREATE TABLE AS and materialized view backfills both run a distsql flow on the original query but with a grouping phase appended to the end of the distributed physical plan, adding processors that write rows to the destination table. Ideally this means that we can have many processors writing in parallel for throughput. However if the original query has an ordering, it's original physical plan would have had to end in a single sortNode, so we'd only append a single writer after that, limiting throughput.

Instead, in some cases if some conditions are met, we may be able to remove the ordering of the source plan, to maintain its width and allow multiple writers.

First if the source plan must not limited -- if there's a limit taking only the first k rows, we obviously need to sort first.

Second, if there are impure expressions as part of the view/ctas query -- like nextval, unique_rowid, etc -- then the order they are evaluated for one row vs another may be observable by their return values so we need to evaluate rows in the requested order, if any.

Another question is if choosing production order should be a proxy for choosing storage order. In theory, storage order -- if they don't specify a key -- is up to us, but if they ask for it to be produced in order, presumable they want it stored in that order? Maybe in these cases we could automatically choose the key to be the ordering columns as a prefix, then the unique-r (unique_rowid), then nix their order clause from the original query? CTAS can already take optional index / key specifications, so maybe we're fine to just nix the order by whenever (1) and (2) are met and say they need to specify the key if they want to control storage order?

Epic: CRDB-13625
Jira issue: CRDB-7830

dt added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-pgcompat Semantic compatibility with PostgreSQL A-schema-changes labels Jun 2, 2021

ajwerner added this to Triage in SQL Foundations via automation Jun 17, 2021

ajwerner moved this from Triage to Backlog in SQL Foundations Jun 17, 2021

postamar added the T-sql-schema-deprecated Use T-sql-foundations instead label Mar 11, 2022

postamar moved this from Backlog to Cold storage in SQL Foundations Nov 10, 2022

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: parallel backfill of materialized views, CTAS even when ordered #66003

sql: parallel backfill of materialized views, CTAS even when ordered #66003

dt commented Jun 2, 2021 •

edited by ajwerner

sql: parallel backfill of materialized views, CTAS even when ordered #66003

sql: parallel backfill of materialized views, CTAS even when ordered #66003

Comments

dt commented Jun 2, 2021 • edited by ajwerner

dt commented Jun 2, 2021 •

edited by ajwerner