sql: parallel backfill of materialized views, CTAS even when ordered #66003
Labels
A-schema-changes
A-sql-pgcompat
Semantic compatibility with PostgreSQL
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
CREATE TABLE AS and materialized view backfills both run a distsql flow on the original query but with a grouping phase appended to the end of the distributed physical plan, adding processors that write rows to the destination table. Ideally this means that we can have many processors writing in parallel for throughput. However if the original query has an ordering, it's original physical plan would have had to end in a single sortNode, so we'd only append a single writer after that, limiting throughput.
Instead, in some cases if some conditions are met, we may be able to remove the ordering of the source plan, to maintain its width and allow multiple writers.
First if the source plan must not limited -- if there's a limit taking only the first k rows, we obviously need to sort first.
Second, if there are impure expressions as part of the view/ctas query -- like nextval, unique_rowid, etc -- then the order they are evaluated for one row vs another may be observable by their return values so we need to evaluate rows in the requested order, if any.
Another question is if choosing production order should be a proxy for choosing storage order. In theory, storage order -- if they don't specify a key -- is up to us, but if they ask for it to be produced in order, presumable they want it stored in that order? Maybe in these cases we could automatically choose the key to be the ordering columns as a prefix, then the unique-r (unique_rowid), then nix their order clause from the original query? CTAS can already take optional index / key specifications, so maybe we're fine to just nix the order by whenever (1) and (2) are met and say they need to specify the key if they want to control storage order?
Epic: CRDB-13625
Jira issue: CRDB-7830
The text was updated successfully, but these errors were encountered: