Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: parallel backfill of materialized views, CTAS even when ordered #66003

Open
dt opened this issue Jun 2, 2021 · 0 comments
Open

sql: parallel backfill of materialized views, CTAS even when ordered #66003

dt opened this issue Jun 2, 2021 · 0 comments
Labels
A-schema-changes A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@dt
Copy link
Member

dt commented Jun 2, 2021

CREATE TABLE AS and materialized view backfills both run a distsql flow on the original query but with a grouping phase appended to the end of the distributed physical plan, adding processors that write rows to the destination table. Ideally this means that we can have many processors writing in parallel for throughput. However if the original query has an ordering, it's original physical plan would have had to end in a single sortNode, so we'd only append a single writer after that, limiting throughput.

Instead, in some cases if some conditions are met, we may be able to remove the ordering of the source plan, to maintain its width and allow multiple writers.

First if the source plan must not limited -- if there's a limit taking only the first k rows, we obviously need to sort first.

Second, if there are impure expressions as part of the view/ctas query -- like nextval, unique_rowid, etc -- then the order they are evaluated for one row vs another may be observable by their return values so we need to evaluate rows in the requested order, if any.

Another question is if choosing production order should be a proxy for choosing storage order. In theory, storage order -- if they don't specify a key -- is up to us, but if they ask for it to be produced in order, presumable they want it stored in that order? Maybe in these cases we could automatically choose the key to be the ordering columns as a prefix, then the unique-r (unique_rowid), then nix their order clause from the original query? CTAS can already take optional index / key specifications, so maybe we're fine to just nix the order by whenever (1) and (2) are met and say they need to specify the key if they want to control storage order?

Epic: CRDB-13625
Jira issue: CRDB-7830

@dt dt added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-pgcompat Semantic compatibility with PostgreSQL A-schema-changes labels Jun 2, 2021
@ajwerner ajwerner added this to Triage in SQL Foundations via automation Jun 17, 2021
@ajwerner ajwerner moved this from Triage to Backlog in SQL Foundations Jun 17, 2021
@postamar postamar added the T-sql-schema-deprecated Use T-sql-foundations instead label Mar 11, 2022
@postamar postamar moved this from Backlog to Cold storage in SQL Foundations Nov 10, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
SQL Foundations
  
Cold storage
Development

No branches or pull requests

2 participants