colexec: add a way to propagate metadata in a streaming fashion #55758

yuzefovich · 2020-10-20T17:18:41Z

While working on #55713, I realized that the current way of propagating the metadata in the vectorized engine is incompatible with some things we would like to do. As a reminder, we currently buffer up all metadata in MetadataSources and return it when draining the whole flow, but there are cases when we would like to propagate the metadata once it is created:

we emit the metadata about the number of rows read from rowexec.TableReaders which is then used in combination with the estimated number of rows to be scanned (which is obtained during planning) in order provide the progress estimate for a query (shown on SHOW QUERIES in phase column)
the table statistics collection emit progress metadata as well, and although we don't have the vectorized table statistics collection yet, it'd be nice to support the progress update when we do.

The current thinking is that we probably want to implement a pull-based model that is able to return the buffered meta since the previous call (something like DrainBufferedMeta which - unlike DrainMeta - doesn't have an assumption of being called when the flow is shutdown) and then have the root materializer periodically poll its metadata sources.

Jira issue: CRDB-3634

The text was updated successfully, but these errors were encountered:

64697: colflow: introduce flow coordinator and simplify materializer r=yuzefovich a=yuzefovich **execinfra: refactor ProcessorBase slightly** Previously, the output of a processor was embedded in `ProcOutputHelper`. This commit moves it out into the `ProcessorBase` because the follow-up commit will take advantage of such placement. Release note: None **execinfra: rename a field in ProcOutputHelper** This commit renames `ProcessorBase.Out` to `ProcessorBase.OutputHelper`. It also removes the large part of the comment on `ProcessorBase` since it has become quite stale, and it is better to take a look at the existing users of the struct as a guide (the actual code should be up-to-date!). Release note: None **execinfra: separate out ProcOutputHelper from ProcessorBase** In some cases, `ProcOutputHelper` that lives in the `ProcessorBase` is not used by the caller, yet it is always allocated. This commit extracts `ProcessorBaseNoHelper` that doesn't contain the helper (as well as some other fields) and uses that in the materializer. This should reduce the size of the materializers slightly. This commit was prompted by the fact that the follow-up commit will introduce another processor (the vectorized flow coordinator) that also doesn't utilize the `ProcOutputHelper`. Release note: None **colflow: introduce flow coordinator and simplify materializer** This commit extracts the logic of shutting down the vectorized flow out of the materializer which simplifies the latter. This allows us to optimize the case when the root of the whole plan is a wrapped row-execution processor. Previously, in such a scenario we would plan a columnarizer followed by a materializer because the latter was needed in order to shut the flow down. This commit removes this redundant pair of operators. Informs: #50857. Informs: #55758. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

github-actions · 2023-11-23T11:05:31Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

yuzefovich added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Oct 20, 2020

yuzefovich added this to Triage in BACKLOG, NO NEW ISSUES: SQL Execution via automation Oct 20, 2020

yuzefovich changed the title ~~colexec: add a way to propagete metadata in a streaming fashion~~ colexec: add a way to propagate metadata in a streaming fashion Oct 20, 2020

asubiotto moved this from Triage to 21.1 Release Issues in BACKLOG, NO NEW ISSUES: SQL Execution Oct 27, 2020

yuzefovich mentioned this issue Dec 1, 2020

sql: make all processor cores vectorize-able #57268

Closed

5 tasks

asubiotto moved this from 21.1 Release Issues to 21.1 Stability Issues in BACKLOG, NO NEW ISSUES: SQL Execution Feb 1, 2021

asubiotto moved this from 21.1 Stability Issues to [VECTORIZED BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution Mar 2, 2021

yuzefovich self-assigned this Apr 28, 2021

yuzefovich removed this from [VECTORIZED BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution Apr 28, 2021

yuzefovich added this to Triage in SQL Queries via automation Apr 28, 2021

yuzefovich moved this from Triage to 21.2 Milestone A in SQL Queries Apr 28, 2021

yuzefovich mentioned this issue May 5, 2021

colflow: introduce flow coordinator and simplify materializer #64697

Merged

yuzefovich moved this from 21.2 Milestone A to 21.2 May Milestone in SQL Queries May 7, 2021

yuzefovich linked a pull request May 22, 2021 that will close this issue

colexec: add streaming metadata propagation #65586

Open

yuzefovich moved this from 21.2 May Milestone to 21.2 June Milestone in SQL Queries Jun 2, 2021

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

mgartner moved this from 21.2 June Milestone to 21.2 July Milestone in SQL Queries Jul 15, 2021

yuzefovich moved this from 21.2 July Milestone to Backlog in SQL Queries Aug 2, 2021

yuzefovich removed their assignment May 31, 2022

mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries May 25, 2023

yuzefovich mentioned this issue Jun 26, 2023

sql: re-execute distributed query as local for some errors #105451

Merged

github-actions bot added the no-issue-activity label Nov 23, 2023

yuzefovich removed the no-issue-activity label Nov 26, 2023

yuzefovich mentioned this issue May 1, 2024

sql: table stats job does not indicate progress #89766

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: add a way to propagate metadata in a streaming fashion #55758

colexec: add a way to propagate metadata in a streaming fashion #55758

yuzefovich commented Oct 20, 2020 •

edited by cockroach-jira-scripts

github-actions bot commented Nov 23, 2023

colexec: add a way to propagate metadata in a streaming fashion #55758

colexec: add a way to propagate metadata in a streaming fashion #55758

Comments

yuzefovich commented Oct 20, 2020 • edited by cockroach-jira-scripts

github-actions bot commented Nov 23, 2023

yuzefovich commented Oct 20, 2020 •

edited by cockroach-jira-scripts