Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: add a way to propagate metadata in a streaming fashion #55758

Open
yuzefovich opened this issue Oct 20, 2020 · 1 comment · May be fixed by #65586
Open

colexec: add a way to propagate metadata in a streaming fashion #55758

yuzefovich opened this issue Oct 20, 2020 · 1 comment · May be fixed by #65586
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects

Comments

@yuzefovich
Copy link
Member

yuzefovich commented Oct 20, 2020

While working on #55713, I realized that the current way of propagating the metadata in the vectorized engine is incompatible with some things we would like to do. As a reminder, we currently buffer up all metadata in MetadataSources and return it when draining the whole flow, but there are cases when we would like to propagate the metadata once it is created:

  1. we emit the metadata about the number of rows read from rowexec.TableReaders which is then used in combination with the estimated number of rows to be scanned (which is obtained during planning) in order provide the progress estimate for a query (shown on SHOW QUERIES in phase column)
  2. the table statistics collection emit progress metadata as well, and although we don't have the vectorized table statistics collection yet, it'd be nice to support the progress update when we do.

The current thinking is that we probably want to implement a pull-based model that is able to return the buffered meta since the previous call (something like DrainBufferedMeta which - unlike DrainMeta - doesn't have an assumption of being called when the flow is shutdown) and then have the root materializer periodically poll its metadata sources.

Jira issue: CRDB-3634

@yuzefovich yuzefovich added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Oct 20, 2020
@yuzefovich yuzefovich added this to Triage in BACKLOG, NO NEW ISSUES: SQL Execution via automation Oct 20, 2020
@yuzefovich yuzefovich changed the title colexec: add a way to propagete metadata in a streaming fashion colexec: add a way to propagate metadata in a streaming fashion Oct 20, 2020
@asubiotto asubiotto moved this from Triage to 21.1 Release Issues in BACKLOG, NO NEW ISSUES: SQL Execution Oct 27, 2020
@asubiotto asubiotto moved this from 21.1 Release Issues to 21.1 Stability Issues in BACKLOG, NO NEW ISSUES: SQL Execution Feb 1, 2021
@asubiotto asubiotto moved this from 21.1 Stability Issues to [VECTORIZED BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution Mar 2, 2021
@yuzefovich yuzefovich self-assigned this Apr 28, 2021
@yuzefovich yuzefovich removed this from [VECTORIZED BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution Apr 28, 2021
@yuzefovich yuzefovich added this to Triage in SQL Queries via automation Apr 28, 2021
@yuzefovich yuzefovich moved this from Triage to 21.2 Milestone A in SQL Queries Apr 28, 2021
@yuzefovich yuzefovich moved this from 21.2 Milestone A to 21.2 May Milestone in SQL Queries May 7, 2021
craig bot pushed a commit that referenced this issue May 15, 2021
64697: colflow: introduce flow coordinator and simplify materializer r=yuzefovich a=yuzefovich

**execinfra: refactor ProcessorBase slightly**

Previously, the output of a processor was embedded in
`ProcOutputHelper`. This commit moves it out into the `ProcessorBase`
because the follow-up commit will take advantage of such placement.

Release note: None

**execinfra: rename a field in ProcOutputHelper**

This commit renames `ProcessorBase.Out` to `ProcessorBase.OutputHelper`.
It also removes the large part of the comment on `ProcessorBase` since
it has become quite stale, and it is better to take a look at the
existing users of the struct as a guide (the actual code should be
up-to-date!).

Release note: None

**execinfra: separate out ProcOutputHelper from ProcessorBase**

In some cases, `ProcOutputHelper` that lives in the `ProcessorBase` is
not used by the caller, yet it is always allocated. This commit extracts
`ProcessorBaseNoHelper` that doesn't contain the helper (as well as some
other fields) and uses that in the materializer. This should reduce the
size of the materializers slightly.

This commit was prompted by the fact that the follow-up commit will
introduce another processor (the vectorized flow coordinator) that also
doesn't utilize the `ProcOutputHelper`.

Release note: None

**colflow: introduce flow coordinator and simplify materializer**

This commit extracts the logic of shutting down the vectorized flow out
of the materializer which simplifies the latter. This allows us to
optimize the case when the root of the whole plan is a wrapped
row-execution processor. Previously, in such a scenario we would plan
a columnarizer followed by a materializer because the latter was needed
in order to shut the flow down. This commit removes this redundant pair
of operators.

Informs: #50857.
Informs: #55758.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@yuzefovich yuzefovich linked a pull request May 22, 2021 that will close this issue
@yuzefovich yuzefovich moved this from 21.2 May Milestone to 21.2 June Milestone in SQL Queries Jun 2, 2021
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@mgartner mgartner moved this from 21.2 June Milestone to 21.2 July Milestone in SQL Queries Jul 15, 2021
@yuzefovich yuzefovich moved this from 21.2 July Milestone to Backlog in SQL Queries Aug 2, 2021
@yuzefovich yuzefovich removed their assignment May 31, 2022
@mgartner mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries May 25, 2023
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
SQL Queries
New Backlog
Development

Successfully merging a pull request may close this issue.

2 participants