colexec: clean up buffered state when operators spilled to disk #60022

yuzefovich · 2021-02-09T04:10:40Z

Currently, when buffering in-memory operator exports its buffered tuples in ExportBuffered, we still keep references to all of that buffered state (thus prohibiting it from being GCed) as well as we don't clear the corresponding memory accounts. In all cases except for the unordered distinct, we actually can lose those references and we should update the memory accounting accordingly.

Jira issue: CRDB-3198

The text was updated successfully, but these errors were encountered:

84229: colexechash: improve memory accounting in the hash table r=yuzefovich a=yuzefovich This commit improves the memory accounting in the hash table to be more precise in the case when the `distsql_workmem` limit is exhausted. Previously, we would allocate large slices first only to perform the memory accounting after the fact, possibly running out of budget which would result in a error being thrown. We'd end up in a situation where the hash table is still referencing larger newly-allocated slices while only the previous memory usage is accounted for. This commit makes it so that we account for the needed capacity upfront, then perform the allocation, and then reconcile the accounting if necessary. This way we're much more likely to encounter the budget error before making the large allocations. Additionally, this commit accounts for some internal slices in the hash table used only in the hash joiner case. Also, now both the hash aggregator and the hash joiner eagerly release references to these internal slices of the hash table when the spilling to disk occurs (we cannot do the same for the unordered distinct because there the hash table is actually used after the spilling too). This required a minor change to the way the unordered distinct spills to disk. Previously, the memory error could only occur in two spots (and one of those would leave the hash table in an inconsistent state and we were "smart" in how we repaired that). However, now the memory error could occur in more spots (and we could have several different inconsistent states), so this commit chooses a slight performance regression of simply rebuilding the hash table from scratch, once, when the unordered distinct spills to disk. Addresses: #60022. Addresses: #64906. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

github-actions · 2023-11-23T11:05:15Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

yuzefovich added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Feb 9, 2021

yuzefovich self-assigned this Feb 9, 2021

yuzefovich added this to Triage in BACKLOG, NO NEW ISSUES: SQL Execution via automation Feb 9, 2021

yuzefovich mentioned this issue Feb 9, 2021

colmem: limit batches of dynamic size by workmem in memory footprint #59851

Merged

yuzefovich moved this from Triage to [VECTORIZED BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution Feb 9, 2021

yuzefovich added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Feb 9, 2021

yuzefovich mentioned this issue Feb 15, 2021

colexec: make external sorter respect memory limit better #60593

Merged

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

yuzefovich removed this from [VECTORIZED BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution May 31, 2022

yuzefovich added this to Triage in SQL Queries via automation May 31, 2022

yuzefovich removed their assignment May 31, 2022

yuzefovich moved this from Triage to Backlog in SQL Queries May 31, 2022

yuzefovich mentioned this issue Jul 12, 2022

colexechash: improve memory accounting in the hash table #84229

Merged

github-actions bot added the no-issue-activity label Nov 23, 2023

yuzefovich removed the no-issue-activity label Nov 26, 2023

yuzefovich self-assigned this Nov 26, 2023

yuzefovich added the E-quick-win Likely to be a quick win for someone experienced. label Nov 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: clean up buffered state when operators spilled to disk #60022

colexec: clean up buffered state when operators spilled to disk #60022

yuzefovich commented Feb 9, 2021 •

edited by cockroach-jira-scripts

github-actions bot commented Nov 23, 2023

colexec: clean up buffered state when operators spilled to disk #60022

colexec: clean up buffered state when operators spilled to disk #60022

Comments

yuzefovich commented Feb 9, 2021 • edited by cockroach-jira-scripts

github-actions bot commented Nov 23, 2023

yuzefovich commented Feb 9, 2021 •

edited by cockroach-jira-scripts