Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] prototype for caching geometries #54476

Closed

Conversation

otan
Copy link
Contributor

@otan otan commented Sep 16, 2020

No description provided.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@otan otan changed the title prototype for caching geometries WIP: prototype for caching geometries Sep 16, 2020
craig bot pushed a commit that referenced this pull request Jan 14, 2021
57817: colexec: add the disk-spilling to the hash aggregator r=yuzefovich a=yuzefovich

**colexec: mechanical changes for the external hash aggregator**

This commit performs several mechanical changes prompted by the
follow up work on the external hash aggregator:
- extract the arguments to `newSpillingQueue` into a struct
- add `context.Context` as the first argument to `ExportBuffered`
method
- extract aggregator tests into global variables.

It also fixes a couple of cosmetic issues with the memory account
names for the external operators.

Release note: None

**colexec: add the disk-spilling to the hash aggregator**

This commit introduces the external hash aggregator that uses the
hash-based partitioner with the in-memory hash aggregator as the "main"
strategy and the external sort + the ordered aggregator as the
"fallback". This approach was benchmarked against simply using the
external sort + the ordered aggregator, and on larger datasets the
chosen approach is noticably faster.

In order for the in-memory hash aggregator to be able to actually
fallback we need to keep track of all of the input tuples since it is
very hard to spill the intermediate results of computation. This
required the usage of a spilling queue and enqueuing the copies of all
input batches into it. The benchmarks show that the performance overhead
of this is relatively small while the spilling queue doesn't have to
spill to disk (on the order of 15-20% hit in micro-benchmarks), however,
when the spilling queue needs to use the disk, the hit can be 2-3x in
case when the hash aggregator itself doesn't have spill.

One notable change is that because the ordered aggregator doesn't
support filtering aggregation, we cannot support it in the external hash
aggregator, and as a result the hash aggregation is currently not
planned if filtering aggregation is requested.

Another notable change is the addition of unwrapping datum when
converting it to JSON using `AsJSON` - for some reason, the row engine
was panicking on `TestAggregatorAgainstProcessor` test with `json_agg`
function, yet I couldn't reproduce it outside of the unit test, still
I believe this addition doesn't make things worse.

Fixes: #42485.

Release note (sql change): Hash aggregation can now spill to disk when
it exhausts its memory limit when executed via the vectorized engine.

58288: builtins: Implement ST_GeneratePoints function r=otan a=mknycha

This function generates pseudo-random points until the requested number are found within the input area.

Release note (sql change): Implement geo builtin ST_GeneratePoints

Resolves: #48942

Dependent on: #54476

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Marcin Knychała <knychala.marcin@gmail.com>
@otan otan changed the title WIP: prototype for caching geometries [WIP] prototype for caching geometries Feb 4, 2021
@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
@otan otan closed this May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants