Skip to content

Out-of-Core Hash Aggregate #7931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Jul 4, 2023
Merged

Out-of-Core Hash Aggregate #7931

merged 48 commits into from
Jul 4, 2023

Conversation

lnkuiper
Copy link
Contributor

@lnkuiper lnkuiper commented Jun 13, 2023

This PR implements out-of-core functionality for our aggregate hash table.

This is done by abandoning hash tables when they reach a specific memory limit in PartitionableHashTable::ListAddChunk, as well as repartitioning hash tables in RadixPartitionedHashTable::ScheduleTasks partitions would otherwise not fit within the memory limit.

This is a basic implementation, and I have many ideas to improve our hash aggregate, which I will implement in the coming weeks. For now, though, I wanted to send this PR with the out-of-core functionality so this PR is manageable.

This implementation also forces all aggregates that use an Allocator to use ArenaAllocator instead. This means that aggregate states are never explicitly destroyed but are destroyed when the ArenaAllocator goes out of scope. We already used the Allocator wrapper for the ArenaAllocator, so the aggregate states were usually not destroyed explicitly. Of course, we want to destroy/serialize the aggregate states for the out-of-core hash aggregate. I will address this at some point in the future, but I will focus on improving the hash aggregate (both in-memory and out-of-core) first.

As always, I have a little benchmark! This is the query:

SELECT count(*) FROM (SELECT DISTINCT * FROM 'lineitem30.parquet');

Where 'lineitem30.parquet' is the first 30M rows from TPC-H lineitem at SF10. These are the results:

memory limit runtime
10.0GB 8.52s
9.0GB 9.35s
8.0GB 7.89s
7.0GB 7.81s
6.0GB 6.84s
5.0GB 7.39s
4.0GB 9.04s
3.0GB 8.76s
2.0GB 8.08s
1.0GB 10.27s

Interestingly, the execution time decreases as the memory limit decreases. This is due to the out-of-core strategy creating more fine-grained partitions. This improves cache-locality, which explains the improved runtime.

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great - and great performance results! Some comments:

group_types, op.payload_types, op.bindings);
make_uniq<PartitionableHashTable>(context.client, BufferAllocator::Get(context.client),
*gstate.partition_info, group_types, op.payload_types, op.bindings);
if (context.client.config.force_external) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related but can we perhaps rename this to verify_external in the codebase as well to prevent confusion?

Copy link
Contributor Author

@lnkuiper lnkuiper Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a verify_external, which triggers the ExternalStatementVerifier, running the query once with and once without force_external. I can rename this to debug_force_external.

Edit: Actually on second though, both of these settings are already in line with other config options. I think this should stay as is. We have a verify_... and a force_... for testing external functionality, which is similar to other debug config options.

@lnkuiper
Copy link
Contributor Author

I don't think the failing test has anything to do with my changes. This fails on vector size 2, somehow strings become invalid:

statement ok
CREATE VIEW strlists AS SELECT * FROM (VALUES
	(1, ['a']),
	(2, [NULL]),
	(3, []),
	(4, ['Branta Canadensis', 'c']),
	(5, ['i','j','k']),
	(NULL::INTEGER, ['Somateria mollissima'])
	) lv(pk, p);

query II
SELECT * FROM strlists
----
NULL	[Somateria mollissima]
1	[a]
2	[NULL]
3	[]
4	[Branta Canadensis, c]
5	[i, j, k]

Actual result:

1	[a]
2	[NULL]
3	[]
4	[Branta Canadensis, c]
5	[i, j, Z]
NULL	[ZZZZZZZZZZZZZZZZZZZZ]

I will investigate, but maybe I can add a require vector_size?

@Mytherin
Copy link
Collaborator

I have not seen this issue before in other PRs - perhaps you could investigate?

@github-actions github-actions bot marked this pull request as draft June 26, 2023 08:21
@Mytherin Mytherin marked this pull request as ready for review June 26, 2023 09:41
@lnkuiper lnkuiper marked this pull request as draft June 26, 2023 13:09
@lnkuiper lnkuiper marked this pull request as ready for review June 26, 2023 13:09
@lnkuiper lnkuiper marked this pull request as draft June 27, 2023 08:04
@lnkuiper lnkuiper marked this pull request as ready for review June 27, 2023 08:04
@github-actions github-actions bot marked this pull request as draft June 27, 2023 08:04
@lnkuiper lnkuiper marked this pull request as ready for review June 27, 2023 08:24
@lnkuiper
Copy link
Contributor Author

I think this is ready to go!

@Mytherin Mytherin changed the base branch from feature to master July 4, 2023 13:16
@Mytherin Mytherin merged commit eea0a67 into duckdb:master Jul 4, 2023
@Mytherin
Copy link
Collaborator

Mytherin commented Jul 4, 2023

Thanks! LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants