Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't count unused arena bytes as read_bytes in AggregateFunction types #48745

Merged
merged 8 commits into from Apr 24, 2023

Conversation

Algunenano
Copy link
Member

@Algunenano Algunenano commented Apr 13, 2023

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • In AggregateFunction types, don't count unused arena bytes as read_bytes.

Documentation entry for user-facing changes

Several changes:

  • Reduce a bit the dependencies over the Arena.h header (to make it a bit faster to iterate, but not much).
  • Add a new member in Arena to differentiate between allocated bytes (pre-allocated in the Arena or wasted in holes) and reserved_bytes (bytes requested via allocations to the arena).
  • Change ColumnAggregateFunction to use reserved_bytes() when reporting the column ::byteSize(), leaving allocatedBytes() the same. This allows more precise reporting of read_bytes when working with these column types.

Closes #48606

@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-improvement Pull request with some product improvements label Apr 13, 2023
@Algunenano Algunenano changed the title Arenita Don't count unused arena bytes as read_bytes in AggregateFunction types Apr 13, 2023
@Algunenano
Copy link
Member Author

All green. I didn't know this was possible

@Algunenano
Copy link
Member Author

It'd be great if somebody could review this

  • Failures: Integration tests (tsan) [3/6] hdfs failed to start

@alexey-milovidov alexey-milovidov self-assigned this Apr 21, 2023
@alexey-milovidov
Copy link
Member

"Reserved" bytes sounds misleading.

It is expected that it means something similar to std::vector::reserve and std::vector::capacity, but the meaning is the opposite.

What do you think about renaming it to "used" bytes?

@Algunenano
Copy link
Member Author

What do you think about renaming it to "used" bytes?

Makes sense to me. I've changed it in the PR

@alexey-milovidov alexey-milovidov merged commit dee0b78 into ClickHouse:master Apr 24, 2023
252 of 255 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extremely large read bytes reported when reading columns of AggregateFunction type
3 participants