Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory size estimation for ParquetMetadata #1729

Closed
alamb opened this issue May 23, 2022 · 7 comments · Fixed by #5965
Closed

Add memory size estimation for ParquetMetadata #1729

alamb opened this issue May 23, 2022 · 7 comments · Fixed by #5965
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented May 23, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In https://github.com/influxdata/influxdb_iox, caching is very important to performance. Part of what is cached in memory is a ParquetMetadata structure. In order to effectively cache this data (and free it when under memory pressure) we need to know an accurate estimate of the heap it is using.

Describe the solution you'd like
I would like a function such as the following that accurately estimates the memory usage of parquet metadata:

impl ParquetMetadata {
    ...

    /// In-memory size in bytes, including `self`.
    pub fn size(&self) -> usize {
        // recursively 
    }

   ...
}

Describe alternatives you've considered
I believe @domodwyer is considering caching only the parts that we need (rather than the ParquetMetadata object itself) in which case we would likely not need this feature in IOx. I think it would still be generally helpful though

Additional context
See https://github.com/influxdata/influxdb_iox/pull/4661 for example of usecase

@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label May 23, 2022
@alamb
Copy link
Contributor Author

alamb commented Jun 11, 2024

I believe @crepererum may be working on this

@crepererum
Copy link
Contributor

I think I might not. After a quick look at the data structures, the memory estimation is quite a mess, especially for the parquet schema data.

@alamb alamb self-assigned this Jun 24, 2024
@alamb
Copy link
Contributor Author

alamb commented Jun 24, 2024

I am going to try and find time to do this sometime this week

@Xuanwo
Copy link
Member

Xuanwo commented Jun 24, 2024

In https://github.com/influxdata/influxdb_iox, caching is very important to performance. Part of what is cached in memory is a ParquetMetadata structure.

Seems a typo here, https://github.com/influxdata/influxdb_iox is empty

@Xuanwo
Copy link
Member

Xuanwo commented Jun 24, 2024

Oh, I see. This issue created in 2022, and influxdb_iox has been renamed to influxdb. What a history!

@alamb
Copy link
Contributor Author

alamb commented Jun 24, 2024

Oh, I see. This issue created in 2022, and influxdb_iox has been renamed to influxdb. What a history!

Yeah, the actual history is more tortured -- if you care it is here https://www.influxdata.com/blog/the-plan-for-influxdb-3-0-open-source/

TLDR is that we made the influxdb_iox repo private

@alamb
Copy link
Contributor Author

alamb commented Jun 27, 2024

Here is a proposed PR with a fix: #5965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants