Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.0.x] Core: Increase inferred column metrics limit to 100. #5933

Merged
merged 1 commit into from
Oct 7, 2022

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Oct 7, 2022

No description provided.

@nastra nastra added this to the Iceberg 1.0.0 Release milestone Oct 7, 2022
@nastra nastra requested a review from rdblue October 7, 2022 07:55
@rdblue rdblue merged commit e2bb9ad into apache:1.0.x Oct 7, 2022
gaborkaszab pushed a commit to gaborkaszab/iceberg that referenced this pull request Oct 24, 2022
@haydenflinner
Copy link

If I have a table with more than 100 columns, what are the downsides since I'm above this param value? I don't see it documented here -- https://iceberg.apache.org/docs/latest/configuration/

I only ask because I have a table that is basically a collection of events. Upstream, each event has some metadata in a dict. Using a column per key in that metadata dict felt like it would compress better than each row having a {"key1": 123}, where the key names are relatively static and the values would benefit from columnar compression. The majority of such cols are empty for any particular partition which I assume is near 0 storage/runtime overhead. Like, file 1's rows will have metadata dict {"abc": 1234} repeated in virtually the whole GB of data. File 2 may have metadata in most rows of {"def": "foo"} instead.

@nastra nastra deleted the 1.0.x-increase-column-metrics branch June 1, 2023 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants