Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for child Bucket size in OrderPreservingInterner #4646

Merged
merged 6 commits into from
Aug 8, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 4, 2023

Which issue does this PR close?

Closes #4645

Rationale for this change

Without this change our system consumes significantly more memory (2-3x) than the configured limit

What changes are included in this PR?

Account for missing allocation

Are there any user-facing changes?

If you use this to enforce memory limits, will not exceed limits

@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 4, 2023
@@ -343,8 +343,19 @@ impl Bucket {
fn size(&self) -> usize {
std::mem::size_of::<Self>()
+ self.slots.capacity() * std::mem::size_of::<Slot>()
// and account for the size of any embedded buckets in the slots
+ self.slot_child_bucket_size()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that a Bucket may contain other Buckets in the children Slots:

child: Option<Box<Bucket>>,

Their memory was not accounted for

@alamb alamb changed the title Account for child buckets in OrderPreservingInterner Account for child Bucket size in OrderPreservingInterner Aug 4, 2023
.iter()
.map(|slot| slot.child.as_ref().map(|x| x.size()).unwrap_or_default())
.sum()
}
}

#[cfg(test)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love some suggestions on how to test this -- what I tried is described on #4645

Copy link
Contributor Author

@alamb alamb Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a test in 111b43c

It is basically a second implementation of size, which one could argue is unnecessary but I feel good by having a double check

@alamb
Copy link
Contributor Author

alamb commented Aug 7, 2023

I am still working on a test for this

@alamb alamb merged commit 0ded0ce into apache:master Aug 8, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RowInterner::size() much too low for high cardinality dictionary columns
2 participants