Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutablebuffer::shrink_to_fit #318

Merged
merged 2 commits into from
May 19, 2021
Merged

Conversation

ritchie46
Copy link
Contributor

Which issue does this PR close?

This PR add shrink_to_fit to the public API of MutableBuffer. This closes #297.

In line of this, I'd like the option to shrink existing array data, where would such logic be a best fit. I was thinking membership of dyn Array trait.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Two comments:

  1. Could we add a test? ^_^
  2. I am not sure we need inline this; inlining is useful when growing because it is used over an iterator with push; I am not sure it is needed for shrinking

@ritchie46
Copy link
Contributor Author

Could we add a test? ^_^

Definitely! It was already in the docstring, but I will add an explicit one.

I am not sure we need inline this; inlining is useful when growing because it is used over an iterator with push; I am not sure it is needed for shrinking

I agree. 👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks @ritchie46

@alamb alamb added arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog labels May 18, 2021
Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👍

@codecov-commenter
Copy link

Codecov Report

Merging #318 (830d3a8) into master (c863a2c) will increase coverage by 0.00%.
The diff coverage is 92.85%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #318   +/-   ##
=======================================
  Coverage   82.52%   82.52%           
=======================================
  Files         162      162           
  Lines       44007    44021   +14     
=======================================
+ Hits        36316    36329   +13     
- Misses       7691     7692    +1     
Impacted Files Coverage Δ
arrow/src/buffer/mutable.rs 84.34% <92.85%> (+0.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c863a2c...830d3a8. Read the comment docs.

@alamb alamb merged commit 7f37a7f into apache:master May 19, 2021
alamb pushed a commit to alamb/arrow-rs that referenced this pull request May 24, 2021
* Mutablebuffer::shrink_to_fit

* add shrink_to_fit explicit test
alamb pushed a commit that referenced this pull request May 24, 2021
* Mutablebuffer::shrink_to_fit

* add shrink_to_fit explicit test
alamb added a commit that referenced this pull request May 25, 2021
* Mutablebuffer::shrink_to_fit

* add shrink_to_fit explicit test

Co-authored-by: Ritchie Vink <ritchie46@gmail.com>
@Dandandan
Copy link
Contributor

I am wondering if we should do shrink_to_fit when calling freeze, as this would save memory for anything created using the MutableBuffer.

Any thoughts?

@ritchie46
Copy link
Contributor Author

I am wondering if we should do shrink_to_fit when calling freeze, as this would save memory for anything created using the MutableBuffer.

I discussed this idea with @jorgecarleitao. But it turns out that if we call shrink_to_fit we allocate a whole new slab of memory of the needed size and copy all data before releasing the old slab.
IMO that decision should be left to the caller, as they can decide if this worth it.

@Dandandan
Copy link
Contributor

Dandandan commented Jun 11, 2021

I am wondering if we should do shrink_to_fit when calling freeze, as this would save memory for anything created using the MutableBuffer.

I discussed this idea with @jorgecarleitao. But it turns out that if we call shrink_to_fit we allocate a whole new slab of memory of the needed size and copy all data before releasing the old slab.
IMO that decision should be left to the caller, as they can decide if this worth it.

I was looking for some documentation about whether realloc will do a allocation / copy the data when the size of the realloc is smaller than the original allocation. Rust can not guarentee that, as it depends on the allocator in use, but most allocators might not allocate + copy?

Did we have any evidence or tests for what, in general, allocators are doing here? @jorgecarleitao

@ritchie46
Copy link
Contributor Author

Yes, you are rigtht.

A quick google documentation survey:

malloc is not really clear about how it reallocs. It seems that it may shrink in place or it may reallocate.

mi-malloc says it will shrink in place:
pointer to the re-allocated memory of newsize bytes, or NULL if out of memory. If NULL is returned, the pointer p is not freed. Otherwise the original pointer is either freed or returned as the reallocated result (in case it fits in-place with the new size). If the pointer p is NULL, it behaves as mi_malloc(newsize). If newsize is larger than the original size allocated for p, the bytes after size are uninitialized

jemalloc says it will shrink in place:
-The realloc() function changes the size of the previously allocated memory referenced by ptr to size bytes. The contents of the memory are unchanged up to the lesser of the new and old sizes. If the new size is larger, the contents of the newly allocated portion of the memory are undefined_

@ritchie46
Copy link
Contributor Author

Something to consider. I think that it may lead to increased fragmentation. If we allocate n large chunks and then shrink their size, it may lead with smaller regions of memory that may be filled with long living allocations.

@Dandandan
Copy link
Contributor

Something to consider. I think that it may lead to increased fragmentation. If we allocate n large chunks and then shrink their size, it may lead with smaller regions of memory that may be filled with long living allocations.

Would that be a problem over not shrinking at all? If you don't shrink, the (smaller amount of) remaining memory might be more fragmented than the chunk that can be removed with shrinking.
On average, the remaining size of the allocation can be quite big when shrinking (~1/2 the remaining size).

@alamb alamb removed the enhancement Any new improvement worthy of a entry in the changelog label Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add shrink_to / shrink_to_fit to MutableBuffer
5 participants