-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35270: [C++] Use Buffer instead of raw buffer in hash join internals #35347
GH-35270: [C++] Use Buffer instead of raw buffer in hash join internals #35347
Conversation
…e in hash join internals
|
@@ -226,12 +228,12 @@ class ARROW_EXPORT SwissTable { | |||
// --------------------------------------------------- | |||
// * Empty bucket has value 0x80. Non-empty bucket has highest bit set to 0. | |||
// | |||
uint8_t* blocks_; | |||
std::shared_ptr<Buffer> blocks_; | |||
|
|||
// Array of hashes of values inserted into slots. | |||
// Undefined if the corresponding slot is empty. | |||
// There is 64B padding at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should here still mention padding here? Since Buffer already ensure padding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffer ensures that allocations are aligned (and I think, at least 64 bytes) but I don't know if it actually guarantees padding. I think this buffer needs to go 64 bytes past the end regardless of the size which is a slightly different requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, padding and aligned are different, I missed it previously. the patch LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, nice robustness improvement
Benchmark runs are scheduled for baseline = b372242 and contender = 18c9760. 18c9760 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
…nternals (apache#35347) ### Rationale for this change The current code has two storage buffers in the key map which are allocated with MemoryPool::Allocate which does not use smart pointers. This could have led to a potential memory leak in an OOM scenario where the first allocate fails and it also led to some convoluted code keeping track of the previously allocated size in order to properly call Free. Furthermore, it seems that this key map could have been getting potentially copied in the swiss join code. While that was probably not happening (since the copy happened before the key map was initialized) it is still an easy recipe for an accidental double-free later on as we maintain the class. ### What changes are included in this PR? Those raw buffers are changed to std::shared_ptr<Buffer> to avoid these issues. ### Are these changes tested? Somewhat, the existing unit tests should ensure we didn't cause a regression. I didn't introduce a regression test to introduce this potential bug because it would be very difficult to do so. ### Are there any user-facing changes? No * Closes: apache#35270 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…nternals (apache#35347) ### Rationale for this change The current code has two storage buffers in the key map which are allocated with MemoryPool::Allocate which does not use smart pointers. This could have led to a potential memory leak in an OOM scenario where the first allocate fails and it also led to some convoluted code keeping track of the previously allocated size in order to properly call Free. Furthermore, it seems that this key map could have been getting potentially copied in the swiss join code. While that was probably not happening (since the copy happened before the key map was initialized) it is still an easy recipe for an accidental double-free later on as we maintain the class. ### What changes are included in this PR? Those raw buffers are changed to std::shared_ptr<Buffer> to avoid these issues. ### Are these changes tested? Somewhat, the existing unit tests should ensure we didn't cause a regression. I didn't introduce a regression test to introduce this potential bug because it would be very difficult to do so. ### Are there any user-facing changes? No * Closes: apache#35270 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…nternals (apache#35347) ### Rationale for this change The current code has two storage buffers in the key map which are allocated with MemoryPool::Allocate which does not use smart pointers. This could have led to a potential memory leak in an OOM scenario where the first allocate fails and it also led to some convoluted code keeping track of the previously allocated size in order to properly call Free. Furthermore, it seems that this key map could have been getting potentially copied in the swiss join code. While that was probably not happening (since the copy happened before the key map was initialized) it is still an easy recipe for an accidental double-free later on as we maintain the class. ### What changes are included in this PR? Those raw buffers are changed to std::shared_ptr<Buffer> to avoid these issues. ### Are these changes tested? Somewhat, the existing unit tests should ensure we didn't cause a regression. I didn't introduce a regression test to introduce this potential bug because it would be very difficult to do so. ### Are there any user-facing changes? No * Closes: apache#35270 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
The current code has two storage buffers in the key map which are allocated with MemoryPool::Allocate which does not use smart pointers. This could have led to a potential memory leak in an OOM scenario where the first allocate fails and it also led to some convoluted code keeping track of the previously allocated size in order to properly call Free.
Furthermore, it seems that this key map could have been getting potentially copied in the swiss join code. While that was probably not happening (since the copy happened before the key map was initialized) it is still an easy recipe for an accidental double-free later on as we maintain the class.
What changes are included in this PR?
Those raw buffers are changed to std::shared_ptr to avoid these issues.
Are these changes tested?
Somewhat, the existing unit tests should ensure we didn't cause a regression. I didn't introduce a regression test to introduce this potential bug because it would be very difficult to do so.
Are there any user-facing changes?
No