-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38372: [C++] replace memo table with a swiss table like implementation #40915
base: main
Are you sure you want to change the base?
Conversation
|
this pr is work in progress now, i will add some benchmarks and test this week |
You can mark it as "draft" first |
cpp/src/arrow/util/hashing.h
Outdated
@@ -19,6 +19,8 @@ | |||
|
|||
#pragma once | |||
|
|||
#include <_types/_uint16_t.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cstdint is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i will remove it
cpp/src/arrow/util/bit_util.h
Outdated
/// sub-expressions the result is the high bits set where the bytes in v were zero, since | ||
/// the high bits set due to a value greater than 0x80 in the first sub-expression are | ||
/// masked off by the second. | ||
uint64_t HasZeroByte(uint64_t value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's a "has" return bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I get the point, I don't know whether putting this in bit_util
a good idea, since it's like swisstable
's logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it will a little bit confusing, move this function to swiss table class and rename it to GetZeroByteMask will be better
cpp/src/arrow/util/hashing.h
Outdated
++reinsert_count; | ||
auto p = DoLookup<CompareKind::NoCompare>(old_entry->h, | ||
[](const Payload*) { return false; }); | ||
assert(!p.second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DCHECK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
d045800
to
70c64f0
Compare
Rationale for this change
replace memo table with swiss table to improve set lookup functions, vector hash functions, the count_distinct aggregate function, dictionary unification .. perfomance.
related discuss: #38372
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No