Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38372: [C++] replace memo table with a swiss table like implementation #40915

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

SGZW
Copy link
Contributor

@SGZW SGZW commented Mar 31, 2024

Rationale for this change

replace memo table with swiss table to improve set lookup functions, vector hash functions, the count_distinct aggregate function, dictionary unification .. perfomance.

related discuss: #38372

What changes are included in this PR?

  1. add some bit utils
  2. add SwissHashTable which implements MemoTable required interface and easy to integrate with the rest of the code

Are these changes tested?

Yes

Are there any user-facing changes?

No

Copy link

⚠️ GitHub issue #38372 has been automatically assigned in GitHub to PR creator.

@SGZW
Copy link
Contributor Author

SGZW commented Mar 31, 2024

this pr is work in progress now, i will add some benchmarks and test this week

@mapleFU
Copy link
Member

mapleFU commented Mar 31, 2024

You can mark it as "draft" first

@@ -19,6 +19,8 @@

#pragma once

#include <_types/_uint16_t.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cstdint is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i will remove it

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 31, 2024
/// sub-expressions the result is the high bits set where the bytes in v were zero, since
/// the high bits set due to a value greater than 0x80 in the first sub-expression are
/// masked off by the second.
uint64_t HasZeroByte(uint64_t value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a "has" return bool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I get the point, I don't know whether putting this in bit_util a good idea, since it's like swisstable's logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it will a little bit confusing, move this function to swiss table class and rename it to GetZeroByteMask will be better

@SGZW SGZW marked this pull request as draft April 1, 2024 02:57
++reinsert_count;
auto p = DoLookup<CompareKind::NoCompare>(old_entry->h,
[](const Payload*) { return false; });
assert(!p.second);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DCHECK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@SGZW SGZW force-pushed the add_swiss_table branch 2 times, most recently from d045800 to 70c64f0 Compare April 6, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants