Skip to content

[improvement]Use phmap::flat_hash_set in AggregateFunctionUniq#11257

Merged
yiguolei merged 1 commit intoapache:masterfrom
mrhhsg:distinct_phmap
Jul 29, 2022
Merged

[improvement]Use phmap::flat_hash_set in AggregateFunctionUniq#11257
yiguolei merged 1 commit intoapache:masterfrom
mrhhsg:distinct_phmap

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented Jul 27, 2022

Proposed changes

Issue Number: close #xxx

Problem Summary:

Test on clikcbench with SQL:

SELECT COUNT(DISTINCT SearchPhrase), count(distinct userid) FROM hits;

Execution time reduced to 7.60 sec from 9.73 sec.

Checklist(Required)

  1. Type of your changes:
    • Improvement
    • Fix
    • Feature-WIP
    • Feature
    • Doc
    • Refator
    • Others:
  2. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  3. Has unit tests been added:
    • Yes
    • No
    • No Need
  4. Has document been added or modified:
    • Yes
    • No
    • No Need
  5. Does it need to update dependencies:
    • Yes
    • No
  6. Are there any changes that cannot be rolled back:
    • Yes
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

kpfly
kpfly previously approved these changes Jul 27, 2022
Copy link
Copy Markdown

@kpfly kpfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@mrhhsg mrhhsg force-pushed the distinct_phmap branch 2 times, most recently from 7d77293 to 56d437f Compare July 28, 2022 15:57
set.rehash(set.size() + rhs_set.size());

for (auto elem : rhs_set) {
set.insert(elem);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phmap has merge method, is that same with insert one by one?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, phmap::merge inserts elements one by one and it requires the src(rhs) is not constant.

auto& set = this->data(place).set;
write_var_uint(set.size(), buf);
for (const auto& elem : set) {
write_pod_binary(elem, buf);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add a Todo here; After phmap is included in BE's code. We can serialize phmap in copy way

Copy link
Copy Markdown
Contributor

@wangbo wangbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 29, 2022
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/vectorization reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants