Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Refactor build_hash_map and build_set to remove duplicated codes #13968

Merged
merged 8 commits into from
Dec 6, 2022

Conversation

LiShuMing
Copy link
Contributor

@LiShuMing LiShuMing commented Nov 23, 2022

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Which issues of this PR fixes :

Fixes #13880

  • Use template to refactor build_hash_map and build_set to remove duplicated codes;
  • Add build_hash_map_allocate_and_with_selection method to be used in StreamMV later.
// When meets not found group keys, mark the first pos into `_streaming_selection` and insert into the hashmap
// so the following group keys(same as the first not found group keys) are not marked as non-founded.
// This can be used for stream mv so no need to find multi times for the same non-found group keys.

Problem Summary(Required) :

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto backported to target branch
    • 2.5
    • 2.4
    • 2.3
    • 2.2

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@stdpain
Copy link
Contributor

stdpain commented Nov 24, 2022

            if constexpr (allocate_and_compute_state && compute_not_founds) {
                auto iter = this->hash_map.lazy_emplace_with_hash(key, hash_values[i], [&](const auto& ctor) {
                    (*not_founds)[i] = 1;
                    AggDataPtr pv = allocate_func(key);
                    ctor(key, pv);
                });
                (*agg_states)[i] = iter->second;
            } else if constexpr (compute_not_founds) {
                DCHECK(not_founds);
                if (auto iter = this->hash_map.find(key); iter != this->hash_map.end()) {
                    (*agg_states)[i] = iter->second;
                } else {
                    (*not_founds)[i] = 1;
                }

compute_not_founds and allocate_and_compute_state && compute_not_founds should be simpty to

if constexpr (allocate_and_compute_state) {
    auto iter = this->hash_map.lazy_emplace_with_hash(key, hash_values[i], [&](const auto& ctor) {
       if constexpr (xxx) {
           (*not_founds)[i] = true;
       }
        AggDataPtr pv = allocate_func(key);
        ctor(key, pv);
     });

    (*agg_states)[i] = iter->second;
}

template <typename Func>
void compute_agg_prefetch(ColumnType* column, Buffer<AggDataPtr>* agg_states, Func&& allocate_func) {
template <typename Func, bool allocate_and_compute_state, bool compute_not_founds>
void compute_agg_prefetch(ColumnType* column, Buffer<AggDataPtr>* agg_states, Func&& allocate_func,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is single-purpose in the past, why now need to be multiple-purpose?
if we add another two single-purpose functions to support 1.allocate_and_compute_state && compute_not_founds and 2. compute_not_founds, is it be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is single-purpose in the past, why now need to be multiple-purpose?

  • there are many duplicated codes for computing allocate_and_compute_state and compute_not_founds seperately;
  • add this template parameter doesn't add any affects with old implements;

if we add another two single-purpose functions to support 1.allocate_and_compute_state && compute_not_founds and 2. compute_not_founds, is it be better?

  • as above, we can use one function to express the purpose. use one function is better?

template <typename Func>
void compute_agg_noprefetch(ColumnType* column, Buffer<AggDataPtr>* agg_states, Func&& allocate_func) {
template <typename Func, bool allocate_and_compute_state, bool compute_not_founds>
void compute_agg_noprefetch(ColumnType* column, Buffer<AggDataPtr>* agg_states, Func&& allocate_func,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, explicitly, three different purposes split into single function would make the code more clear and succinct.

@LiShuMing
Copy link
Contributor Author

            if constexpr (allocate_and_compute_state && compute_not_founds) {
                auto iter = this->hash_map.lazy_emplace_with_hash(key, hash_values[i], [&](const auto& ctor) {
                    (*not_founds)[i] = 1;
                    AggDataPtr pv = allocate_func(key);
                    ctor(key, pv);
                });
                (*agg_states)[i] = iter->second;
            } else if constexpr (compute_not_founds) {
                DCHECK(not_founds);
                if (auto iter = this->hash_map.find(key); iter != this->hash_map.end()) {
                    (*agg_states)[i] = iter->second;
                } else {
                    (*not_founds)[i] = 1;
                }

compute_not_founds and allocate_and_compute_state && compute_not_founds should be simpty to

if constexpr (allocate_and_compute_state) {
    auto iter = this->hash_map.lazy_emplace_with_hash(key, hash_values[i], [&](const auto& ctor) {
       if constexpr (xxx) {
           (*not_founds)[i] = true;
       }
        AggDataPtr pv = allocate_func(key);
        ctor(key, pv);
     });

    (*agg_states)[i] = iter->second;
}

Good~ fixed.

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

stdpain
stdpain previously approved these changes Nov 28, 2022
satanson
satanson previously approved these changes Nov 30, 2022
@wanpengfei-git wanpengfei-git added the Approved Ready to merge label Nov 30, 2022
@wanpengfei-git
Copy link
Collaborator

run starrocks_admit_test

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@@ -91,6 +91,28 @@ static_assert(sizeof(AggDataPtr) == sizeof(size_t));
this->hash_map.prefetch_hash(hash_values[__prefetch_index++]); \
}

#define AGG_HASH_MAP_COMMON_METHODS() \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessary to add this macros.
we can add this to AggHashMapWithKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good advise~

@github-actions github-actions bot removed the be-build label Dec 1, 2022
@LiShuMing
Copy link
Contributor Author

run starrocks_be_unittest

@github-actions
Copy link

github-actions bot commented Dec 1, 2022

clang-tidy review says "All clean, LGTM! 👍"

satanson
satanson previously approved these changes Dec 1, 2022
stdpain
stdpain previously approved these changes Dec 1, 2022
@wanpengfei-git wanpengfei-git added the Approved Ready to merge label Dec 1, 2022
@wanpengfei-git
Copy link
Collaborator

run starrocks_admit_test

@github-actions
Copy link

github-actions bot commented Dec 2, 2022

clang-tidy review says "All clean, LGTM! 👍"

@LiShuMing
Copy link
Contributor Author

run starrocks_admit_test

@github-actions
Copy link

github-actions bot commented Dec 4, 2022

clang-tidy review says "All clean, LGTM! 👍"

@LiShuMing
Copy link
Contributor Author

run starrocks_admit_test

@github-actions github-actions bot removed the be-build label Dec 5, 2022
@LiShuMing
Copy link
Contributor Author

run starrocks_admit_test

@github-actions
Copy link

github-actions bot commented Dec 5, 2022

clang-tidy review says "All clean, LGTM! 👍"

@wanpengfei-git wanpengfei-git added the Approved Ready to merge label Dec 5, 2022
@wanpengfei-git
Copy link
Collaborator

run starrocks_admit_test

@satanson satanson merged commit 32b4142 into StarRocks:main Dec 6, 2022
@github-actions github-actions bot removed Approved Ready to merge be-build labels Dec 6, 2022
@github-actions
Copy link

github-actions bot commented Dec 6, 2022

clang-tidy review says "All clean, LGTM! 👍"

@sonarcloud
Copy link

sonarcloud bot commented Dec 6, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
10.7% 10.7% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Incremental MV Operators
5 participants