[Enhancement] optimize harmonic mean evaluation in hll::estimate_cardinality #16351

satanson · 2023-01-09T01:22:44Z

What type of PR is this：

Which issues of this PR fixes ：

Fixes #

Problem Summary(Required) ：

Checklist:

I have added test cases for my bug fix or my new feature
This pr will affect users' behaviors
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

Harmonic mean evaluation in estimate_cardinality is quite slow, code as follows, 3 other choices are tried to speed up this code snippet.

orignal method:

std::pair<float, int> calc_harmonic_mean1(int8_t* data, size_t n) {
    float harmonic_mean = 0;
    int num_zeros = 0;

    for (int i = 0; i < n; ++i) {
        harmonic_mean += powf(2.0f, -data[i]);

        if (data[i] == 0) {
            ++num_zeros;
        }
    }
    harmonic_mean = 1.0f / harmonic_mean;
    return std::make_pair(harmonic_mean, num_zeros);
}

simd method
notice: exp256_ps comes from https://github.com/reyoung/avx_mathfun

std::pair<float, int> calc_harmonic_mean2(int8_t* data, size_t n) {
    float harmonic_mean = 0;
    int num_zeros = 0;
#if defined(__AVX2__)
    auto* p = data;
    const auto end = data + n;
    constexpr auto BLOCK_SIZE = sizeof(__m256i);
    const auto end0 = data + (n & ~(BLOCK_SIZE - 1));
    const auto ln2 = _mm256_set1_ps(0.69314718055995f);
    const auto zerof32 = _mm256_setzero_ps();
    const auto zeroi8 = _mm256_setzero_si256();
    auto sumf32 = _mm256_setzero_ps();
    for (; p < end0; p += BLOCK_SIZE) {
        auto d = _mm256_load_si256(reinterpret_cast<__m256i*>(p));
        num_zeros += _mm_popcnt_u32(_mm256_movemask_epi8(_mm256_cmpeq_epi8(d, zeroi8)));

        auto pp = p;
        for (int i = 0; i < 4; ++i) {
            auto x = _mm256_set_ps(pp[0], pp[1], pp[2], pp[3], pp[4], pp[5], pp[6], pp[7]);
            sumf32 = _mm256_add_ps(exp256_ps((_mm256_mul_ps(_mm256_sub_ps(zerof32, x), ln2))), sumf32);
            pp += 8;
        }
    }
    for (int i = 0; i < sizeof(sumf32) / sizeof(float); ++i) {
        harmonic_mean += (reinterpret_cast<float*>(&sumf32))[i];
    }
#endif
    for (; p < end; ++p) {
        harmonic_mean += powf(2.0f, p[0]);
        if (p[0] == 0) {
            ++num_zeros;
        }
    }

    harmonic_mean = 1.0f / harmonic_mean;
    return std::make_pair(harmonic_mean, num_zeros)

use 1.0f / (1L << x) instead of powf(2, -x) becase x ranges in (0..40):

std::pair<float, int> calc_harmonic_mean3(int8_t* data, size_t n) {
    float harmonic_mean = 0;
    int num_zeros = 0;

    for (int i = 0; i < n; ++i) {
        harmonic_mean += 1.0f / static_cast<float>((1L << data[i]));
        if (data[i] == 0) {
            ++num_zeros;
        }
    }
    harmonic_mean = 1.0f / harmonic_mean;
    return std::make_pair(harmonic_mean, num_zeros);
}

similar to 3, but use a lookup table to access pre-computed 1.0f / (1L << x) x ranges(0..64)

static float tables[65] = {
        1.0f / static_cast<float>(1L << 0),  1.0f / static_cast<float>(1L << 1),  1.0f / static_cast<float>(1L << 2),
        1.0f / static_cast<float>(1L << 3),  1.0f / static_cast<float>(1L << 4),  1.0f / static_cast<float>(1L << 5),
        1.0f / static_cast<float>(1L << 6),  1.0f / static_cast<float>(1L << 7),  1.0f / static_cast<float>(1L << 8),
        1.0f / static_cast<float>(1L << 9),  1.0f / static_cast<float>(1L << 10), 1.0f / static_cast<float>(1L << 11),
        1.0f / static_cast<float>(1L << 12), 1.0f / static_cast<float>(1L << 13), 1.0f / static_cast<float>(1L << 14),
        1.0f / static_cast<float>(1L << 15), 1.0f / static_cast<float>(1L << 16), 1.0f / static_cast<float>(1L << 17),
        1.0f / static_cast<float>(1L << 18), 1.0f / static_cast<float>(1L << 19), 1.0f / static_cast<float>(1L << 20),
        1.0f / static_cast<float>(1L << 21), 1.0f / static_cast<float>(1L << 22), 1.0f / static_cast<float>(1L << 23),
        1.0f / static_cast<float>(1L << 24), 1.0f / static_cast<float>(1L << 25), 1.0f / static_cast<float>(1L << 26),
        1.0f / static_cast<float>(1L << 27), 1.0f / static_cast<float>(1L << 28), 1.0f / static_cast<float>(1L << 29),
        1.0f / static_cast<float>(1L << 30), 1.0f / static_cast<float>(1L << 31), 1.0f / static_cast<float>(1L << 32),
        1.0f / static_cast<float>(1L << 33), 1.0f / static_cast<float>(1L << 34), 1.0f / static_cast<float>(1L << 35),
        1.0f / static_cast<float>(1L << 36), 1.0f / static_cast<float>(1L << 37), 1.0f / static_cast<float>(1L << 38),
        1.0f / static_cast<float>(1L << 39), 1.0f / static_cast<float>(1L << 40), 1.0f / static_cast<float>(1L << 41),
        1.0f / static_cast<float>(1L << 42), 1.0f / static_cast<float>(1L << 43), 1.0f / static_cast<float>(1L << 44),
        1.0f / static_cast<float>(1L << 45), 1.0f / static_cast<float>(1L << 46), 1.0f / static_cast<float>(1L << 47),
        1.0f / static_cast<float>(1L << 48), 1.0f / static_cast<float>(1L << 49), 1.0f / static_cast<float>(1L << 50),
        1.0f / static_cast<float>(1L << 51), 1.0f / static_cast<float>(1L << 52), 1.0f / static_cast<float>(1L << 53),
        1.0f / static_cast<float>(1L << 54), 1.0f / static_cast<float>(1L << 55), 1.0f / static_cast<float>(1L << 56),
        1.0f / static_cast<float>(1L << 57), 1.0f / static_cast<float>(1L << 58), 1.0f / static_cast<float>(1L << 59),
        1.0f / static_cast<float>(1L << 60), 1.0f / static_cast<float>(1L << 61), 1.0f / static_cast<float>(1L << 62),
        1.0f / static_cast<float>(1L << 63), 1.0f / static_cast<float>(1L << 64),
};

std::pair<float, int> calc_harmonic_mean4(int8_t* data, size_t n) {
    float harmonic_mean = 0;
    int num_zeros = 0;

    for (int i = 0; i < n; ++i) {
        harmonic_mean += tables[data[i]];
        if (data[i] == 0) {
            ++num_zeros;
        }
    }
    harmonic_mean = 1.0f / harmonic_mean;
    return std::make_pair(harmonic_mean, num_zeros);
}

Micro-benchmarks are conducted on these functions, it shows that choice 4 is the best. choice 4 outperform original implemenation 7.74X.

NOTICE: in BM_calc_harmonic_mean{n}_{m}, n means which choice, m means the function as applied to std::vector<int8_t> of length 2^m.

…inality

silverbullet233 · 2023-01-09T02:23:32Z

LGTM

be/src/types/hll.cpp

github-actions · 2023-01-09T02:58:51Z

clang-tidy review says "All clean, LGTM! 👍"

sonarcloud · 2023-01-09T03:00:29Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

wanpengfei-git · 2023-01-09T08:57:43Z

run starrocks_admit_test

mofeiatwork · 2023-01-09T18:16:50Z

run starrocks_admit_test

satanson force-pushed the optimize_harmoic_mean_evaluation branch from 2fd5a59 to c8dde24 Compare January 9, 2023 01:27

[Enhancement] optimize harmonic mean evaluation in hll::estimate_card…

ad13ebc

…inality

satanson force-pushed the optimize_harmoic_mean_evaluation branch from c8dde24 to ad13ebc Compare January 9, 2023 02:22

silverbullet233 approved these changes Jan 9, 2023

View reviewed changes

ZiheLiu reviewed Jan 9, 2023

View reviewed changes

be/src/types/hll.cpp Show resolved Hide resolved

wanpengfei-git added the be-build label Jan 9, 2023

kangkaisen approved these changes Jan 9, 2023

View reviewed changes

wanpengfei-git added the Approved Ready to merge label Jan 9, 2023

satanson enabled auto-merge (squash) January 9, 2023 15:58

mofeiatwork approved these changes Jan 9, 2023

View reviewed changes

imay approved these changes Jan 9, 2023

View reviewed changes

satanson merged commit 16f3429 into StarRocks:main Jan 9, 2023

github-actions bot removed Approved Ready to merge be-build labels Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] optimize harmonic mean evaluation in hll::estimate_cardinality #16351

[Enhancement] optimize harmonic mean evaluation in hll::estimate_cardinality #16351

satanson commented Jan 9, 2023 •

edited

silverbullet233 commented Jan 9, 2023

github-actions bot commented Jan 9, 2023

sonarcloud bot commented Jan 9, 2023

wanpengfei-git commented Jan 9, 2023

mofeiatwork commented Jan 9, 2023

[Enhancement] optimize harmonic mean evaluation in hll::estimate_cardinality #16351

[Enhancement] optimize harmonic mean evaluation in hll::estimate_cardinality #16351

Conversation

satanson commented Jan 9, 2023 • edited

What type of PR is this：

Which issues of this PR fixes ：

Problem Summary(Required) ：

Checklist:

Bugfix cherry-pick branch check:

silverbullet233 commented Jan 9, 2023

github-actions bot commented Jan 9, 2023

sonarcloud bot commented Jan 9, 2023

wanpengfei-git commented Jan 9, 2023

mofeiatwork commented Jan 9, 2023

satanson commented Jan 9, 2023 •

edited