Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Improve the randomness of the random function #35199

Merged
merged 3 commits into from
Nov 21, 2023

Conversation

zenoyang
Copy link
Contributor

@zenoyang zenoyang commented Nov 16, 2023

Why I'm doing:
The original random function of SR was implemented through the rand_r function. The randomness was very poor and usually did not meet expectations.

before:
image

after:
image

What I'm doing:

Fixes #34026

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@CLAassistant
Copy link

CLAassistant commented Nov 16, 2023

CLA assistant check
All committers have signed the CLA.

context->set_function_state(FunctionContext::THREAD_LOCAL, state);
for (int i = 0; i < num_rows; ++i) {
result.append(distribution(*generator));
}

return result.build(false);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Using a reinterpret_cast<int64_t> for the pointer to std::mt19937_64 instance, which can cause undefined behavior when it tries to cast back and use it.

You can modify the code like this:

-    void* state = context->get_function_state(FunctionContext::THREAD_LOCAL);
+    std::mt19937_64* generator = reinterpret_cast<std::mt19937_64*>(context->get_function_state(FunctionContext::THREAD_LOCAL));

...

-    int64_t res = generate_randoms(&result, num_rows, reinterpret_cast<int64_t>(state));
-    state = reinterpret_cast<void*>(res); // NOLINT
-    context->set_function_state(FunctionContext::THREAD_LOCAL, state);
// The above lines are actually removed in the given code review as they are outdated logic not suitable for use with std::mt19937_64.

The rest of the changes provided here replace the deleted generate_randoms function with standard <random> library functions. These changes also introduce proper handling of memory allocation and deallocation for the std::mt19937_64 random number generator.

@zenoyang zenoyang marked this pull request as draft November 16, 2023 12:43
@zenoyang zenoyang force-pushed the 231115_opt_rand_randomness branch 3 times, most recently from 56d5589 to 754dcff Compare November 16, 2023 13:28
@imay
Copy link
Contributor

imay commented Nov 16, 2023

@zenoyang
After run is slower than current, I think you pasted the wrong picture.

@stdpain stdpain self-assigned this Nov 20, 2023
return Status::OK();
}

StatusOr<ColumnPtr> MathFunctions::rand(FunctionContext* context, const Columns& columns) {
int32_t num_rows = ColumnHelper::get_const_value<TYPE_INT>(columns[columns.size() - 1]);
void* state = context->get_function_state(FunctionContext::THREAD_LOCAL);
std::mt19937_64* generator =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declare static thread local here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I modified it based on your suggestions, and the test performance is in line with expectations. Please help to review it again. @stdpain

@zenoyang zenoyang marked this pull request as ready for review November 20, 2023 10:22
@zenoyang
Copy link
Contributor Author

@zenoyang After run is slower than current, I think you pasted the wrong picture.

No, that is the result of the intermediate state of development. The performance was not very good at the beginning.

stdpain
stdpain previously approved these changes Nov 21, 2023
@stdpain
Copy link
Contributor

stdpain commented Nov 21, 2023

CI clang-format failed in

--- /home/runner/_work/starrocks/starrocks/be/src/exprs/math_functions.cpp
+++ /home/runner/_work/starrocks/starrocks/be/src/exprs/math_functions.cpp (after clang format)
/home/runner/_work/starrocks/starrocks/be/src/exprs/math_functions.cpp had clang-format style issues
@@ -34,7 +34,7 @@
 static const double MAX_EXP_PARAMETER = std::log(std::numeric_limits<double>::max());
 
 static std::uniform_real_distribution<double> distribution(0.0, 1.0);
-static thread_local std::mt1[9](https://github.com/StarRocks/starrocks/actions/runs/6928566833/job/18844627873?pr=35199#step:6:10)937_64 generator { std::random_device{}() };
+static thread_local std::mt[19](https://github.com/StarRocks/starrocks/actions/runs/6928566833/job/18844627873?pr=35199#step:6:20)937_64 generator{std::random_device{}()};
 
 // ==== basic check rules =========
 DEFINE_UNARY_FN_WITH_IMPL(NegativeCheck, value) {

@stdpain
Copy link
Contributor

stdpain commented Nov 21, 2023

cla also need check

@stdpain
Copy link
Contributor

stdpain commented Nov 21, 2023

sign off also need check. try to execute

git commit -s -m "[Enhancement] Improv...."

Signed-off-by: zenoyang <cookie.yz@qq.com>
Signed-off-by: zenoyang <cookie.yz@qq.com>
@zenoyang
Copy link
Contributor Author

zenoyang commented Nov 21, 2023

sign off also need check. try to execute

git commit -s -m "[Enhancement] Improv...."

done

Signed-off-by: zenoyang <cookie.yz@qq.com>
@github-actions github-actions bot removed the 3.2 label Nov 21, 2023
@wanpengfei-git
Copy link
Collaborator

@Mergifyio backport branch-3.1

@github-actions github-actions bot removed the 3.1 label Nov 21, 2023
@wanpengfei-git
Copy link
Collaborator

@Mergifyio backport branch-3.0

@github-actions github-actions bot removed the 3.0 label Nov 21, 2023
@wanpengfei-git
Copy link
Collaborator

@Mergifyio backport branch-2.5

@github-actions github-actions bot removed the 2.5 label Nov 21, 2023
Copy link
Contributor

mergify bot commented Nov 21, 2023

backport branch-3.2

✅ Backports have been created

Copy link
Contributor

mergify bot commented Nov 21, 2023

backport branch-3.1

✅ Backports have been created

Copy link
Contributor

mergify bot commented Nov 21, 2023

backport branch-3.0

✅ Backports have been created

Copy link
Contributor

mergify bot commented Nov 21, 2023

backport branch-2.5

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Nov 21, 2023
Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit 893a701)
mergify bot pushed a commit that referenced this pull request Nov 21, 2023
Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit 893a701)

# Conflicts:
#	be/src/exprs/math_functions.cpp
mergify bot pushed a commit that referenced this pull request Nov 21, 2023
Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit 893a701)

# Conflicts:
#	be/src/exprs/math_functions.cpp
mergify bot pushed a commit that referenced this pull request Nov 21, 2023
Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit 893a701)

# Conflicts:
#	be/src/exprs/vectorized/math_functions.cpp
wanpengfei-git pushed a commit that referenced this pull request Nov 24, 2023
Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit 893a701)
chaoyli pushed a commit to chaoyli/starrocks that referenced this pull request May 28, 2024
…s#35199)

Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit 893a701)
chaoyli pushed a commit to chaoyli/starrocks that referenced this pull request May 28, 2024
…s#35199)

Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit 893a701)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some issues with StarRocks being compatible with Trino
7 participants