-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Improve the randomness of the random function #35199
Conversation
context->set_function_state(FunctionContext::THREAD_LOCAL, state); | ||
for (int i = 0; i < num_rows; ++i) { | ||
result.append(distribution(*generator)); | ||
} | ||
|
||
return result.build(false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Using a reinterpret_cast<int64_t>
for the pointer to std::mt19937_64
instance, which can cause undefined behavior when it tries to cast back and use it.
You can modify the code like this:
- void* state = context->get_function_state(FunctionContext::THREAD_LOCAL);
+ std::mt19937_64* generator = reinterpret_cast<std::mt19937_64*>(context->get_function_state(FunctionContext::THREAD_LOCAL));
...
- int64_t res = generate_randoms(&result, num_rows, reinterpret_cast<int64_t>(state));
- state = reinterpret_cast<void*>(res); // NOLINT
- context->set_function_state(FunctionContext::THREAD_LOCAL, state);
// The above lines are actually removed in the given code review as they are outdated logic not suitable for use with std::mt19937_64.
The rest of the changes provided here replace the deleted generate_randoms
function with standard <random>
library functions. These changes also introduce proper handling of memory allocation and deallocation for the std::mt19937_64
random number generator.
56d5589
to
754dcff
Compare
@zenoyang |
be/src/exprs/math_functions.cpp
Outdated
return Status::OK(); | ||
} | ||
|
||
StatusOr<ColumnPtr> MathFunctions::rand(FunctionContext* context, const Columns& columns) { | ||
int32_t num_rows = ColumnHelper::get_const_value<TYPE_INT>(columns[columns.size() - 1]); | ||
void* state = context->get_function_state(FunctionContext::THREAD_LOCAL); | ||
std::mt19937_64* generator = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declare static thread local here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I modified it based on your suggestions, and the test performance is in line with expectations. Please help to review it again. @stdpain
No, that is the result of the intermediate state of development. The performance was not very good at the beginning. |
CI clang-format failed in --- /home/runner/_work/starrocks/starrocks/be/src/exprs/math_functions.cpp
+++ /home/runner/_work/starrocks/starrocks/be/src/exprs/math_functions.cpp (after clang format)
/home/runner/_work/starrocks/starrocks/be/src/exprs/math_functions.cpp had clang-format style issues
@@ -34,7 +34,7 @@
static const double MAX_EXP_PARAMETER = std::log(std::numeric_limits<double>::max());
static std::uniform_real_distribution<double> distribution(0.0, 1.0);
-static thread_local std::mt1[9](https://github.com/StarRocks/starrocks/actions/runs/6928566833/job/18844627873?pr=35199#step:6:10)937_64 generator { std::random_device{}() };
+static thread_local std::mt[19](https://github.com/StarRocks/starrocks/actions/runs/6928566833/job/18844627873?pr=35199#step:6:20)937_64 generator{std::random_device{}()};
// ==== basic check rules =========
DEFINE_UNARY_FN_WITH_IMPL(NegativeCheck, value) { |
cla also need check |
sign off also need check. try to execute
|
Signed-off-by: zenoyang <cookie.yz@qq.com>
Signed-off-by: zenoyang <cookie.yz@qq.com>
7f387f6
to
e41b43c
Compare
done |
Signed-off-by: zenoyang <cookie.yz@qq.com>
@Mergifyio backport branch-3.1 |
@Mergifyio backport branch-3.0 |
@Mergifyio backport branch-2.5 |
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
Signed-off-by: zenoyang <cookie.yz@qq.com> (cherry picked from commit 893a701)
Signed-off-by: zenoyang <cookie.yz@qq.com> (cherry picked from commit 893a701) # Conflicts: # be/src/exprs/math_functions.cpp
Signed-off-by: zenoyang <cookie.yz@qq.com> (cherry picked from commit 893a701) # Conflicts: # be/src/exprs/math_functions.cpp
Signed-off-by: zenoyang <cookie.yz@qq.com> (cherry picked from commit 893a701) # Conflicts: # be/src/exprs/vectorized/math_functions.cpp
Signed-off-by: zenoyang <cookie.yz@qq.com> (cherry picked from commit 893a701)
Why I'm doing:
The original
random
function of SR was implemented through therand_r
function. The randomness was very poor and usually did not meet expectations.before:
after:
What I'm doing:
Fixes #34026
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: