-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] 330cdh failed test_hash_reduction_sum_full_decimal on CI #9779
Comments
this also failed multiple cases in different CDH nightly runs all mismatch CPU and GPU results. FAILED ../../src/main/python/string_test.py::test_initcap[DATAGEN_SEED=1700413755, INJECT_OOM, INCOMPAT]
FAILED ../../src/main/python/schema_evolution_test.py::test_column_add_after_partition[orc][DATAGEN_SEED=1700413303, INJECT_OOM, IGNORE_ORDER({'local': True})]
|
@pxLi please file separate issue for each failure. The two you listed here are not related to the test failure for SUM which this is about. |
Datagen seed was 1700246532:
|
While trying to reproduce, I ran into a slightly different issue where instead of the CPU overflowing and the GPU did not, the opposite occurred. Here's what I ran to reproduce:
|
Running with |
The test is failing because of a difference in how the CPU does overflow checks vs. how the GPU checks for overflow, and the CPU itself is not consistent in how it might check for overflow (e.g.: overflow checking with codegen enabled is different than when it's disabled). In the single-task case, the GPU is overflowing only in the tests where we use a batch size of 250. That is essentially emulating the case where we're running with more tasks, since each batch is like doing a separate partial aggregation, and the overflow check is done per batch (like the CPU does it per partition). In the second case where we use two tasks, the CPU overflows because it's checking for overflow after the partial, but the values fit in 128-bit for the GPU intermediate so we let it pass and that allows the final to aggregate to a value that can be stored in Decimal(38). |
Remaining scope is to fix seed in proper way so failure will not be encountered going forward. |
Describe the bug
Steps/Code to reproduce bug
TBD
Expected behavior
should pass
Environment details (please complete the following information)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: