New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-12334: [Rust] [Ballista] Aggregate queries producing incorrect results #10083
Conversation
3221b1e
to
547c905
Compare
Codecov Report
@@ Coverage Diff @@
## master #10083 +/- ##
=========================================
Coverage ? 78.90%
=========================================
Files ? 286
Lines ? 64721
Branches ? 0
=========================================
Hits ? 51070
Misses ? 13651
Partials ? 0
Continue to review full report at Codecov.
|
Didn't mean to close this, sorry. |
@edrevo That assumption is correct. Thanks for tracking this down! |
Test failures are related to rust nightly and flatbuffers version. I merged this into master locally and confirmed that tests pass so am going to go ahead and merge. |
…results The function that calculated job status from the task status was aggregating all of the partition locations. This is incorrect. Only the partitions of the last stage should be collected. @andygrove, could you confirm that my assumption that the output of any query is contained in the last stage? If there is a possibility of having multiple output stages, then this fix is incorrect. Closes apache#10083 from edrevo/fix-shuffle-reads Authored-by: Ximo Guanter <ximo.guanter@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>
…results The function that calculated job status from the task status was aggregating all of the partition locations. This is incorrect. Only the partitions of the last stage should be collected. @andygrove, could you confirm that my assumption that the output of any query is contained in the last stage? If there is a possibility of having multiple output stages, then this fix is incorrect. Closes apache#10083 from edrevo/fix-shuffle-reads Authored-by: Ximo Guanter <ximo.guanter@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>
…results The function that calculated job status from the task status was aggregating all of the partition locations. This is incorrect. Only the partitions of the last stage should be collected. @andygrove, could you confirm that my assumption that the output of any query is contained in the last stage? If there is a possibility of having multiple output stages, then this fix is incorrect. Closes apache#10083 from edrevo/fix-shuffle-reads Authored-by: Ximo Guanter <ximo.guanter@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>
The function that calculated job status from the task status was aggregating all of the partition locations. This is incorrect. Only the partitions of the last stage should be collected.
@andygrove, could you confirm that my assumption that the output of any query is contained in the last stage? If there is a possibility of having multiple output stages, then this fix is incorrect.