Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-12334: [Rust] [Ballista] Aggregate queries producing incorrect results #10083

Closed
wants to merge 1 commit into from

Conversation

edrevo
Copy link
Contributor

@edrevo edrevo commented Apr 17, 2021

The function that calculated job status from the task status was aggregating all of the partition locations. This is incorrect. Only the partitions of the last stage should be collected.

@andygrove, could you confirm that my assumption that the output of any query is contained in the last stage? If there is a possibility of having multiple output stages, then this fix is incorrect.

@github-actions
Copy link

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@fdd6ab1). Click here to learn what that means.
The diff coverage is 0.00%.

❗ Current head 9c6206c differs from pull request most recent head 547c905. Consider uploading reports for the commit 547c905 to get more accurate results
Impacted file tree graph

@@            Coverage Diff            @@
##             master   #10083   +/-   ##
=========================================
  Coverage          ?   78.90%           
=========================================
  Files             ?      286           
  Lines             ?    64721           
  Branches          ?        0           
=========================================
  Hits              ?    51070           
  Misses            ?    13651           
  Partials          ?        0           
Impacted Files Coverage Δ
rust/ballista/rust/scheduler/src/state/mod.rs 0.00% <0.00%> (ø)
rust/datafusion/src/execution/context.rs 92.60% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdd6ab1...547c905. Read the comment docs.

@edrevo edrevo closed this Apr 17, 2021
@edrevo edrevo deleted the fix-shuffle-reads branch April 17, 2021 19:42
@edrevo edrevo restored the fix-shuffle-reads branch April 17, 2021 20:51
@edrevo
Copy link
Contributor Author

edrevo commented Apr 17, 2021

Didn't mean to close this, sorry.

@edrevo edrevo reopened this Apr 17, 2021
@andygrove
Copy link
Member

@edrevo That assumption is correct. Thanks for tracking this down!

@andygrove
Copy link
Member

Test failures are related to rust nightly and flatbuffers version. I merged this into master locally and confirmed that tests pass so am going to go ahead and merge.

@andygrove andygrove closed this in bb53986 Apr 17, 2021
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…results

The function that calculated job status from the task status was aggregating all of the partition locations. This is incorrect. Only the partitions of the last stage should be collected.

@andygrove, could you confirm that my assumption that the output of any query is contained in the last stage? If there is a possibility of having multiple output stages, then this fix is incorrect.

Closes apache#10083 from edrevo/fix-shuffle-reads

Authored-by: Ximo Guanter <ximo.guanter@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 10, 2021
…results

The function that calculated job status from the task status was aggregating all of the partition locations. This is incorrect. Only the partitions of the last stage should be collected.

@andygrove, could you confirm that my assumption that the output of any query is contained in the last stage? If there is a possibility of having multiple output stages, then this fix is incorrect.

Closes apache#10083 from edrevo/fix-shuffle-reads

Authored-by: Ximo Guanter <ximo.guanter@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
…results

The function that calculated job status from the task status was aggregating all of the partition locations. This is incorrect. Only the partitions of the last stage should be collected.

@andygrove, could you confirm that my assumption that the output of any query is contained in the last stage? If there is a possibility of having multiple output stages, then this fix is incorrect.

Closes apache#10083 from edrevo/fix-shuffle-reads

Authored-by: Ximo Guanter <ximo.guanter@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants