Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][C++] arrow-compute ExecPlanExecution.StressSourceGroupedSumStop timeout #15243

Closed
mapleFU opened this issue Jan 7, 2023 · 10 comments · Fixed by #33700
Closed

[CI][C++] arrow-compute ExecPlanExecution.StressSourceGroupedSumStop timeout #15243

mapleFU opened this issue Jan 7, 2023 · 10 comments · Fixed by #33700

Comments

@mapleFU
Copy link
Member

mapleFU commented Jan 7, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Test is: https://github.com/apache/arrow/actions/runs/3862492798/jobs/6584098719

[ RUN      ] ExecPlanExecution.StressSourceGroupedSumStop
D:/a/arrow/arrow/cpp/src/arrow/compute/exec/plan_test.cc:849: Failure
Value of: _fut.Wait(::arrow::kDefaultAssertFinishesWaitSeconds)
  Actual: false
Expected: true
Google Test trace:
D:/a/arrow/arrow/cpp/src/arrow/compute/exec/plan_test.cc:824: single threaded
D:/a/arrow/arrow/cpp/src/arrow/compute/exec/plan_test.cc:821: slowed
D:/a/arrow/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:64: Plan was destroyed before finishing


99% tests passed, 1 tests failed out of 69

which cause this patch failed #15241

Component(s)

C++, Continuous Integration

@rok
Copy link
Member

rok commented Jan 8, 2023

Looks like the test takes too long to run?

@mapleFU
Copy link
Member Author

mapleFU commented Jan 8, 2023

Looks like the test takes too long to run?

Maybe just running to much time / a deadlock or other bug...I don't know :(

@westonpace
Copy link
Member

kDefaultAssertFinishesWaitSeconds is generally long enough that even CI machines will finish by then. So I think deadlock seems likely :(

@westonpace
Copy link
Member

I've reproduced this. It is a deadlock. It appears that, in some cancel situations, GroupByNode is not properly marking it's finished future as complete. I can probably spot fix this if we need to but it is fixed more categorically by #15253 (I'm attempting to reproduce on that branch and so far have not been able to so that is a good sign). If it appears that #15253 is not going to merge before the release then I will fix this individually (feel free to ping me if it seems like I am forgetting).

@mapleFU
Copy link
Member Author

mapleFU commented Jan 11, 2023

Please go ahead, I just blocked by CI

By the way, I meet another timeout, but I don't understand why: https://github.com/apache/arrow/actions/runs/3886295191/jobs/6631229773

@westonpace
Copy link
Member

I don't think #15253 will merge in time. I am going to try and put in a more targeted fix today.

@mapleFU
Copy link
Member Author

mapleFU commented Jan 16, 2023

Hi westonpace, did you solve this issue?

@westonpace
Copy link
Member

Hi westonpace, did you solve this issue?

Thanks for the reminder, I had forgotten.

@westonpace westonpace modified the milestones: 12.0.0, 11.0.0 Jan 16, 2023
@westonpace
Copy link
Member

@raulcd this should be included in the release. I've marked it with the 11.0.0 milestone. Is there anything more I need to do?

@raulcd
Copy link
Member

raulcd commented Jan 16, 2023

no, that's fine, Thanks! The archery release cherry-pick script will pick it up. I am adding the Priority: blocker label, just as a reminder for me to double check once I run the cherry-pick command but adding the 11.0.0 milestone should be enough.

@raulcd raulcd added the Priority: Blocker Marks a blocker for the release label Jan 16, 2023
raulcd pushed a commit that referenced this issue Jan 18, 2023
* Closes: #15243

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants