-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Some TPCH benchmarks started failing #35383
Comments
@icexelloss @rtpsw Could you take a look at this? |
BTW, it's difficult to find error messages from the log. Is this a related error for this?
Can we improve log output? |
Sorry that I missed notification of this. Thanks to @westonpace there is a fix in the linked PR above. |
westonpace
added a commit
that referenced
this issue
May 2, 2023
…d segmentation fault (#35384) ### Rationale for this change The recent change (#34912) calculates the max concurrency using `plan->query_context()->executor()->GetCapacity()`. This is later used to initialize the kernel states. However, this is different than what we used to use. The previous method used was `plan->query_context()->max_concurrency()` which is slightly different(if the aggregate node IS run in parallel then we initialize one state for each CPU thread, one for each I/O thread, and one for the calling user thread). This is unfortunately a bit complicated as `max_concurrency` would not be a good indicator to use when determining if the plan is running in parallel or not. So we need to query both properties and use them in their respective spots. ### What changes are included in this PR? Now, `max_concurrency` is used to figure out how many thread local states need to be initialized and `GetCapacity` is used to figure out if there are multiple CPU threads or not. ### Are these changes tested? The bug was caught by the benchmarks which is a bit concerning. Most of the CI have a very small number of CPU threads and don't experience much concurrency and so I think we just didn't see this pattern. Or possibly, this pattern is only experienced in the legacy way that pyarrow launches exec plans. ### Are there any user-facing changes? No. * Closes: #35383 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
liujiacheng777
pushed a commit
to LoongArch-Python/arrow
that referenced
this issue
May 11, 2023
…o avoid segmentation fault (apache#35384) ### Rationale for this change The recent change (apache#34912) calculates the max concurrency using `plan->query_context()->executor()->GetCapacity()`. This is later used to initialize the kernel states. However, this is different than what we used to use. The previous method used was `plan->query_context()->max_concurrency()` which is slightly different(if the aggregate node IS run in parallel then we initialize one state for each CPU thread, one for each I/O thread, and one for the calling user thread). This is unfortunately a bit complicated as `max_concurrency` would not be a good indicator to use when determining if the plan is running in parallel or not. So we need to query both properties and use them in their respective spots. ### What changes are included in this PR? Now, `max_concurrency` is used to figure out how many thread local states need to be initialized and `GetCapacity` is used to figure out if there are multiple CPU threads or not. ### Are these changes tested? The bug was caught by the benchmarks which is a bit concerning. Most of the CI have a very small number of CPU threads and don't experience much concurrency and so I think we just didn't see this pattern. Or possibly, this pattern is only experienced in the legacy way that pyarrow launches exec plans. ### Are there any user-facing changes? No. * Closes: apache#35383 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
ArgusLi
pushed a commit
to Bit-Quill/arrow
that referenced
this issue
May 15, 2023
…o avoid segmentation fault (apache#35384) ### Rationale for this change The recent change (apache#34912) calculates the max concurrency using `plan->query_context()->executor()->GetCapacity()`. This is later used to initialize the kernel states. However, this is different than what we used to use. The previous method used was `plan->query_context()->max_concurrency()` which is slightly different(if the aggregate node IS run in parallel then we initialize one state for each CPU thread, one for each I/O thread, and one for the calling user thread). This is unfortunately a bit complicated as `max_concurrency` would not be a good indicator to use when determining if the plan is running in parallel or not. So we need to query both properties and use them in their respective spots. ### What changes are included in this PR? Now, `max_concurrency` is used to figure out how many thread local states need to be initialized and `GetCapacity` is used to figure out if there are multiple CPU threads or not. ### Are these changes tested? The bug was caught by the benchmarks which is a bit concerning. Most of the CI have a very small number of CPU threads and don't experience much concurrency and so I think we just didn't see this pattern. Or possibly, this pattern is only experienced in the legacy way that pyarrow launches exec plans. ### Are there any user-facing changes? No. * Closes: apache#35383 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
rtpsw
pushed a commit
to rtpsw/arrow
that referenced
this issue
May 16, 2023
…o avoid segmentation fault (apache#35384) ### Rationale for this change The recent change (apache#34912) calculates the max concurrency using `plan->query_context()->executor()->GetCapacity()`. This is later used to initialize the kernel states. However, this is different than what we used to use. The previous method used was `plan->query_context()->max_concurrency()` which is slightly different(if the aggregate node IS run in parallel then we initialize one state for each CPU thread, one for each I/O thread, and one for the calling user thread). This is unfortunately a bit complicated as `max_concurrency` would not be a good indicator to use when determining if the plan is running in parallel or not. So we need to query both properties and use them in their respective spots. ### What changes are included in this PR? Now, `max_concurrency` is used to figure out how many thread local states need to be initialized and `GetCapacity` is used to figure out if there are multiple CPU threads or not. ### Are these changes tested? The bug was caught by the benchmarks which is a bit concerning. Most of the CI have a very small number of CPU threads and don't experience much concurrency and so I think we just didn't see this pattern. Or possibly, this pattern is only experienced in the legacy way that pyarrow launches exec plans. ### Are there any user-facing changes? No. * Closes: apache#35383 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
When #34912 was merged, some TPCH benchmarks started failing. Here is a build with failed benchmarks: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2835
#35364 fixed some of these failing TPCH benchmarks but these benchmarks are still failing:
Component(s)
Benchmarking
The text was updated successfully, but these errors were encountered: