-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfaults in rabit tests #10321
Comments
I will debug it next week. |
I can't reproduce the segfault on my Mac mini (M1). I wonder if the issue is only present on Intel Mac? |
Might be related to #10312 ? |
I don't think this is related to NCCL. I can reproduce that with repeated runs, it's a deadlock inside CTK called by NCCL. |
That's what we are currently using on the master branch right? |
Yes |
Looking at the x86 instance on aws, it seems macos 13 is not available? https://aws.amazon.com/ec2/instance-types/mac/ |
Same error with |
Just ran the gtest with Thread Sanitizer enabled on MacOS 12:
|
Never mind, the error from ThreadSanitizer appears to be a false positive |
Fixed in #10320 |
https://github.com/dmlc/xgboost/actions/runs/9211613634/job/25341388037?pr=10320
https://github.com/dmlc/xgboost/actions/runs/9211613620/job/25341830117?pr=10320
Rabit tests segfault on the
macos-13
platform. The failure occurs consistently when I restart the tests.The text was updated successfully, but these errors were encountered: