-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gasnetex: crash and assertion failure #1118
Comments
Start by getting a backtrace |
The backtrace is included at the top of the paste -- it was collected with |
We'll probably need to wait for @streichler to get back from vacation. It would still be good to get a backtrace with symbols. Try using |
I got a different error with
Another backtrace for this error:
|
Here's a backtrace for the intial assertion failure:
|
I suspect that they are variations of the same error. The CRC checksum differing is indicative of packet data corruption, which could manifest in different ways. Since the original error also manifested in an active message handler, it is likely that packet was corrupted as well. |
Another stack that doesn't appear when building with debug (or at least i haven't been able to reproduce it)
|
@rohany can you try running with |
Running with
|
I also seen this error:
I'm trying to get a backtrace, will update if I can get one. |
I ran into this error again on a different application, here's the backtrace:
|
@rohany can you try cherry-picking this commit and see if the behavior with batching enabled changes (for better or for worse)? |
These bugs look like they have been fixed! Closing this for now. |
Well it needs 58cc02f97e2e6d4e4d6f67c03454712304600b15 to land first, but basically fixed. |
I recently switched to using GASNet-Ex from GASNet, and I see this error frequently when running on 8 nodes of lassen:
I'm on commit
e7d513f51a39df73e9a7ca31f9a631a82777c022
. I can't reproduce the error on Sapling, so let me know what I can do to give you information to debug it.The text was updated successfully, but these errors were encountered: