-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM segmentation fault: JIT/Regression/CLR-x86-JIT/V1-M11-Beta1/b31878/b31878/b31878.sh #6334
Comments
/cc @RussKeldorph these are the tests reported by Samsung as failing? |
@jashook @RussKeldorph After PR dotnet/coreclr#6021 is merged, there are many regressions in CoreCLR ARM (softp) Release build: Here is the result from bdfce9e:
Here is the result from bdfce9e without PR dotnet/coreclr#6021 (revert PR dotnet/coreclr#6021 manually):
[bdfce9ed7fb][TC 387d9fc0a Release] [include-aab8856ce03].txt Interestingly, b31878 is passed on my side. |
On the PR that triggered this failure, this test looked like it passed on the Release build but segfaulted on the Debug build here - could be why it passed on a Release build for @parjong . |
|
@RussKeldorph @swgillespie is there a way I can get the dump off that box for you ? |
In theory, the dump should have been uploaded to the dumpling service, although looking at the logs I don't see anything indicating that the upload occurred. |
Can I go ahead and reset the CI? |
Yeah, I'd say it's fine. |
@swgillespie Where can we find out more about this "dumpling" service? I wouldn't be surprised it doesn't "just work" for the ARM emulator jobs. /cc @jashook |
@RussKeldorph I don't know the specifics, but as I understand it dumpling is an HTTP endpoint that can receive crash dumps (dotnet/coreclr#6083) and view them (http://aka.ms/dumpling). @adityamandaleeka or @bryanAR might know more about how it works vis-a-vis ARM and other platforms. |
@hqueue @leemgs @myungjoo @parjong @wateret @sjsinju This issue is causing spurious failures in our PR and rolling tests. Can you investigate? Note that we believe @CarolEidt fixed the ARM regression introduced by dotnet/coreclr#6021 before resubmitting her change. You may want to try adding the new --limitedDumpGeneration option to the ARM CI runtest.sh so you can use dumpling as @swgillespie suggests above. |
@RussKeldorph The issue seems to be related with emulator-based testing environment. I tried to reproduce this issue with Raspberry Pi 3 and another ARM soft-fp devices, but failed. @sjsinju @wateret Could you let me know about the current ARM CI in detail? |
I checked current the debug builds of ARM CI status. It was hard to find the failure in b31878 tc. But I found failures of below links. http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/arm_emulator_cross_debug_ubuntu_prtest/988/ are the PRs that just changed documentation of 'linux-instructions.md'. But they made each other different test failures. The release build of the same PR on the CI cloud was successful. It seems that is the same problem with this issue. I think the segmentation fault is caused by not only b31878 but any test cases randomly with only debug CI likes @swgillespie said. So I think we have to investigate debug build. |
@sjsinju Sorry for not being clear. Yes, we believe this failure is not specific to b31878. It seems to happen nondeterministically in many, if not all, tests currently running in ARM CI. I believe this is another example (in the release build): http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/arm_emulator_cross_release_ubuntu/232/console. Due to the difficulty of reproducing, it may be better to enable capturing the dump in the CI rather than attempting to reproduce locally. If the --limitedDumpGeneration switch I mentioned above works in the emulator, that might be the easiest thing to try. |
To make sure the reason of test failure ramdomly( #6298 ), runtest option of '--limitedDumpGeneration' is added.
To make sure the reason of test failure ramdomly( #6298 ), We checked segmentation faults occurred from mounted rootfs and the multi thread processing. So I changed root-fs to archived root-fs and run tests with --sequential option.
…7946) * ARM-CI : Use archived root-fs and run tests with --sequential option To make sure the reason of test failure ramdomly( #6298 ), We checked segmentation faults occurred from mounted rootfs and the multi thread processing. So I changed root-fs to archived root-fs and run tests with --sequential option. * change to original clang version
To make sure the reason of test failure ramdomly( #6298 ), We checked segmentation faults occurred from mounted rootfs and the multi thread processing. So I changed root-fs to the archived root-fs and run tests with --sequential option. PS. The location of root-fs folder was changed from '/opt' wrote on reverted commit(dotnet#7991) to '/mnt' for resolving no space issue.
* ARM-CI : Fix segmentation faults on running tests To make sure the reason of test failure ramdomly( #6298 ), We checked segmentation faults occurred from mounted rootfs and the multi thread processing. So I changed root-fs to the archived root-fs and run tests with --sequential option. PS. The location of root-fs folder was changed from '/opt' wrote on reverted commit(#7991) to '/mnt' for resolving no space issue.
The PR dotnet/coreclr#8019 that running tests using a archived root-fs with sequential option was merged. But it seems to be not enough to resolve this issue. Although the frequency has decreased, the segmentation fault is occurred still(Local tests were all successful). Additional investigations are needed. |
Per dotnet/coreclr#11069, I unfortunately recommend adding retry logic to runtest.sh. It should be enabled for ARM32 only, and I would prefer that a test only be retried if its output matches a very specific pattern, e.g. |
I think retry logic can be implemented in two ways.
(1) may be preferred over (2) in general (right?) What do you think of it ? |
I'm less concerned about how it's implemented and more concerned about the requirements. I strongly prefer retrying at the lowest level possible (e.g. individual test cases) rather than retrying all the tests if only one fails spuriously. I assumed that meant going with your option (1), but I suppose you could achieve the same goal in other ways. I think if a single invocation of I'm not sure we need a new script option to enable retry on failure. Retries are a huge hack that we should remove eventually, and I'm not a fan of adding unnecessary complexity hacks. I would just enable retry only when the |
I definitely agree with this idea.
I had similar idea in mind. Minor problems related to this approach is that (1) it may take more time to setup test environment if we invoke If this doesn't matter, then I also think this 2nd approach (by exploiting I will prepare a retrial logic with 2nd approach. |
related issue dotnet/coreclr#6573 |
@RussKeldorph Is there a reason this issue needs to be kept open? You added a reference to the issue recently. But it's not clear that the "real" issue here is what the failure it was opened for. |
@BruceForstall I'm pretty confident the problem is a bug in the version of QEMU we rely on. I don't know if a newer QEMU would fix it or if that's even an option. The issue is less pressing since we don't have default-triggered CI jobs using QEMU anymore, but I believe the bug is still relevant until we either fix QEMU or stop using it altogether. @jashook mentioned a possible way to test |
- dotnet crashes with: qemu: uncaught target signal 11 (Segmentation fault) - core dumped - Seems to be known issue: https://github.com/dotnet/coreclr/issues/6298
* Dockerfiles for arm32v7 * Add support for arm32. Dockerfile using qemu crashes. - dotnet crashes with: qemu: uncaught target signal 11 (Segmentation fault) - core dumped - Seems to be known issue: https://github.com/dotnet/coreclr/issues/6298
Closing stale. Will revisit if/when this environment is running regularly again. |
* Dockerfiles for arm32v7 * Add support for arm32. Dockerfile using qemu crashes. - dotnet crashes with: qemu: uncaught target signal 11 (Segmentation fault) - core dumped - Seems to be known issue: https://github.com/dotnet/coreclr/issues/6298
See http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/arm_emulator_cross_debug_ubuntu_prtest/511/console:
The text was updated successfully, but these errors were encountered: