Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unix tests' core-cracking script looks in the wrong place for core dumps #55702

Open
danmoseley opened this issue Jul 15, 2021 · 6 comments
Open

Comments

@danmoseley
Copy link
Member

Eg., from a crash in a unit test run on Linux:

----- end Wed Jul 14 21:46:14 UTC 2021 ----- exit code 134 ----------------------------------------------------------
exit code 134 means SIGABRT Abort. Managed or native assert, or runtime check such as heap corruption, caused call to abort(). Core dumped.
ulimit -c value: unlimited
...
Waiting a few seconds for any dump to be written..
cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p
cat /proc/sys/kernel/core_uses_pid: 0
cat: /proc/sys/kernel/coredump_filter: No such file or directory
cat /proc/sys/kernel/coredump_filter:
Looking around for any Linux dump..
... found no dump in /root/helix/work/workitem

The script we use to run unit tests tries to find any core dump, crack it with gdb, dump all threads' stacks, then make a copy in case it's useful. This is broken, because it's looking in the current directory for the dump, and it's nowadays found in /home/helixbot/dotnetbuild/dumps/. It needs to read core_pattern so it knows to look there.

Note, Helix gathers the dump just fine because it uses core_pattern (in fact it set that). It would be nice to fix this so that we also dump some free info in the log.

cc fyi @MattGal

@dotnet-issue-labeler dotnet-issue-labeler bot added area-Diagnostics-coreclr untriaged New issue has not been triaged by the area owner labels Jul 15, 2021
@ghost
Copy link

ghost commented Jul 15, 2021

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Eg., from a crash in a unit test run on Linux:

----- end Wed Jul 14 21:46:14 UTC 2021 ----- exit code 134 ----------------------------------------------------------
exit code 134 means SIGABRT Abort. Managed or native assert, or runtime check such as heap corruption, caused call to abort(). Core dumped.
ulimit -c value: unlimited
...
Waiting a few seconds for any dump to be written..
cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p
cat /proc/sys/kernel/core_uses_pid: 0
cat: /proc/sys/kernel/coredump_filter: No such file or directory
cat /proc/sys/kernel/coredump_filter:
Looking around for any Linux dump..
... found no dump in /root/helix/work/workitem

The script we use to run unit tests tries to find any core dump, crack it with gdb, dump all threads' stacks, then make a copy in case it's useful. This is broken, because it's looking in the current directory for the dump, and it's nowadays found in /home/helixbot/dotnetbuild/dumps/. It needs to read core_pattern so it knows to look there.

Note, Helix gathers the dump just fine because it uses core_pattern (in fact it set that). It would be nice to fix this so that we also dump some free info in the log.

cc fyi @MattGal

Author: danmoseley
Assignees: -
Labels:

area-Diagnostics-coreclr, untriaged

Milestone: -

@danmoseley danmoseley added area-Infrastructure-libraries and removed area-Diagnostics-coreclr untriaged New issue has not been triaged by the area owner labels Jul 15, 2021
@ghost ghost added this to Untriaged in Infrastructure Backlog Jul 15, 2021
@danmoseley danmoseley added this to the Future milestone Jul 15, 2021
@ghost ghost moved this from Untriaged to Future in Infrastructure Backlog Jul 15, 2021
@ViktorHofer
Copy link
Member

I think that the most of the RunnerTemplate logic for Unix is outdated and can probably be removed. At least the parts here and the other ones at the bottom of the script.

@danmoseley
Copy link
Member Author

Really none of this need be repo specific right? Arcade spoils have any dump cracking logic.

@ViktorHofer
Copy link
Member

Exactly. I don't think any of that logic is important for local testing either.

@danmoseley
Copy link
Member Author

Would you perhaps be able to open an issue in Arcade pointing to where the logic should go?

@ViktorHofer
Copy link
Member

The way dotnet/runtime, more specifically libraries, execute tests is still very much repo specific. I don't think it makes sense to move this into Arcade. IMO we can just remove parts of that code and no one will notice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants