Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET 5 application on Linux fails to create CoreCLR, 0x80004005 #46462

Closed
Mgamerz opened this issue Dec 30, 2020 · 14 comments
Closed

.NET 5 application on Linux fails to create CoreCLR, 0x80004005 #46462

Mgamerz opened this issue Dec 30, 2020 · 14 comments
Labels
area-PAL-coreclr needs-author-action An issue or pull request that requires more info or actions from the author.
Milestone

Comments

@Mgamerz
Copy link

Mgamerz commented Dec 30, 2020

Description

I'm running .NET 5 on a Linux host that I don't own (it's a shared environment). I know I'm not really supposed to do this... but here we are.

I downloaded the zip for .NET 5 for Linux, extracted it, and exported the DOTNET_ROOT and PATH variables for it. I can run ./dotnet in the folder and the dotnet program runs, printing out information. It is installed at /home/username/net5.

Trying to run a published .dll yields the following:

[servername]$ dotnet /home/username/nexusindexer/app/NexusIndexer.dll
Failed to create CoreCLR, HRESULT: 0x80004005

The same problem occurs when doing single-file mode.

To reproduce

I'm not sure how one could reproduce this without logging into the server instance, unfortunately. I have attached information I've seen others post in related closed issues below.

Configuration

Here's the output of export:

[servername]$ export
declare -x COMPlus_EnableDiagnostics="0"
declare -x DOTNET_ROOT="/home/username/net5"
declare -x PATH="/home/username/.gems/bin:/usr/lib/ruby/gems/1.8/bin/:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/home/username/net5"
...

Regression?

I tested .NET Core 3.1.302 and .NET 5.0.1. Both have same exact output. So I don't think it's a regression.

Other information

I ran an strace. It's really long, but I could not really interpret anything that failed in it.

Here's the last 70 lines. I've attached the whole thing as a file.

mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x37aa955b000
mprotect(0x37aa955b000, 4096, PROT_NONE) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x37aa8231fd0}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=0x37aa7ae8220, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x37aa91398a0}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
fcntl(0, F_DUPFD_CLOEXEC, 0)            = 5
fcntl(1, F_DUPFD_CLOEXEC, 0)            = 6
fcntl(2, F_DUPFD_CLOEXEC, 0)            = 7
openat(AT_FDCWD, "/sys/devices/system/node/possible", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 8
fstat(8, {st_mode=S_IFREG|0644, st_size=80875, ...}) = 0
mmap(NULL, 80875, PROT_READ, MAP_PRIVATE, 8, 0) = 0x37aa9539000
close(8)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libnuma.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libnuma.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/libnuma.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libnuma.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
munmap(0x37aa9539000, 80875)            = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 8
fstat(8, {st_mode=S_IFREG|0644, st_size=80875, ...}) = 0
mmap(NULL, 80875, PROT_READ, MAP_PRIVATE, 8, 0) = 0x37aa9539000
close(8)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libnuma.so.1", O_RDONLY|O_CLOEXEC) = 8
read(8, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000.\0\0\0\0\0\0"..., 832) = 832
fstat(8, {st_mode=S_IFREG|0644, st_size=43896, ...}) = 0
mmap(NULL, 2140096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8, 0) = 0x37a26a35000
mprotect(0x37a26a3f000, 2093056, PROT_NONE) = 0
mmap(0x37a26c3e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0x9000) = 0x37a26c3e000
close(8)                                = 0
mprotect(0x37a26c3e000, 4096, PROT_READ) = 0
openat(AT_FDCWD, "/proc/self/status", O_RDONLY) = 8
fstat(8, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(8, "Name:\tdotnet\nUmask:\t0002\nState:\t"..., 1024) = 1024
read(8, "000000,00000000,00000000,0000000"..., 1024) = 308
close(8)                                = 0
openat(AT_FDCWD, "/sys/devices/system/node", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 8
fstat(8, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents(8, /* 10 entries */, 32768)    = 312
openat(AT_FDCWD, "/sys/devices/system/node/node0/meminfo", O_RDONLY) = -1 EACCES (Permission denied)
openat(AT_FDCWD, "/sys/devices/system/node/node1/meminfo", O_RDONLY) = -1 EACCES (Permission denied)
getdents(8, /* 0 entries */, 32768)     = 0
close(8)                                = 0
sched_getaffinity(0, 512, [0, 1, 2, 3, 4, 5, 6, 7]) = 8
openat(AT_FDCWD, "/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 8
fstat(8, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents(8, /* 24 entries */, 32768)    = 688
getdents(8, /* 0 entries */, 32768)     = 0
close(8)                                = 0
openat(AT_FDCWD, "/proc/self/status", O_RDONLY) = 8
fstat(8, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(8, "Name:\tdotnet\nUmask:\t0002\nState:\t"..., 1024) = 1024
read(8, "000000,00000000,00000000,0000000"..., 1024) = 308
read(8, "", 1024)                       = 0
close(8)                                = 0
munmap(0x37aa9539000, 80875)            = 0
get_mempolicy(NULL, NULL, 0, NULL, 0)   = 0
sched_getaffinity(15662, 128, [0, 1, 2, 3, 4, 5, 6, 7]) = 8
membarrier(MEMBARRIER_CMD_QUERY, 0)     = 0x19 (MEMBARRIER_CMD_GLOBAL|MEMBARRIER_CMD_PRIVATE_EXPEDITED|MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
membarrier(MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, 0) = 0
mprotect(0x37aa79a3000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = -1 EACCES (Permission denied)
futex(0x37aa87fb1a0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "Failed to create CoreCLR, HRESUL"..., 45Failed to create CoreCLR, HRESULT: 0x80004005) = 45
write(2, "\n", 1
)                       = 1
munmap(0x37aa7f90000, 2499424)          = 0
exit_group(-2147450743)                 = ?
+++ exited with 137 +++

Full log:
strace.txt

The mprotect call at the end fails, and the grsec kernel page seems to talk about setting memory to read or execute. Could that be killing .NET startup? Another same topic from ages ago is listed here (dotnet/sdk#7204). On my server, it does not appear that it's using PaX (the paxctl commands did not resolve to anything - not sure though...)

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 30, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@jkotas
Copy link
Member

jkotas commented Dec 30, 2020

cc @janvorli

@janvorli
Copy link
Member

@Mgamerz is the error code printed in the Failed to create CoreCLR, HRESULT: 0x80004005 the same for .NET 5.0?

@janvorli
Copy link
Member

I think you are right that the grsec is causing trouble here. From the strace.txt, I can see that the area that we are trying to change to Read+Write+Execute is inside libcoreclr.so and it was mapped as Read+Execute (from the strace.txt):

openat(AT_FDCWD, "/home/micper65/net5/shared/Microsoft.NETCore.App/5.0.1/libcoreclr.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=7109080, ...}) = 0
mmap(NULL, 7320552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x38f728a9000
mmap(0x38f72f29000, 245760, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x67f000) = 0x38f72f29000
mmap(0x38f72f65000, 259048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x38f72f65000
close(3)
.
.
.
mprotect(0x38f72c13000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = -1 EACCES (Permission denied)

The mprotect address is in the range of the first mmap above.

I believe it is this called by this code:

if (!ClrVirtualProtect((void *)JIT_PatchedCodeStart, (BYTE*)JIT_PatchedCodeLast - (BYTE*)JIT_PatchedCodeStart,
PAGE_EXECUTE_READWRITE, &oldProt))

This was tested with grsec in the past and it wasn't causing problems. I wonder if the pax tools could be missing on our build machines. That could result in the reported problem. We intentionally don't fail the build when the paxclt is missing to make the build possible on distros where paxtools are not present.

@Mgamerz would you be able to try to run paxctl -c -m /your/path/to/dotnet (the dotnet being the executable) to see if it fixes the problem?

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Dec 30, 2020
@mangod9 mangod9 added this to the 6.0.0 milestone Dec 30, 2020
@Mgamerz
Copy link
Author

Mgamerz commented Dec 31, 2020

It does not appear that paxctl is installed on the system (or I cannot access the command - it just returns command not found).

[servername]$ paxctl
-bash: paxctl: command not found
[servername]$ which paxctl
[severname]$

I would ask the hosting provider, but they might ask a few questions they won't like the answer to. I doubt I'm supposed to be running .NET on their servers (they have resource limits, but I don't think this app would hit that resource limit as it's very quick to run). I doubt they would want me running anything beyond web pages without using their VPS tier which is significantly more $$$.

The same HRESULT error message occurs on .NET Core 3.1.302 and 5.0.1. I just tested and it's also on 5.0.0. Let me know if there is more commands or investigative work I can do for you.

@Mgamerz
Copy link
Author

Mgamerz commented Dec 31, 2020

Update: I downloaded the paxctl .deb file for Ubuntu 18.04, extracted it, then uploaded the binary onto my server. I ran that command, and now the program appears to work! It seems my webhost does not have paxctl on the system (or is purposely not visible to me). I will need to test a bit more to get it all worked out and setup (right now it just throws the missing parameter messages) but it seems like that fixes it.

Hopefully the administrators don't ask why paxctl is now on the system (:

@janvorli
Copy link
Member

Thank you for checking that! That seems to means that we are missing the paxctl on our build machines.
You can run paxctl on the binary on some other machine and then copy the file to the target one. So you don't need to install it on the server in the hosting.

@Mgamerz
Copy link
Author

Mgamerz commented Dec 31, 2020

Ah so paxctl patches the files? I haven't worked on hardened linux systems before, so even after reading, had little idea what it did.

@janvorli
Copy link
Member

janvorli commented Jan 4, 2021

Yes, it sets some bits in the executable. See http://man.he.net/man1/paxctl

@alexei-matveev
Copy link

alexei-matveev commented Apr 23, 2021

I have a similar Issue, but not because of permissions.
I think inodes are running out after one week of automated PS Skript invokation
a few times a minute. Every script run leaves a UUID-named directory in /tmp.
When it is over "ENOSPC" comes:

mknod("/tmp/clr-debug-pipe-7214-1824840957-out", S_IFIFO|0700) = -1 ENOSPC (No space left on device)

After that even Shell does not start:

        $ pwsh
        Failed to create CoreCLR, HRESULT: 0x80004005

On CentOS 7 pwsh is a symlink ins /opt

        $ rpm -qf /opt/microsoft/powershell/7/pwsh
        powershell-7.1.3-1.rhel.7.x86_64

This is towards the bottom of an strace, look for ENOSPC:

        unlink("/tmp/clr-debug-pipe-7214-1824840957-in") = -1 ENOENT (No such file or directory)
        mknod("/tmp/clr-debug-pipe-7214-1824840957-in", S_IFIFO|0700) = 0
        unlink("/tmp/clr-debug-pipe-7214-1824840957-out") = -1 ENOENT (No such file or directory)
        mknod("/tmp/clr-debug-pipe-7214-1824840957-out", S_IFIFO|0700) = -1 ENOSPC (No space left on device)

At that time there were 14k Entries in /tmp:

        $ ls -1 /tmp/ | wc -l
        14506

Mostly directories names as a5f5772f-2cd0-4e89-b861-89bea736bf54
with timestamps within last week:

        $ ls -lrtd ????????-????-????-????-???????????? | less

After deleting them pwsh runs again.

@alexei-matveev
Copy link

Unfortunately the same behaviour with LTS Version of PowerShell.
This ist the garbage that remains:

        $ find 88a84cf3-05b5-40c3-a29e-52d611e919e7
        88a84cf3-05b5-40c3-a29e-52d611e919e7
        88a84cf3-05b5-40c3-a29e-52d611e919e7/.cache
        88a84cf3-05b5-40c3-a29e-52d611e919e7/.cache/powershell
        88a84cf3-05b5-40c3-a29e-52d611e919e7/.cache/powershell/StartupProfileData-NonInteractive
        88a84cf3-05b5-40c3-a29e-52d611e919e7/.cache/powershell/telemetry.uuid
        88a84cf3-05b5-40c3-a29e-52d611e919e7/.local
        88a84cf3-05b5-40c3-a29e-52d611e919e7/.local/share
        88a84cf3-05b5-40c3-a29e-52d611e919e7/.local/share/powershell
        88a84cf3-05b5-40c3-a29e-52d611e919e7/.local/share/powershell/Modules

It was powershell-lts-7.0.6-1.rhel.7.x86_64 this time.
You cannot fool it by setting TMPDIR to a non-existant path:

        $ TMPDIR=/XXX pwsh
        Failed to create CoreCLR, HRESULT: 0x80004005

@janvorli
Copy link
Member

It seems this is likely a powershell issue rather than a .NET one. .NET does not create files like this. Can you please report this issue in the https://github.com/PowerShell/PowerShell repo?

@ghost
Copy link

ghost commented Oct 9, 2021

This issue has been automatically marked no recent activity because it has been marked as needs more info but has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no recent activity.

Please refer to our contribution guidelines for tips on what information might be required.

@ghost
Copy link

ghost commented Nov 5, 2021

This issue will now be closed since it had been marked no recent activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.

@ghost ghost closed this as completed Nov 5, 2021
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 5, 2021
@eiriktsarpalis eiriktsarpalis added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 19, 2022
@ghost ghost removed the no-recent-activity label Jan 19, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-PAL-coreclr needs-author-action An issue or pull request that requires more info or actions from the author.
Projects
None yet
Development

No branches or pull requests

7 participants