Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/dump fails with "WriteDumpAsync failed - HRESULT: 0x00000000" #1783

Closed
plaisted opened this issue Apr 20, 2022 · 8 comments
Closed

/dump fails with "WriteDumpAsync failed - HRESULT: 0x00000000" #1783

plaisted opened this issue Apr 20, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@plaisted
Copy link

Description

When attempting to take a full dump by navigating to /dump endpoint (with defaults) the following error message is returned:

{"status":400,"detail":"WriteDumpAsync failed - HRESULT: 0x00000000"}

Only mention I see of this error is in #1216 but I've verified folder exists and is writable:

# on sidecar
            - name: DOTNETMONITOR_Storage__DumpTempFolder
              value: /dumps
# on both sidecar and app
            - mountPath: /dumps
              name: dumpsvol

Configuration

Pod with process running RHEL UBI8 (registry.access.redhat.com/ubi8/dotnet-60-runtime:6.0-5)

Dotnet --info
  Version: 6.0.2
  Commit:  839cdfb0ec

Sidecar (ms provided container)
/info:

{"version":"6.1.1+17a566bc3228b2ea2a9bfa6d75423637c340943a","runtimeVersion":"6.0.4","diagnosticPortMode":"Connect","diagnosticPortName":null}

Configured with defaults other than "--no-auth" arg and DOTNETMONITOR_Urls and DOTNETMONITOR_Storage__DumpTempFolder env vars set.

Other information

  • /trace & /gcdump work without issue.
  • I've tried adding SYS_PTRACE capability to both and shareProcessNamespace as well since I saw those mentioned occasionally but same error occurred. I think those were for older dotnet dump versions anyway.
@plaisted plaisted added the bug Something isn't working label Apr 20, 2022
@plaisted
Copy link
Author

relevant sidecar log:

{"Timestamp":"2022-04-20T18:11:05.7739180Z","EventId":1,"LogLevel":"Error","Category":"Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagController","Message":"Request failed.","Exception":"Microsoft.Diagnostics.NETCore.Client.ServerErrorException: WriteDumpAsync failed - HRESULT: 0x00000000    at Microsoft.Diagnostics.NETCore.Client.DiagnosticsClient.ValidateResponseMessage(IpcMessage responseMessage, String operationName, ValidateResponseOptions options)    at Microsoft.Diagnostics.NETCore.Client.DiagnosticsClient.WriteDumpAsync(DumpType dumpType, String dumpPath, Boolean logDumpGeneration, CancellationToken token)    at Microsoft.Diagnostics.Monitoring.WebApi.DumpService.DumpAsync(IEndpointInfo endpointInfo, DumpType mode, CancellationToken token) in /_/src/Microsoft.Diagnostics.Monitoring.WebApi/DumpService.cs:line 77    at Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagController.\u003C\u003Ec__DisplayClass15_0.\u003C\u003CCaptureDump\u003Eb__0\u003Ed.MoveNext() in /_/src/Microsoft.Diagnostics.Monitoring.WebApi/Controllers/DiagController.cs:line 216 --- End of stack trace from previous location ---    at Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagController.\u003C\u003Ec__DisplayClass31_0.\u003C\u003CInvokeForProcess\u003Eb__0\u003Ed.MoveNext() in /_/src/Microsoft.Diagnostics.Monitoring.WebApi/Controllers/DiagController.cs:line 680 --- End of stack trace from previous location ---    at Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagController.\u003C\u003Ec__DisplayClass33_0\u00601.\u003C\u003CInvokeForProcess\u003Eb__0\u003Ed.MoveNext() in /_/src/Microsoft.Diagnostics.Monitoring.WebApi/Controllers/DiagController.cs:line 712 --- End of stack trace from previous location ---    at Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagControllerExtensions.InvokeService[T](ControllerBase controller, Func\u00601 serviceCall, ILogger logger) in /_/src/Microsoft.Diagnostics.Monitoring.WebApi/Controllers/DiagControllerExtensions.cs:line 57","State":{"Message":"Request failed.","{OriginalFormat}":"Request failed."},"Scopes":[{"Message":"SpanId:e67598eb6a7d0998, TraceId:67bda82851289b252c2915565df8ca04, ParentId:0000000000000000","SpanId":"e67598eb6a7d0998","TraceId":"67bda82851289b252c2915565df8ca04","ParentId":"0000000000000000"},{"Message":"ConnectionId:0HMH2V000RF45","ConnectionId":"0HMH2V000RF45"},{"Message":"RequestPath:/dump RequestId:0HMH2V000RF45:00000002","RequestId":"0HMH2V000RF45:00000002","RequestPath":"/dump"},{"Message":"Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagController.CaptureDump (Microsoft.Diagnostics.Monitoring.WebApi)","ActionId":"4228df93-4446-4861-9999-4af625ba6659","ActionName":"Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagController.CaptureDump (Microsoft.Diagnostics.Monitoring.WebApi)"},{"Message":"ArtifactType:dump","ArtifactType":"dump"}]}

@jander-msft
Copy link
Member

@dotnet/dotnet-diag, could someone help investigate this issue? The diagnostic command response says it failed but HRESULT is S_OK. Relevant line where this exception occurs in the client library: https://github.com/dotnet/diagnostics/blob/028e7abb5c46a085f3abf0a2080850b4f3e2b11a/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/DiagnosticsClient.cs#L574

@tommcdon
Copy link
Member

Thank you @jander-msft
Hi @plaisted, do you know if dotnet-dump running in the RHEL pod also fails with a similar error? Dotnet-dump uses the same diagnostics IPC channel that dotnet-monitor uses for triggering dumps, so my suspicion is that it would reproduce the same error. If so would you mind trying to run createdump with -d and -v in the pod to see if there is some useful error messages? Note that better diagnostic messages were added to .NET 7 using DOTNET_CreateDumpVerboseDiagnostics - please see dotnet/runtime@14e6a41.

@plaisted
Copy link
Author

Thank you @tommcdon. createdump gave a much more useful error, appears to be a memory permission issue:

createdump output

bash-4.4$ ls /tmp
clr-debug-pipe-8-369380096-in   dotnet-diagnostic-8-369380096-socket
clr-debug-pipe-8-369380096-out  system-commandline-sentinel-files
bash-4.4$ ./createdump -d -v 8
open(/proc/8/mem) FAILED 13 (Permission denied)
bash-4.4$ whoami
default
bash-4.4$ ls /proc/8/mem -l
-rw------- 1 default root 0 Apr 21 14:22 /proc/8/mem

I verified pod had SYS_PTRACE capability:

  securityContext:
    capabilities:
      add:
      - SYS_PTRACE

I also tried adding

seccompProfile:
  type: Unconfined

explicitly but no change. I see similar issues at dotnet/runtime#13687 but appears they were resolved by adding the security context I tried adding.

I'm not very familiar with linux memory permissions / kubernetes, are there other requirements? I didn't see anything in examples / documentation. Considering this works locally running under WSL2/podman I may just need to open a ticket with our kubernetes vendor (vmware).

@plaisted
Copy link
Author

I've worked through the initial issue (hosts had /proc/sys/kernel/yama/ptrace_scope at 1) but now getting:

[createdump] 00007fffc0359000 - 00007fffc037a000 (000021) 0000000000000000 rw--p- 26 [stack]
[createdump] 00007fffc03d9000 - 00007fffc03dc000 (000003) 0000000000000000 r---p- 24 [vvar]
[createdump] 00007fffc03dc000 - 00007fffc03de000 (000002) 0000000000000000 r-x-p- 25 [vdso]
[createdump] ffffffffff600000 - ffffffffff601000 (000001) 0000000000000000 r-x-p- 25 [vsyscall]
[createdump] EnumerateElfInfo: phdr 0x55f363314040 phnum 12
[createdump] ReadProcessMemory FAILED, addr: 000055f363314040, size: 56, ERRNO 38: Function not implemented
[createdump] ERROR: ReadMemory(0x55f363314040, 38) phdr FAILED

Hosts are running on photon os, are there any details on what linux functionality is missing / not implemented?

@noahfalk
Copy link
Member

The failure is likely coming from this code, so either a call to pread64 or a call to process_vm_readv. .NET doesn't include PhotonOS as a supported distribution as far as I am aware, but we can often accept a PR if you want to do the work to make some additional accomodations.

@hoyosjs
Copy link
Member

hoyosjs commented Apr 26, 2022

This looks like dotnet/runtime#67544

@plaisted
Copy link
Author

I'll close this, appears related to linked fix (or other photon os specific issue) and not specific to dotnet monitor tooling itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants