Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using dotnet-gcdump on Android device #4348

Closed
jonathanpeppers opened this issue Oct 20, 2023 · 10 comments · Fixed by #4354
Closed

Error using dotnet-gcdump on Android device #4348

jonathanpeppers opened this issue Oct 20, 2023 · 10 comments · Fixed by #4354
Labels
bug Something isn't working

Comments

@jonathanpeppers
Copy link
Member

Description

This is working for some apps, but I'm trying to grab a GC dump of a test application where I've reproduced a memory leak in MAUI:

dotnet/maui@main...jonathanpeppers:maui:CarouselViewLeaks

When I run dotnet-gcdump I get:

> dotnet-gcdump collect -p 23492 -v
Writing gcdump to 'D:\src\20231020_103523_23492.gcdump'...
  0.0s: Creating type table flushing task
  0.0s: Flushing the type table
  0.1s: Done flushing the type table
  0.1s: Requesting a .NET Heap Dump
  0.1s: gcdump EventPipe Session started
  0.2s: Starting to process events
  0.2s: .NET Dump Started...
Process wildcard selects process id 27299
Found a Gen2 Induced non-background GC Start at 2.042 msec GC Count 2
  0.7s: Making GC Heap Progress...
  0.7s: .NET GC Complete.
Found a GC Stop at 457.149 for GC 2, ignoring events from now on.
  0.7s: Shutting down gcdump EventPipe session
  0.8s: gcdump EventPipe session shut down
  0.8s: [Error] Exception during gcdump: System.AggregateException: One or more errors occurred. (Arithmetic operation resulted in an overflow.)
 ---> System.OverflowException: Arithmetic operation resulted in an overflow.
   at DotNetHeapDumpGraphReader.<>c__DisplayClass8_0.<SetupCallbacks>b__0(ModuleLoadUnloadTraceData data) in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/DotNetHeapDumpGraphReader.cs:line 115
   at Microsoft.Diagnostics.Tracing.Parsers.Clr.ModuleLoadUnloadTraceData.Dispatch()
   at Microsoft.Diagnostics.Tracing.TraceEventDispatcher.DoDispatch(TraceEvent anEvent)
   at Microsoft.Diagnostics.Tracing.TraceEventDispatcher.Dispatch(TraceEvent anEvent)
   at Microsoft.Diagnostics.Tracing.EventPipeEventSource.DispatchEventRecord(EVENT_RECORD* eventRecord)
   at Microsoft.Diagnostics.Tracing.EventPipeEventSource.EventCache_OnEvent(EventPipeEventHeader& header)
   at Microsoft.Diagnostics.Tracing.EventPipe.EventCache.SortAndDispatch(Int64 stopTimestamp)
   at Microsoft.Diagnostics.Tracing.EventPipe.EventCache.ProcessSequencePointBlock(Byte[] sequencePointBytes)
   at Microsoft.Diagnostics.Tracing.EventPipeSequencePointBlock.ReadBlockContents(PinnedStreamReader reader)
   at Microsoft.Diagnostics.Tracing.EventPipeBlock.FromStream(Deserializer deserializer)
   at FastSerialization.Deserializer.ReadObjectDefinition(Tags tag, StreamLabel objectLabel)
   at FastSerialization.Deserializer.ReadObject()
   at Microsoft.Diagnostics.Tracing.EventPipeEventSource.Process()
   at Microsoft.Diagnostics.Tools.GCDump.EventPipeDotNetHeapDumper.<>c__DisplayClass3_1.<DumpFromEventPipe>b__4() in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/EventPipeDotNetHeapDumper.cs:line 231
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at Microsoft.Diagnostics.Tools.GCDump.EventPipeDotNetHeapDumper.DumpFromEventPipe(CancellationToken ct, Int32 processId, String diagnosticPort, MemoryGraph memoryGraph, TextWriter log, Int32 timeout, DotNetHeapInfo dotNetInfo) in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/EventPipeDotNetHeapDumper.cs:line 274
[  0.9s: Done Dumping .NET heap success=True]
[ERROR] System.ApplicationException: RootIndex not set.
   at Graphs.Graph.AllowReading() in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/Graph.cs:line 241
   at Graphs.MemoryGraph.AllowReading() in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/MemoryGraph.cs:line 100
   at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.TryCollectMemoryGraph(CancellationToken ct, Int32 processId, String diagnosticPort, Int32 timeout, Boolean verbose, MemoryGraph& memoryGraph) in /_/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 130
   at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.<>c__DisplayClass1_0.<Collect>b__0() in /_/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 83
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.Collect(CancellationToken ct, IConsole console, Int32 processId, String output, Int32 timeout, Boolean verbose, String name, String diagnosticPort) in /_/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 92

Using a local, debug build of dotnet-gcdump, I get a different message:

> dotnet run -- collect -p 23492 -v
Writing gcdump to 'D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump\20231020_103656_23492.gcdump'...
  0.0s: Creating type table flushing task
  0.0s: Flushing the type table
Process terminated. Assertion failed.
   at Microsoft.Diagnostics.NETCore.Client.EventPipeSession.TryCreateStopMessage(IpcMessage& stopMessage) in D:\src\dotnet\diagnostics\src\Microsoft.Diagnostics.NETCore.Client\DiagnosticsClient\EventPipeSession.cs:line 110
   at Microsoft.Diagnostics.NETCore.Client.EventPipeSession.Stop() in D:\src\dotnet\diagnostics\src\Microsoft.Diagnostics.NETCore.Client\DiagnosticsClient\EventPipeSession.cs:line 50
   at Microsoft.Diagnostics.Tools.GCDump.EventPipeSessionController.EndSession() in D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump\DotNetHeapDump\EventPipeDotNetHeapDumper.cs:line 369
   at Microsoft.Diagnostics.Tools.GCDump.EventPipeDotNetHeapDumper.<>c__DisplayClass3_2.<DumpFromEventPipe>b__7() in D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump\DotNetHeapDump\EventPipeDotNetHeapDumper.cs:line 145
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(Thread threadPoolThread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()

I believe I have dotnet-dsrouter setup correctly, as dotnet-trace can connect successfully:

> dotnet-trace collect -p 23492
No profile or providers specified, defaulting to trace profile 'cpu-sampling'

Provider Name                           Keywords            Level               Enabled By
Microsoft-DotNETCore-SampleProfiler     0x0000F00000000000  Informational(4)    --profile
Microsoft-Windows-DotNETRuntime         0x00000014C14FCCBD  Informational(4)    --profile

Process        : D:\src\dotnet\diagnostics\artifacts\bin\dotnet-dsrouter\Debug\net6.0\dotnet-dsrouter.exe
Output File    : D:\src\dotnet-dsrouter.exe_20231020_103806.nettrace

[00:00:00:04]   Recording trace 73.728   (KB)
Press <Enter> or <Ctrl+C> to exit...

Configuration

> dotnet --version
8.0.100-rc.2.23502.2

Regression?

No, this is newly implemented in .NET 8.

/cc @lateralusX

@jonathanpeppers jonathanpeppers added the bug Something isn't working label Oct 20, 2023
@jonathanpeppers
Copy link
Member Author

This appears to be working fine for an Android emulator:

> dotnet-gcdump collect -p 28660
Writing gcdump to 'D:\src\20231020_104922_28660.gcdump'...
        Finished writing 2778992 bytes.

So must be related to device?

@lateralusX
Copy link
Member

lateralusX commented Oct 23, 2023

@jonathanpeppers Yes, we seen similar issue where the diagnostics tooling made incorrect assumption of sizes on some fields returned back from runtime in IPC messages, in this case its process id handled as an int, and it is typed as a pid_t that should map int, but maybe Android running on device uses the upper bit in pid_t that cause side effects. The pid field in the trace event is parsed as a 4 byte int, but it is parsed using unsafe code that doesn't use overflow checks. Most of the diagnostic tooling from dotnet/diagnostics on the other hand build with overflow checks as default, so I assume it will return an int with high bit set and then when it is used in code with overflow check this is detected and you get an overflow exception.

The assert you get is something that I fixed end of last week and its caused by incorrect size of session id, #4345, but it didn't have any runtime consequences, just firring an assert in debug builds.

Android running on device tends to put allocated memory on higher addresses and is probably related to this, https://source.android.com/docs/security/test/tagged-pointers. It is default since level 30 on arm64 hardware, but it can be disabled if it cause issues.

Will look a little deeper to see how we best can mitigate this issue.

@lateralusX
Copy link
Member

lateralusX commented Oct 23, 2023

@jonathanpeppers Would it be possible for you to do a gc dump through dotnet-trace targeting the device that you get the error and attach the trace to this issue? Then I will be able to load that exact trace into dotnet-gcdump and debug it "live".

You collect the same trace as dotnet-gcdump does by running:

dotnet-trace collect -p <pid of dsrouter> --providers Microsoft-Windows-DotNETRuntime:0x1980001:5

You will see that dotnet-trace writes content into the file and then it will wait for you to stop the session by pressing enter.

You can validate that the generated .nettrace file give you the same error by running:

dotnet-gcdump convert <path to generated .nettrace file>

That should give you the same error and if it does, please share the .nettrace file and I can debug it to detect more details about the exact format of the process id.

@jonathanpeppers
Copy link
Member Author

Yes, it appears to be the same error:

> dotnet-gcdump convert D:\src\dotnet-dsrouter.exe_20231023_084435.nettrace
Writing gcdump to 'D:\src\dotnet-dsrouter.exe_20231023_084435.gcdump'...
Unhandled exception: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> System.ApplicationException: RootIndex not set.
   at Graphs.Graph.AllowReading() in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/Graph.cs:line 241

dotnet-dsrouter.exe_20231023_084435.nettrace.zip

@lateralusX
Copy link
Member

Thanks, looks like the first exception trace line numbers was a little of, so its not the process id, but the module id, its typed as uint64 in the event but dotnet trace library handles it as a long that in turn is casted to a ulong and since the diagnostic tooling is build using overflow checking by default, it will fail to do the explicit cast. Unfortunately we can't change Microsoft.Diagnostics.Tracing.Parsers.Clr.ModuleLoadUnloadTraceData.ModuleID since its a public property, so we will need to temporary disable the checked cast where this property is used.

@lateralusX
Copy link
Member

Fixing the cast to be an unchecked cast made dotnet-gcdump capable to parse the nettrace file into a gcdump file:

image

@lateralusX
Copy link
Member

@jonathanpeppers Possible to take #4354 for a test drive?

@jonathanpeppers
Copy link
Member Author

I tried using dotnet-gcdump directly w/ these changes, and it is close but runs into a second issue:

# Just to compare, the released, 7.0.447801 version
> dotnet-gcdump collect -p 6432
Writing gcdump to 'D:\src\20231023_104524_6432.gcdump'...
[ERROR] System.ApplicationException: RootIndex not set.
   at Graphs.Graph.AllowReading() in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/Graph.cs:line 241
   at Graphs.MemoryGraph.AllowReading() in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/MemoryGraph.cs:line 100
   at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.TryCollectMemoryGraph(CancellationToken ct, Int32 processId, String diagnosticPort, Int32 timeout, Boolean verbose, MemoryGraph& memoryGraph) in /_/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 130
   at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.<>c__DisplayClass1_0.<Collect>b__0() in /_/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 83
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.Collect(CancellationToken ct, IConsole console, Int32 processId, String output, Int32 timeout, Boolean verbose, String name, String diagnosticPort) in /_/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 92
# Local build
D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump [lateralusX/fix-moduleid-cast]> dotnet run -- collect -p 6432
Writing gcdump to 'D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump\20231023_104605_6432.gcdump'...
Process terminated. Assertion failed.
   at Microsoft.Diagnostics.NETCore.Client.EventPipeSession.TryCreateStopMessage(IpcMessage& stopMessage) in D:\src\dotnet\diagnostics\src\Microsoft.Diagnostics.NETCore.Client\DiagnosticsClient\EventPipeSession.cs:line 110
   at Microsoft.Diagnostics.NETCore.Client.EventPipeSession.Stop() in D:\src\dotnet\diagnostics\src\Microsoft.Diagnostics.NETCore.Client\DiagnosticsClient\EventPipeSession.cs:line 50
   at Microsoft.Diagnostics.Tools.GCDump.EventPipeSessionController.EndSession() in D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump\DotNetHeapDump\EventPipeDotNetHeapDumper.cs:line 369
   at Microsoft.Diagnostics.Tools.GCDump.EventPipeDotNetHeapDumper.<>c__DisplayClass3_2.<DumpFromEventPipe>b__7() in D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump\DotNetHeapDump\EventPipeDotNetHeapDumper.cs:line 145
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(Thread threadPoolThread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()

The convert step works, though:

D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump [lateralusX/fix-moduleid-cast]> dotnet run -- convert D:\src\dotnet-dsrouter.exe_20231023_084435.nettrace
Writing gcdump to 'D:\src\dotnet-dsrouter.exe_20231023_084435.gcdump'...

So, this gets partway there, but maybe there is one further problem?

@lateralusX
Copy link
Member

lateralusX commented Oct 23, 2023

@jonathanpeppers. that should be a debug build only issue and fixed by #4345. So either build and run in release or apply that patch to fix the assert in debug build.

@jonathanpeppers
Copy link
Member Author

Ok, yes adding -c Release works:

D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump [lateralusX/fix-moduleid-cast]> dotnet run -c Release -- collect -p 6432
Writing gcdump to 'D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump\20231023_105821_6432.gcdump'...
        Finished writing 492293 bytes.

hoyosjs pushed a commit that referenced this issue Oct 23, 2023
… really a ulong member. (#4354)

The ModuleLoadUnload events ModuleID is typed as a uint64 in the
EventPipe manifest and emitted as a uint64 in the event payload.
However, the parsing logic in ModuleLoadUnloadTraceDataevent in trace
event:


https://github.com/microsoft/perfview/blob/a6c87911fe1aef8f59c9ce54aa4e16a1be6db91e/src/TraceEvent/Parsers/ClrEtwAll.cs.base#L4963

handles it as a long. Android devices running arm64 can use pointer
tagging meaning that high bits can be set in 64-bit addresses and since
module id is a memory address, it will be returned as a negative long
from ModuleLoadUnload event and since all diagnostic tooling is build
with overflow checking enabled by default, casting it to a ulong will
trigger and overflow exception when high bit is set.

Fix is to do an unchecked cast in this case since the value should be
threated as a ulong in first place.

Fixes #4348.
@ghost ghost locked as resolved and limited conversation to collaborators Nov 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
2 participants