Test failure: tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.sh #35884

v-haren · 2020-05-06T06:49:30Z

failed in job: runtime-coreclr gcstress0x3-gcstress0xc 20200503.1

Error message

Process terminated. A callback was made on a garbage collected delegate of type 'System.Private.CoreLib!Interop+Advapi32+EtwEnableCallback::Invoke'.
/home/helixbot/work/A9DD08E1/w/A2D60904/e/tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.sh: line 312: 9131 Aborted $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"

Return code: 1
Raw output file: /home/helixbot/work/A9DD08E1/w/A2D60904/e/tracing/eventpipe/Reports/tracing.eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.output.txt
Raw output:
BEGIN EXECUTION
/home/helixbot/work/A9DD08E1/p/corerun eventsourceerror.dll ''
 0.0s: ==TEST STARTING==
 2.6s: Validating clean environment...
 10.3s: Environment was clean.
 11.0s: Started sending sentinel events...
 11.5s: Connecting to EventPipe...
 22.0s: Connected to EventPipe with sessionID '0x7f59bc000fb0'
 22.0s: Creating EventPipeEventSource...
 24.8s: EventPipeEventSource created
 26.0s: Dynamic.All callback registered
 26.0s: Running optional trace validator
 26.1s: Finished running optional trace validator
 26.1s: Starting stream processing...
 31.7s: Saw sentinel event
 31.7s: Starting event generating action...
 31.7s: Stopped sending sentinel events
 36.5s: Stopping event generating action
 36.9s: Sending StopTracing command...
 38.4s: Saw new provider 'IllegalTypesEventSource'
 39.2s: Saw new provider 'Microsoft-DotNETCore-EventPipe'
 39.5s: Saw new provider 'Microsoft-Windows-DotNETRuntimeRundown'
Writing minidump to file /home/helixbot/dotnetbuild/dumps/coredump.9131.dmp
Written 105562112 bytes (25772 pages) to core file
Dump successfully written
Expected: 100
Actual: 134
END EXECUTION - FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=/home/helixbot/work/A9DD08E1/p
> /home/helixbot/work/A9DD08E1/w/A2D60904/e/tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.sh
Expected: True
Actual: False


Stack trace
   at tracing_eventpipe._eventsourceerror_eventsourceerror_eventsourceerror_._eventsourceerror_eventsourceerror_eventsourceerror_sh() in /__w/1/s/artifacts/tests/coreclr/Linux.x64.Checked/TestWrappers/tracing.eventpipe/tracing.eventpipe.XUnitWrapper.cs:line 237

The text was updated successfully, but these errors were encountered:

ghost · 2020-05-06T06:49:32Z

Tagging subscribers to this area: @Maoni0
Notify danmosemsft if you want to be subscribed.

ghost · 2020-05-06T06:52:37Z

Tagging subscribers to this area: @tommcdon
Notify danmosemsft if you want to be subscribed.

tommcdon · 2020-05-06T16:26:52Z

@josalem

josalem · 2020-05-06T16:45:51Z

I'll take a look!

josalem · 2020-05-06T16:58:39Z

CC @davmason

It looks like this test has been failing under gcstress since it went in with #35334. It fails across all configurations except Linux arm32 gcstress=0x3 (at least on the linked run). I'm not sure why it's happening, but my hunch is that is has to do with the lifetime of the delegate in the failure path for the EventSource constructor now that EventSource tries to send messages across EventPipe for that path.

davmason · 2020-05-06T18:49:15Z

I'll take a look

davmason · 2020-05-07T07:43:41Z

So, this turned in to way more of an involved investigation than I thought. This turned out to not be a test issue but exposed a fundamental problem with the way EventSources interact with EventPipe.

An EventSource will call dispose on its underlying providers when it is finalized or Dispose is called, and in turn the EventPipeProvider will call EventUnregister which calls EventPipeInternal::DeleteProvider.

Up until this point everything is ok, but inside of EventPipe::DeleteProvider there is a check for Enabled(), which checks if there are any active sessions, and if so will put the providers in a deferred delete state.

runtime/src/coreclr/src/vm/eventpipe.cpp

Lines 616 to 627 in c2b1018

    
           if (pProvider != NULL) 
        
           { 
        
               if (Enabled()) 
        
               { 
        
                   // Save the provider until the end of the tracing session. 
        
                   pProvider->SetDeleteDeferred(); 
        
               } 
        
               else 
        
               { 
        
                   // Delete the provider now. 
        
                   s_config.DeleteProvider(pProvider); 
        
               }

Which means if any other provider is active when the EventSource tries to delete itself from EventPipe, it will get placed in the deferred list. For example, in this test Microsoft-Windows-DotNETRuntime is also active. This is fundamentally flawed for interacting with managed providers, because as soon as they are disposed/finalized the EventProvider's m_etwCallback will be collected and later when EventPipe goes to delete the deferred providers we will attempt to reverse pinvoke to a collected delegate.

I suspect this has gone undetected for so long because we don't have a ton of testing around EventSources and how they interact with EventPipe, and likely the tests that do test the scenario just happen to be lucky and the delegate doesn't get collected in the timeframe. Even this test passes fine under normal circumstances and it takes GCStress to fail.

@noahfalk @josalem @sywhang - I don't fully understand the need to keep providers around until tracing stops. I see it was added in this PR: dotnet/coreclr#11651, but I don't see anything that even attempts to root the EventSource/EventPipeProvider managed objects. So we either need to root those objects, or if the native provider is all we need then we should not attempt to do the reverse pinvoke during deferred deletion.

Tracked by dotnet#35884

BruceForstall · 2020-06-28T20:21:01Z

This is still failing in the GCStress runs. I submitted PR #38516 to disable it for GCStress while this bug is open.

Tracked by #35884

Tracked by dotnet/runtime#35884

v-haren added GCStress area-GC-coreclr labels May 6, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label May 6, 2020

Maoni0 added the area-Diagnostics-coreclr label May 6, 2020

Maoni0 removed the area-GC-coreclr label May 6, 2020

tommcdon removed the untriaged New issue has not been triaged by the area owner label May 6, 2020

tommcdon added this to Needs Triage in .NET Core Diagnostics via automation May 6, 2020

tommcdon added this to the 5.0 milestone May 6, 2020

davmason mentioned this issue May 7, 2020

EventPipe caches EventSource provider callback which can cause an exception on EventPipe shutdown #36028

Closed

AndyAyersMS mentioned this issue Jun 5, 2020

Test failure: tracing\\eventpipe\\eventsourceerror\\eventsourceerror\\eventsourceerror.cmd #37238

Closed

BruceForstall added a commit to BruceForstall/runtime that referenced this issue Jun 28, 2020

Disable eventsourceerror for GCStress due to test failure

acf035a

Tracked by dotnet#35884

BruceForstall mentioned this issue Jun 28, 2020

Disable eventsourceerror for GCStress due to test failure #38516

Merged

BruceForstall added a commit that referenced this issue Jun 28, 2020

Disable eventsourceerror for GCStress due to test failure (#38516)

4ef07c2

Tracked by #35884

tommcdon added the tracking This issue is tracking the completion of other related issues. label Jul 8, 2020

tommcdon modified the milestones: 5.0.0, 6.0.0 Jul 8, 2020

kevinwkt pushed a commit to kevinwkt/runtimelab that referenced this issue Jul 15, 2020

Disable eventsourceerror for GCStress due to test failure (#38516)

20c83c6

Tracked by dotnet/runtime#35884

davmason mentioned this issue Jul 31, 2020

Disable GCStress for profiler tests #40171

Merged

davmason closed this as completed in #40171 Jul 31, 2020

.NET Core Diagnostics automation moved this from Needs Triage to Done Jul 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test failure: tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.sh #35884

Test failure: tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.sh #35884

v-haren commented May 6, 2020

ghost commented May 6, 2020

ghost commented May 6, 2020

tommcdon commented May 6, 2020

josalem commented May 6, 2020

josalem commented May 6, 2020

davmason commented May 6, 2020

davmason commented May 7, 2020

BruceForstall commented Jun 28, 2020

Test failure: tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.sh #35884

Test failure: tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.sh #35884

Comments

v-haren commented May 6, 2020

ghost commented May 6, 2020

ghost commented May 6, 2020

tommcdon commented May 6, 2020

josalem commented May 6, 2020

josalem commented May 6, 2020

davmason commented May 6, 2020

davmason commented May 7, 2020

BruceForstall commented Jun 28, 2020