Fixing MetricEventSource tests #55385

noahfalk · 2021-07-09T06:26:15Z

I've seen two recent non-deterministic failures:

After starting the EventListener there is a delay of one collection
interval (currently 0.3 seconds) where we expect calls to counter.Add
and Histogram.Record() to update stats. On a fast machine this code
would run in well under a millisecond but in some test runs it still wasn't
happening fast enough.
We were seeing error events from a previous test get observed in a
later test because session id was being ignored for error events.

Fixes:

Added an initial collection interval where no counter APIs will be
invoked and the test will delay up to 5 seconds waiting for this.
Hopefully this delay makes it more likely that the Add/Record APIs
are ready to execute promptly once the 2nd interval begins. If that
still isn't sufficient we can either increase the collection intervals or
disable the tests.
Tightened up the session id matching so only error events with
empty id or the expected id are allowed.

Fixes #55315 and #55313 (I hope)

I've seen two recent non-deterministic failures: 1. After starting the EventListener there is a delay of one collection interval (currently 0.3 seconds) where we expect calls to counter.Add and Histogram.Record() to update stats. On a fast machine this code would run in under a millisecond but in some test runs it still wasn't happening fast enough. 2. We were seeing error events from a previous test get observed in a later test because session id was being ignored for error events. Fixes: 1. Added an initial collection interval where no counter APIs will be invoked and the test will delay up to 5 seconds waiting for this. Hopefully this delay makes it more likely that the Add/Record APIs are ready to execute promptly once the 2nd interval begins. If that still isn't sufficient we can either increase the collection intervals or disable the tests. 2. Tightened up the session id matching so only error events with empty id or the expected id are allowed.

noahfalk · 2021-07-09T06:27:15Z

@tarekgh
cc @dotnet/dotnet-diag

stephentoub · 2021-07-09T14:11:36Z

Added an initial collection interval where no counter APIs will be
invoked and the test will delay up to 5 seconds waiting for this.
Hopefully this delay makes it more likely that the Add/Record APIs
are ready to execute promptly once the 2nd interval begins. If that
still isn't sufficient we can either increase the collection intervals or
disable the tests.

Am I understanding correctly that we expect these events to occur and if they don't the test is meant to fail? If that's the case, I recommend using a much larger timeout than 5 seconds. Elsewhere we generally use at least 30 if not 60 seconds in such cases: such values are much more tolerant of load in CI, you only end up paying the full time if the test is going to fail anyway, and if the event occurs later in that timeframe, it's better to have spent the extra time to make the test succeed rather than having flaky tests.

(I also recommend putting it into a SuccessTimeout constant such that the value is centralized to one updateable location.)

noahfalk · 2021-07-09T23:43:43Z

Am I understanding correctly that we expect these events to occur and if they don't the test is meant to fail?

Yep, that is correct. I can change 5 sec -> 60 sec.

tarekgh · 2021-07-10T01:06:01Z

src/libraries/System.Diagnostics.DiagnosticSource/tests/MetricEventSourceTests.cs


        public MetricEventSourceTests(ITestOutputHelper output)
        {
            _output = output;
        }

        [ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.IsNotBrowser))]
+        [OuterLoop("Slow and has lots of console spew")]


OuterLoop

Thanks for adding this :-)

tarekgh · 2021-07-10T01:07:07Z

I am seeing you have added outer loop attribute to some test, do you want enable CI run the outer loops tests?

tarekgh · 2021-07-10T01:11:55Z

@noahfalk do you think this may help with #55313 too?

tarekgh

LGTM. I hope you enable the outerloop tests before you merge to ensure running the touched tests. Thanks

noahfalk · 2021-07-10T08:57:07Z

I am seeing you have added outer loop attribute to some test, do you want enable CI run the outer loops tests?

I don't care if every PR runs them, as long as they run periodically. Were you saying that as-is the automation will never run them (in which case I want to change it), or that they won't run on every PR (which I am fine with)?

noahfalk · 2021-07-10T09:00:46Z

@noahfalk do you think this may help with #55313 too?

Yes I am hopeful it will resolve both of the issues. I edited the issue description at the top to include it. Thanks!

stephentoub · 2021-07-10T13:44:09Z

Outerloop doesn't run on any PR by default; you have to ask the bot to run the legs via a comment on a PR. Outerloop does run on various rolling builds daily.

stephentoub · 2021-07-10T13:44:45Z

/azp list

azure-pipelines · 2021-07-10T13:44:50Z

CI/CD Pipelines for this repository: coreclr-gc-longrunning coreclr-gc-simulator runtime-coreclr outerloop runtime-coreclr jitstress runtime-coreclr jitstressregs runtime-coreclr jitstress2-jitstressregs runtime-coreclr gcstress0x3-gcstress0xc runtime-coreclr gcstress-extra runtime-coreclr r2r-extra runtime-coreclr jitstress-isas-x86 runtime-coreclr jitstress-isas-arm runtime-coreclr jitstressregs-x86 runtime-coreclr libraries-jitstressregs runtime-coreclr libraries-jitstress2-jitstressregs runtime-coreclr r2r runtime-coreclr runincontext runtime-coreclr crossgen2 runtime-libraries-coreclr outerloop runtime-libraries-coreclr outerloop-windows runtime-libraries-coreclr outerloop-linux runtime-libraries-coreclr outerloop-osx runtime runtime-libraries enterprise-linux runtime-libraries stress-http runtime-libraries stress-ssl runtime-dev-innerloop runtime-coreclr crossgen2 outerloop coreclr-release-outerloop-nightly sync-runtime-to-mono runtime-coreclr crossgen2-composite runtime-jit-experimental runtime-coreclr libraries-jitstress dotnet-linker-tests runtime-coreclr ilasm runtime-coreclr crossgen2-composite gcstress runtime-libraries-mono outerloop runtime-staging runtime-coreclr pgo runtime-coreclr libraries-pgo coreclr-gc-regions

stephentoub · 2021-07-10T13:45:42Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2021-07-10T13:45:55Z

Azure Pipelines successfully started running 1 pipeline(s).

src/libraries/System.Diagnostics.DiagnosticSource/tests/MetricEventSourceTests.cs

noahfalk · 2021-07-10T22:48:18Z

Outerloop doesn't run on any PR by default; you have to ask the bot to run the legs via a comment on a PR. Outerloop does run on various rolling builds daily.

Thanks, that sounds like a fine state of affairs for the tests I marked as outerloop.

[EDIT]
I think I misunderstood you originally Tarek and finally realized it. You were proposing that I force them to run on this PR and I thought you were saying I needed to change something in the code or they would never run. I feel silly but I'm glad Stephen got it : )

dotnet-issue-labeler bot added the area-System.Diagnostics.Metric label Jul 9, 2021

noahfalk requested a review from tarekgh July 9, 2021 06:27

tarekgh reviewed Jul 10, 2021

View reviewed changes

tarekgh approved these changes Jul 10, 2021

View reviewed changes

stephentoub reviewed Jul 10, 2021

View reviewed changes

src/libraries/System.Diagnostics.DiagnosticSource/tests/MetricEventSourceTests.cs Outdated Show resolved Hide resolved

stephentoub approved these changes Jul 10, 2021

View reviewed changes

Increased timeouts and made tests outerloop

deaeea4

noahfalk force-pushed the fix_metric_tests branch from 4feb86c to deaeea4 Compare July 10, 2021 23:59

noahfalk merged commit 21c2516 into dotnet:main Jul 11, 2021

noahfalk mentioned this pull request Jul 12, 2021

Test failure:System.Diagnostics.Metrics.Tests.MetricEventSourceTests.EventSourceFiltersInstruments #55313

Closed

ManickaP mentioned this pull request Jul 20, 2021

[QUIC] Remove AppContext switch from S.N.Quic #56027

Merged

karelz mentioned this pull request Jul 28, 2021

[WinHttpHandler] Long running test: ResponseHeadersRead_SynchronizationContextNotUsedByHandler on Win7/Win8 #54034

Closed

ghost locked as resolved and limited conversation to collaborators Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing MetricEventSource tests #55385

Fixing MetricEventSource tests #55385

noahfalk commented Jul 9, 2021 •

edited

Loading

noahfalk commented Jul 9, 2021

stephentoub commented Jul 9, 2021 •

edited

Loading

noahfalk commented Jul 9, 2021

tarekgh Jul 10, 2021

tarekgh commented Jul 10, 2021

tarekgh commented Jul 10, 2021

tarekgh left a comment

noahfalk commented Jul 10, 2021

noahfalk commented Jul 10, 2021

stephentoub commented Jul 10, 2021

stephentoub commented Jul 10, 2021

azure-pipelines bot commented Jul 10, 2021

stephentoub commented Jul 10, 2021

azure-pipelines bot commented Jul 10, 2021

noahfalk commented Jul 10, 2021 •

edited

Loading

Fixing MetricEventSource tests #55385

Fixing MetricEventSource tests #55385

Conversation

noahfalk commented Jul 9, 2021 • edited Loading

noahfalk commented Jul 9, 2021

stephentoub commented Jul 9, 2021 • edited Loading

noahfalk commented Jul 9, 2021

tarekgh Jul 10, 2021

Choose a reason for hiding this comment

tarekgh commented Jul 10, 2021

tarekgh commented Jul 10, 2021

tarekgh left a comment

Choose a reason for hiding this comment

noahfalk commented Jul 10, 2021

noahfalk commented Jul 10, 2021

stephentoub commented Jul 10, 2021

stephentoub commented Jul 10, 2021

azure-pipelines bot commented Jul 10, 2021

stephentoub commented Jul 10, 2021

azure-pipelines bot commented Jul 10, 2021

noahfalk commented Jul 10, 2021 • edited Loading

noahfalk commented Jul 9, 2021 •

edited

Loading

stephentoub commented Jul 9, 2021 •

edited

Loading

noahfalk commented Jul 10, 2021 •

edited

Loading