Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing MetricEventSource tests #55385

Merged
merged 2 commits into from
Jul 11, 2021
Merged

Conversation

noahfalk
Copy link
Member

@noahfalk noahfalk commented Jul 9, 2021

I've seen two recent non-deterministic failures:

  1. After starting the EventListener there is a delay of one collection
    interval (currently 0.3 seconds) where we expect calls to counter.Add
    and Histogram.Record() to update stats. On a fast machine this code
    would run in well under a millisecond but in some test runs it still wasn't
    happening fast enough.
  2. We were seeing error events from a previous test get observed in a
    later test because session id was being ignored for error events.

Fixes:

  1. Added an initial collection interval where no counter APIs will be
    invoked and the test will delay up to 5 seconds waiting for this.
    Hopefully this delay makes it more likely that the Add/Record APIs
    are ready to execute promptly once the 2nd interval begins. If that
    still isn't sufficient we can either increase the collection intervals or
    disable the tests.
  2. Tightened up the session id matching so only error events with
    empty id or the expected id are allowed.

Fixes #55315 and #55313 (I hope)

I've seen two recent non-deterministic failures:
1. After starting the EventListener there is a delay of one collection
interval (currently 0.3 seconds) where we expect calls to counter.Add
and Histogram.Record() to update stats. On a fast machine this code
would run in under a millisecond but in some test runs it still wasn't
happening fast enough.
2. We were seeing error events from a previous test get observed in a
later test because session id was being ignored for error events.

Fixes:
1. Added an initial collection interval where no counter APIs will be
invoked and the test will delay up to 5 seconds waiting for this.
Hopefully this delay makes it more likely that the Add/Record APIs
are ready to execute promptly once the 2nd interval begins. If that
still isn't sufficient we can either increase the collection intervals or
disable the tests.
2. Tightened up the session id matching so only error events with
empty id or the expected id are allowed.
@noahfalk
Copy link
Member Author

noahfalk commented Jul 9, 2021

@tarekgh
cc @dotnet/dotnet-diag

@noahfalk noahfalk requested a review from tarekgh July 9, 2021 06:27
@stephentoub
Copy link
Member

stephentoub commented Jul 9, 2021

Added an initial collection interval where no counter APIs will be
invoked and the test will delay up to 5 seconds waiting for this.
Hopefully this delay makes it more likely that the Add/Record APIs
are ready to execute promptly once the 2nd interval begins. If that
still isn't sufficient we can either increase the collection intervals or
disable the tests.

Am I understanding correctly that we expect these events to occur and if they don't the test is meant to fail? If that's the case, I recommend using a much larger timeout than 5 seconds. Elsewhere we generally use at least 30 if not 60 seconds in such cases: such values are much more tolerant of load in CI, you only end up paying the full time if the test is going to fail anyway, and if the event occurs later in that timeframe, it's better to have spent the extra time to make the test succeed rather than having flaky tests.

(I also recommend putting it into a SuccessTimeout constant such that the value is centralized to one updateable location.)

@noahfalk
Copy link
Member Author

noahfalk commented Jul 9, 2021

Am I understanding correctly that we expect these events to occur and if they don't the test is meant to fail?

Yep, that is correct. I can change 5 sec -> 60 sec.


public MetricEventSourceTests(ITestOutputHelper output)
{
_output = output;
}

[ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.IsNotBrowser))]
[OuterLoop("Slow and has lots of console spew")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OuterLoop

Thanks for adding this :-)

@tarekgh
Copy link
Member

tarekgh commented Jul 10, 2021

I am seeing you have added outer loop attribute to some test, do you want enable CI run the outer loops tests?

@tarekgh
Copy link
Member

tarekgh commented Jul 10, 2021

@noahfalk do you think this may help with #55313 too?

Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I hope you enable the outerloop tests before you merge to ensure running the touched tests. Thanks

@noahfalk
Copy link
Member Author

I am seeing you have added outer loop attribute to some test, do you want enable CI run the outer loops tests?

I don't care if every PR runs them, as long as they run periodically. Were you saying that as-is the automation will never run them (in which case I want to change it), or that they won't run on every PR (which I am fine with)?

@noahfalk
Copy link
Member Author

@noahfalk do you think this may help with #55313 too?

Yes I am hopeful it will resolve both of the issues. I edited the issue description at the top to include it. Thanks!

@stephentoub
Copy link
Member

Outerloop doesn't run on any PR by default; you have to ask the bot to run the legs via a comment on a PR. Outerloop does run on various rolling builds daily.

@stephentoub
Copy link
Member

/azp list

@stephentoub
Copy link
Member

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@noahfalk
Copy link
Member Author

noahfalk commented Jul 10, 2021

Outerloop doesn't run on any PR by default; you have to ask the bot to run the legs via a comment on a PR. Outerloop does run on various rolling builds daily.

Thanks, that sounds like a fine state of affairs for the tests I marked as outerloop.

[EDIT]
I think I misunderstood you originally Tarek and finally realized it. You were proposing that I force them to run on this PR and I thought you were saying I needed to change something in the code or they would never run. I feel silly but I'm glad Stephen got it : )

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants