Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry support #33230

Merged
merged 11 commits into from
Jun 8, 2021
Merged

Add retry support #33230

merged 11 commits into from
Jun 8, 2021

Conversation

HaoK
Copy link
Member

@HaoK HaoK commented Jun 2, 2021

See if retries help with flaky template tests

Part of #30882

@HaoK
Copy link
Member Author

HaoK commented Jun 3, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@Pilchie Pilchie added area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates feature-templates labels Jun 3, 2021
@HaoK
Copy link
Member Author

HaoK commented Jun 3, 2021

Failures seen so far: (5 runs)

BlazorWasmStandalonePwaTemplate_Works failed on Mac OS with 3 retries, 2 of them were CERT changed errors, one was 30 second timeout.

@HaoK
Copy link
Member Author

HaoK commented Jun 3, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@HaoK HaoK marked this pull request as ready for review June 3, 2021 22:23
@HaoK HaoK requested a review from Pilchie as a code owner June 3, 2021 22:23
@HaoK HaoK requested review from a team June 3, 2021 22:23
@HaoK
Copy link
Member Author

HaoK commented Jun 3, 2021

Summary of this PR:

  • Adds [Retry] attribute which defaults to 3 retries, if any of the attempts succeed, the test is considered good
  • Marks all the blazor template tests with retry to see if this helps the pass rate in quarantine

@HaoK
Copy link
Member Author

HaoK commented Jun 3, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Member

@dougbu dougbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me if this is what the team wants. My main concern is 3 retries may paper over problems getting twice as bad e.g. going from 1 failure in 3 tries to 2 failures in 3 tries without anyone noticing.

I'm also thinking the whole GetOr part of ProjectFactoryFixture.GetOrCreateProject(...) is a Bad Idea:tm:

if (_retryFailsUntil3 != 2) throw new Exception("NOOOOOOOO");
}

private static int _canOverrideRetries = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move to the top of the class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I departed from the usual pattern here because these are doing a bad thing in that each test has its own static int, but I felt that was cleaner than adding some kind of internal context that we can look at since this is just test code. So the grouping is really a static field per test, it would be really bad if the tests mucked with another one of these fields.

@@ -102,6 +102,21 @@ public class Project : IDisposable
try
{
Output.WriteLine("Acquired DotNetNewLock");

if (Directory.Exists(TemplateOutputDir))
Copy link
Member

@dougbu dougbu Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't TemplateOutputDir already very unique❔ If Path.GetRandomFileName() isn't unique enough use Guid.NewGuid() in ProjectFactoryFixture.GetOrCreateProject(...). If instead this is new needed because the retries reuse existing projects, we may want to completely remove that aspect of the fixture i.e. switch to ProjectFactoryFixture.CreateProject(...).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can file a separate issue to track changing that if you want, but I don't really want to do additional major surgery (prefer making this PR just about adding/using retries)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's additional complications about switching from reusing projects since our longer term hope was to actually switch to one test creating/building the template project, and a second (ordered) test would be in charge of running/verifying the tests, we were hoping that would also reduce the flakiness, or at least help us isolate which part of the test is flaky. Unfortunately we didn't get to that point yet, but that was the longer term goal

src/ProjectTemplates/Shared/TemplatePackageInstaller.cs Outdated Show resolved Hide resolved
{
var attempts = 0;
var timeTaken = 0.0M;
for (attempts = 0; attempts < retryAttribute.MaxRetries; attempts++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remind me: Will this loop retry an entire [Theory] even if only one data set hits a problem❔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a per test case/fact (at least that's the intent), I'm pretty certain of that, since there was a different extensibility point I was looking at in a different iteration of trying to implement this.

@dougbu
Copy link
Member

dougbu commented Jun 4, 2021

@markwilkie @garath @ChadNedzlek @missymessa does this duplicate near-term work on Dev WF❔

@HaoK
Copy link
Member Author

HaoK commented Jun 4, 2021

When the helix retry stuff is ready, we should definitely switch to just using the schema/rules as it is much more powerful, but this is basically a way for us to do a simple retry at an xunit level (this is akin to local reruns only)

@ChadNedzlek
Copy link
Member

This is 100% duplicating work that dev WF should have a solution for in the next few weeks. I'd really rather we not invent a second mechanism for doing the same thing.

Copy link
Member

@ChadNedzlek ChadNedzlek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you don't want to use the retry mechanism arcade is going to be providing in a couple weeks? It's very unfortunate to have all this duplicated work already.

@HaoK
Copy link
Member Author

HaoK commented Jun 4, 2021

@ChadNedzlek this won't prevent us from using the arcade work at all, but this simple retry will work for our test jobs that aren't on helix, it wasn't clear to me if the test-configuration.json stuff is going to be helix only? If so, we might still benefit from retry for some of our tests that aren't on helix (components for example).

But most importantly, right now 100% of our blazor template tests are in quarantine, and its been that way for a few previews already, given that this is a pretty cheap fix, I think its worth temporarily using this to get some of our template tests back online (and switch to using the test-configuration.json stuff once its available)

@ChadNedzlek
Copy link
Member

It seems like an offline discussion about your needs around retry stuff would be fruitful.

@markwilkie
Copy link
Member

It seems like an offline discussion about your needs around retry stuff would be fruitful.

Yea, outside of this PR it'd be really great to understand the delta (if any) between this retry stuff and what's being implemented as part of dev wf. Sounds like rests run outside of Helix might be one, but it'd be awesome to understand that better. Is this something you think would be useful to do @HaoK ?

@HaoK
Copy link
Member Author

HaoK commented Jun 5, 2021

@markwilkie @ChadNedzlek I don't think there's actually any gaps in the current retry plan, this PR is more of a point and time thing to try and get some blazor templates coverage back up as soon as possible, the dev wf plan currently proposed seems like a superset (on helix) and looks great. Our goal is to eventually move all our tests onto helix, and the xunit extensibility retry in this PR seems like it would map exactly onto using local reruns, so we could just turn this off once the dev wf retry stuff is available to try out.

@markwilkie
Copy link
Member

@markwilkie @ChadNedzlek I don't think there's actually any gaps in the current retry plan, this PR is more of a point and time thing to try and get some blazor templates coverage back up as soon as possible, the dev wf plan currently proposed seems like a superset (on helix) and looks great. Our goal is to eventually move all our tests onto helix, and the xunit extensibility retry in this PR seems like it would map exactly onto using local reruns, so we could just turn this off once the dev wf retry stuff is available to try out.

Thanks @HaoK ! This makes sense. Let's be sure and continue to work together as we too would love to see more tests run in a consistent fashion across .NET. Cheers

src/ProjectTemplates/Shared/Project.cs Outdated Show resolved Hide resolved
src/Testing/src/RetryAttribute.cs Outdated Show resolved Hide resolved
/// Runs a test multiple times when it fails
/// This can be used on an assembly, class, or method name. Requires using the AspNetCore test framework.
/// </summary>
[EditorBrowsable(EditorBrowsableState.Never)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because this ships in a user visible package? Making it Never makes it really hard to type this in regular code and gives some people PTSD (@rynowak).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied

[EditorBrowsable(EditorBrowsableState.Never)]
as a starting point so it wasn't specifically intentional by me

HaoK and others added 2 commits June 7, 2021 16:26
Co-authored-by: Pranav K <prkrishn@hotmail.com>
@HaoK HaoK enabled auto-merge (squash) June 7, 2021 23:29
@HaoK HaoK merged commit 49a1014 into main Jun 8, 2021
@HaoK HaoK deleted the haok/retry2 branch June 8, 2021 07:36
@ghost ghost added this to the 6.0-preview6 milestone Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates feature-templates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants