Skip to content

Conversation

@dougbu
Copy link
Contributor

@dougbu dougbu commented Feb 25, 2022

  • Consider producing fewer log files per Helix work item (10k+ in some instances) #39038
  • add per-test LoggedTestCleanerFixtureAttribute to help track test failures
  • add per-assembly LoggedTestCleaner to perform actual log directory Cleanup
  • extend AssemblyFixtureAttribute and AspNetTestAssemblyRunner to support communication between the above
  • use LoggedTestCleanerFixtureAttribute in test projects that produce large numbers of usually-useless log files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while driving by I found these <Content/> items were fading into the <Compile/> items

@Pilchie Pilchie added the area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework label Feb 25, 2022
@dougbu dougbu marked this pull request as draft February 25, 2022 22:27
@dougbu
Copy link
Contributor Author

dougbu commented Feb 25, 2022

Working to address a couple of concerns:

  • There are typos in two of the projects I changed.
  • Cleanup doesn't happen if any test in an assembly fails. Better if LoggedTestBase and any similar test or test base classes notified the cleaner when their specific tests fail. Will probably make AssemblyTestLog the cleaner (that'll also get its Dispose() method working).
  • On the other hand, this cleanup doesn't happen for many of test classes that rely on LoggedTestBase let alone AssemblyTestLog or (even more broadly) otherwise write to TestOutputDirectoryAttribute-specified folders. Better to change Microsoft.AspNetCore.Testing.props than individual test projects although I may ignore the "otherwise write to TestOutputDirectoryAttribute-specified folders" part (by having tests indicate their interest in cleanup and assuming the two purposes don't overlap in a single test project).
  • The Instance assumption is too fragile.

Will comment when I mark this PR ready for review again.

@halter73
Copy link
Member

Cleanup doesn't happen if any test in an assembly fails.

I prefer this. Sometimes it's nice to look at similar passing tests to see if there are patterns in the logs. I hope most test assemblies pass at a high enough rate that we shouldn't bother optimizing the artifact size when they fail.

It'd really be nice if I could also look at logs from passing tests in other assemblies when there's a failure. Can we only clean logs if the entire test matrix succeeds? When I'm debugging rare ci-only test failures, I want to keep all the clues available to me.

@dougbu
Copy link
Contributor Author

dougbu commented Feb 26, 2022

Can we only clean logs if the entire test matrix succeeds?

@halter73 the direction I was heading before your comment was to keep all logs created by AssemblyTestLog on behalf of LoggedTestBase unless a test class based on LoggedTestBase hit a failure. The main failures I was going to ignore were those that don't involve LoggedTestBase i.e. tests that don't create log files. I wasn't proposing cleaning up log files on a per-test-class basis for example. So, if a single test in ChunkedRequestTests fails, all logs for ChunkedRequestTests, ChunkedResponseTests, et cetera will be kept. Is that sufficient❔

I could leave things as-is, meaning (switching from InMemory.FunctionalTests to Kestrel.Core.Tests) a failure in AddressBinderTests or AsciiDecodingTests would cause logs to be saved for ConnectionDispatcherTests and other logged tests in that assembly. That seems a bit silly to me but can be persuaded if you prefer it.

Note I can't keep logs for everything if tests in some other assembly or for a different TFM in the same test project fail. Our Helix work items are per-test assembly aka per-test project / TFM pair.

Comment on lines +47 to +55
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to AssemblyTestLog currently but is a pattern other classes could follow. Adding a default constructor to AssemblyTestLog wouldn't be a good idea.

@dougbu dougbu marked this pull request as ready for review February 28, 2022 04:29
@dougbu
Copy link
Contributor Author

dougbu commented Feb 28, 2022

Ready for review

@dougbu
Copy link
Contributor Author

dougbu commented Feb 28, 2022

This does a great deal of cleanup compared to the before picture, especially when looking at non-Windows work items (none of which create more than 5 jobs). I may look into the IIS logging but that doesn't appear urgent.

Before

Build #20220228.5 of main.

WorkItemFriendlyName FileCount
Microsoft.AspNetCore.Server.Kestrel.Core.Tests--net7.0 3551
InMemory.FunctionalTests--net7.0 1289
Microsoft.AspNetCore.SignalR.Client.FunctionalTests--net7.0 810
Microsoft.AspNetCore.Http.Extensions.Tests--net7.0 395
Interop.FunctionalTests--net7.0 391
Microsoft.AspNetCore.SignalR.Tests--net7.0 261
IISExpress.FunctionalTests--net7.0 194
Microsoft.AspNetCore.SignalR.Client.Tests--net7.0 167
Microsoft.AspNetCore.Http.Connections.Tests--net7.0 150
IIS.LongTests--net7.0 114

After

Build #20220227.10 of this PR.

WorkItemFriendlyName FileCount
IISExpress.FunctionalTests--net7.0 190
IIS.LongTests--net7.0 113
IIS.FunctionalTests--net7.0 104
IIS.NewHandler.FunctionalTests--net7.0 84
IIS.NewShim.FunctionalTests--net7.0 82
IIS.ShadowCopy.Tests--net7.0 13
Microsoft.AspNetCore.Mvc.FunctionalTests--net7.0 10
Microsoft.Extensions.Features.Tests--net7.0 5
InteropTests--net7.0 5
Microsoft.Extensions.Diagnostics.HealthChecks.Tests--net462 5

For posterity

Sample query
Files
| where JobName == "6d309ca5-777b-4aeb-9861-5cc4e4ba1213"
| summarize FileCount = count() by WorkItemFriendlyName
| top 10 by FileCount desc

@dougbu
Copy link
Contributor Author

dougbu commented Feb 28, 2022

/fyi @MattGal since you reported #39038

@dougbu
Copy link
Contributor Author

dougbu commented Feb 28, 2022

Hmm, have some room to improvement on the space our files consume. FileSize below adds all file sizes together for each work item.

WorkItemFriendlyName TotalSize
IIS.ShadowCopy.Tests--net7.0 440 MB
Microsoft.AspNetCore.SignalR.Client.FunctionalTests--net7.0 74 MB
IISExpress.FunctionalTests--net7.0 73 MB
Microsoft.AspNetCore.Server.Kestrel.Core.Tests--net7.0 51 MB
Sockets.FunctionalTests--net7.0 40 MB
InMemory.FunctionalTests--net7.0 30 MB
Microsoft.AspNetCore.Mvc.Core.Test--net7.0 25 MB
ProjectTemplates.Tests--net7.0 20 MB
Microsoft.AspNetCore.Routing.Tests--net7.0 19 MB
Microsoft.AspNetCore.Mvc.ViewFeatures.Test--net7.0 15 MB

@dougbu
Copy link
Contributor Author

dougbu commented Feb 28, 2022

Oops, that was for the build of 'main'. The after story (again, on Windows) is

WorkItemFriendlyName TotalSize
IIS.ShadowCopy.Tests--net7.0 147 MB
IISExpress.FunctionalTests--net7.0 69 MB
Microsoft.AspNetCore.SignalR.Client.FunctionalTests--net7.0 51 MB
Microsoft.AspNetCore.Server.Kestrel.Core.Tests--net7.0 47 MB
Sockets.FunctionalTests--net7.0 28 MB
Microsoft.AspNetCore.Mvc.Core.Test--net7.0 25 MB
InMemory.FunctionalTests--net7.0 23 MB
Microsoft.AspNetCore.Routing.Tests--net7.0 19 MB
Microsoft.AspNetCore.Mvc.ViewFeatures.Test--net7.0 15 MB
IIS.LongTests--net7.0 13 MB

Copy link
Member

@BrennanConroy BrennanConroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, no changes needed to existing tests, and looks like it works as expected from my quick glance at the quarantine and non-quarantine runs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I get what this is doing, the check isn't really needed and sort of makes it confusing when reading.
I would just always set the state, change the function name to something like "MarkHasTestFailure" or "RecordHasTestFailure", and if you really like the check maybe throw an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you pointed me here @BrennanConroy. My intention was to ensure we didn't lose track of reported test failures because another class requested cleanup later. This line means problems reported before the first cleanup request are ignored ☹️

Just updating _cleanupState is more correct but insufficient because other AssemblyTestLog users in the same assembly might not inherit from LoggedTestBase. I need to ensure all users report their failures if any users exist in the assembly because the cleanup has to be all or nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on two options here:

  1. Revert changes to LoggedTestBase and have AspNetTestAssemblyRunner somehow notify AssemblyTestLog of test failures. This would entirely eliminate the concept of registering for cleanup and would mean any failure in a test assembly prevents cleanup.
  2. Update all AssemblyTestLog users to make sure they notify that class of test failures. Basically this option would make not registering fail loudly and longly. If I go this route, I'll use @halter73's two-bool suggestion.

@halter73
Copy link
Member

halter73 commented Mar 1, 2022

I could leave things as-is, meaning (switching from InMemory.FunctionalTests to Kestrel.Core.Tests) a failure in AddressBinderTests or AsciiDecodingTests would cause logs to be saved for ConnectionDispatcherTests and other logged tests in that assembly. That seems a bit silly to me but can be persuaded if you prefer it.

Is this where we landed? As long as all the logs from the same helix job as the failing test are kept it's fine. I mostly like keeping the logs from other tests to find evidence of thread pool starvation, ephemeral port exhaustion or other similar environmental issues that can cause a bunch of tests in the same job to fail.

@dougbu
Copy link
Contributor Author

dougbu commented Mar 1, 2022

Is this where we landed? As long as all the logs from the same helix job as the failing test are kept it's fine.

That was my proposal. I'm asking if it's enough for you.

I should have mentioned: Because Helix work items are per-assembly i.e. test a single TFM from a single test project, you'll get all of the logs for the work item. For something like ServerComparison.FunctionalTests, a single test failure won't impact logging for other tests sharing the same build agent but you'll get all of the logs for that assembly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you pointed me here @BrennanConroy. My intention was to ensure we didn't lose track of reported test failures because another class requested cleanup later. This line means problems reported before the first cleanup request are ignored ☹️

Just updating _cleanupState is more correct but insufficient because other AssemblyTestLog users in the same assembly might not inherit from LoggedTestBase. I need to ensure all users report their failures if any users exist in the assembly because the cleanup has to be all or nothing.

Comment on lines 228 to 230
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about multiple test classes executing their tests in parallel, should this be bullet-proof e.g. using Interlocked.CompareExchange(...) (and a full-sized int enum)❔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could always lock in ReportTestFailure. Although per-instance lock might be better than the static _lock. I think the best options is to avoid locking or interlocking entirely by turning _cleanupState into two bools: bool _wasTestFailure and bool _cleanupRequested.

Based on my reading, neither of these bools could ever transition from true to false, so there's no reason for any comparisons before setting either of these to true. Dispose can then delete the directory only if _cleanupRequested && !_wasTestFailure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll work @halter73. I merged them into an enum primarily to make updates atomic but that didn't work out very well 😉

@dougbu dougbu requested a review from a team as a code owner March 15, 2022 19:29
Copy link
Contributor Author

@dougbu dougbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm requesting re-reviews because this iteration is fairly different from what was in this PR before. A lot of that's in testing of Microsoft.AspNetCore.Testing but the src/ change to have AspNetTestAssemblyRunner handle failure reporting to AssemblyTestLog is significant too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important because individual tests don't own AssemblyTestLog instances, AspNetTestAssemblyRunner does

@dougbu dougbu requested review from BrennanConroy, halter73 and wtgodbe and removed request for pranavkm March 15, 2022 19:39
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, never mind

@dougbu
Copy link
Contributor Author

dougbu commented Mar 16, 2022

@halter73 @BrennanConroy who knows logging fairly well❔ I added a bunch o' logging to figure out why AssemblyTestLogTests.ForAssemblyWritesToAssemblyBaseDirectory() keeps failing. Haven't found a smoking gun -- the AssemblyTestLog in that test has the expected base directory, an enabled global logger, a SerilogLoggerProvider in the service provider, et cetera. Nonetheless the global.log file is not created even if I sleep the test Thread for 5 seconds.

Problem is specific to Linux and OSX and reproduces under WSL2.

Any suggestions❔

@dougbu
Copy link
Contributor Author

dougbu commented Mar 17, 2022

Below is from the console log of recent failure on Helix OSX agent, showing both the settings for Microsoft.AspNetCore.Testing, the ForAssembly_ReturnsSameInstanceForSameAssembly()'s Disposable instance, then the same for the inner dynamic assembly and fake test. Everything lines up but still the /var/folders/hv/54dwf7d169j4qnkvdmlkg0lw0000gq/T/TestLogging_8dcd6868d67c47558b55586a88470821/Testf6be736258cf46b2b2902121c279f9c8/net7.0 directory contains only the MyTestClass/MyTestClass.MyTestMethod.log file and no global.log. global.log files are created as expected in the other tests on both Linux and OSX.

What's worth checking next❔

 _assembly: 'Microsoft.AspNetCore.Testing.Tests, Version=42.42.42.42, Culture=neutral, PublicKeyToken=adb9793829ddae60'
 _baseDirectory: 'artifacts/log/Microsoft.AspNetCore.Testing.Tests/net7.0'
 _globalLogger: 'Microsoft.Extensions.Logging.Logger'
 _globalLogger.IsEnabled: 'True'
 _globalLoggerFactory: 'Microsoft.Extensions.Logging.LoggerFactory'
 _serviceProvider: 'Microsoft.Extensions.DependencyInjection.ServiceProvider'
 loggerProvider: Serilog.Extensions.Logging.SerilogLoggerProvider
 inner logger: 'Microsoft.Extensions.Logging.Logger'
 inner logger.IsEnabled: 'True'
 inner factory: 'Microsoft.Extensions.Logging.LoggerFactory'
 inner serviceProvider: 'Microsoft.Extensions.DependencyInjection.ServiceProvider'
 inner loggerProvider: Microsoft.Extensions.Logging.Testing.XunitLoggerProvider
 inner loggerProvider: Serilog.Extensions.Logging.SerilogLoggerProvider
 | [0.010s] Test Information: Created test log in /var/folders/hv/54dwf7d169j4qnkvdmlkg0lw0000gq/T/TestLogging_8dcd6868d67c47558b55586a88470821
 | [0.002s] TestLifetime Information: Starting test MyTestClass.MyTestMethod at 2022-03-17T01:13:02
 Starting test MyTestClass.MyTestMethod at 2022-03-17T01:13:02
 _assembly: 'Testf6be736258cf46b2b2902121c279f9c8, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null'
 _baseDirectory: '/var/folders/hv/54dwf7d169j4qnkvdmlkg0lw0000gq/T/TestLogging_8dcd6868d67c47558b55586a88470821/Testf6be736258cf46b2b2902121c279f9c8/net7.0'
 _globalLogger: 'Microsoft.Extensions.Logging.Logger'
 _globalLogger.IsEnabled: 'True'
 _globalLoggerFactory: 'Microsoft.Extensions.Logging.LoggerFactory'
 _serviceProvider: 'Microsoft.Extensions.DependencyInjection.ServiceProvider'
 loggerProvider: Serilog.Extensions.Logging.SerilogLoggerProvider
 inner logger: 'Microsoft.Extensions.Logging.Logger'
 inner logger.IsEnabled: 'True'
 inner factory: 'Microsoft.Extensions.Logging.LoggerFactory'
 inner serviceProvider: 'Microsoft.Extensions.DependencyInjection.ServiceProvider'
 inner loggerProvider: Microsoft.Extensions.Logging.Testing.XunitLoggerProvider
 inner loggerProvider: Serilog.Extensions.Logging.SerilogLoggerProvider

@dougbu
Copy link
Contributor Author

dougbu commented Mar 17, 2022

So, I finally checked if AssemblyTestLog.ForAssembly(...) produced global.log files and the answer is unfortunately No. That is, the ForAssembly_ReturnsSameInstanceForSameAssembly() failure in this PR is a demonstration of a real problem in Microsoft.AspNetCore.Testing on Linux and OSX. Raises the priority of figuring out what's going on. Please help and / or provide suggestions❕

dougbu added 4 commits March 17, 2022 22:07
- #39038
- update `AssemblyTestLog` to perform actual log directory cleanup
  - add `ReportTestFailure()` method for tests to report failures, disabling cleanup
  - add `IAcceptFailureReports` for `AspNetTestAssemblyRunner` to report failures to `AssemblyTestLog`
- extend `AspNetTestAssemblyRunner` to optionally create fixture instances using `static ForAssembly(Assembly)`
- add `AssemblyTestLogFixtureAttribute` to register `AssemblyTestLog` as an assembly fixture
  - use `AssemblyTestLogFixtureAttribute` in all test projects
- disable log cleanup in three existing tests

nits:
- use `is [not] null` and `new()` more
- also cover a few existing methods

nit: move `AssemblyTestLogTests` to same namespace as `AssemblyTestLog`
- was deleting the just-created global.log file
@dougbu
Copy link
Contributor Author

dougbu commented Mar 18, 2022

🎉 Found the problem 🎉

Please review the last commit if you're interested in the details. I'd also appreciate any comments before I squish 'n merge this sometime tomorrow…

var baseDirectory = TestFileOutputContext.GetOutputDirectory(assembly);
var stackTrace = Environment.StackTrace;
if (!stackTrace.Contains(
"Microsoft.AspNetCore.Testing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever false? Wouldn't it include the current method which is in Microsoft.AspNetCore.Testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought it might be but you're probably right. I'll play with this in a follow-up when I get a bit of time.

@dougbu dougbu merged commit 6958517 into dotnet:main Mar 19, 2022
@dougbu dougbu deleted the dougbu/fewer.logs branch March 19, 2022 02:12
@ghost ghost added this to the 7.0-preview3 milestone Mar 19, 2022
@dougbu
Copy link
Contributor Author

dougbu commented Mar 21, 2022

/backport to release/6.0

@github-actions
Copy link
Contributor

Started backporting to release/6.0: https://github.com/dotnet/aspnetcore/actions/runs/2017735060

@github-actions
Copy link
Contributor

@dougbu backporting to release/6.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Cleanup logs after successful test runs - #39038 - update `AssemblyTestLog` to perform actual log directory cleanup - add `ReportTestFailure()` method for tests to report failures, disabling cleanup - add `IAcceptFailureReports` for `AspNetTestAssemblyRunner` to report failures to `AssemblyTestLog` - extend `AspNetTestAssemblyRunner` to optionally create fixture instances using `static ForAssembly(Assembly)` - add `AssemblyTestLogFixtureAttribute` to register `AssemblyTestLog` as an assembly fixture - use `AssemblyTestLogFixtureAttribute` in all test projects - disable log cleanup in three existing tests
Using index info to reconstruct a base tree...
M	src/Mvc/test/Mvc.FunctionalTests/ErrorPageTests.cs
M	src/Testing/src/AssemblyTestLog.cs
M	src/Testing/src/xunit/AspNetTestAssemblyRunner.cs
M	src/Testing/test/AssemblyTestLogTests.cs
Falling back to patching base and 3-way merge...
Auto-merging src/Testing/test/AssemblyTestLogTests.cs
CONFLICT (content): Merge conflict in src/Testing/test/AssemblyTestLogTests.cs
Auto-merging src/Testing/src/xunit/AspNetTestAssemblyRunner.cs
CONFLICT (content): Merge conflict in src/Testing/src/xunit/AspNetTestAssemblyRunner.cs
Auto-merging src/Testing/src/AssemblyTestLog.cs
CONFLICT (content): Merge conflict in src/Testing/src/AssemblyTestLog.cs
Auto-merging src/Mvc/test/Mvc.FunctionalTests/ErrorPageTests.cs
CONFLICT (content): Merge conflict in src/Mvc/test/Mvc.FunctionalTests/ErrorPageTests.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Cleanup logs after successful test runs - #39038 - update `AssemblyTestLog` to perform actual log directory cleanup - add `ReportTestFailure()` method for tests to report failures, disabling cleanup - add `IAcceptFailureReports` for `AspNetTestAssemblyRunner` to report failures to `AssemblyTestLog` - extend `AspNetTestAssemblyRunner` to optionally create fixture instances using `static ForAssembly(Assembly)` - add `AssemblyTestLogFixtureAttribute` to register `AssemblyTestLog` as an assembly fixture - use `AssemblyTestLogFixtureAttribute` in all test projects - disable log cleanup in three existing tests
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions github-actions bot locked and limited conversation to collaborators Dec 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider producing fewer log files per Helix work item (10k+ in some instances)

5 participants