Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: parallel test output interleaving #27474

Closed
BruceForstall opened this issue Sep 25, 2018 · 18 comments
Closed

Question: parallel test output interleaving #27474

BruceForstall opened this issue Sep 25, 2018 · 18 comments
Assignees
Labels
area-Infrastructure-libraries question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@BruceForstall
Copy link
Member

Is there a way to run tests in parallel, but still get good, non-interleaved, output?

For example, in:

https://ci.dot.net/job/dotnet_coreclr/job/master/view/x64/job/jitstress/job/x64_checked_windows_nt_corefx_baseline/641/consoleText

we run the tests using:

build-tests.cmd -Release -os:Windows_NT -buildArch:x64  -- /p:WithoutCategories=IgnoreForCI /p:PreExecutionTestScript=D:\j\workspace\x64_checked_w---a7bd363e\SetStressModes.bat

Then, there is apparently a failure in System.ComponentModel.Composition.Tests. However, it is extremely difficult to see what it is from the logs, because the output between the time the test starts and the time the test ends is interleaved with other test output.

@BruceForstall
Copy link
Member Author

@weshaggard Suggestions?

@weshaggard
Copy link
Member

I don't know of any good way. My best suggestion is to look at the log for the failing test project once it you see it has a failure.

@weshaggard
Copy link
Member

We might also be able to figure out a way to dump the log of the failing test project at the end so it is continuous.

@BruceForstall
Copy link
Member Author

That would be great! That plus a simple list of exactly the failing test projects.

fyi, I recently copied the run-test.sh script (in the root of corefx) to coreclr, and fixed it to work better for running in parallel, and not interleave output when doing so, by creating an output file per test project where redirected output is put. This is basically a bash script hack for running on arm32/arm64 since we can't run msbuild, but it works out very well, IMO. dotnet/coreclr#20107

@BruceForstall
Copy link
Member Author

This is still a serious problem, and makes it extremely difficult to read corefx test run log files.

E.g., https://ci.dot.net/job/dotnet_coreclr/job/master/view/x64/job/jitstress/job/x64_checked_windows_nt_corefx_baseline/702/consoleFull

@karelz @ericstj @danmosemsft Is there anything that could be done here? Even adding a per-line "thread id" to the output would help. Ideally, the output would be one test at a time (and wait until the entire output is complete before generating any). Is it msbuild that invokes the individual RunTest.cmd files? Where?

One option would be to change the RunTest.cmd files themselves so all output is saved and then output all at once, as follows:

set _tmpfile=%TEMP%\RunTest_%RANDOM%.txt
echo ...foo... >>%_tmpfile%
echo ... bar... >>%_tmpfile%
...
call %RUNTIME_PATH%\dotnet.exe xunit.console.dll System.Buffers.Tests.dll ... >>%_tmpfile%
set RESULT=%ERRORLEVEL%
...
type %_tmpfile%
del /f %_tmpfile%
EXIT /B %RESULT%

Or, maybe better, move RunTest.cmd to RunTestHelp.cmd, and create a wrapper RunTest.cmd that does the output buffering, as shown above.

Where is the code that generates the RunTest.cmd files? Looks like the buildtools repo, src\Microsoft.DotNet.Build.Tasks\PackageFiles\RunnerTemplate.Windows.txt as well as src\Microsoft.DotNet.Build.Tasks\GenerateTestExecutionScripts.cs.

Comments?

cc @RussKeldorph

@danmoseley
Copy link
Member

@ViktorHofer is thinking about this right now actually

@BruceForstall
Copy link
Member Author

@ViktorHofer If you have a separate issue tracking this, please link it.

@ViktorHofer
Copy link
Member

I'm not changing this with the move of the testing infrastructure to arcade but we can discuss what changes can be made to improve the output.

@BruceForstall
Copy link
Member Author

Does "the move of the testing infrastructure to arcade" improve the situation? We still have this problem right now, when all tests are run locally (or on one machine in the CI), and I thought the corefx transition to arcade had already happened.

@ViktorHofer
Copy link
Member

The transition is nearly done: dotnet/corefx#33423. I haven't made any improvements to this specific issue. Let's follow-up what can be done.

Note: I'm considering moving to vstest.console as the xunit runner is out of support. Not sure if that helps anyhow.

@danmoseley
Copy link
Member

If the testresults.xml is available, does that solve this?

@BruceForstall
Copy link
Member Author

It looks like we have a bug in our Jenkins configuration where is only archives the testsresults.xml file if the tests all succeed. Maybe if fixed that to always upload the file(s), we could use them for more info. They are uploaded for xunit "interpretation", but Jenkins test result display is (IMO) terrible.

So, I don't know if it "solves" this; IMO, I prefer a nice, de-interlaced output at the end of the test run log file. But maybe it makes it usable.

@ericstj
Copy link
Member

ericstj commented Apr 10, 2019

@BruceForstall will this still be an issue under the new test process in Azure DevOps?

@BruceForstall
Copy link
Member Author

@echesakovMSFT Can you answer @ericstj 's question? My guess is the problem is lessened.

IMO, the problem still exists when running tests locally, if the output is still interleaved.

@echesakov
Copy link
Contributor

@BruceForstall Can you remind me when we see the interleaved output in Jenkins? Does it come from summary messages that get printed for different test assemblies? If so, then moving to Azure DevOps should help - since we will be running each assembly on a different test machine. However, if we decide to group some of those assemblies to the same Helix payload then it's still going to be an issue.

@BruceForstall
Copy link
Member Author

It happens in the console output when we use the normal corefx test runner (e.g., https://ci.dot.net/job/dotnet_coreclr/job/master/view/x64/job/jitstress/job/x64_checked_windows_nt_corefx_baseline/826/consoleText). When we run using our own test running scripts (arm32/arm64), we've implemented de-interleaving in those scripts to avoid the problem.

@ViktorHofer
Copy link
Member

I think we can close this now that coreclr is using a different infrastructure to run corefx tests.

@BruceForstall
Copy link
Member Author

@ViktorHofer I guess that's ok from the CI perspective, but it still seems weird/problematic from the perspective of running all tests locally on a single machine. Maybe people just don't do that.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure-libraries question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

8 participants