[Merge-on-Red] - Implement Test Process Watcher #78742

ivdiazsa · 2022-11-23T01:19:22Z

This PR adds a new harness to run tests by means of corerun. This is the watchdog work item defined in issue #77735.

The main purpose of adding it, is to be able to have a way to monitor potential freezes and hangs when running tests. If a specified time frame lapses, and the test hasn't finished, then the watcher will automatically kill the process and report accordingly.

With this mechanism in place, we will no longer have incomplete test runs that froze somewhere, without any further information, making test failures of this kind much easier to begin investigating, and consequently fix them.

Remaining Tasks Until Completion:

Refactor accordingly to give it a more solid multi-platform shape.
Modify the test script generators to use the watcher, instead of directly calling corerun.
Add the watcher to the repo's build scripts, so it's addition is transparent.

…ut now gotta convert a whole char ** to a char *.

dotnet-issue-labeler · 2022-11-23T01:19:32Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

ivdiazsa · 2022-11-23T01:21:44Z

Hi @jkoritzinsky! Here's the initial draft of the test watcher. It works on Linux and Windows, and it's ready for sharing, so you can take a look and we can adjust/refactor/edit/etc as necessary.

ghost · 2022-11-23T01:22:20Z

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds a new harness to run tests by means of corerun.

The main purpose of adding it, is to be able to have a way to monitor potential freezes and hangs when running tests. If a specified time frame lapses, and the test hasn't finished, then the watcher will automatically kill the process and report accordingly.

With this mechanism in place, we will no longer have incomplete test runs that froze somewhere, without any further information, making test failures of this kind much easier to begin investigating, and consequently fix them.

Remaining Tasks Until Completion:

Refactor accordingly to give it a more solid multi-platform shape.
Modify the test script generators to use the watcher, instead of directly calling corerun.

Author:	ivdiazsa
Assignees:	ivdiazsa
Labels:	`area-Infrastructure-coreclr`
Milestone:	-

ivdiazsa · 2022-12-01T21:52:31Z

FYI. This is the draft PR of the watcher @agocke @tommcdon

… yet functional, but gotta save my progress :)

jkoritzinsky · 2022-12-06T05:40:32Z

I think we can just include the watchdog CMakeLists.txt from the CoreCLR and Mono CMake builds. I don’t think we need to introduce a separate subset and native project build for it.

agocke · 2022-12-06T08:02:21Z

src/native/watchdog/watchdog.cpp

+#else
+    const int check_interval = 1000;
+    int check_count = 0;
+    char **args = new char *[exe_argc];


Consider a vector here. You forgot to delete

I forgot the deletion, thanks for pointing it out Andy. The reason we're using a char*[] is because that's the type that execv() requires.

unique_ptr<char*[]> then?

That might work as well. I'll try it.

agocke · 2022-12-06T08:04:20Z

fwiw I'm not completely sold on C++ if we have to parse and fixup the XML file. I'd rather use... anything else for string manipulation tbh.

ivdiazsa · 2022-12-06T19:57:13Z

fwiw I'm not completely sold on C++ if we have to parse and fixup the XML file. I'd rather use... anything else for string manipulation tbh.

Yeah I agree, C++ is too much for this. We should do it in plain C like it should be :)

Jokes aside, thanks for pointing this out @agocke. We seem be on different pages on this, so let's take this chance to sync that. The YAML log is generated in C#, right here: https://github.com/ivdiazsa/runtime/blob/2f66c46c1948af53c447e1efe0d1cc32867244f3/src/tests/Common/XUnitWrapperLibrary/TestSummary.cs#L95

I was under the impression that the XML corrector, if we decide to go with that approach, would go somewhere around there as well.

… that need it.

src/tests/Common/helixpublishwitharcade.proj

…irectory, rather than just the executable, and reallowed tests to be run without the watcher.

… of the object artifacts.

jkoritzinsky

Two small comments, but other than that, LGTM!

jkoritzinsky · 2023-03-09T21:12:53Z

src/native/watchdog/watchdog.cpp

+
+#else // !TARGET_WINDOWS
+
+    // TODO: Describe what the 'ms_factor' is, and why it's being used here.


Address this TODO?

Oops forgot that. Thanks for the catch.

I actually removed it altogether. I needed it because originally, I was dealing with some C stuff that combined microseconds and milliseconds. After switching to C++'s std::chrono::milliseconds, testing with and without it yielded virtually the same wait time, so it ended up being redundant.

src/tests/Common/helixpublishwitharcade.proj

lewing · 2023-03-10T01:02:06Z

src/tests/Common/helixpublishwitharcade.proj

-    <HelixCorrelationPayload Include="$(XUnitLogCheckerDirectory)" />
+
+    <!-- Browser-Wasm follows a very different workflow, which is currently out of scope of the Log Checker. -->
+    <HelixCorrelationPayload Include="$(XUnitLogCheckerDirectory)" Condition="'$(TargetsBrowser)' != 'true'" />


Does this change the current behavior for wasm logs then?

Not at all. Wasm is out of scope of this project for the time being, hence we are excluding it. So wasm tests will remain unaffected.

BruceForstall · 2023-03-13T20:38:30Z

It seems undesirable to hard-code yet another timeout value (300 seconds?) into a new location (and there isn't a big, obvious comment noting that's what the number is). Especially since the YAML files (?) already specify per-test timeouts. E.g., it wouldn't surprise me if some merged test cases (e.g., Hardware Intrinsics) run under GCStress=3 on Linux/arm could be very slow.

This reverts commit 728fd85.

ivdiazsa added 9 commits November 10, 2022 15:31

Started the draft for the watch tower.

be9867e

Parsing command-line arguments done.

e79ae66

Removed auto-generated file.

b17eef2

Added an example I made on how to run processes from C/C++ :)

d40ad9e

Perfect arg parsing done!

6cc9e9e

Seems to be working?

fcd9f1a

Watchdog is functional now!

1c7f46c

Fixed a nuisance with the Windows compiler messing with Linux code, b…

ec90d1e

…ut now gotta convert a whole char ** to a char *.

Got the Windows draft working, at long last.

b157904

ghost assigned ivdiazsa Nov 23, 2022

ivdiazsa added the area-Infrastructure-coreclr label Nov 23, 2022

ivdiazsa assigned jkoritzinsky Nov 23, 2022

ivdiazsa added this to the 8.0.0 milestone Nov 23, 2022

ivdiazsa linked an issue Nov 23, 2022 that may be closed by this pull request

[Merge-on-Red] - Accurately log catastrophic test failures and freezes in source-gened test infrastructure #77735

Closed

5 tasks

ivdiazsa added this to In Progress in Infrastructure Backlog Nov 23, 2022

build-analysis bot mentioned this pull request Nov 23, 2022

CI build failure: Build MacCatalyst x64 Release AllSubsets_Mono - XcodeBuildApp task failed unexpectedly - Could not find System.Runtime.Tests.app #78778

Closed

Added initial linking of the watcher to the repo's build scripts. Not…

36a75f0

… yet functional, but gotta save my progress :)

agocke reviewed Dec 6, 2022

View reviewed changes

build-analysis bot mentioned this pull request Dec 6, 2022

iOS & tvOS legs are failing to AOT System.Net.Http.Json #79279

Closed

ivdiazsa added 4 commits December 6, 2022 15:30

Moved the watcher to the CoreCLR project. Not yet functional though...

c4f2009

Got it to build on Windows!

e8b13a5

Builds and Works on Linux!

b476ad2

Fixed a slowdown with the watcher.

f74fb2e

ivdiazsa added 4 commits February 22, 2023 15:45

Fixed build issue in the watcher's Windows version.

064b9b0

Apparently works on Windows now too?

2cb9270

Fixed build issue with Windows x86.

450eba9

Fixed issue where we were not building the watcher in certain subsets…

eba2bcd

… that need it.

runfoapp bot mentioned this pull request Feb 27, 2023

Infra improvements for Helix #68176

Closed

Fixed yet another Windows issue.

d68d16c

jkoritzinsky reviewed Feb 27, 2023

View reviewed changes

src/tests/Common/helixpublishwitharcade.proj Outdated Show resolved Hide resolved

This was referenced Feb 27, 2023

Test failure: System.Security.Cryptography.X509Certificates.Tests.CertificateCreation.CertificateRequestChainTests/CreateChain_Hybrid #25979

Closed

System.Security.Cryptography.X509Certificates.Tests.ChainTests.BuildInvalidSignatureTwice failure #65448

Closed

Used ExeSuffix instead of recomputing it and excluded Browser-Wasm

998965c

ivdiazsa marked this pull request as ready for review February 28, 2023 20:30

Changed the Helix Correlation Payloads to bundle the whole watchdog d…

7cb4e89

…irectory, rather than just the executable, and reallowed tests to be run without the watcher.

This was referenced Mar 2, 2023

System.Security.Cryptography.X509Certificates.Tests.ChainTests.BuildInvalidSignatureTwice failure #82837

Open

Some System.Buffers.Binary.Tests.ReverseEndiannessUnitTests tests broken on main #82868

Closed

Changed the scripts to use the test watcher in the Core_Root, instead…

2ba916a

… of the object artifacts.

build-analysis bot mentioned this pull request Mar 3, 2023

Alpine System.Net.Security.Tests failing because of "Cannot load library libgssapi_krb5.so.2" #82945

Closed

Fixed undeclared variable in Windows test scripts.

a712352

jkoritzinsky approved these changes Mar 9, 2023

View reviewed changes

Addressed PR feedback comments.

0cc8875

jkoritzinsky approved these changes Mar 10, 2023

View reviewed changes

lewing requested a review from radical March 10, 2023 01:00

lewing reviewed Mar 10, 2023

View reviewed changes

ivdiazsa merged commit 728fd85 into dotnet:main Mar 10, 2023

Infrastructure Backlog automation moved this from In Progress to Closed Mar 10, 2023

jakobbotsch mentioned this pull request Mar 11, 2023

[jitstress] HardwareIntrinsics_ro fails with "process cannot access the file" error #83298

Closed

hoyosjs added a commit that referenced this pull request Mar 14, 2023

Revert "[Merge-on-Red] - Implement Test Process Watcher (#78742)"

a0b6048

This reverts commit 728fd85.

hoyosjs mentioned this pull request Mar 14, 2023

Revert "[Merge-on-Red] - Implement Test Process Watcher" #83412

Closed

ghost locked as resolved and limited conversation to collaborators Apr 13, 2023

ivdiazsa deleted the the-native-eye branch April 24, 2023 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merge-on-Red] - Implement Test Process Watcher #78742

[Merge-on-Red] - Implement Test Process Watcher #78742

ivdiazsa commented Nov 23, 2022 •

edited

dotnet-issue-labeler bot commented Nov 23, 2022

ivdiazsa commented Nov 23, 2022

ghost commented Nov 23, 2022

ivdiazsa commented Dec 1, 2022

jkoritzinsky commented Dec 6, 2022

agocke Dec 6, 2022

ivdiazsa Dec 6, 2022

agocke Dec 9, 2022

ivdiazsa Dec 12, 2022

agocke commented Dec 6, 2022 •

edited

ivdiazsa commented Dec 6, 2022

jkoritzinsky left a comment

jkoritzinsky Mar 9, 2023

ivdiazsa Mar 9, 2023

ivdiazsa Mar 9, 2023

lewing Mar 10, 2023

ivdiazsa Mar 10, 2023 •

edited

BruceForstall commented Mar 13, 2023


		#else // !TARGET_WINDOWS

		// TODO: Describe what the 'ms_factor' is, and why it's being used here.

[Merge-on-Red] - Implement Test Process Watcher #78742

[Merge-on-Red] - Implement Test Process Watcher #78742

Conversation

ivdiazsa commented Nov 23, 2022 • edited

dotnet-issue-labeler bot commented Nov 23, 2022

ivdiazsa commented Nov 23, 2022

ghost commented Nov 23, 2022

ivdiazsa commented Dec 1, 2022

jkoritzinsky commented Dec 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agocke commented Dec 6, 2022 • edited

ivdiazsa commented Dec 6, 2022

jkoritzinsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivdiazsa Mar 10, 2023 • edited

Choose a reason for hiding this comment

BruceForstall commented Mar 13, 2023

ivdiazsa commented Nov 23, 2022 •

edited

agocke commented Dec 6, 2022 •

edited

ivdiazsa Mar 10, 2023 •

edited