Skip to content

runtests: drop logic calling the handle tool (Windows) #16484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

vszakats
Copy link
Member

@vszakats vszakats commented Feb 25, 2025

In the cases observed throughout the last year, handle64 run once per
test run, but with no action (match or task kill). It did not help with
flakiness and seems redundant.

runtests launched it (if present) in Cygwin/MSYS jobs too, where it
probably shouldn't have, because we have seen no flakiness there. In CI
the tool was present and launched in MSYS2 jobs, but not in Cygwin.

After this patch the "clearlocks" warning remain in the log. They are
consistently appearing once in every MSVC CI log, early in the tests:

  test 3207 SKIPPED: curl lacks OpenSSL support
[...START-OF-TESTS...]
  test 0003...[HTTP POST with auth and contents but with content-length set to 0]
  --pd---e--- OK (3   out of 1596, remaining: 17:50, took 1.423s, duration: 00:02)
  test 0007...[HTTP with cookie parser and header recording]
  --pd--oe--- OK (7   out of 1596, remaining: 07:51, took 1.485s, duration: 00:02)
  test 0006...[HTTP with simple cookie send]
  --pd---e--- OK (6   out of 1596, remaining: 09:11, took 1.488s, duration: 00:02)
  test 0005...[HTTP over proxy]
  --pd---e--- OK (5   out of 1596, remaining: 11:03, took 1.491s, duration: 00:02)
CUSTOMBUILD : error : 169: cleardir(log/8/lock) failed [D:\a\curl\curl\bld\tests\test-ci.vcxproj]
  test 0001...[HTTP GET]
  --pd---e--- OK (1   out of 1596, remaining: 55:34, took 1.466s, duration: 00:02)
  test 0004...[Replaced internal and added custom HTTP headers]

Ref: https://github.com/curl/curl/actions/runs/13546192228/job/37858323380?pr=16484#step:14:167

Ref: e53523f #14859
Ref: 311c31e #6179
Follow-up to 3a8920e #16600

@vszakats vszakats marked this pull request as draft February 25, 2025 17:18
@github-actions github-actions bot added tests CI Continuous Integration labels Feb 25, 2025
@vszakats
Copy link
Member Author

vszakats commented Feb 25, 2025

@mback2k, @jay: Your thoughts on this? I'd be interested in ways to trigger handle64 finding an actual kill. It's also an option to keep the logic in runtests.pl and just drop handle64 installs from CI, if it does help outside CI.

@jay
Copy link
Member

jay commented Feb 25, 2025

I'm going to defer to Marc on this one. My general thinking is it should not happen however weird things in CI happen a lot. I really could go either way. #6058 for background. I'd leave this open a bit to give him some time to respond.

@mback2k
Copy link
Member

mback2k commented Feb 26, 2025

@mback2k, @jay: Your thoughts on this? I'd be interested in ways to trigger handle64 finding an actual kill.

I think this happened with stuck tests/testservers in the past, especially with those related to timeouts.

It's also an option to keep the logic in runtests.pl and just drop handle64 installs from CI, if it does help outside CI.

Fine with me. You could also drop the '-rm' flag from being active on CI first.

@vszakats
Copy link
Member Author

vszakats commented Feb 26, 2025

@mback2k, @jay: Your thoughts on this? I'd be interested in ways to trigger handle64 finding an actual kill.

I think this happened with stuck tests/testservers in the past, especially with those related to timeouts.

It's also an option to keep the logic in runtests.pl and just drop handle64 installs from CI, if it does help outside CI.

Fine with me. You could also drop the '-rm' flag from being active on CI first.

@mback2k Just to clarify, are you also okay to drop the clearlocks() function
from servers.pm (making the call to handle*.exe), or is it still needed
in some envs?

Meanwhile tested without -rm in CI, and it builds OK.

@testclutch

This comment was marked as outdated.

@vszakats vszakats changed the title [TEST] runtests: stop calling handle64.exe runtests: stop calling handle64.exe Mar 3, 2025
@vszakats vszakats changed the title runtests: stop calling handle64.exe runtests: drop logic calling the handle tool Mar 3, 2025
@vszakats vszakats marked this pull request as ready for review March 3, 2025 03:06
@vszakats vszakats added the Windows Windows-specific label Mar 3, 2025
@vszakats vszakats changed the title runtests: drop logic calling the handle tool runtests: drop logic calling the handle tool (Windows) Mar 6, 2025
@mback2k
Copy link
Member

mback2k commented Mar 6, 2025

@mback2k Just to clarify, are you also okay to drop the clearlocks() function

from servers.pm (making the call to handle*.exe), or is it still needed

in some envs?

Meanwhile tested without -rm in CI, and it builds OK.

Maybe keep the code around for a little bit longer until the CI runs have proven to be stable without it for some weeks?

@vszakats
Copy link
Member Author

vszakats commented Mar 6, 2025

@mback2k Just to clarify, are you also okay to drop the clearlocks() function
from servers.pm (making the call to handle*.exe), or is it still needed
in some envs?
Meanwhile tested without -rm in CI, and it builds OK.

Maybe keep the code around for a little bit longer until the CI runs have proven to be stable without it for some weeks?

It works for me, I'll merge the CI patch separately now. And keep this
one around for a while.

I think the main issue is that even though handle runs exactly once for
every MSYS2 and MSVC CI job, it never finds anything to kill. I've added
verbose logs to follow this, and tested it extensively a while ago. Couldn't
figure out why that happens. The code, filters all seem correct.

vszakats added a commit that referenced this pull request Mar 6, 2025
To test its effect on stability/flakiness of Windows jobs.

Ref: #16484 (comment)
Cherry-picked from #16484
Closes #16600
vszakats added 11 commits March 13, 2025 17:17
This reverts commit e5e5c09.

Some of the logic don't seem to be Windows-specific or handle64-related,
as most of the CI test fail to complete in time (hang?) when removing it:
https://github.com/curl/curl/actions/runs/13527054998/job/37800227765?pr=16484 (macOS)
https://github.com/curl/curl/actions/runs/13527054994/job/37800178091?pr=16484 (BSDs)
https://github.com/curl/curl/actions/runs/13527054982/job/37800212082?pr=16484 (Windows)
```
CUSTOMBUILD : error : 169: cleardir(log/8/lock) failed [D:\a\curl\curl\bld\tests\test-ci.vcxproj]
```
https://github.com/curl/curl/actions/runs/13546192228/job/37858323380?pr=16484#step:14:177

Seems to happen consistently in every vcpkg job, once, around the beginning.

```
CUSTOMBUILD : error : 323: cleardir(log/8/lock) failed [D:\a\curl\curl\bld\tests\test-ci.vcxproj]
CUSTOMBUILD : error : 1176: cleardir(log/8/lock) failed [D:\a\curl\curl\bld\tests\test-ci.vcxproj]
CUSTOMBUILD : error : 752: cleardir(log/8/lock) failed [D:\a\curl\curl\bld\tests\test-ci.vcxproj]
```
@vszakats vszakats added tidy-up and removed CI Continuous Integration tidy-up labels Mar 13, 2025
@vszakats
Copy link
Member Author

vszakats commented Mar 19, 2025

I could observe no effect of disabling handle almost 2 weeks ago.

Windows CI looks just like before, with mostly:

  1. greyed out test run step
  2. 2304 errors
  3. pipe errors
  4. the occasional flaky test, mostly just a single test

1., 2., 3. are sometimes combined and/or the result of hangs.

This is about the same as we observed always. handle doesn't seem to play a role, and when run, it was never seen to kill a job. The condition keeps triggering once for every test run, at the beginning, sometimes right in the beginning (maybe always, hard to say from the logs).

Planning to merge this today.

@vszakats vszakats closed this in 8409cc2 Mar 19, 2025
@vszakats vszakats deleted the handle64 branch March 19, 2025 17:51
pps83 pushed a commit to pps83/curl that referenced this pull request Apr 26, 2025
To test its effect on stability/flakiness of Windows jobs.

Ref: curl#16484 (comment)
Cherry-picked from curl#16484
Closes curl#16600
pps83 pushed a commit to pps83/curl that referenced this pull request Apr 26, 2025
In the cases observed throughout the last year, `handle64` run once per
test run, but with no action (match or task kill). It did not help with
flakiness and seems redundant.

runtests launched it (if present) in Cygwin/MSYS jobs too, where it
probably shouldn't have, because we have seen no flakiness there. In CI
the tool was present and launched in MSYS2 jobs, but not in Cygwin.

After this patch the "clearlocks" warning remain in the log. They are
consistently appearing once in every MSVC CI log, early in the tests:
```
  test 3207 SKIPPED: curl lacks OpenSSL support
[...START-OF-TESTS...]
  test 0003...[HTTP POST with auth and contents but with content-length set to 0]
  --pd---e--- OK (3   out of 1596, remaining: 17:50, took 1.423s, duration: 00:02)
  test 0007...[HTTP with cookie parser and header recording]
  --pd--oe--- OK (7   out of 1596, remaining: 07:51, took 1.485s, duration: 00:02)
  test 0006...[HTTP with simple cookie send]
  --pd---e--- OK (6   out of 1596, remaining: 09:11, took 1.488s, duration: 00:02)
  test 0005...[HTTP over proxy]
  --pd---e--- OK (5   out of 1596, remaining: 11:03, took 1.491s, duration: 00:02)
CUSTOMBUILD : error : 169: cleardir(log/8/lock) failed [D:\a\curl\curl\bld\tests\test-ci.vcxproj]
  test 0001...[HTTP GET]
  --pd---e--- OK (1   out of 1596, remaining: 55:34, took 1.466s, duration: 00:02)
  test 0004...[Replaced internal and added custom HTTP headers]
```
Ref: https://github.com/curl/curl/actions/runs/13546192228/job/37858323380?pr=16484#step:14:167

Ref: e53523f curl#14859
Ref: 311c31e curl#6179
Follow-up to 3a8920e curl#16600
Closes curl#16484
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Windows Windows-specific
Development

Successfully merging this pull request may close these issues.

4 participants