Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test3026: try to reduce runtime within legacy mingw build environments #9412

Closed
wants to merge 3 commits into from

Conversation

mback2k
Copy link
Member

@mback2k mback2k commented Sep 1, 2022

Right now the test seems to take between 15 and 20 minutes
on our Windows CI environments. Which is quite extreme for
just 100 threads, but anyway 42 looks like a better number.

@mback2k mback2k self-assigned this Sep 1, 2022
@mback2k mback2k requested a review from bagder Sep 1, 2022
@mback2k mback2k added the tests label Sep 1, 2022
@jay
Copy link
Member

jay commented Sep 2, 2022

Right now the test seems to take between 15 and 20 minutes
on our Windows CI environments.

The test should not even take 1 minute, why does it take 15?

@bagder
Copy link
Member

bagder commented Sep 2, 2022

If a 100 threads take 20 minutes, won't 42 threads then still take a rather excessive 8.4 minutes?

Maybe the test should rather be changed to have a max time for which it can execute?

@mback2k
Copy link
Member Author

mback2k commented Sep 2, 2022

@jay not sure, on my machine it takes less than a minute. I think the CI VMs have trouble with great parallelism due to limited number of cores.

@bagder I wasn't sure if the issue scaled in a linear fashion. I thought let's try 42 and see what happens. As stated above, I think the main problem is the thread pressure on our CI hosts. Locally it works fast and fine as expected.

@mback2k
Copy link
Member Author

mback2k commented Sep 3, 2022

As I guessed it does not seem to be a linear issue.

Before / without this PR: https://ci.appveyor.com/project/curlorg/curl/builds/44661247/job/fq5hqnkj4bjr90p5#L4987

test 3026...[curl_global_init thread-safety]
-------e--- OK (1510 out of 1510, remaining: 00:00, took 683.379s, duration: 31:14)

After / with this PR: https://ci.appveyor.com/project/curlorg/curl/builds/44653968/job/hk9etjq6tuwo0ca7#L4988

test 3026...[curl_global_init thread-safety]
-------e--- OK (1510 out of 1510, remaining: 00:00, took 87.706s, duration: 19:43

Overall this seems to cut quite some time from the AppVeyor CI runs which run tests.

Before vs. After cmake builds (top 4 MSVC, bottom 4 MSYS/MinGW):
image image

Before vs. After autotools builds (3 MSYS/MinGW and 1 CygWin):
image image

Seems like this is primarily affecting the 4 cmake-based MSYS/MinGW in the middle.

@mback2k
Copy link
Member Author

mback2k commented Sep 3, 2022

To sum up, only builds using the following compilers on AppVeyor are affected by the slowness after all:

  • C:\mingw-w64\x86_64-8.1.0-posix-seh-rt_v6-rev0\mingw64\bin
  • C:\mingw-w64\x86_64-7.2.0-posix-seh-rt_v5-rev1\mingw64\bin
  • C:\mingw-w64\i686-6.3.0-posix-dwarf-rt_v5-rev1\mingw32\bin
  • C:\MinGW\bin

So I guess it is their threading model which is causing the trouble. @MarcelRaad any suggestions?

Edit: confirmed also by Azure CI running classic MinGW and similar variants:

@bagder
Copy link
Member

bagder commented Sep 29, 2022

This is still set a draft but wants a review. Isn't that contradictory?

@mback2k mback2k removed the request for review from bagder Sep 29, 2022
@mback2k mback2k force-pushed the speedup-test3026 branch 2 times, most recently from 74c6442 to c583bb1 Compare Oct 4, 2022
@mback2k
Copy link
Member Author

mback2k commented Oct 8, 2022

I will need to reproduce this locally and figure out a better way to shorten the runtime for legacy MinGW builds.

@mback2k
Copy link
Member Author

mback2k commented Oct 11, 2022

Update: I was able to reproduce this locally with classic/legacy MinGW. Next up is identifying the root cause / finding workaround.

@mback2k
Copy link
Member Author

mback2k commented Oct 11, 2022

Root cause: the slow part is loading of iphlpapi.dll (or any other library) as part of Curl_win32_init during curl_global_init.

@bagder
Copy link
Member

bagder commented Nov 14, 2022

Should we just close this?

@jay
Copy link
Member

jay commented Nov 14, 2022

Root cause: the slow part is loading of iphlpapi.dll (or any other library) as part of Curl_win32_init during curl_global_init.

I pushed a change to your branch that calls loadlibrary on secur32 and iphlpapi in the main thread. This way each time curl_global_init/cleanup is called from the worker threads and it calls loadlibrary/freelibrary on those two libraries it should just increase/decrease their refcount. This may or may not work depending on if initialization of the library is taking a long time, and not the actual loading.

@mback2k
Copy link
Member Author

mback2k commented Nov 14, 2022

@jay thanks, it is definitely the loading part according to my tests. But even reversing the binaries and running them outside of msys-shells (v1 or v2) did not help me identify the difference between the various fast (mingw-w64 from msys2) and slow (any mingw on msys1) build variants.

@mback2k
Copy link
Member Author

mback2k commented Nov 14, 2022

@jay to resolve the merge conflict which is blocking Azure Pipelines and AppVeyor you may delete my remaining commit.

@mback2k mback2k changed the title test3026: reduce number of threads to reduce total runtime test3026: try to reduce test runtime within legacy mingw build environments Nov 14, 2022
@mback2k mback2k changed the title test3026: try to reduce test runtime within legacy mingw build environments test3026: try to reduce runtime within legacy mingw build environments Nov 14, 2022
- Load Windows system libraries secur32 and iphlpapi beforehand, so
  that libcurl's repeated global init/cleanup only increases/decreases
  the library's refcount rather than causing it to load/unload.

Assisted-by: Marc Hoersken

Closes #xxxx
@jay
Copy link
Member

jay commented Nov 17, 2022

job windows msys v1_mingw32 now shows:

2022-11-16T07:40:24.7492812Z test 3026...[curl_global_init thread-safety]
2022-11-16T07:40:25.0123541Z -------e--- OK (1526 out of 1530, remaining: 00:02, took 0.268s, duration: 17:07)

Does that mean this is fixed or should I be checking a different job?

the code is already in a WIN32 section
@jay jay removed the help wanted label Nov 17, 2022
@mback2k
Copy link
Member Author

mback2k commented Nov 17, 2022

Thanks a lot @jay. Yeah, all of the msys v1 jobs experienced the slowness issue, so if one is fixed, all of them should be.

Will you take it completely from here?

@jay
Copy link
Member

jay commented Nov 17, 2022

Will you take it completely from here?

Yup np. I'm going to wait on the CI for this last fixup to complete then I will land it.

@mback2k
Copy link
Member Author

mback2k commented Nov 17, 2022

Sounds good, thanks again.

@mback2k
Copy link
Member Author

mback2k commented Nov 17, 2022

CI looks good to me. I think this is ready for merge.

@mback2k mback2k marked this pull request as ready for review Nov 17, 2022
@jay jay closed this in 856b133 Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants