Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[icu] Enable parallel builds #6695

Merged
merged 30 commits into from
Jun 12, 2019
Merged

[icu] Enable parallel builds #6695

merged 30 commits into from
Jun 12, 2019

Conversation

cbezault
Copy link
Contributor

@cbezault cbezault commented May 30, 2019

Add a VCPKG_CONCURRENCY variable for use in portfiles.
Use VCPKG_CONCURRENCY in icu as a parameter to make.

@cbezault cbezault added the info:internal This PR or Issue was filed by the vcpkg team. label May 30, 2019

int System::get_num_logical_cores()
{
int num_cores = std::thread::hardware_concurrency();
Copy link
Contributor

@Neumann-A Neumann-A May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Thats capped at 64 on windows or even lower (1 process group). To get the real number of cores you have to iterate over all groups and ask for the number of cores. The problem with that is it might not be the number asigned to the job task in an HPC env. For that you have to read in the typical ENV variables which contains the assigned cores.

Copy link
Contributor Author

@cbezault cbezault May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, you learn something new every day. I'm more of a Linux guy so I had never heard of processor groups. I also had not considered the HPC use case where you queue jobs that are assigned a limited amount of resources. I'd guess that checking the cpu affinities before the number of cores on the machine should be sufficient. Though on Linux I would hope people are using cgroups. I wonder if the get_hardware_concurrency function returns the number of logical cores in your cgroup.

Edit:
How does batching software on windows limit the number of cpus a process and its children have access to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem with std::thread::hardware_concurrency() is that it is only a hint. So it probably does not always return what you want.

How does batching software on windows limit the number of cpus a process and its children have access to?

One process will always only spawn threads within its own group unless the process itself moves newly-created threads to a different group by explicitly assigning affinities. This makes the process a multi group process (Fun Fact: Looking at the affinities of such a multi group process in the task manager crashes the task manger). There is no external way of doing it and there is no way of telling in which group windows decides to spawn a new process (which is problematic if your HPC assigned resources span two groups). Furthermore groups might even be asymmetrical in core count if requested.
That is why in my simulation program I read in the env variable CCP_COREIDS (HPC_PACK) and assign each thread to one COREID by hand because threads on windows are just broken ;)

I wonder if the get_hardware_concurrency function returns the number of logical cores in your cgroup.

Cannot answer that, never tried something like that because our nodes run windows. Maybe @cenit knows?

Copy link
Contributor Author

@cbezault cbezault May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at documentation for process groups it appears windows makes guarantees about the topology of logical cores in a process groups (that they're as physically/logically close as possible and/or respecting some Numa group settings) and that windows will minimize the number of process groups. What happens if the number of cores you are supposed to be allocated is less than the number in a process group?

Either way I was hoping that I wouldn't have to implement a per platform solution but it's looking like I might have to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Neumann-A @ras0219-msft What do you guys think of adding two environment variables/command line options that override the automated cpu detection. VCPKG_MAX_LOGICAL_PROCESSOR_COUNT which sets an upper bound on the number of logical processors used and VCPKG_LOGICAL_PROCESSOR_COUNT which overrides the number of processors automatically detected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbezault: Why two? Is one not enough? If VCPKG_LOGICAL_PROCESSOR_COUNT is not set you could use the automated detection else use whatever the triplet says. (probably requires usage of get_triplet_environment.cmake)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the single variable is probably fine, I was just thinking of the corner case where you still want some automated detection for whatever reason but want to cap it. At that point though if you know what a reasonable cap is you probably also know what you want the parallelism to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a VCPKG_MAX_CONCURRENCY setting makes sense. If not set, we will use best-effort heuristics (for now, probably just hardware_concurrency(), but in the future we could definitely do platform-specifics). If set, we just use that setting directly.

This enables both "fixing" our heuristics upwards when they're wrong (due to complex machine architectures) as well as downwards when there are other constraints we aren't considering (RAM limits, system interactivity, other concurrent processes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a good heuristic to determine if I want this to be a command line setting or an environment variable?

@cbezault cbezault changed the title [fmilib]][icu] Enable parallel builds [fmilib][icu] Enable parallel builds May 31, 2019
@cbezault cbezault self-assigned this Jun 4, 2019
@cbezault cbezault changed the title [fmilib][icu] Enable parallel builds [icu] Enable parallel builds Jun 5, 2019
scripts/cmake/vcpkg_build_cmake.cmake Outdated Show resolved Hide resolved
scripts/cmake/vcpkg_execute_build_process.cmake Outdated Show resolved Hide resolved
toolsrc/src/vcpkg/build.cpp Show resolved Hide resolved
cbezault referenced this pull request Jun 10, 2019
…in) from ports since they are broken and usually break non-win32 ports
@cbezault cbezault requested a review from Rastaban June 10, 2019 18:39
scripts/cmake/vcpkg_execute_build_process.cmake Outdated Show resolved Hide resolved
ports/icu/portfile.cmake Outdated Show resolved Hide resolved
ports/icu/portfile.cmake Outdated Show resolved Hide resolved
endif()
endif()
elseif(out_contents MATCHES "mt : general error c101008d: " OR out_contents MATCHES "mt.exe : general error c101008d: ")
# Antivirus workaround - occasionally files are locked and cause mt.exe to fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably print a message here to inform the user that deactivating antivirus for the vcpkg buildtree folder could speed up the build if he encounters this error

@cbezault cbezault merged commit b7d6160 into microsoft:master Jun 12, 2019
strega-nil pushed a commit to strega-nil/vcpkg that referenced this pull request May 5, 2021
* Add VCPKG_NUM_LOGICAL_CORES

* break out logic that retries running a command several times into its own function

* Parallelize icu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info:internal This PR or Issue was filed by the vcpkg team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants