Setup of DIRACOS does not restrict cpu cores #189

maxfischer2781 · 2023-05-31T15:16:23Z

When the Pilot creates its DIRACOS environment, it directly calls the DIRACOS-Linux-machine.sh which eventually invokes mamba. Mamba assumes that it can use all cores of the machine (see mamba-org/mamba#2463), which isn't realistic for Pilot environments. This leads to excessive process creation, which can negatively affect the pilot, user or even entire compute resource.

As far as I can tell, the templates from which DIRACOS is generated do no provide a feasible way to limit this internally. The Pilot thus seems like the best place, seeing how it is aware of resource restrictions.
A solution would be to set MAMBA_EXTRACT_THREADS when installing DIRACOS, either to a conservative 1 or pp.maxNumberOfProcessors.

For reference of scale, we caught this on a WLCG Tier 1 WN with 256 cores that got allocated mostly to one VO. Each of the single core pilots tried to use 256 child processes; each pilot quickly ground to a halt due to resource and fork bomb protection, which caused each new pilot to also immediately get stuck on nproc limits and similar safeguards.

The text was updated successfully, but these errors were encountered:

maxfischer2781 mentioned this issue May 31, 2023

Limit Mamba concurrency during installation #190

Merged

fstagni closed this as completed in #190 Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup of DIRACOS does not restrict cpu cores #189

Setup of DIRACOS does not restrict cpu cores #189

maxfischer2781 commented May 31, 2023 •

edited

Loading

Setup of DIRACOS does not restrict cpu cores #189

Setup of DIRACOS does not restrict cpu cores #189

Comments

maxfischer2781 commented May 31, 2023 • edited Loading

maxfischer2781 commented May 31, 2023 •

edited

Loading