Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ConditionalTasks make the HLT jobs take minutes to start #38725

Closed
fwyzard opened this issue Jul 13, 2022 · 10 comments · Fixed by #38730
Closed

Using ConditionalTasks make the HLT jobs take minutes to start #38725

fwyzard opened this issue Jul 13, 2022 · 10 comments · Fixed by #38730

Comments

@fwyzard
Copy link
Contributor

fwyzard commented Jul 13, 2022

While testing the latest HLT menu /online/collisions/2022/2e34/v1.2/HLT/V5 with CMSSW_12_4_2 and the 124X_dataRun3_HLT_v4 global tag, I noticed that it takes a few minutes longer than usual to start.
Adding the Tracer service shows:

13-Jul-2022 16:16:40.50 CEST  ++++ finished: constructing module with label 'hltOutputPhysicsZeroBias1' id = 5619
13-Jul-2022 16:16:40.50 CEST  ++++ starting: constructing module with label 'hltOutputPhysicsZeroBias2' id = 5620
13-Jul-2022 16:16:40.50 CEST  ++++ finished: constructing module with label 'hltOutputPhysicsZeroBias2' id = 5620
13-Jul-2022 16:16:40.50 CEST  ++++ starting: constructing module with label 'hltOutputPhysicsZeroBias3' id = 5621
13-Jul-2022 16:16:40.50 CEST  ++++ finished: constructing module with label 'hltOutputPhysicsZeroBias3' id = 5621
# the process seems to hang here for a few minutes
13-Jul-2022 16:19:56.15 CEST  ++ preallocate: 1 concurrent runs, 2 concurrent luminosity sections, 32 streams
13-Jul-2022 16:19:56.20 CEST  ++++ starting: destructing module with label 'hltPixelTracksTrackingRegions' id = 876
13-Jul-2022 16:19:56.20 CEST  ++++ finished: destructing module with label 'hltPixelTracksTrackingRegions' id = 876
13-Jul-2022 16:19:56.20 CEST  ++++ starting: destructing module with label 'hltSiPixelClustersCache' id = 859
13-Jul-2022 16:19:56.20 CEST  ++++ finished: destructing module with label 'hltSiPixelClustersCache' id = 859
13-Jul-2022 16:19:56.20 CEST  ++++ starting: destructing module with label 'hltSiPixelClustersLegacy' id = 857

Following a suggestion by @missirol , I replace all instances of ConditionalTask with a Task.
This seems to solve the long startup time:

13-Jul-2022 16:13:14.34 CEST  ++++ finished: constructing module with label 'hltSiPixelRecHitsSoA' id = 5619
13-Jul-2022 16:13:14.34 CEST  ++++ starting: constructing module with label 'hltSiPixelRecHitsSoAFromGPU' id = 5620
13-Jul-2022 16:13:14.34 CEST  ++++ finished: constructing module with label 'hltSiPixelRecHitsSoAFromGPU' id = 5620
13-Jul-2022 16:13:14.34 CEST  ++++ starting: constructing module with label 'hltTrimmedPixelVertices' id = 5621
13-Jul-2022 16:13:14.34 CEST  ++++ finished: constructing module with label 'hltTrimmedPixelVertices' id = 5621
# this step now takes just a few seconds
13-Jul-2022 16:13:20.14 CEST  ++ preallocate: 1 concurrent runs, 2 concurrent luminosity sections, 32 streams
13-Jul-2022 16:13:20.18 CEST  ++++ starting: destructing module with label 'hltPixelTracksTrackingRegions' id = 5599
13-Jul-2022 16:13:20.18 CEST  ++++ finished: destructing module with label 'hltPixelTracksTrackingRegions' id = 5599
13-Jul-2022 16:13:20.18 CEST  ++++ starting: destructing module with label 'hltSiPixelClustersCache' id = 5606
13-Jul-2022 16:13:20.18 CEST  ++++ finished: destructing module with label 'hltSiPixelClustersCache' id = 5606
13-Jul-2022 16:13:20.18 CEST  ++++ starting: destructing module with label 'hltSiPixelClustersLegacy' id = 5609

The problem can be reproduced using the configuration files at

and the input file at

with

# set up a local area for CMSSW_12_4_2
cmsrel CMSSW_12_4_2
cd CMSSW_12_4_2
mkdir run
cd run
cmsenv

# download the configurations and input file
wget https://fwyzard.web.cern.ch/issue38725/hlt_ConditionalTask.py https://fwyzard.web.cern.ch/issue38725/hlt_Task.py https://fwyzard.web.cern.ch/issue38725/run355444_ls0003_index000000.raw

# run using ConditionalTasks
rm -rf run355444; mkdir run355444; touch run355444/fu.lock
cmsRun hlt_ConditionalTask.py

# for comparison, run using Tasks
rm -rf run355444; mkdir run355444; touch run355444/fu.lock
cmsRun hlt_ConditionalTask.py
@cmsbuild
Copy link
Contributor

A new Issue was created by @fwyzard Andrea Bocci.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Dr15Jones
Copy link
Contributor

@fwyzard could you dump your configuration somewhere accessible? That would help in profiling.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 13, 2022

@Dr15Jones sure, I've just updated the description of the issue with the links to the configuration and input files

@makortel
Copy link
Contributor

makortel commented Jul 13, 2022

Profiling showed that the slowdown was caused by calling some semi-heavy functions repeatedly, and with the complexity of the HLT menu then blowed up. #38730 provides a fix that reduces the the startup time close close to the job that used Tasks instead of ConditionalTask. I'll make a backport to 12_4_X as well.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 13, 2022

I can confirm that after cherry-picking #38730, the HLT menu using the ConditionalTasks is almost as fast as the one using the Tasks was:

13-Jul-2022 23:18:16.35 CEST  ++++ finished: constructing module with label 'hltOutputPhysicsZeroBias1' id = 5619
13-Jul-2022 23:18:16.35 CEST  ++++ starting: constructing module with label 'hltOutputPhysicsZeroBias2' id = 5620
13-Jul-2022 23:18:16.35 CEST  ++++ finished: constructing module with label 'hltOutputPhysicsZeroBias2' id = 5620
13-Jul-2022 23:18:16.35 CEST  ++++ starting: constructing module with label 'hltOutputPhysicsZeroBias3' id = 5621
13-Jul-2022 23:18:16.36 CEST  ++++ finished: constructing module with label 'hltOutputPhysicsZeroBias3' id = 5621
# this step takes again just a few seconds !
13-Jul-2022 23:18:23.65 CEST  ++ preallocate: 1 concurrent runs, 2 concurrent luminosity sections, 32 streams
13-Jul-2022 23:18:23.69 CEST  ++++ starting: destructing module with label 'hltPixelTracksTrackingRegions' id = 857
13-Jul-2022 23:18:23.69 CEST  ++++ finished: destructing module with label 'hltPixelTracksTrackingRegions' id = 857
13-Jul-2022 23:18:23.69 CEST  ++++ starting: destructing module with label 'hltSiPixelClustersCache' id = 852
13-Jul-2022 23:18:23.69 CEST  ++++ finished: destructing module with label 'hltSiPixelClustersCache' id = 852
13-Jul-2022 23:18:23.69 CEST  ++++ starting: destructing module with label 'hltSiPixelClustersLegacy' id = 851

@makortel
Copy link
Contributor

Thanks @fwyzard for the addtional test!

@makortel
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants