New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve resource requirements for utilitarian jobs #8331
Comments
Might not be a very good idea... I've just found a merge job for the TaskChain_Relval_Multicore template that had a performance failure:
|
Weird. Merge jobs should all use fast-copy of baskets which is fast and should use little memory. Might be worthwhile to get a log of that job and figure out what went wrong... |
Are you volunteering yourself to look at it? :) |
#8451 maybe a duplicate |
You mean the other way around :) |
from #8451 make sure you update also what goes in htcondor when you rework this |
So... how straightforward it is to increase the threshold to some higher value, say, 4GB? |
You don't want to do this for all such jobs. Requesting 4GB for standard merge, cleanup, logcollect etc jobs means you have less resources that can run them (you wait longer to run them and can run less of them) and you leave less resources available for other jobs. If special types of utility jobs (i.e. NANOAOD merges that aren't really standard merges) need more memory, we should request more memory just for these special types of jobs. Cleanup and LogCollect could probably be reduced though. |
Thanks! That's what we want. |
For the record, several of the |
Hi, Note that the NanoAOD merge issues are really a ROOT bug -- and affect how well these files can be effectively read by users. See: For the other merge jobs - are we really seeing memory limits, or are we simply snapshotting Brian |
I suggest we first update the watchdog to PSS instead of RSS. Then we collect data for a couple of months and set those utilitarian jobs with a reasonable resources requirement in order to minimize the resource wastage. |
For cleanup, logcollect and merge jobs.
By default, they use 1 core, request 1GB of RAM and have a MaxRSS watchdog set to ~2.3GB.
We should check ES data and maybe lower these requirements for a better usage of the resources.
The text was updated successfully, but these errors were encountered: