Improve resource requirements for utilitarian jobs #8331

amaltaro · 2017-11-15T09:59:21Z

For cleanup, logcollect and merge jobs.
By default, they use 1 core, request 1GB of RAM and have a MaxRSS watchdog set to ~2.3GB.

We should check ES data and maybe lower these requirements for a better usage of the resources.

amaltaro · 2017-11-20T10:17:53Z

Might not be a very good idea... I've just found a merge job for the TaskChain_Relval_Multicore template that had a performance failure:

    PerformanceError
        PerformanceKill (Exit Code: 50660)

            Error in CMSSW step cmsRun1
            Number of Cores: None
            Job has exceeded maxRSS: 2355.2
            Job has RSS: 2425

hufnagel · 2017-11-20T14:11:12Z

Weird. Merge jobs should all use fast-copy of baskets which is fast and should use little memory. Might be worthwhile to get a log of that job and figure out what went wrong...

amaltaro · 2017-11-20T14:48:11Z

Are you volunteering yourself to look at it? :)

vlimant · 2018-02-13T08:57:32Z

#8451 maybe a duplicate

amaltaro · 2018-02-13T09:13:06Z

You mean the other way around :)

vlimant · 2018-02-13T09:39:58Z

from #8451 make sure you update also what goes in htcondor when you rework this

thongonary · 2018-03-02T20:03:27Z

So... how straightforward it is to increase the threshold to some higher value, say, 4GB?

hufnagel · 2018-03-02T20:53:02Z

You don't want to do this for all such jobs. Requesting 4GB for standard merge, cleanup, logcollect etc jobs means you have less resources that can run them (you wait longer to run them and can run less of them) and you leave less resources available for other jobs.

If special types of utility jobs (i.e. NANOAOD merges that aren't really standard merges) need more memory, we should request more memory just for these special types of jobs.

Cleanup and LogCollect could probably be reduced though.

thongonary · 2018-03-02T22:02:11Z

If special types of utility jobs (i.e. NANOAOD merges that aren't really standard merges) need more memory, we should request more memory just for these special types of jobs.

Thanks! That's what we want.

amaltaro · 2018-03-03T07:13:07Z

For the record, several of the Task getters/setters methods don't touch "utilitarian" jobs. Right now we cannot change resource requirements for such jobs and if we want to support updates to those tasks too, that's going to be tricky and likely ugly for the assigner/unified side (the only way I see memory updates working without causing issues in other tasks would be specificifying every single tasks and its Memory requirement).

bbockelm · 2018-03-04T07:45:11Z

Hi,

Note that the NanoAOD merge issues are really a ROOT bug -- and affect how well these files can be effectively read by users. See:

cms-sw/cmssw#22445

For the other merge jobs - are we really seeing memory limits, or are we simply snapshotting cmsRun when it forks? The watchdog should be using PSS, not RSS, in the end.

Brian

amaltaro · 2019-01-18T13:31:38Z

I suggest we first update the watchdog to PSS instead of RSS. Then we collect data for a couple of months and set those utilitarian jobs with a reasonable resources requirement in order to minimize the resource wastage.

amaltaro self-assigned this Nov 15, 2017

amaltaro added this to the WMAgent1801 milestone Nov 15, 2017

ticoann modified the milestones: WMAgent1801, WMAgent1802 Feb 12, 2018

amaltaro mentioned this issue Feb 13, 2018

Merge job requirements #8451

Closed

vlimant added the Medium Priority label Feb 13, 2018

ticoann modified the milestones: WMAgent1802, WMAgent1803 Feb 27, 2018

amaltaro added the Further Discussion label Feb 27, 2018

ticoann modified the milestones: WMAgent1803, WMAgent1804 Feb 27, 2018

ticoann modified the milestones: WMAgent1804, WMAgent1805 Apr 10, 2018

ticoann modified the milestones: WMAgent1805, WMAgent1809 Aug 28, 2018

ticoann modified the milestones: WMAgent1809, WMAgent1904 Dec 26, 2018

amaltaro added the WMAgent label Jan 18, 2019

amaltaro added the Enhancement label Jan 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve resource requirements for utilitarian jobs #8331

Improve resource requirements for utilitarian jobs #8331

amaltaro commented Nov 15, 2017

amaltaro commented Nov 20, 2017

hufnagel commented Nov 20, 2017

amaltaro commented Nov 20, 2017

vlimant commented Feb 13, 2018

amaltaro commented Feb 13, 2018

vlimant commented Feb 13, 2018

thongonary commented Mar 2, 2018

hufnagel commented Mar 2, 2018

thongonary commented Mar 2, 2018

amaltaro commented Mar 3, 2018

bbockelm commented Mar 4, 2018

amaltaro commented Jan 18, 2019

Improve resource requirements for utilitarian jobs #8331

Improve resource requirements for utilitarian jobs #8331

Comments

amaltaro commented Nov 15, 2017

amaltaro commented Nov 20, 2017

hufnagel commented Nov 20, 2017

amaltaro commented Nov 20, 2017

vlimant commented Feb 13, 2018

amaltaro commented Feb 13, 2018

vlimant commented Feb 13, 2018

thongonary commented Mar 2, 2018

hufnagel commented Mar 2, 2018

thongonary commented Mar 2, 2018

amaltaro commented Mar 3, 2018

bbockelm commented Mar 4, 2018

amaltaro commented Jan 18, 2019