Update the spec parameters (MaxMemory, etc) for running workflow. #8622

ticoann · 2018-05-18T04:21:07Z

As discussed with Alan,

add the time stamp in couch db reqmgr when spec file is updated.
add new table with workflow id and updated timestamp.
Job updater compares the time stamp in wmbs (table above) and reqmgr2 couchdb.
if reqmgr2 couchdb record is newer, update the specs in the disk (JobCache, sandbox),
update the wmbs table.

amaltaro · 2018-06-14T20:17:39Z

See #8646 for further details

amaltaro · 2019-01-30T20:31:26Z

Given that we initially thought about these changes more on the resource requirements land (can be then extended to site lists and etc), it would be interesting to know how Unified does the workflow/job tweak in order to better use grid resources.

@vlimant can you give us a brief explanation on how it's done in Unified (live resource requirements update)? Which services/API is used and whether all workflows are under this monitoring? Or only what is configured for?
Trying to evaluate how much we'd gain by implementing this in WMCore...

amaltaro · 2019-02-05T08:55:51Z

@vlimant I'm planning to work on this ticket in the coming weeks. Unless you think Unified mechanism is good enough and we don't need it. So your input and answer to the questions I asked above would be highly appreciated.

vlimant · 2019-02-05T09:16:19Z

@amaltaro there are several other candidates for unified integration already (#8914, #8921, #8920, #8324, ...) ; I believe those are the ones we put together as first thing of the integration.

The mechanism for classad tweaking in unified is all in https://github.com/CMSCompOps/WmAgentScripts/blob/master/Unified/equalizor.py and is depending on gwmsmon (although everything can be retrieved from ES directly). It will likely require further documentation of what is done exactly

amaltaro · 2019-02-05T09:30:27Z

Ok, none of the issues you pointed out are straight forward. But eventually we have to get them started...
If you can get this equalizor properly documented, it will be certainly helpful in the near future.

sharad1126 · 2020-02-05T09:06:51Z

@amaltaro , According to a small discussion with James today morning, these computations(memory tuning) are a little expensive and increase the loads on the schedd. He mentioned that @todor-ivanov tried implementing something like this in CRAB3 schedds which made the schedds slower. So the best place could be to implement this directly at the condor level(probably in a schedd attached to negotiator) and can be a feature request to the condor developers. May be we can ask about this in the next condor developers meeting. @dpiparo FYI

bbockelm · 2020-02-05T11:50:52Z

@sharad1126 - I’m not sure that comment makes much sense. Without knowing the exact thing Alan is planning, it could be almost no load - or very expensive.

In fact, if done right, this could be much more efficient than the current system because one could affect all idle jobs in a single transaction instead of doing it one-by-one (like Unified does today).

sharad1126 · 2020-02-07T11:28:05Z

@bbockelm I discussed about this with @amaltaro and then I discussed this with James Letts and James told me what I exactly mentioned in the above comment. Of course it is a good idea to get this done which would help us making the system more efficient.

amaltaro · 2020-02-07T12:25:22Z

What I have in mind is actually an update of the workflow spec file, such that jobs still waiting in the global workqueue (or waiting for the agent job splitting) could use the up-to-date parameters, thus stopping the usage of JobRouter.

In the next phase of this tunning, we could also update jobs pending in the local condor queue (basically the same process as done for RequestPriority/JobPrio).

I believe those 2 approaches are not tightly coupled and can be delivered in different stages.

ticoann added the Medium Priority label May 18, 2018

ticoann added this to the WMAgent1806 milestone May 18, 2018

ticoann self-assigned this May 18, 2018

amaltaro added the New Feature label Jun 14, 2018

amaltaro mentioned this issue Jun 14, 2018

Create new API for job requirements update on the fly #8646

Closed

amaltaro modified the milestones: WMAgent1806, WMAgent1807 Jun 14, 2018

ticoann modified the milestones: WMAgent1807, WMAgent1809 Aug 28, 2018

amaltaro modified the milestones: WMAgent1809, WMAgent1810 Oct 1, 2018

amaltaro modified the milestones: WMAgent1810, WMAgent1902 Jan 7, 2019

amaltaro added the ReqMgr2 label Jan 17, 2019

amaltaro modified the milestones: WMAgent1902, WMAgent1905 Feb 5, 2019

todor-ivanov mentioned this issue Feb 25, 2020

Intorduce Floating/adjustable margins for MaxPSS #9545

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the spec parameters (MaxMemory, etc) for running workflow. #8622

Update the spec parameters (MaxMemory, etc) for running workflow. #8622

ticoann commented May 18, 2018

amaltaro commented Jun 14, 2018

amaltaro commented Jan 30, 2019

amaltaro commented Feb 5, 2019

vlimant commented Feb 5, 2019

amaltaro commented Feb 5, 2019

sharad1126 commented Feb 5, 2020 •

edited

Loading

bbockelm commented Feb 5, 2020

sharad1126 commented Feb 7, 2020

amaltaro commented Feb 7, 2020

Update the spec parameters (MaxMemory, etc) for running workflow. #8622

Update the spec parameters (MaxMemory, etc) for running workflow. #8622

Comments

ticoann commented May 18, 2018

amaltaro commented Jun 14, 2018

amaltaro commented Jan 30, 2019

amaltaro commented Feb 5, 2019

vlimant commented Feb 5, 2019

amaltaro commented Feb 5, 2019

sharad1126 commented Feb 5, 2020 • edited Loading

bbockelm commented Feb 5, 2020

sharad1126 commented Feb 7, 2020

amaltaro commented Feb 7, 2020

sharad1126 commented Feb 5, 2020 •

edited

Loading