Revisit GPUMemoryMB, CUDARuntime and CUDACapabilities at workflow and job matchmaking #11595

amaltaro · 2023-05-17T13:58:43Z

Impact of the new feature
WMAgent (but perhaps ReqMgr2 for multi-step workflows, aka StepChain)

Is your feature request related to a problem? Please describe.
This is related to adding GPU support to StepChain workflows, tracked in this ticket:
#10401

after discussing about scenarios and use case of multi-step GPU workflows, in this thread and a couple of messages after it as well, it was pointed out that the GPU job description and matchmaking should evolve with the latest GPU developments.

For the record, here is an example of how it can be currently configured:

{"GPUMemoryMB": 123, "CUDARuntime": "11.2", "CUDACapabilities": ["11.2", "11.4"]}

Describe the solution you'd like
This ticket requires 2 types of solutions, one at the workflow/job description level; the second at the job matchmaking.

Starting with the resource provisioning and job matchmaking, here are the required changes:

CUDARuntime: the pilot needs to support all of the CUDARuntimes requested by the job (i.e. job CUDARuntime is a subset of pilot CUDARuntime)
CUDACapabilities: the pilot needs to support the same or newer CUDACapabilities as requested by the job (i.e. min(job CUDACapabilities) <= max(pilot CUDACapabilities)
GPUMemoryMB: does not change. job GPUMemoryMB needs to be smaller or equal to pilot GPUMemoryMB.

For multi-step job workflows (StepChain and maybe PromptReco(?)), here is how a job needs to be described by the agent:
4) CUDARuntime: currently a python string with the version. It looks like it needs to be a comma separated string. TODO: Are we able to make a comma separated or list comparison in GlideinWMS and ensure that each element in one list is present in the other?
5) CUDACapabilities: currently a list of python strings. TODO: Given that the pilot/node only needs to have an equal or newer version, should we revisit it and make it a plain string with the version? Are we able to have such comparison in GlideinWMS?
6) GPUMemoryMB: has to be the max of all the steps

Describe alternatives you've considered
Further discussion with Core and GlideinWMS still required to finalize these requirements.

Additional context
Documentation: gpu-parameter-specification

and initial development made in this PR #10388

The text was updated successfully, but these errors were encountered:

mmascher · 2023-05-17T14:40:38Z

Hi Alan, for the "resource provisioning step", aka first matchmaking, we are only asking WMAgent to put requiresGPUs. Providing all the attributes is not feasible without a significant effort, and it brings little benefit IMHO.

For the job (second) matchmaking we can use any of the attributes here: https://monit-grafana.cern.ch/d/2qoPfS0Mz/cms-submission-infrastructure-gpus-monitor?orgId=11

My preference would be to add a Requirement in WMAgent. Of course we can support you in building that, IIRC there should be a way to test an expression locally. Let me do some tests.

amaltaro · 2023-05-17T15:13:28Z

Thank you for the prompt feedback, Marco!
For the resource provisioning, I think we can keep it as is and gather further experience with this setup.

For the job matchmaking, my concerns in terms of job classad are:
i) can we have set based tests (apparently for CUDARuntime)?
ii) how do we implement "greater or equal than" with version-like values? Their expected regex is linked in the "Additional context"

mmascher · 2023-05-17T15:34:52Z

Thank you for the prompt feedback, Marco! For the resource provisioning, I think we can keep it as is and gather further experience with this setup.

For the job matchmaking, my concerns in terms of job classad are:
i) can we have set based tests (apparently for CUDARuntime)?

Maybe we can use stringlistmember? It works like stringlistmember("A", "A,B,C") and checks if "A" is on the comma separated list "A,B,C". arguments can be classads.

ii) how do we implement "greater or equal than" with version-like values? Their expected regex is linked in the "Additional context"

Good question. Condor support the split function:

[mmascher@vocms0802 cmsgwms-frontend-configurations]$ condor_status -limit 1 -af GLIDEIN_SINGULARITY_BINDPATH
/storage,/lfs_roots,/cms,/etc/cvmfs/SITECONF,/ceph,/cvmfs/grid.cern.ch/etc/grid-security:/etc/grid-security
[mmascher@vocms0802 cmsgwms-frontend-configurations]$ condor_status -limit 1 -af 'split(GLIDEIN_SINGULARITY_BINDPATH)[2]'
/cms

And also int(). I fear we'll need to make an ad-hoc expression..

aperezca · 2023-05-17T17:04:26Z

Hi Alan, when you describe this:

For multi-step job workflows (StepChain and maybe PromptReco(?)), here is how a job needs to be described by the agent:
4) CUDARuntime: currently a python string with the version. It looks like it needs to be a comma separated string. TODO: Are we able to make a comma separated or list comparison in GlideinWMS and ensure that each element in one list is present in the other?
5) CUDACapabilities: currently a list of python strings. TODO: Given that the pilot/node only needs to have an equal or newer version, should we revisit it and make it a plain string with the version? Are we able to have such comparison in GlideinWMS?
6) GPUMemoryMB: has to be the max of all the steps

are you implying that more than one of the steps will be ready to use the GPUs? Is this the case we are trying to cover here?

amaltaro · 2023-05-17T17:31:08Z

That's correct, Antonio. It's common to have workflows using different CMSSW releases and those can be compiled with different CUDA support, hence we need to ensure that the job description and worker node can satisfy all of the different cmsRun requirements/capabilities.

belforte · 2023-05-18T08:41:54Z

please let CRAB people (esp. @novicecpp ) know in case this ends up in some non=backward compatible format specifications in current classAds (form string to list e.g.)

amaltaro · 2023-05-24T20:00:14Z

From further discussion in mattermost:

Looking into CUDACapabilities, IF glideinWMS is able to do the matchmaking based on "greater or equal than", then my understanding is that it does not really need to be a list of capabilitites. We simply need to use the smallest capability (version) and do the matchmaking based on that.
- a pseudo-code for a multi-step job would be: max(min(from step1), min(from step2), ...)
Andrea B. confirms that supporting multiple CUDARuntime within the same worker node is realistic. So, multi step jobs can indeed request multiple CUDARuntime during matchmaking.

@mmascher @mambelli Hi Marco and Marco, I wonder if you can suggest how we can support a more complex job matchmaking in glideinWMS? A summary from what has been discussed above is:

CUDACapabilities: we need to have a ">=" comparison of the requested vs provided resource.
CUDARuntime: for multi-step case, we need to ensure that every CUDARuntime version requested by the job is also supported by the resource/pilot.

Or please let us know if you would have any other suggestions that were not yet mentioned here.

amaltaro · 2023-06-12T11:52:14Z

@mmascher Marco, now that the requirements are more clear (see comment above), I wonder if you have any recommendations on how to define these in term of job classads?

amaltaro · 2023-08-03T17:12:40Z

Updating this issue with discussions that happened mostly in mattermost (SI and GPU Developments).

Regarding the 2 glideinWMS questions raised above, here is further information on how that can be accomplished:

CUDACapabilities: we need to have a ">=" comparison of the requested vs provided resource.

We can split each part of the version number and compare each of them (e.g. below for a version with major.medium.minor values):

int(split(v3,".")[0])<=int(split(v4,".")[0]) && int(split(v3,".")[1])<=int(split(v4,".")[1]) && int(split(v3,".") [2])<=int(split(v4,".")[2])' # for v3<=v4

and an example to test such expression can be seen in [1].
In addition, I was also asking about the format of the CUDA capabilities, which so far have been 2 decimal digits separated by a dot (\d.\d). Andrea B confirms that this is what we foresee as well (because if it changes, the classad expression would have to change as well).

For the second question:

CUDARuntime: for multi-step case, we need to ensure that every CUDARuntime version requested by the job is also supported by the resource/pilot.

Marco suggests to adapt stringListSubsetMatch classad, and an example can be seen in [2]. The only problem with this approach is that - confirmed with Jaime from HTCondor - this feature has been added in HTCondor 10.0.6, while condor version in the CERN and FNAL schedds is 10.0.1. It's not clear to me which part of the SI layer would have to be upgraded to support it (negotiator plus schedds?).

[1] from Marco M.

[mmascher@lxplus7103 ~]$ cat classad_file 
v1 = "1.1.1";
v2 = "2.1.1";
v3 = "1.1.2";
v4 = "1.1.12";
v5 = "1.12.1";
v6 = "1.1.0";
v7 = "1";
v8 = "2";
v9 = "1.32.7";
v10 = "2.0";
[mmascher@lxplus7103 ~]$ classad_eval -file classad_file 'int(split(v3,".")[0])<=int(split(v4,".")[0]) && int(split(v3,".")[1])<=int(split(v4,".")[1]) && int(split(v3,".")[2])<=int(split(v4,".")[2])' # for v3<=v4
[ v1 = "1.1.1"; v2 = "2.1.1"; v3 = "1.1.2"; v4 = "1.1.12"; v5 = "1.12.1"; v6 = "1.1.0"; v7 = "1"; v8 = "2"; v9 = "1.32.7"; v10 = "2.0" ]
true

[2]

$ classad_eval -file alan 'stringListSubsetMatch(list3,list4)'
[ list1 = "1.2.3,2.3.4"; list2 = "1.2.3,2.3.4,3.4.5"; list3 = { "1.2.3","2.3.4" }; list4 = { "1.2.3","2.3.4","3.4.5" } ]
error

With this information, I think the development of this feature can be resumed and in parallel we discuss possible upgrades of the SI infrastructure.

fwyzard · 2023-08-03T20:04:59Z

We can split each part of the version number and compare each of them (e.g. below for a version with major.medium.minor values):

The capabilities are always of the format a.b.
Can you just compute a * 10 + b and compare based on that ?

amaltaro · 2023-08-03T20:28:18Z

Can you just compute a * 10 + b and compare based on that ?

If CUDA capability is always in the format of \d.\d, then yes it would work.
But I feel like we should be prepared to have it in the format of \d+\.\d+, then it can fail with an example like 9.2 vs 8.13

fwyzard · 2023-08-03T22:37:24Z

All NVIDIA documentation about CUDA describes it as x.y.

It might reasonably becomes xx.y in the future, but I doubt the second field will get a second digit, because many macros are already defined x * 10 + y.

If it ever does change in a non-backward compatible way, we can always update the comparison code, no ?

klannon · 2023-08-21T19:20:58Z

@amaltaro Should this really be marked as "Waiting" if a PR was linked to this issue this past week? Seems more "in progress" to me.

amaltaro · 2023-08-21T20:53:58Z

@klannon development of this feature should be completed, besides further testing. Now we have a dependency on the Submission Infrastructure team to upgrade HTCondor. I have not yet contacted them because, from Mattermost, I understand that most of the SI team is on vacation.

amaltaro · 2023-09-12T10:00:24Z

Now that people are back from vacation season, here is a ticket to address the required developments at the glideinWMS layer:
https://its.cern.ch/jira/browse/CMSSI-79

amaltaro · 2023-10-10T14:03:35Z

Just a note: We decided not to pull this ticket into 2023 Q4, as there are dependencies on the SI infrastructure and it's not clear when those will be implemented.

amaltaro added New Feature WMAgent ReqMgr2 SimpleCondorPlugIn Feature change High Priority QPrio: High quarter priority and removed New Feature labels May 17, 2023

amaltaro self-assigned this May 17, 2023

amaltaro mentioned this issue May 17, 2023

Add GPU support to the StepChain spec #11588

Merged

amaltaro added the GPU label May 17, 2023

novicecpp mentioned this issue May 25, 2023

add GPU support dmwm/CRABServer#6989

Closed

amaltaro linked a pull request Aug 15, 2023 that will close this issue

Change how CUDA runtime and capabilities are defined in the task and Condor #11689

Open

novicecpp mentioned this issue Sep 28, 2023

Adopting GPU Params change from WMCore dmwm/CRABServer#7893

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit GPUMemoryMB, CUDARuntime and CUDACapabilities at workflow and job matchmaking #11595

Revisit GPUMemoryMB, CUDARuntime and CUDACapabilities at workflow and job matchmaking #11595

amaltaro commented May 17, 2023 •

edited

Loading

mmascher commented May 17, 2023

amaltaro commented May 17, 2023

mmascher commented May 17, 2023 •

edited

Loading

aperezca commented May 17, 2023

amaltaro commented May 17, 2023

belforte commented May 18, 2023

amaltaro commented May 24, 2023

amaltaro commented Jun 12, 2023

amaltaro commented Aug 3, 2023

fwyzard commented Aug 3, 2023

amaltaro commented Aug 3, 2023

fwyzard commented Aug 3, 2023

klannon commented Aug 21, 2023

amaltaro commented Aug 21, 2023

amaltaro commented Sep 12, 2023

amaltaro commented Oct 10, 2023

Revisit GPUMemoryMB, CUDARuntime and CUDACapabilities at workflow and job matchmaking #11595

Revisit GPUMemoryMB, CUDARuntime and CUDACapabilities at workflow and job matchmaking #11595

Comments

amaltaro commented May 17, 2023 • edited Loading

mmascher commented May 17, 2023

amaltaro commented May 17, 2023

mmascher commented May 17, 2023 • edited Loading

aperezca commented May 17, 2023

amaltaro commented May 17, 2023

belforte commented May 18, 2023

amaltaro commented May 24, 2023

amaltaro commented Jun 12, 2023

amaltaro commented Aug 3, 2023

fwyzard commented Aug 3, 2023

amaltaro commented Aug 3, 2023

fwyzard commented Aug 3, 2023

klannon commented Aug 21, 2023

amaltaro commented Aug 21, 2023

amaltaro commented Sep 12, 2023

amaltaro commented Oct 10, 2023

amaltaro commented May 17, 2023 •

edited

Loading

mmascher commented May 17, 2023 •

edited

Loading