Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Campaign names via job classads #10914

Closed
3 tasks done
amaltaro opened this issue Dec 13, 2021 · 17 comments
Closed
3 tasks done

Provide Campaign names via job classads #10914

amaltaro opened this issue Dec 13, 2021 · 17 comments

Comments

@amaltaro
Copy link
Contributor

amaltaro commented Dec 13, 2021

Impact of the new feature
WMAgent

Is your feature request related to a problem? Please describe.
As recently discussed over mattermost, this script:
https://github.com/dmwm/cms-htcondor-es/blob/vm-legacy/src/htcondor_es/convert_to_json.py

has some logic embedded in order to figure out the campaign name out of the request name.

Instead of parsing the request name, we should actually define new classads and provide the right information in a job-basis.

Describe the solution you'd like
Complete the following tasks:
For Campaign names, support it with RERECO, TaskChain and StepChain workflows:

Describe alternatives you've considered
None

Additional context

@mrceyhun
Copy link
Contributor

Thanks a lot @amaltaro

@mrceyhun
Copy link
Contributor

mrceyhun commented Oct 5, 2022

Hi @amaltaro @haozturk

In CMS Monitoring classad conversion code-base, Campaign is extracted from WMAgent_RequestName . As I understood thanks to Hasan, this is not 100% correct. A request has sub tasks and these sub tasks may have different campaing names. Therefore, if we can say that each WMAgent_SubTaskName should have same campaing name, we can use it to extract campaign names.

For example, for this reuest name : cmsunified_task_TRK-Run3Winter22wmLHEGS-00004__v1_T_220916_122602_6460, see in Kibana . A subtask of this request is /cmsunified_task_TRK-Run3Winter22wmLHEGS-00004__v1_T_220916_122602_6460/TRK-Run3Winter22wmLHEGS-00004_0/TRK-Run3Winter22DRPremix-00001_1CleanupUnmergedALCARECOStreamTkAlMuonIsolated. We can extract the campaing name of this sub task intuitevely as Run3Winter22DRPremix, but since its request has Run3Winter22wmLHEGS campaign name, this subtask's campaign is also set as Run3Winter22wmLHEGS although its should be Premix.

Another example is for : pdmvserv_task_HIG-RunIISummer20UL18wmLHEGEN-03789__v1_T_220420_190817_7562, see in Kibana. A subtask of this request is /pdmvserv_task_HIG-RunIISummer20UL18wmLHEGEN-03789__v1_T_220420_190817_7562/HIG-RunIISummer20UL18wmLHEGEN-03789_0/HIG-RunIISummer20UL18SIM-03452_0/HIG-RunIISummer20UL18DIGIPremix-03433_0/HIG-RunIISummer20UL18HLT-03452_0/HIG-RunIISummer20UL18RECO-03452_0/HIG-RunIISummer20UL18RECO-03452_0MergeAODSIMoutput/HIG-RunIISummer20UL18MiniAODv2-03452_0/HIG-RunIISummer20UL18MiniAODv2-03452_0MergeMINIAODSIMoutput/HIG-RunIISummer20UL18MiniAODv2-03452_0MINIAODSIMoutputMergeLogCollect. We can extract the campaing name of this sub task intuitevely as RunIISummer20UL18MiniAODv2, but since its request has RunIISummer20UL18wmLHEGEN campaign name, this subtask's campaign is also set as RunIISummer20UL18wmLHEGEN although its should be MiniAOD.

First one is step chain and second one is task chain example. In WMCore repo, I found that classadds are set in WMCore/BossAir/Plugins/SimpleCondorPlugin.py. If that is the correct place to make necessary changes, I can put a new classadd called WMAgent_Campaignwhich will be extracted from WMAgent_SubTaskName. If there is a way to get campaign name directly in WMCore, it would be better. Because regex/parsing usage in these kind of important fields should be avoided as much as possible IMHO.

Although, CMS_CampaignType is more difficult to implement. We need to decide their names with the help of PdmV folks. We can talk about this later and focus on Campaign name now.

@amaltaro
Copy link
Contributor Author

amaltaro commented Oct 5, 2022

@mrceyhun Hi Ceyhun, thanks for looking into this.
The right way to get this implemented is to actually read the PrepID from the agent database (for a given request + task name) and add this extra condor job classad in SimpleCondorPlugin. We should have the correct information in the agent, we need however to check how "accessible" it is either from JobCreator or JobSubmitter components.

Using regex on the SubTaskName will not be 100% reliable, so I'd avoid it.

@vkuznet
Copy link
Contributor

vkuznet commented Oct 5, 2022

@amaltaro , I want to stop you right here, the comment about The right way to get this implemented is to actually read the PrepID from the agent database is not applicable to monitoring tasks. At monitoring level nobody should read from any database, i.e. the monitoring task is to use provided data from MONIT. If PrepID is not properly provided to MONIT it is not issue with monitoring and the right way is to fix the place which should provide this info to MONIT. If I read correctly the issue, the PrepID should be provided in job classads and then monitoring can read it. Is this is the case?

@mrceyhun
Copy link
Contributor

mrceyhun commented Oct 5, 2022

@vkuznet I am planning to contribute to WMCore and WMCore will provide this new classadd, it will not be implemented in cms-htcondor-es side.

@amaltaro
Copy link
Contributor Author

amaltaro commented Jul 6, 2023

For the record, we had two other issues that were a duplicate of this one. They are:
#11560
and
#11643

@vkuznet
Copy link
Contributor

vkuznet commented Jul 12, 2023

Whoever who will work on this issue need the following information:

  1. Identify place in WMCore codebase where classAds are set, @amaltaro could you please provide necessary pointers
  2. Relate the code above with code which issue workflow creation and assign workflow/request name, again we need some pointers in WMCore

I think when we'll know the two steps above it will be trivial task to modify these codebase to add new attributes.

@khurtado
Copy link
Contributor

khurtado commented Sep 5, 2023

@amaltaro Do I understand correctly that :

  • campaignName = prepID ?

And for campaignType, do we still use the request name for this, or do we use the prepID? I assume this is the one that would probably read a dictionary with all the matching options from a config.py

@khurtado khurtado assigned khurtado and unassigned khurtado Sep 5, 2023
@amaltaro
Copy link
Contributor Author

amaltaro commented Sep 5, 2023

No, campaign name and prep ID are two different things.
For the campaign name, it should be a direct map of the Campaign attribute in the workflow, e.g.:
https://cmsweb.cern.ch/reqmgr2/fetch?rid=cmsunified_task_B2G-RunIISummer20UL18wmLHEGEN-02318__v1_T_220606_220438_5006

for StepChain, I think we will want to have a comma separated list of campaigns(?)

For the campaign type, I honestly don't feel comfortable adding that code dependent on random string parsing to guess what type it is, as it's been proven over the past few years that it does not work. Unless we come up with a more solid plan, I would be in favor in leaving that for some point in the future in its own github issue.

@khurtado
Copy link
Contributor

khurtado commented Sep 6, 2023

@amaltaro But even for stepChain, I only see the campaign to be set only once, as opposed to prepID for example.
E.g.:
https://cmsweb.cern.ch/reqmgr2/config?name=cmsunified_task_SUS-RunIISummer20UL16SUSwmLHEGSAPV-00108__v1_T_230905_143020_2699

am I missing something?

What seems to change is the acquisitionEra. E.g.: In the example that @mrceyhun gave:
https://cmsweb.cern.ch/reqmgr2/config?name=cmsunified_task_TRK-Run3Winter22wmLHEGS-00004__v1_T_220916_122602_6460

I do see:

properties.acquisitionEra = {
'TRK-Run3Winter22DRPremix-00001_1': 'Run3Winter22DRPremix', 
'TRK-Run3Winter22DRPremix-00001_0': 'Run3Winter22DRPremix', '
TRK-Run3Winter22wmLHEGS-00004_0': 'Run3Winter22wmLHEGS'
}

@amaltaro
Copy link
Contributor Author

amaltaro commented Sep 7, 2023

Oh, that changes the whole development! It looks like Campaign is a property of the WMWorkload object, not of the WMTask nor the WMStep, bummer!

This comment also gives a hint of how we currently deal with campaign in the workflows:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/StdBase.py#L1016

Before we complicate this development, I think we will have to reach to CompOps to learn how they would like to see campaign information in MonIT for StepChain workflows. Is it:
a) picking one of the campaigns of the workflow (the first step?);
b) or should we indeed make a comma separated list
?

I would say b), but still, how people are going to use it in MonIT? I fail to see a straight forward way to match campaigns and create plots for that case. @khurtado

@khurtado
Copy link
Contributor

khurtado commented Sep 7, 2023

@amaltaro I have converted this issue into a meta-issue. This meta-issue has now 4 issues, 3 related to campaign names and 1 for the type, which we can leave for later. I have added the first issue (RERECO) as part of Q3.

@amaltaro
Copy link
Contributor Author

@khurtado Kenyi, I was going to suggest to close this meta-issue, but I see that the campaign type is still to be implemented. Given that we don't have any clear algorithm for the campaign type, should we further refactor this ticket to be only about the campaign name; and create a new meta-issue for the "campaign type"? I would go with a meta-issue because apparently we will have to discuss and agree on an algorithm for the campaign type definition (1 issue); while the implementation of the campaign type will be at least another ticket (perhaps 2, one for stepchain and one for other request types). What do you think?

One last comment on this campaign name meta-issue. Do we have a ticket for updating the condor ES script to start consuming the new classad, whenever it is available (as there will be a transition phase where jobs will provide or not this classad). If not, then I fear we need to create it and add it still to Q3.

@khurtado
Copy link
Contributor

@amaltaro Yes, I think that would be the best.
Here is the PR associated from monit:
dmwm/cms-htcondor-es#214

@khurtado khurtado changed the title Provide Campaign name and type via job classads Provide Campaign names via job classads Sep 27, 2023
@khurtado
Copy link
Contributor

khurtado commented Sep 27, 2023

@amaltaro I have made
#11707
a meta issue and created 3 issues to go with it.

I also refactored this meta-issue to remove the campaign type from it.
I think as soon as dmwm/cms-htcondor-es#214 is closed, we can close this issue as well

@khurtado
Copy link
Contributor

khurtado commented Sep 28, 2023

@amaltaro Given that the changes on WMCore are already merged and that dmwm/cms-htcondor-es#214 is only pending 1 more approval (out of 3), I am going to go ahead and close this ticket, as I do not expect any more work on it rather than just waiting for that PR on monit to be merged. Feel free to re-open it if you think there is anything still missing.

@amaltaro
Copy link
Contributor Author

Sounds good to me. Thanks Kenyi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants