-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add runtime json #11812
Add runtime json #11812
Conversation
Jenkins results:
|
cb4a8ac
to
7ce1340
Compare
@amaltaro @todor-ivanov I think this is ready for review
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khurtado thank you for providing those json dumps, Kenyi. Based on that, I have the following suggestions:
- what do you think about renaming
output_files_transient
totransient_output
? - perhaps I would rename
output_files_lfnBase
to something likeoutput_files_unmerged_base
(oroutput_files_unmerged_lfn_base
). Similarly foroutput_files_mergedLFNBase
- when a step has no input files, should we set
input_files=[]
instead ofinput_files=[null]
?
In addition, perhaps it would be useful to have another attribute for the job type as well (Production/Processing/Merge/etc)? If we have that, then we could consider logging the final json structure for Production/Processing jobs. The other job types can potentially have a very large list of input files and I'd rather not dump those in the logs.
Could you please try to create a unit test for createWMRuntimeJson
as well?
Note that I have not yet looked into the source code changes, but I will come back to this later or whenever you provide further updates to this PR.
Jenkins results:
|
I made the following changes:
What is missing is the unit test, which I can come back to later when I'm back. In the meantime, I think this is ready for review again. The log will look like this:
|
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the prompt action, Kenyi.
I left a few more comments along the code, and I would suggest the following as well:
- given that we are loading the job information and doing other things, I think it would be beneficial to actually time the time spent creating this runtime dump. There are a few ways to do so, but I can easily point you to one that has been adopted in other places of WMCore, with
CodeTimer
, e.g.: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/MSTransferor/RequestInfo.py#L108 - please have a look at the jenkins/pylint summary as well, there might be some easy things that you can spot and correct.
if chained: | ||
inputModule = getattr(step.data.input, 'inputOutputModule', None) | ||
inputStepName = getattr(step.data.input, 'inputStepName', None) | ||
inputFileNames.append("../{}/{}.root".format(inputStepName, inputModule)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to set it to an absolute path instead of relative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaltaro Maybe, but we treat chain processing with relative paths in the PSet tweaking, which is why I opted to keep it that way. Considering that, do you prefer absolute path, or keep it the same way as with the code below?
WMCore/src/python/WMCore/WMRuntime/Scripts/SetupCMSSWPset.py
Lines 341 to 342 in 63292d6
inputFile = ("file:../%s/%s.root" % (self.step.data.input.inputStepName, | |
self.step.data.input.inputOutputModule)) |
@amaltaro I addressed all requests but the absolute path for chained processing. I believe it should be treated the same way it is done with the tweakings. CodeTimer will print something like this for the json stuff now:
|
Jenkins results:
|
Jenkins results:
|
Thank you Kenyi, let's get this in. |
"job_type": "Production", # string with job type information: Production/Processing, Merge, LogCollect, etc. | ||
"cmsRun_params": [{ | ||
"step": "cmsRun1", # string with the relevant step | ||
"keek_output": boolean saying whether we keep output or not, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a typo, should the key be keep_output
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Matti, it should be keep_output
. I will soon pick a couple of examples from workflows that ran this week and share those here with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alan.
@makortel Matti, in case this is relevant to you. I picked a couple of workflows and looked on what was reported in the wmagentJob.log log file. This is the runtime json that was created for a StepChain job which belongs to workflow amaltaro_SC_EL8_Agent227_Val_240110_213036_6665:
while this TaskChain job for https://cmsweb-testbed.cern.ch/reqmgr2/fetch?rid=amaltaro_TC_Nano_Agent227_Val_240110_213025_3077 produced:
Those are just for illustration, but I hope they give you a better sense of what we are currently providing from the workflow/job runtime environment. |
Thanks @amaltaro for the examples. Here is some initial feedback from me and @Dr15Jones (based on just staring the examples, and reminding ourselves what we had asked for in #10819
|
@khurtado Kenyi, as we have discussed in the weekly meeting. Can you please document these fields and share it here? If you prefer, create a merge request against cms-wmcore and we can follow it from there. |
Reference: WMCore/src/python/WMCore/WMSpec/Steps/Templates/CMSSW.py Lines 316 to 324 in 52ae759
Reference: WMCore/src/python/WMCore/WMSpec/StdSpecs/StepChain.py Lines 12 to 14 in 52ae759
However, there is a bug on transient output that maybe @amaltaro can comment more on: #9013
@amaltaro: Would it be fair to say if keep_output is True, the merged LFN base is used?
@amaltaro Do you agree? If so, I can create a ticket to follow up on this
|
The
Totally! If
Yes, AFAIR We do not provide a list of secondary files though, as some of those datasets (e.g. Neutrino) can have > 100k files. Unless there is a strong motivation for this, I don't think it should be part of the runtime dump - there are other ways to get a hold of it in the worker node.
I had to scan my AFS area, and I did find an example from 2017 (AFAIK it has not changed). You can also access it here: https://amaltaro.web.cern.ch/amaltaro/forFrancesco/myPSet.py
or HTH! |
I am confused. Isn't WMCore/src/python/PSetTweaks/WMTweak.py Lines 433 to 434 in 52ae759
|
Thanks for the clarifications. Here are some quick follow-up questions
Could you elaborate how the bug manifests? Like is
Ok, so these are about how the files are handled by the WM after the chain has completed. I guess for the purposes for modifying
That would be great. We need to be able to produce the JSON file for CMSSW tests, and without a clear definition of the fields that would be challenging.
Let's try to not confuse two-file solution and pileup files. @khurtado's link
looked relevant to what I was after (
On a quick thought I don't think we would have a use case that would require the knowledge of the pileup files, and the question was more out of curiosity. But we need to think a bit more.
Right, it can be related to #11744. I brought it up because from the earlier discussion in #10819 (comment) onwards it wasn't clear if WM preference for this information would to have it as part of the JSON or some other way. |
@amaltaro Could you confirm if we would need to add this block (inputFiles['parents') to support the secondaryFileNames field? From what I see, that seems to be the case but you previous comment on it confused me, so I want to double check. If so, we would need to create a couple of issues, one to provide this extra field and one for documentation. WMCore/src/python/PSetTweaks/WMTweak.py Lines 433 to 434 in 52ae759
|
Kenyi, IF this secondary file feature refers to workflows processing parent datasets (which in the workflow description is represented by On what concerns the GH tickets, I see these things as an atomic operation and IMO documentation - whenever required - should be provided with the actual development. You already have most of the information here anyways, so migrating to gitlab should be simple (classical words ;)) |
Do I understand correctly that the 2 required steps are the following?
|
Fixes #11743
Status
Ready to be tested
Description
Creates json with runtime information for future customization usage
Is it backward compatible (if not, which system it affects?)
YES