Add runtime json #11812

khurtado · 2023-12-06T23:38:02Z

Fixes #11743

Status

Ready to be tested

Description

Creates json with runtime information for future customization usage

Is it backward compatible (if not, which system it affects?)

YES

cmsdmwmbot · 2023-12-06T23:45:08Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 1 tests added
- 3 changes in unstable tests
Python3 Pylint check: failed
- 14 warnings and errors that must be fixed
- 1 warnings
- 30 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 9 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14698/artifact/artifacts/PullRequestReport.html

khurtado · 2023-12-07T19:35:12Z

@amaltaro @todor-ivanov I think this is ready for review
This PR produces the json structure below for taskChain and stepChain. I opted for keeping both 'keep_output" and "transient" due to #9013.

taskChain

{
    "workflow_type": "TaskChain",
    "number_of_cmsRuns": 1,
    "worker_arch": "X86_64",
    "worker_os": "rhel8",
    "cmsRun_params": [
        {
            "step": "cmsRun1",
            "keep_output": true,
            "input_files": [
                "/store/data/Run2023C/JetMET1/RAW/v1/000/367/131/00000/f7943baa-66b2-402d-a9f2-4f6fc52cde13.root"
            ],
            "output_files": [
                "FEVTDEBUGHLToutput.root"
            ],
            "output_files_datatiers": [
                "FEVTDEBUGHLT"
            ],
            "output_files_transient": [
                true
            ],
            "output_files_lfnBase": [
                "/store/unmerged/CMSSW_13_3_0_pre4/JetMET1/FEVTDEBUGHLT/133X_dataRun3_HLT_frozen_v1_CNAFARM_RelVal_2023C-v1"
            ],
            "output_files_mergedLFNBase": [
                "/store/relval/CMSSW_13_3_0_pre4/JetMET1/FEVTDEBUGHLT/133X_dataRun3_HLT_frozen_v1_CNAFARM_RelVal_2023C-v1"
            ]
        }
    ]
}

StepChain

{
    "workflow_type": "StepChain",
    "number_of_cmsRuns": 5,
    "worker_arch": "X86_64",
    "worker_os": "rhel8",
    "cmsRun_params": [
        {
            "step": "cmsRun1",
            "keep_output": false,
            "input_files": [
                null
            ],
            "output_files": [
                "RAWSIMoutput.root"
            ],
            "output_files_datatiers": [
                "GEN-SIM"
            ],
            "output_files_transient": [
                true
            ],
            "output_files_lfnBase": [
                "/store/unmerged/Run3Summer23BPixGS/HTo2LongLivedTo4b_MH-125_MFF-25_CTau-15000mm_TuneCP5_13p6TeV-pythia8/GEN-SIM/130X_mcRun3_2023_realistic_postBPix_v2-v2"
            ],
            "output_files_mergedLFNBase": [
                "/store/mc/Run3Summer23BPixGS/HTo2LongLivedTo4b_MH-125_MFF-25_CTau-15000mm_TuneCP5_13p6TeV-pythia8/GEN-SIM/130X_mcRun3_2023_realistic_postBPix_v2-v2"
            ]
        },
        {
            "step": "cmsRun2",
            "keep_output": true,
            "input_files": [
                "../cmsRun1/RAWSIMoutput.root"
            ],
            "output_files": [
                "PREMIXRAWoutput.root"
            ],
            "output_files_datatiers": [
                "GEN-SIM-RAW"
            ],
            "output_files_transient": [
                true
            ],
            "output_files_lfnBase": [
                "/store/unmerged/Run3Summer23BPixDRPremix/HTo2LongLivedTo4b_MH-125_MFF-25_CTau-15000mm_TuneCP5_13p6TeV-pythia8/GEN-SIM-RAW/130X_mcRun3_2023_realistic_postBPix_v2-v2"
            ],
            "output_files_mergedLFNBase": [
                "/store/mc/Run3Summer23BPixDRPremix/HTo2LongLivedTo4b_MH-125_MFF-25_CTau-15000mm_TuneCP5_13p6TeV-pythia8/GEN-SIM-RAW/130X_mcRun3_2023_realistic_postBPix_v2-v2"
            ]
        },
        {
            "step": "cmsRun3",
            "keep_output": true,
            "input_files": [
                "../cmsRun2/PREMIXRAWoutput.root"
            ],
            "output_files": [
                "AODSIMoutput.root"
            ],
            "output_files_datatiers": [
                "AODSIM"
            ],
            "output_files_transient": [
                true
            ],
            "output_files_lfnBase": [
                "/store/unmerged/Run3Summer23BPixDRPremix/HTo2LongLivedTo4b_MH-125_MFF-25_CTau-15000mm_TuneCP5_13p6TeV-pythia8/AODSIM/130X_mcRun3_2023_realistic_postBPix_v2-v2"
            ],
            "output_files_mergedLFNBase": [
                "/store/mc/Run3Summer23BPixDRPremix/HTo2LongLivedTo4b_MH-125_MFF-25_CTau-15000mm_TuneCP5_13p6TeV-pythia8/AODSIM/130X_mcRun3_2023_realistic_postBPix_v2-v2"
            ]
        },
        {
            "step": "cmsRun4",
            "keep_output": true,
            "input_files": [
                "../cmsRun3/AODSIMoutput.root"
            ],
            "output_files": [
                "MINIAODSIMoutput.root"
            ],
            "output_files_datatiers": [
                "MINIAODSIM"
            ],
            "output_files_transient": [
                true
            ],
            "output_files_lfnBase": [
                "/store/unmerged/Run3Summer23BPixMiniAODv4/HTo2LongLivedTo4b_MH-125_MFF-25_CTau-15000mm_TuneCP5_13p6TeV-pythia8/MINIAODSIM/130X_mcRun3_2023_realistic_postBPix_v2-v2"
            ],
            "output_files_mergedLFNBase": [
                "/store/mc/Run3Summer23BPixMiniAODv4/HTo2LongLivedTo4b_MH-125_MFF-25_CTau-15000mm_TuneCP5_13p6TeV-pythia8/MINIAODSIM/130X_mcRun3_2023_realistic_postBPix_v2-v2"
            ]
        },
        {
            "step": "cmsRun5",
            "keep_output": true,
            "input_files": [
                "../cmsRun4/MINIAODSIMoutput.root"
            ],
            "output_files": [
                "NANOEDMAODSIMoutput.root"
            ],
            "output_files_datatiers": [
                "NANOAODSIM"
            ],
            "output_files_transient": [
                true
            ],
            "output_files_lfnBase": [
                "/store/unmerged/Run3Summer23BPixNanoAODv12/HTo2LongLivedTo4b_MH-125_MFF-25_CTau-15000mm_TuneCP5_13p6TeV-pythia8/NANOAODSIM/130X_mcRun3_2023_realistic_postBPix_v2-v2"
            ],
            "output_files_mergedLFNBase": [
                "/store/mc/Run3Summer23BPixNanoAODv12/HTo2LongLivedTo4b_MH-125_MFF-25_CTau-15000mm_TuneCP5_13p6TeV-pythia8/NANOAODSIM/130X_mcRun3_2023_realistic_postBPix_v2-v2"
            ]
        }
    ]
}

cmsdmwmbot · 2023-12-07T19:39:03Z

Jenkins results:

Python3 Unit tests: succeeded
- 1 tests no longer failing
- 1 changes in unstable tests
Python3 Pylint check: failed
- 12 warnings and errors that must be fixed
- 1 warnings
- 33 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14702/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-12-07T19:41:35Z

Jenkins results:

Python3 Unit tests: succeeded
- 1 tests no longer failing
- 1 changes in unstable tests
Python3 Pylint check: failed
- 12 warnings and errors that must be fixed
- 1 warnings
- 33 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14703/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-12-07T20:34:33Z

Jenkins results:

Python3 Unit tests: succeeded
- 1 tests no longer failing
Python3 Pylint check: failed
- 12 warnings and errors that must be fixed
- 1 warnings
- 33 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14704/artifact/artifacts/PullRequestReport.html

todor-ivanov

It looks good to me.

khurtado · 2023-12-08T19:17:40Z

@todor-ivanov THank you!
@amaltaro Just for the record, I injected all the DMWM/Integration test jobs and so far so good

https://cmsweb-testbed.cern.ch/reqmgr2/fetch?rid=amaltaro_TaskChain_ProdMinBias_khurtado_taskchain1_231208_153841_5641

amaltaro

@khurtado thank you for providing those json dumps, Kenyi. Based on that, I have the following suggestions:

what do you think about renaming output_files_transient to transient_output?
perhaps I would rename output_files_lfnBase to something like output_files_unmerged_base (or output_files_unmerged_lfn_base). Similarly for output_files_mergedLFNBase
when a step has no input files, should we set input_files=[] instead of input_files=[null]?

In addition, perhaps it would be useful to have another attribute for the job type as well (Production/Processing/Merge/etc)? If we have that, then we could consider logging the final json structure for Production/Processing jobs. The other job types can potentially have a very large list of input files and I'd rather not dump those in the logs.

Could you please try to create a unit test for createWMRuntimeJson as well?

Note that I have not yet looked into the source code changes, but I will come back to this later or whenever you provide further updates to this PR.

cmsdmwmbot · 2023-12-11T14:56:02Z

Jenkins results:

Python3 Unit tests: succeeded
- 2 changes in unstable tests
Python3 Pylint check: failed
- 14 warnings and errors that must be fixed
- 1 warnings
- 33 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14706/artifact/artifacts/PullRequestReport.html

khurtado · 2023-12-11T15:41:56Z

@amaltaro

I made the following changes:

I renamed the attributes with your suggestions
I agree with the input_files as well, so it is now doing [] rather than [null]
I added job_type which can be Production, Processing, Merge, LogCollect, etc
I also changed Startup such that if job_type is Production/Processing, we dump the json in the log
I have also squashed the commits

What is missing is the unit test, which I can come back to later when I'm back. In the meantime, I think this is ready for review again.

The log will look like this:

<snip>

INFO:root:runtime json = {
    "workflow_type": "TaskChain",
    "number_of_cmsRuns": 1,
    "worker_arch": "X86_64",
    "worker_os": "rhel8",
    "job_type": "Processing",
    "cmsRun_params": [
        {
            "step": "cmsRun1",
            "keep_output": true,
            "input_files": [
                "/store/data/Run2023C/JetMET1/RAW/v1/000/367/131/00000/f7943baa-66b2-402d-a9f2-4f6fc52cde13.root"
            ],
            "output_files": [
                "FEVTDEBUGHLToutput.root"
            ],
            "output_files_datatiers": [
                "FEVTDEBUGHLT"
            ],
            "transient_output": [
                true
            ],
            "output_files_unmerged_base": [
                "/store/unmerged/CMSSW_13_3_0_pre4/JetMET1/FEVTDEBUGHLT/133X_dataRun3_HLT_frozen_v1_CNAFARM_RelVal_2023C-v1"
            ],
            "output_files_merged_base": [
                "/store/relval/CMSSW_13_3_0_pre4/JetMET1/FEVTDEBUGHLT/133X_dataRun3_HLT_frozen_v1_CNAFARM_RelVal_2023C-v1"
            ]
        }
    ]
}
INFO:root:Building task at directory: /tmpscratch/users/khurtado/work/wmcore/11743/taskchain/job
INFO:root:CMSSW.coreBuild invoked
INFO:root:create(/tmpscratch/users/khurtado/work/wmcore/11743/taskchain/job/WMTaskSpace/cmsRun1)

cmsdmwmbot · 2023-12-11T15:47:40Z

Jenkins results:

Python3 Unit tests: succeeded
- 1 changes in unstable tests
Python3 Pylint check: failed
- 14 warnings and errors that must be fixed
- 1 warnings
- 33 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14710/artifact/artifacts/PullRequestReport.html

amaltaro

Thank you for the prompt action, Kenyi.
I left a few more comments along the code, and I would suggest the following as well:

given that we are loading the job information and doing other things, I think it would be beneficial to actually time the time spent creating this runtime dump. There are a few ways to do so, but I can easily point you to one that has been adopted in other places of WMCore, with CodeTimer, e.g.: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/MSTransferor/RequestInfo.py#L108
please have a look at the jenkins/pylint summary as well, there might be some easy things that you can spot and correct.

src/python/WMCore/WMRuntime/Bootstrap.py

amaltaro · 2023-12-11T16:56:21Z

src/python/WMCore/WMRuntime/Bootstrap.py

+            if chained:
+                inputModule = getattr(step.data.input, 'inputOutputModule', None)
+                inputStepName = getattr(step.data.input, 'inputStepName', None)
+                inputFileNames.append("../{}/{}.root".format(inputStepName, inputModule))


Is it possible to set it to an absolute path instead of relative?

@amaltaro Maybe, but we treat chain processing with relative paths in the PSet tweaking, which is why I opted to keep it that way. Considering that, do you prefer absolute path, or keep it the same way as with the code below?

WMCore/src/python/WMCore/WMRuntime/Scripts/SetupCMSSWPset.py

Lines 341 to 342 in 63292d6

inputFile = ("file:../%s/%s.root" % (self.step.data.input.inputStepName,

self.step.data.input.inputOutputModule))

src/python/WMCore/WMRuntime/Bootstrap.py

src/python/WMCore/WMRuntime/Startup.py

khurtado · 2023-12-11T18:25:37Z

@amaltaro I addressed all requests but the absolute path for chained processing. I believe it should be treated the same way it is done with the tweakings.
I left the changes in a different commit so they are easier to spot.

CodeTimer will print something like this for the json stuff now:

INFO:root:Creating WM runtime information json took 0.218 seconds to complete

cmsdmwmbot · 2023-12-11T18:30:08Z

Jenkins results:

Python3 Unit tests: succeeded
- 2 changes in unstable tests
Python3 Pylint check: failed
- 12 warnings and errors that must be fixed
- 1 warnings
- 33 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14715/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2023-12-11T18:32:49Z

Jenkins results:

Python3 Unit tests: succeeded
- 2 changes in unstable tests
Python3 Pylint check: failed
- 12 warnings and errors that must be fixed
- 1 warnings
- 33 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 7 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14716/artifact/artifacts/PullRequestReport.html

amaltaro · 2023-12-12T09:38:23Z

Thank you Kenyi, let's get this in.

makortel · 2024-01-18T14:45:53Z

src/python/WMCore/WMRuntime/Bootstrap.py

+     "job_type": "Production", # string with job type information: Production/Processing, Merge, LogCollect, etc.
+     "cmsRun_params": [{
+         "step": "cmsRun1",  # string with the relevant step
+         "keek_output": boolean saying whether we keep output or not,


Looks like a typo, should the key be keep_output?

Yes Matti, it should be keep_output. I will soon pick a couple of examples from workflows that ran this week and share those here with you.

Thanks Alan.

amaltaro · 2024-01-18T16:10:48Z

@makortel Matti, in case this is relevant to you. I picked a couple of workflows and looked on what was reported in the wmagentJob.log log file.

This is the runtime json that was created for a StepChain job which belongs to workflow amaltaro_SC_EL8_Agent227_Val_240110_213036_6665:

{
    "cmsRun_params": [
        {
            "input_files": [],
            "keep_output": true,
            "output_files": [
                "FEVTDEBUGoutput.root"
            ],
            "output_files_datatiers": [
                "GEN-SIM"
            ],
            "output_files_merged_base": [
                "/store/backfill/1/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/GEN-SIM/GenSimFull_SC_EL8_Agent227_Val_Alanv1-v11"
            ],
            "output_files_unmerged_base": [
                "/store/unmerged/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/GEN-SIM/GenSimFull_SC_EL8_Agent227_Val_Alanv1-v11"
            ],
            "step": "cmsRun1",
            "transient_output": [
                true
            ]
        },
        {
            "input_files": [
                "../cmsRun1/FEVTDEBUGoutput.root"
            ],
            "keep_output": true,
            "output_files": [
                "FEVTDEBUGHLToutput.root"
            ],
            "output_files_datatiers": [
                "GEN-SIM-DIGI-RAW"
            ],
            "output_files_merged_base": [
                "/store/backfill/1/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/GEN-SIM-DIGI-RAW/Digi_2021_SC_EL8_Agent227_Val_Alanv1-v11"
            ],
            "output_files_unmerged_base": [
                "/store/unmerged/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/GEN-SIM-DIGI-RAW/Digi_2021_SC_EL8_Agent227_Val_Alanv1-v11"
            ],
            "step": "cmsRun2",
            "transient_output": [
                true
            ]
        },
        {
            "input_files": [
                "../cmsRun2/FEVTDEBUGHLToutput.root"
            ],
            "keep_output": true,
            "output_files": [
                "NANOEDMAODSIMoutput.root",
                "MINIAODSIMoutput.root",
                "DQMoutput.root",
                "RECOSIMoutput.root"
            ],
            "output_files_datatiers": [
                "NANOAODSIM",
                "MINIAODSIM",
                "DQMIO",
                "GEN-SIM-RECO"
            ],
            "output_files_merged_base": [
                "/store/backfill/1/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/NANOAODSIM/RecoNano_2021_SC_EL8_Agent227_Val_Alanv1-v11",
                "/store/backfill/1/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/MINIAODSIM/RecoNano_2021_SC_EL8_Agent227_Val_Alanv1-v11",
                "/store/backfill/1/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/DQMIO/RecoNano_2021_SC_EL8_Agent227_Val_Alanv1-v11",
                "/store/backfill/1/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/GEN-SIM-RECO/RecoNano_2021_SC_EL8_Agent227_Val_Alanv1-v11"
            ],
            "output_files_unmerged_base": [
                "/store/unmerged/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/NANOAODSIM/RecoNano_2021_SC_EL8_Agent227_Val_Alanv1-v11",
                "/store/unmerged/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/MINIAODSIM/RecoNano_2021_SC_EL8_Agent227_Val_Alanv1-v11",
                "/store/unmerged/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/DQMIO/RecoNano_2021_SC_EL8_Agent227_Val_Alanv1-v11",
                "/store/unmerged/CMSSW_12_4_0_pre2/RelValWToLNu_14TeV/GEN-SIM-RECO/RecoNano_2021_SC_EL8_Agent227_Val_Alanv1-v11"
            ],
            "step": "cmsRun3",
            "transient_output": [
                true,
                true,
                true,
                true
            ]
        }
    ],
    "job_type": "Production",
    "number_of_cmsRuns": 3,
    "worker_arch": "X86_64",
    "worker_os": "rhel8",
    "workflow_type": "StepChain"
}

while this TaskChain job for https://cmsweb-testbed.cern.ch/reqmgr2/fetch?rid=amaltaro_TC_Nano_Agent227_Val_240110_213025_3077 produced:

{
    "cmsRun_params": [
        {
            "input_files": [
                "/store/mc/RunIISummer20UL17MiniAODv2/WplusToJJZToLNuJJ_mjj100_pTj10_QCD_LO_TuneCP5_13TeV-madgraph-pythia8/MINIAODSIM/106X_mc2017_realistic_v9-v2/2560000/FC70BF0E-01D1-D04C-8C9F-C0DEFA040142.root",
                "/store/mc/RunIISummer20UL17MiniAODv2/WplusToJJZToLNuJJ_mjj100_pTj10_QCD_LO_TuneCP5_13TeV-madgraph-pythia8/MINIAODSIM/106X_mc2017_realistic_v9-v2/2560000/CC9DE505-6F64-034F-9F1D-3CEB014DCC2F.root",
                "/store/mc/RunIISummer20UL17MiniAODv2/WplusToJJZToLNuJJ_mjj100_pTj10_QCD_LO_TuneCP5_13TeV-madgraph-pythia8/MINIAODSIM/106X_mc2017_realistic_v9-v2/2560000/C0F2C2AB-D82F-D344-917C-CFD67051CEBF.root",
                "/store/mc/RunIISummer20UL17MiniAODv2/WplusToJJZToLNuJJ_mjj100_pTj10_QCD_LO_TuneCP5_13TeV-madgraph-pythia8/MINIAODSIM/106X_mc2017_realistic_v9-v2/2560000/F5B1A081-7A16-F744-BE87-5166500D959E.root",
                "/store/mc/RunIISummer20UL17MiniAODv2/WplusToJJZToLNuJJ_mjj100_pTj10_QCD_LO_TuneCP5_13TeV-madgraph-pythia8/MINIAODSIM/106X_mc2017_realistic_v9-v2/2560000/F2A73B75-CFBA-8E41-AFAC-396401CB4067.root",
                "/store/mc/RunIISummer20UL17MiniAODv2/WplusToJJZToLNuJJ_mjj100_pTj10_QCD_LO_TuneCP5_13TeV-madgraph-pythia8/MINIAODSIM/106X_mc2017_realistic_v9-v2/2560000/4BCFFC5F-8139-B74E-AFD3-6BD520111CE1.root"
            ],
            "keep_output": true,
            "output_files": [
                "NANOEDMAODSIMoutput.root"
            ],
            "output_files_datatiers": [
                "NANOAODSIM"
            ],
            "output_files_merged_base": [
                "/store/backfill/1/RunIISummer20UL17NanoAODv9/WplusToJJZToLNuJJ_mjj100_pTj10_QCD_LO_TuneCP5_13TeV-madgraph-pythia8/NANOAODSIM/Task1_NanoAODv9_TC_Nano_Agent227_Val_Alanv1-v11"
            ],
            "output_files_unmerged_base": [
                "/store/unmerged/RunIISummer20UL17NanoAODv9/WplusToJJZToLNuJJ_mjj100_pTj10_QCD_LO_TuneCP5_13TeV-madgraph-pythia8/NANOAODSIM/Task1_NanoAODv9_TC_Nano_Agent227_Val_Alanv1-v11"
            ],
            "step": "cmsRun1",
            "transient_output": [
                true
            ]
        }
    ],
    "job_type": "Processing",
    "number_of_cmsRuns": 1,
    "worker_arch": "X86_64",
    "worker_os": "rhel7",
    "workflow_type": "TaskChain"
}

Those are just for illustration, but I hope they give you a better sense of what we are currently providing from the workflow/job runtime environment.

makortel · 2024-01-19T21:14:14Z

Thanks @amaltaro for the examples. Here is some initial feedback from me and @Dr15Jones (based on just staring the examples, and reminding ourselves what we had asked for in #10819

What is the meaning of keep_output and transient_output?
What is the meaning of output_files_merged_base and output_files_unmerged_base?
Seems we would need some documentation on all the fields
How would a workflow configured to use the two-file solution (i.e. CMSSW process.source.secondaryFileNames) look like?
How would a pileup file look like?
How can we tell if an intermediate file is not used outside of the StepChain?
We'd want the current step value (cmsRun1 etc) to be passed in some way to the process.customizeWorkflow() function. Could be e.g.
- new top-level untracked.PSet (we'd need to agree the name)
- parameter to the customizeWorkflow()

amaltaro · 2024-02-13T18:24:54Z

@khurtado Kenyi, as we have discussed in the weekly meeting. Can you please document these fields and share it here? If you prefer, create a merge request against cms-wmcore and we can follow it from there.

khurtado · 2024-02-22T19:00:47Z

@makortel

What is the meaning of keep_output and transient_output?

   "keek_output":  Mark whether or not we should keep the output from this CMSSW step.

Reference:

WMCore/src/python/WMCore/WMSpec/Steps/Templates/CMSSW.py

Lines 316 to 324 in 52ae759

    
               def keepOutput(self, keepOutput): 
        
                   """ 
        
                   _keepOutput_ 
        
                   Mark whether or not we should keep the output from this step.  We don't 
        
                   want to keep the output from certain chained steps. 
        
                   """ 
        
                   self.data.output.keep = keepOutput 
        
                   return

“transient_output”: ordered boolean saying whether output files are announced in the workflow or not. E.g.: StepChain intermediate steps outputs are considered transient and hence do not need to be staged out and registered in DBS/Rucio.

Reference:

WMCore/src/python/WMCore/WMSpec/StdSpecs/StepChain.py

Lines 12 to 14 in 52ae759

    
           It also assumes all the intermediate steps output are transient and do not need 
        
           to be staged out and registered in DBS/PhEDEx. Only the last step output will be 
        
           made available.

However, there is a bug on transient output that maybe @amaltaro can comment more on: #9013

What is the meaning of output_files_merged_base and output_files_unmerged_base?
"output_files_unmerged_base": [ordered string with unmerged lfnBase info],
Used for example, to build the process logicalFileName:

WMCore/src/python/PSetTweaks/WMTweak.py

Line 531 in 52ae759

result.addParameter("process.%s.logicalFileName" % modName, lfn)
```
   "output_files_merged_base": [ordered string with merged LFNBase info]
```

@amaltaro: Would it be fair to say if keep_output is True, the merged LFN base is used?

Seems we would need some documentation on all the fields
As far as I know, there is at present no documentation for these fields other than the comments in the code. We might need to create an issue to create some documentation.
How would a workflow configured to use the two-file solution (i.e. CMSSW process.source.secondaryFileNames) look like?
- Mhhh, I don’t think we are covering this case, actually
- Looking into the WM Tweak, inputfile[‘lfn’] is used for the primaryFiles and input file[‘parents’] for the secondaryFileNames. We would need to include this field in the json.
  
  WMCore/src/python/PSetTweaks/WMTweak.py
  
  Lines 433 to 434 in 52ae759
  
  for secondaryFile in inputFile["parents"]:
  
  secondaryFiles.append(secondaryFile["lfn"])

@amaltaro Do you agree? If so, I can create a ticket to follow up on this

How would a pileup file look like?
- @amaltaro do you have an example of this?
How can we tell if an intermediate file is not used outside of the StepChain?
- Technically, with the transient value.
We'd want the current step value (cmsRun1 etc) to be passed in some way to the process.customizeWorkflow() function. Could be e.g.
- new top-level untracked.PSet (we'd need to agree the name)
- parameter to the customizeWorkflow()
  I think this would be related to:
  Provide generic CMSSW customization script to cmssw-wm-tools #11744

amaltaro · 2024-02-22T19:45:39Z

However, there is a bug on transient output that maybe @amaltaro can comment more on: #9013

The transient_output basically specifies whether a given output module + datatier is supposed to be registered in DBS/Rucio or not. In other words, objects marked as transient are usually produced/consumed/destroyed inside a given job and they are not meant to even be transferred to the storage unmerged area (except for differences between TaskChain and StepChain).

@amaltaro: Would it be fair to say if keep_output is True, the merged LFN base is used?

Totally! If False, then those output files are never merged (and never show up in DBS/Rucio).

How would a workflow configured to use the two-file solution (i.e. CMSSW process.source.secondaryFileNames) look like?

Yes, AFAIR secondaryFileNames is used for pileup files. For the record, here is the script used to tweak the job PSet: https://github.com/cms-sw/cmssw-wm-tools/blob/30b626d030b7b11f83fe4e0385ce303e83d3fcdf/bin/cmssw_handle_pileup.py

We do not provide a list of secondary files though, as some of those datasets (e.g. Neutrino) can have > 100k files. Unless there is a strong motivation for this, I don't think it should be part of the runtime dump - there are other ways to get a hold of it in the worker node.

@amaltaro do you have an example of this?

I had to scan my AFS area, and I did find an example from 2017 (AFAIK it has not changed). You can also access it here: https://amaltaro.web.cern.ch/amaltaro/forFrancesco/myPSet.py

How can we tell if an intermediate file is not used outside of the StepChain?
Technically, with the transient value.

or keep_output=false.

HTH!

khurtado · 2024-02-22T21:37:30Z

How would a workflow configured to use the two-file solution (i.e. CMSSW process.source.secondaryFileNames) look like?

Yes, AFAIR secondaryFileNames is used for pileup files. For the record, here is the script used to tweak the job PSet: https://github.com/cms-sw/cmssw-wm-tools/blob/30b626d030b7b11f83fe4e0385ce303e83d3fcdf/bin/cmssw_handle_pileup.py

We do not provide a list of secondary files though, as some of those datasets (e.g. Neutrino) can have > 100k files. Unless there is a strong motivation for this, I don't think it should be part of the runtime dump - there are other ways to get a hold of it in the worker node.

I am confused. Isn't inputFile["parents"] the list of secondaryFiles?

WMCore/src/python/PSetTweaks/WMTweak.py

Lines 433 to 434 in 52ae759

    
           for secondaryFile in inputFile["parents"]: 
        
               secondaryFiles.append(secondaryFile["lfn"])

makortel · 2024-02-22T21:40:12Z

Thanks for the clarifications. Here are some quick follow-up questions

However, there is a bug on transient output that maybe @amaltaro can comment more on: #9013

The transient_output basically specifies whether a given output module + datatier is supposed to be registered in DBS/Rucio or not. In other words, objects marked as transient are usually produced/consumed/destroyed inside a given job and they are not meant to even be transferred to the storage unmerged area (except for differences between TaskChain and StepChain).

Could you elaborate how the bug manifests? Like is true when it should be false, or false when it should be true? The examples in #11812 (comment) show it as true in all cases.

What is the meaning of output_files_merged_base and output_files_unmerged_base?
"output_files_unmerged_base": [ordered string with unmerged lfnBase info],
Used for example, to build the process logicalFileName:

Ok, so these are about how the files are handled by the WM after the chain has completed. I guess for the purposes for modifying cmsRun behavior we could ignore these.

Seems we would need some documentation on all the fields
As far as I know, there is at present no documentation for these fields other than the comments in the code. We might need to create an issue to create some documentation.

That would be great. We need to be able to produce the JSON file for CMSSW tests, and without a clear definition of the fields that would be challenging.

How would a workflow configured to use the two-file solution (i.e. CMSSW process.source.secondaryFileNames) look like?

Yes, AFAIR secondaryFileNames is used for pileup files. For the record, here is the script used to tweak the job PSet: https://github.com/cms-sw/cmssw-wm-tools/blob/30b626d030b7b11f83fe4e0385ce303e83d3fcdf/bin/cmssw_handle_pileup.py

Let's try to not confuse two-file solution and pileup files. @khurtado's link

WMCore/src/python/PSetTweaks/WMTweak.py

Lines 433 to 434 in 52ae759

for secondaryFile in inputFile["parents"]:

secondaryFiles.append(secondaryFile["lfn"])

looked relevant to what I was after (process.source.secondaryFileNames).

We do not provide a list of secondary files though, as some of those datasets (e.g. Neutrino) can have > 100k files. Unless there is a strong motivation for this, I don't think it should be part of the runtime dump - there are other ways to get a hold of it in the worker node.

On a quick thought I don't think we would have a use case that would require the knowledge of the pileup files, and the question was more out of curiosity. But we need to think a bit more.

We'd want the current step value (cmsRun1 etc) to be passed in some way to the process.customizeWorkflow() function. Could be e.g.

new top-level untracked.PSet (we'd need to agree the name)

parameter to the customizeWorkflow()
I think this would be related to:
Provide generic CMSSW customization script to cmssw-wm-tools #11744

Right, it can be related to #11744. I brought it up because from the earlier discussion in #10819 (comment) onwards it wasn't clear if WM preference for this information would to have it as part of the JSON or some other way.

khurtado · 2024-02-23T20:58:29Z

@amaltaro Could you confirm if we would need to add this block (inputFiles['parents') to support the secondaryFileNames field? From what I see, that seems to be the case but you previous comment on it confused me, so I want to double check. If so, we would need to create a couple of issues, one to provide this extra field and one for documentation.

WMCore/src/python/PSetTweaks/WMTweak.py

Lines 433 to 434 in 52ae759

    
           for secondaryFile in inputFile["parents"]: 
        
               secondaryFiles.append(secondaryFile["lfn"])

amaltaro · 2024-02-23T21:12:04Z

Kenyi, IF this secondary file feature refers to workflows processing parent datasets (which in the workflow description is represented by IncludeParents=True), then it looks like we need this information as well.

On what concerns the GH tickets, I see these things as an atomic operation and IMO documentation - whenever required - should be provided with the actual development. You already have most of the information here anyways, so migrating to gitlab should be simple (classical words ;))

khurtado · 2024-06-24T14:24:30Z

Do I understand correctly that the 2 required steps are the following?

Create a new issue to include (inputFiles['parents'), in order to support the secondaryFileNames
Document these changes in wmcore-docs

khurtado requested a review from amaltaro December 7, 2023 19:20

khurtado force-pushed the 11743 branch 2 times, most recently from cb4a8ac to 7ce1340 Compare December 7, 2023 19:31

khurtado requested a review from todor-ivanov December 7, 2023 19:44

todor-ivanov approved these changes Dec 8, 2023

View reviewed changes

amaltaro reviewed Dec 11, 2023

View reviewed changes

Add runtime information json.

b1693ad

khurtado force-pushed the 11743 branch from 9a0c9d8 to b1693ad Compare December 11, 2023 15:39

amaltaro requested changes Dec 11, 2023

View reviewed changes

Add timing information and make PR requested changes

5b28542

khurtado force-pushed the 11743 branch from 55946ae to 5b28542 Compare December 11, 2023 18:22

khurtado requested a review from amaltaro December 11, 2023 18:26

amaltaro merged commit fd288bb into dmwm:master Dec 12, 2023
3 of 4 checks passed

novicecpp mentioned this pull request Dec 18, 2023

Latest WMCore broke jobwrapper dmwm/CRABServer#8110

Closed

makortel reviewed Jan 18, 2024

View reviewed changes

makortel mentioned this pull request Jan 18, 2024

Add tests for workflow customization cms-sw/framework-team#794

Open

novicecpp mentioned this pull request Feb 26, 2024

No module named 'WMCore.BossAir' error when running CRAB jobs with WMCore==2.3.1rc3 #11912

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add runtime json #11812

Add runtime json #11812

khurtado commented Dec 6, 2023 •

edited

Loading

cmsdmwmbot commented Dec 6, 2023

khurtado commented Dec 7, 2023 •

edited

Loading

cmsdmwmbot commented Dec 7, 2023

cmsdmwmbot commented Dec 7, 2023

cmsdmwmbot commented Dec 7, 2023

todor-ivanov left a comment

khurtado commented Dec 8, 2023

amaltaro left a comment

cmsdmwmbot commented Dec 11, 2023

khurtado commented Dec 11, 2023 •

edited

Loading

cmsdmwmbot commented Dec 11, 2023

amaltaro left a comment

amaltaro Dec 11, 2023

khurtado Dec 11, 2023

khurtado commented Dec 11, 2023 •

edited

Loading

cmsdmwmbot commented Dec 11, 2023

cmsdmwmbot commented Dec 11, 2023

amaltaro commented Dec 12, 2023

makortel Jan 18, 2024

amaltaro Jan 18, 2024

makortel Jan 18, 2024

amaltaro commented Jan 18, 2024

makortel commented Jan 19, 2024

amaltaro commented Feb 13, 2024

khurtado commented Feb 22, 2024 •

edited

Loading

amaltaro commented Feb 22, 2024

khurtado commented Feb 22, 2024

makortel commented Feb 22, 2024 •

edited

Loading

khurtado commented Feb 23, 2024

amaltaro commented Feb 23, 2024

khurtado commented Jun 24, 2024

	inputFile = ("file:../%s/%s.root" % (self.step.data.input.inputStepName,
	self.step.data.input.inputOutputModule))

Add runtime json #11812

Add runtime json #11812

Conversation

khurtado commented Dec 6, 2023 • edited Loading

Status

Description

Is it backward compatible (if not, which system it affects?)

cmsdmwmbot commented Dec 6, 2023

khurtado commented Dec 7, 2023 • edited Loading

cmsdmwmbot commented Dec 7, 2023

cmsdmwmbot commented Dec 7, 2023

cmsdmwmbot commented Dec 7, 2023

todor-ivanov left a comment

Choose a reason for hiding this comment

khurtado commented Dec 8, 2023

amaltaro left a comment

Choose a reason for hiding this comment

cmsdmwmbot commented Dec 11, 2023

khurtado commented Dec 11, 2023 • edited Loading

cmsdmwmbot commented Dec 11, 2023

amaltaro left a comment

Choose a reason for hiding this comment

amaltaro Dec 11, 2023

Choose a reason for hiding this comment

khurtado Dec 11, 2023

Choose a reason for hiding this comment

khurtado commented Dec 11, 2023 • edited Loading

cmsdmwmbot commented Dec 11, 2023

cmsdmwmbot commented Dec 11, 2023

amaltaro commented Dec 12, 2023

makortel Jan 18, 2024

Choose a reason for hiding this comment

amaltaro Jan 18, 2024

Choose a reason for hiding this comment

makortel Jan 18, 2024

Choose a reason for hiding this comment

amaltaro commented Jan 18, 2024

makortel commented Jan 19, 2024

amaltaro commented Feb 13, 2024

khurtado commented Feb 22, 2024 • edited Loading

amaltaro commented Feb 22, 2024

khurtado commented Feb 22, 2024

makortel commented Feb 22, 2024 • edited Loading

khurtado commented Feb 23, 2024

amaltaro commented Feb 23, 2024

khurtado commented Jun 24, 2024

khurtado commented Dec 6, 2023 •

edited

Loading

khurtado commented Dec 7, 2023 •

edited

Loading

khurtado commented Dec 11, 2023 •

edited

Loading

khurtado commented Dec 11, 2023 •

edited

Loading

khurtado commented Feb 22, 2024 •

edited

Loading

makortel commented Feb 22, 2024 •

edited

Loading