Wrong setting of RunNumber in harvesting output for MC MultiRun #9690

srimanob · 2020-05-11T21:26:30Z

Impact of the bug
Output from MC MultiRun harvesting can't upload to GUI due to RunNumber is set to 999999 from production side while it's forced to be 1 in cmsDriver injected with the workflow.

Describe the bug
MC Run-Dependent relvals have been submitted, e.g.
https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?campaign=CMSSW_11_1_0_pre7__RD_1HS-1588962304

All steps look OK except no histogram is found in GUI. When we check GUI log,

weblog-20200509.log:[09/May/2020:03:02:00] INFO: saved file /data/srv/state/dqmgui/relval/uploads/0001/DQM_V0001_R000999999__RelValZMM_13UP18_RD__CMSSW_11_1_0_pre7-PUpmx25ns_110X_upgrade2018_realistic_v9_RD_1-v1-315257-324245__DQMIO.root size 119895428 checksum md5:6f6371b7190d61fc495fec25f736e7ed

The filename seems to be wrong. The RunNumber should be 1. We force if to be 1 in cmsDriver of harvesting step in the workflow,
https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/53979640a5175ff795ce39b4522dc8fe/configFile

step4 --conditions auto:phase1_2018_realistic --era Run2_2018 --customise_commands process.dqmSaver.forceRunNumber = 1 -s HARVESTING:@standardValidationNoHLT+@standardDQMFakeHLT+@miniAODValidation+@miniAODDQM+@L1TEgamma --harvesting AtJobEnd --filetype DQM --geometry DB:Extended --mc --io HARVESTUP18_PU25_L1TEgDQM_RD.io --python HARVESTUP18_PU25_L1TEgDQM_RD.py --relval 200000,100 -n 100 --no_exec --filein file:step3_inDQM.root --fileout file:step4.root"

Looking around WMCore, I found
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMRuntime/Scripts/SetupCMSSWPset.py#L546-L547

        if multiRun and isCMSSWSupported(self.getCmsswVersion(), "CMSSW_8_0_0"):
            self.process.dqmSaver.forceRunNumber = cms.untracked.int32(999999)

Is this overwritten the Harvesting cmsDriver we submit with the workflow?

How to reproduce it
I don't know how to reproduce, as it seems problem will happen on production side only and can't reproduce when I run locally. Running locally, I get the proper filename, i.e.
DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root

Expected behavior
We expect RunNumber = 1 for MC MultiRun.

The text was updated successfully, but these errors were encountered:

amaltaro · 2020-05-12T06:56:16Z

@srimanob Hi Phat,
I had a look at the history and here's a reference to the PR that applied this change:
#7525

it also points to the original issue, where we described the feature requirements with Marco Rovere and Federico.

Can you please have a look at those tickets and decide whether a feature change is required? Perhaps you also want to touch base with the DQM experts that proposed it back 3 or 4 years ago.
Thanks

srimanob · 2020-05-12T07:02:52Z

Hi Alan @amaltaro
Thanks. I've sent the message to DQM conveners already last night, hope they will confirm in this report soon. The quick idea is we normally use this to data. For MC, to support Run-Dependent MC, this rule of 999999 needs to be reviewed. Either GUI to accept, or update the WMA.

jfernan2 · 2020-05-15T15:12:34Z

Dear all,
within DQM we have agreed that it is better to sitkc to the current convention and keep runNumber=1 for MRH MC, since in this case MRH only concerns confDB settings but it is still MC, so this will avoid confusions in the future:
MC -> RunNumber =1
Data-> RunNumber > 1
Run ranges already appear in the DQM dataset number for MC (and Data) MRH, so this convention would be compliant with the present DQM situation, unless this option causes much trouble to WMA.
Thank you in advance

amaltaro · 2020-06-15T13:14:49Z

@srimanob @jfernan2 thanks for following this up. I understand there is nothing to be changed in the agent then. Please reopen this issue if I have misinterpreted anything.

srimanob · 2020-06-15T13:18:54Z

Hi @amaltaro
Sorry, I think we need an update on
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMRuntime/Scripts/SetupCMSSWPset.py#L546-L547

Force RUN = 1 for MC, and 999999 for data for MRH.

amaltaro · 2020-06-15T15:39:11Z

Sorry for missing it, Phat.
So,

MC by run: run number is 1
MC multi run: run number is 1
data by run: run number is whatever run is being harvested
data multi run: run number is 999999 (no longer a run range - with lower and upper limits)

Can you please confirm it?

srimanob · 2020-06-15T15:46:36Z

Hi @amaltaro
Yes, I confirm that. What we would like to change is somehow like:

if multiRun and isCMSSWSupported(self.getCmsswVersion(), "CMSSW_8_0_0"):
if data:
self.process.dqmSaver.forceRunNumber = cms.untracked.int32(999999)
if MC:
self.process.dqmSaver.forceRunNumber = cms.untracked.int32(1)

(as MC in MRH will have RunNumber as in Data, but we would like to ignore it on MC).

Thanks in advance.

amaltaro · 2020-06-18T07:27:13Z

@srimanob I expected it to be a simple fix, but it turns out we need to systematically identify data and run-dep MC files in the agent. I will get back to this issue in the next week.

christopheralanwest · 2021-04-06T15:06:31Z

AlCa is currently using a naming convention in which global tags for run-dependent MC have the string "_RD" immediately prior to the version number. From our perspective, files from run-dependent MC workflows can be identified by the string "_RD_v" in the name, such as /store/relval/CMSSW_11_3_0_pre3/RelValZEE_13UP18_RD/DQMIO/113X_upgrade2018_realistic_RD_v3_RunDep_HS-v1/00000/42A72F0A-8B23-11EB-A3E1-19BDE183BEEF.root. Would such a scheme be feasible?

christopheralanwest · 2021-05-06T16:04:24Z

AlCa is currently using a naming convention in which global tags for run-dependent MC have the string "_RD" immediately prior to the version number. From our perspective, files from run-dependent MC workflows can be identified by the string "_RD_v" in the name, such as /store/relval/CMSSW_11_3_0_pre3/RelValZEE_13UP18_RD/DQMIO/113X_upgrade2018_realistic_RD_v3_RunDep_HS-v1/00000/42A72F0A-8B23-11EB-A3E1-19BDE183BEEF.root. Would such a scheme be feasible?

I would like to understand if the obstacles to resolving this issue are:

a matter of resolving the issue in an elegant and/or more general/maintainable way
a matter of determining a solution by any means. For example, is my hackish suggestion to identify run-dependent MC files by a particular string assigned by convention in the dataset name not feasible?
purely a consequence of the lower priority of this issue
something else?

At this point, given the small number of run-dependent MC relval samples, it is feasible to run the harvesting and upload to the DQM GUI manually. So, this issue currently is not particularly urgent. At the same time, I don't want to waste the time of those who would perform these manual actions if a fix on the WMCore side is straightforward.

amaltaro · 2021-05-10T05:01:46Z

Hi @christopheralanwest , sorry for the belated response.

Yes, we still have to find out a generic and robust way to distinguish between data and MC. I'm afraid matching the global tag against _RD_ might be risky. Perhaps we could use that runLimits variable and set it to 999999 whenever its limits are different than 1 (thus data). I believe run dependent monte carlo still considers run number 1 inside the agent and DBS, so it should not be a problem.

Nonetheless, this issue hasn't been added to our todo queue for this quarter, so unless we can make it an hour or two of work, we will likely not be able to work on it before Q3. Perhaps you would like to discuss this with PPD/PdmV group, since they have other WMCore issues that we need to consider as well.

christopheralanwest · 2021-05-18T03:05:25Z

Hi @christopheralanwest , sorry for the belated response.

Yes, we still have to find out a generic and robust way to distinguish between data and MC. I'm afraid matching the global tag against _RD_ might be risky. Perhaps we could use that runLimits variable and set it to 999999 whenever its limits are different than 1 (thus data). I believe run dependent monte carlo still considers run number 1 inside the agent and DBS, so it should not be a problem.

Clearly we could choose an longer string, such as RunDep or RunDepMC that would be less likely to accidentally match data files. The shorter string _RD_ was chosen simply to keep the global tag names (and therefore dataset names) short, as requested by PdmV.

Run-dependent MC does not have run numbers other than 1 in the GEN-SIM step:

https://cmsweb.cern.ch/das/request?instance=prod/global&input=run+dataset%3D%2FRelValZEE_13UP18_RD%2FCMSSW_11_3_0_pre3-113X_upgrade2018_realistic_RD_v3_RunDep_HS-v1%2FGEN-SIM

But in the DIGI-PREMIX step the lumiblock number is mapped to a run number:

https://cmsweb.cern.ch/das/request?instance=prod/global&input=run+dataset%3D%2FRelValZEE_13UP18_RD%2FCMSSW_11_3_0_pre3-PUpmx_113X_upgrade2018_realistic_RD_v3_RunDep_HS-v1%2FGEN-SIM-DIGI-RAW-HLTDEBUG

Nonetheless, this issue hasn't been added to our todo queue for this quarter, so unless we can make it an hour or two of work, we will likely not be able to work on it before Q3. Perhaps you would like to discuss this with PPD/PdmV group, since they have other WMCore issues that we need to consider as well.

I think it's fine to postpone this issue until run-dependent MC relval production becomes more frequent, which would not happen before Q3.

vkuznet · 2021-05-18T11:40:50Z

I know that it may be late, but I don't really see a problem with distinguishing MC and data based on run numbers. At the end run numbers is just a number and someone can choose a range which can be assigned to real data and to MC. For instance, MC run numbers can start from 1M and above, everything below is allocated for real data run numbers. I think it is general issue in CMS and should be discussed separately. From a technical point of view I don't see any issue with such approach.

amaltaro added the Further Discussion label May 12, 2020

amaltaro closed this as completed Jun 15, 2020

amaltaro reopened this Jun 15, 2020

amaltaro added Medium Priority WMAgent Feature change and removed Further Discussion labels Jun 15, 2020

amaltaro added this to the June_2020 milestone Jun 15, 2020

amaltaro self-assigned this Jun 15, 2020

amaltaro linked a pull request Jun 15, 2020 that will close this issue

Change how run number is defined for harvested root files in multiRun mode #9746

Open

jfernan2 mentioned this issue Mar 19, 2022

The new DQM GUI file management #10287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong setting of RunNumber in harvesting output for MC MultiRun #9690

Wrong setting of RunNumber in harvesting output for MC MultiRun #9690

srimanob commented May 11, 2020 •

edited

Loading

amaltaro commented May 12, 2020

srimanob commented May 12, 2020 •

edited

Loading

jfernan2 commented May 15, 2020

amaltaro commented Jun 15, 2020

srimanob commented Jun 15, 2020 •

edited

Loading

amaltaro commented Jun 15, 2020

srimanob commented Jun 15, 2020

amaltaro commented Jun 18, 2020

christopheralanwest commented Apr 6, 2021

christopheralanwest commented May 6, 2021

amaltaro commented May 10, 2021

christopheralanwest commented May 18, 2021

vkuznet commented May 18, 2021

Wrong setting of RunNumber in harvesting output for MC MultiRun #9690

Wrong setting of RunNumber in harvesting output for MC MultiRun #9690

Comments

srimanob commented May 11, 2020 • edited Loading

amaltaro commented May 12, 2020

srimanob commented May 12, 2020 • edited Loading

jfernan2 commented May 15, 2020

amaltaro commented Jun 15, 2020

srimanob commented Jun 15, 2020 • edited Loading

amaltaro commented Jun 15, 2020

srimanob commented Jun 15, 2020

amaltaro commented Jun 18, 2020

christopheralanwest commented Apr 6, 2021

christopheralanwest commented May 6, 2021

amaltaro commented May 10, 2021

christopheralanwest commented May 18, 2021

vkuznet commented May 18, 2021

srimanob commented May 11, 2020 •

edited

Loading

srimanob commented May 12, 2020 •

edited

Loading

srimanob commented Jun 15, 2020 •

edited

Loading