Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong setting of RunNumber in harvesting output for MC MultiRun #9690

Open
srimanob opened this issue May 11, 2020 · 13 comments · May be fixed by #9746
Open

Wrong setting of RunNumber in harvesting output for MC MultiRun #9690

srimanob opened this issue May 11, 2020 · 13 comments · May be fixed by #9746

Comments

@srimanob
Copy link

srimanob commented May 11, 2020

Impact of the bug
Output from MC MultiRun harvesting can't upload to GUI due to RunNumber is set to 999999 from production side while it's forced to be 1 in cmsDriver injected with the workflow.

Describe the bug
MC Run-Dependent relvals have been submitted, e.g.
https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?campaign=CMSSW_11_1_0_pre7__RD_1HS-1588962304

All steps look OK except no histogram is found in GUI. When we check GUI log,

weblog-20200509.log:[09/May/2020:03:02:00] INFO: saved file /data/srv/state/dqmgui/relval/uploads/0001/DQM_V0001_R000999999__RelValZMM_13UP18_RD__CMSSW_11_1_0_pre7-PUpmx25ns_110X_upgrade2018_realistic_v9_RD_1-v1-315257-324245__DQMIO.root size 119895428 checksum md5:6f6371b7190d61fc495fec25f736e7ed

The filename seems to be wrong. The RunNumber should be 1. We force if to be 1 in cmsDriver of harvesting step in the workflow,
https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/53979640a5175ff795ce39b4522dc8fe/configFile

step4 --conditions auto:phase1_2018_realistic --era Run2_2018 --customise_commands process.dqmSaver.forceRunNumber = 1 -s HARVESTING:@standardValidationNoHLT+@standardDQMFakeHLT+@miniAODValidation+@miniAODDQM+@L1TEgamma --harvesting AtJobEnd --filetype DQM --geometry DB:Extended --mc --io HARVESTUP18_PU25_L1TEgDQM_RD.io --python HARVESTUP18_PU25_L1TEgDQM_RD.py --relval 200000,100 -n 100 --no_exec --filein file:step3_inDQM.root --fileout file:step4.root"

Looking around WMCore, I found
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMRuntime/Scripts/SetupCMSSWPset.py#L546-L547

        if multiRun and isCMSSWSupported(self.getCmsswVersion(), "CMSSW_8_0_0"):
            self.process.dqmSaver.forceRunNumber = cms.untracked.int32(999999)

Is this overwritten the Harvesting cmsDriver we submit with the workflow?

How to reproduce it
I don't know how to reproduce, as it seems problem will happen on production side only and can't reproduce when I run locally. Running locally, I get the proper filename, i.e.
DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root

Expected behavior
We expect RunNumber = 1 for MC MultiRun.

@amaltaro
Copy link
Contributor

@srimanob Hi Phat,
I had a look at the history and here's a reference to the PR that applied this change:
#7525

it also points to the original issue, where we described the feature requirements with Marco Rovere and Federico.

Can you please have a look at those tickets and decide whether a feature change is required? Perhaps you also want to touch base with the DQM experts that proposed it back 3 or 4 years ago.
Thanks

@srimanob
Copy link
Author

srimanob commented May 12, 2020

Hi Alan @amaltaro
Thanks. I've sent the message to DQM conveners already last night, hope they will confirm in this report soon. The quick idea is we normally use this to data. For MC, to support Run-Dependent MC, this rule of 999999 needs to be reviewed. Either GUI to accept, or update the WMA.

@jfernan2
Copy link

Dear all,
within DQM we have agreed that it is better to sitkc to the current convention and keep runNumber=1 for MRH MC, since in this case MRH only concerns confDB settings but it is still MC, so this will avoid confusions in the future:
MC -> RunNumber =1
Data-> RunNumber > 1
Run ranges already appear in the DQM dataset number for MC (and Data) MRH, so this convention would be compliant with the present DQM situation, unless this option causes much trouble to WMA.
Thank you in advance

@amaltaro
Copy link
Contributor

@srimanob @jfernan2 thanks for following this up. I understand there is nothing to be changed in the agent then. Please reopen this issue if I have misinterpreted anything.

@srimanob
Copy link
Author

srimanob commented Jun 15, 2020

Hi @amaltaro
Sorry, I think we need an update on
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMRuntime/Scripts/SetupCMSSWPset.py#L546-L547

Force RUN = 1 for MC, and 999999 for data for MRH.

@amaltaro
Copy link
Contributor

Sorry for missing it, Phat.
So,

  • MC by run: run number is 1
  • MC multi run: run number is 1
  • data by run: run number is whatever run is being harvested
  • data multi run: run number is 999999 (no longer a run range - with lower and upper limits)

Can you please confirm it?

@amaltaro amaltaro reopened this Jun 15, 2020
@srimanob
Copy link
Author

Hi @amaltaro
Yes, I confirm that. What we would like to change is somehow like:

if multiRun and isCMSSWSupported(self.getCmsswVersion(), "CMSSW_8_0_0"):
if data:
self.process.dqmSaver.forceRunNumber = cms.untracked.int32(999999)
if MC:
self.process.dqmSaver.forceRunNumber = cms.untracked.int32(1)

(as MC in MRH will have RunNumber as in Data, but we would like to ignore it on MC).

Thanks in advance.

@amaltaro
Copy link
Contributor

@srimanob I expected it to be a simple fix, but it turns out we need to systematically identify data and run-dep MC files in the agent. I will get back to this issue in the next week.

@christopheralanwest
Copy link

AlCa is currently using a naming convention in which global tags for run-dependent MC have the string "_RD" immediately prior to the version number. From our perspective, files from run-dependent MC workflows can be identified by the string "_RD_v" in the name, such as /store/relval/CMSSW_11_3_0_pre3/RelValZEE_13UP18_RD/DQMIO/113X_upgrade2018_realistic_RD_v3_RunDep_HS-v1/00000/42A72F0A-8B23-11EB-A3E1-19BDE183BEEF.root. Would such a scheme be feasible?

@christopheralanwest
Copy link

AlCa is currently using a naming convention in which global tags for run-dependent MC have the string "_RD" immediately prior to the version number. From our perspective, files from run-dependent MC workflows can be identified by the string "_RD_v" in the name, such as /store/relval/CMSSW_11_3_0_pre3/RelValZEE_13UP18_RD/DQMIO/113X_upgrade2018_realistic_RD_v3_RunDep_HS-v1/00000/42A72F0A-8B23-11EB-A3E1-19BDE183BEEF.root. Would such a scheme be feasible?

I would like to understand if the obstacles to resolving this issue are:

  • a matter of resolving the issue in an elegant and/or more general/maintainable way
  • a matter of determining a solution by any means. For example, is my hackish suggestion to identify run-dependent MC files by a particular string assigned by convention in the dataset name not feasible?
  • purely a consequence of the lower priority of this issue
  • something else?

At this point, given the small number of run-dependent MC relval samples, it is feasible to run the harvesting and upload to the DQM GUI manually. So, this issue currently is not particularly urgent. At the same time, I don't want to waste the time of those who would perform these manual actions if a fix on the WMCore side is straightforward.

@amaltaro
Copy link
Contributor

Hi @christopheralanwest , sorry for the belated response.

Yes, we still have to find out a generic and robust way to distinguish between data and MC. I'm afraid matching the global tag against _RD_ might be risky. Perhaps we could use that runLimits variable and set it to 999999 whenever its limits are different than 1 (thus data). I believe run dependent monte carlo still considers run number 1 inside the agent and DBS, so it should not be a problem.

Nonetheless, this issue hasn't been added to our todo queue for this quarter, so unless we can make it an hour or two of work, we will likely not be able to work on it before Q3. Perhaps you would like to discuss this with PPD/PdmV group, since they have other WMCore issues that we need to consider as well.

@christopheralanwest
Copy link

Hi @christopheralanwest , sorry for the belated response.

Yes, we still have to find out a generic and robust way to distinguish between data and MC. I'm afraid matching the global tag against _RD_ might be risky. Perhaps we could use that runLimits variable and set it to 999999 whenever its limits are different than 1 (thus data). I believe run dependent monte carlo still considers run number 1 inside the agent and DBS, so it should not be a problem.

Clearly we could choose an longer string, such as RunDep or RunDepMC that would be less likely to accidentally match data files. The shorter string _RD_ was chosen simply to keep the global tag names (and therefore dataset names) short, as requested by PdmV.

Run-dependent MC does not have run numbers other than 1 in the GEN-SIM step:

https://cmsweb.cern.ch/das/request?instance=prod/global&input=run+dataset%3D%2FRelValZEE_13UP18_RD%2FCMSSW_11_3_0_pre3-113X_upgrade2018_realistic_RD_v3_RunDep_HS-v1%2FGEN-SIM

But in the DIGI-PREMIX step the lumiblock number is mapped to a run number:

https://cmsweb.cern.ch/das/request?instance=prod/global&input=run+dataset%3D%2FRelValZEE_13UP18_RD%2FCMSSW_11_3_0_pre3-PUpmx_113X_upgrade2018_realistic_RD_v3_RunDep_HS-v1%2FGEN-SIM-DIGI-RAW-HLTDEBUG

Nonetheless, this issue hasn't been added to our todo queue for this quarter, so unless we can make it an hour or two of work, we will likely not be able to work on it before Q3. Perhaps you would like to discuss this with PPD/PdmV group, since they have other WMCore issues that we need to consider as well.

I think it's fine to postpone this issue until run-dependent MC relval production becomes more frequent, which would not happen before Q3.

@vkuznet
Copy link
Contributor

vkuznet commented May 18, 2021

I know that it may be late, but I don't really see a problem with distinguishing MC and data based on run numbers. At the end run numbers is just a number and someone can choose a range which can be assigned to real data and to MC. For instance, MC run numbers can start from 1M and above, everything below is allocated for real data run numbers. I think it is general issue in CMS and should be discussed separately. From a technical point of view I don't see any issue with such approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants