Change how run number is defined for harvested root files in multiRun mode #9746

amaltaro · 2020-06-15T20:50:53Z

Fixes #9690

Status

not-tested

Description

In short:

if harvesting MC data (be it in byRun or multiRun mode): run number is set to 1
if harvesting data in byRun mode, apply no change to the run number: so it takes it from the data harvested
if harvesting data in multiRun mode, force run to be 999999

Is it backward compatible (if not, which system it affects?)

no, it cannot be applied to workflows with harvesting jobs already created

Related PRs

none

External dependencies / deployment changes

none

… mode

cmsdmwmbot · 2020-06-15T21:12:06Z

Jenkins results:

Unit tests: failed
- 1 new failures
- 1 tests no longer failing
Pylint check: failed
- 9 warnings and errors that must be fixed
- 15 comments to review
Pycodestyle check: succeeded
- 1 comments to review
Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10112/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2020-06-15T21:40:59Z

Jenkins results:

Unit tests: succeeded
- 1 tests no longer failing
Pylint check: failed
- 9 warnings and errors that must be fixed
- 18 comments to review
Pycodestyle check: succeeded
- 1 comments to review
Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10113/artifact/artifacts/PullRequestReport.html

amaltaro · 2020-06-18T06:57:59Z

This will be more complicated than I initially foreseen. Run dependent MC really has run number > 1, where I thought that all that logic was internal to the CMSSW when processing data... Here is part of my harvesting job:

 'input_files': [{'checksums': {'adler32': '861551bd', 'cksum': '661964275'},
                  'events': 0,
                  'first_event': 0,
                  'last_event': 0,
                  'lfn': '/store/backfill/1/CMSSW_11_1_0_pre7/RelValTTbar_13UP18_RD/DQMIO/RECOPRMXUP18_PU25_RD_TC_MC_multiRun_June2020_Val_Alanv12-v11/00000/1ED13C52-B0E9-11EA-8109-D0CDE183BEEF.root',
                  'locations': set([]),
                  'merged': True,
                  'parents': set([]),
                  'runs': set([]),
                  'size': 68426312}],
 'jobType': 'Harvesting',
 'jobgroup': 555,
 'location': None,
 'mask': {'FirstEvent': None,
          'FirstLumi': None,
          'FirstRun': None,
          'LastEvent': None,
          'LastLumi': None,
          'LastRun': None,
          'inclusivemask': True,
          'runAndLumis': {315257: [[1, 36]]}},

so we need to find out a systematic way to identify such run-dependent MC files.

srimanob · 2021-02-22T16:20:09Z

Hi @amaltaro
Do you mean we don't have a way to identify between data and MC on harvesting from wm side? Thanks.

jfernan2 · 2022-03-23T08:22:01Z

Just to be clear, please correct me if I am wrong: right now, before this PR is merged,

MRH files, either data or MC, have a parameter in WMcore "runLimits", "-%s-%s" % (minRun, maxRun))[1], which is used in the dataset name for DQM.
I am not sure how many of these have been uploaded to the DQM GUI, I can only find one of those in the development GUI, none in the Offline GUI. This one: https://tinyurl.com/ycj7luc9

which has RunNumber forced as 999999 in the DQM search box despite there is a mismatch between this and the runNumber displayed in the Menu of the DQM GUI (278017, the longest one in the range?), but dataset name keeps the run range used in the harvesting: /NoBPTX/Run2016F-23Sep2016-v1-277932-278193/DQMIO

This would be the desired behaviour for MRH in DQM GUI, so that DQM user can trace back directly from dataset name, which runs (a range) it contains, despite the search is performed by run = 999999 in the DQM search.

I see several ALCAPROMPT datasets uploaded in this way into the Offline DQM GUI too, all of them with runNumber forced to 999999, but different dataset name and different run displayed in the header of the GUI.
E.g. /StreamExpress/Run2018A-PromptCalibProdSiStripGainsAAG-Express-v1-316702-316766/ALCAPROMPT
https://tinyurl.com/yaz6vfyt
So that they can be distinguished by dataset name (run range) and even by displayed Run Number (in the header of the GUI) despite all have 9999999

After Change how run number is defined for harvested root files in multiRun mode #9746 is merged, we lose all the functionality defined above, and everytime a MRH root file is registered for an existing dataset name, it is overwritten no matter the range used in the harvesting
For single Run mode, always the run Number is kept

@ahmad3213 @emanueleusai @rvenditti please speak either if you agree or disagree

Thanks

[1] https://github.com/dmwm/WMCore/pull/9746/files#diff-3c13cdc9485083bb43b4e4d3d37f7310b878d36bc137ce2a7cf8f08de4e9daf0L181-R184

cmsdmwmbot · 2022-05-09T13:28:41Z

Jenkins results:

Python3 Unit tests: succeeded
- 440 tests deleted
- 19 tests no longer failing
- 13 tests added
- 3 changes in unstable tests
Python3 Pylint check: failed
- 64 warnings and errors that must be fixed
- 5 warnings
- 343 comments to review
Pylint py3k check: failed
- 102 errors and warnings that should be fixed
- 79 warnings
Pycodestyle check: succeeded
- 447 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13176/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2024-09-30T21:00:25Z

Can one of the admins verify this patch?

Change how run number is defined for harvested root files in multiRun…

57ef038

… mode

amaltaro added the PR: Do not merge yet label Jun 15, 2020

fix unit tests

8357273

amaltaro added the PR: Work in progress label Jun 18, 2020

jfernan2 mentioned this pull request Mar 19, 2022

The new DQM GUI file management #10287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change how run number is defined for harvested root files in multiRun mode #9746

Change how run number is defined for harvested root files in multiRun mode #9746

amaltaro commented Jun 15, 2020

cmsdmwmbot commented Jun 15, 2020

cmsdmwmbot commented Jun 15, 2020

amaltaro commented Jun 18, 2020

srimanob commented Feb 22, 2021

jfernan2 commented Mar 23, 2022

cmsdmwmbot commented May 9, 2022

cmsdmwmbot commented Sep 30, 2024

Change how run number is defined for harvested root files in multiRun mode #9746

Are you sure you want to change the base?

Change how run number is defined for harvested root files in multiRun mode #9746

Conversation

amaltaro commented Jun 15, 2020

Status

Description

Is it backward compatible (if not, which system it affects?)

Related PRs

External dependencies / deployment changes

cmsdmwmbot commented Jun 15, 2020

cmsdmwmbot commented Jun 15, 2020

amaltaro commented Jun 18, 2020

srimanob commented Feb 22, 2021

jfernan2 commented Mar 23, 2022

cmsdmwmbot commented May 9, 2022

cmsdmwmbot commented Sep 30, 2024