Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change how run number is defined for harvested root files in multiRun mode #9746

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

amaltaro
Copy link
Contributor

Fixes #9690

Status

not-tested

Description

In short:

  • if harvesting MC data (be it in byRun or multiRun mode): run number is set to 1
  • if harvesting data in byRun mode, apply no change to the run number: so it takes it from the data harvested
  • if harvesting data in multiRun mode, force run to be 999999

Is it backward compatible (if not, which system it affects?)

no, it cannot be applied to workflows with harvesting jobs already created

Related PRs

none

External dependencies / deployment changes

none

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
  • Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 15 comments to review
  • Pycodestyle check: succeeded
    • 1 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10112/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Unit tests: succeeded
    • 1 tests no longer failing
  • Pylint check: failed
    • 9 warnings and errors that must be fixed
    • 18 comments to review
  • Pycodestyle check: succeeded
    • 1 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/10113/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

This will be more complicated than I initially foreseen. Run dependent MC really has run number > 1, where I thought that all that logic was internal to the CMSSW when processing data... Here is part of my harvesting job:

 'input_files': [{'checksums': {'adler32': '861551bd', 'cksum': '661964275'},
                  'events': 0,
                  'first_event': 0,
                  'last_event': 0,
                  'lfn': '/store/backfill/1/CMSSW_11_1_0_pre7/RelValTTbar_13UP18_RD/DQMIO/RECOPRMXUP18_PU25_RD_TC_MC_multiRun_June2020_Val_Alanv12-v11/00000/1ED13C52-B0E9-11EA-8109-D0CDE183BEEF.root',
                  'locations': set([]),
                  'merged': True,
                  'parents': set([]),
                  'runs': set([]),
                  'size': 68426312}],
 'jobType': 'Harvesting',
 'jobgroup': 555,
 'location': None,
 'mask': {'FirstEvent': None,
          'FirstLumi': None,
          'FirstRun': None,
          'LastEvent': None,
          'LastLumi': None,
          'LastRun': None,
          'inclusivemask': True,
          'runAndLumis': {315257: [[1, 36]]}},

so we need to find out a systematic way to identify such run-dependent MC files.

@srimanob
Copy link

Hi @amaltaro
Do you mean we don't have a way to identify between data and MC on harvesting from wm side? Thanks.

@jfernan2
Copy link

Just to be clear, please correct me if I am wrong: right now, before this PR is merged,

  • MRH files, either data or MC, have a parameter in WMcore "runLimits", "-%s-%s" % (minRun, maxRun))[1], which is used in the dataset name for DQM.
    I am not sure how many of these have been uploaded to the DQM GUI, I can only find one of those in the development GUI, none in the Offline GUI. This one: https://tinyurl.com/ycj7luc9

which has RunNumber forced as 999999 in the DQM search box despite there is a mismatch between this and the runNumber displayed in the Menu of the DQM GUI (278017, the longest one in the range?), but dataset name keeps the run range used in the harvesting: /NoBPTX/Run2016F-23Sep2016-v1-277932-278193/DQMIO

This would be the desired behaviour for MRH in DQM GUI, so that DQM user can trace back directly from dataset name, which runs (a range) it contains, despite the search is performed by run = 999999 in the DQM search.

I see several ALCAPROMPT datasets uploaded in this way into the Offline DQM GUI too, all of them with runNumber forced to 999999, but different dataset name and different run displayed in the header of the GUI.
E.g. /StreamExpress/Run2018A-PromptCalibProdSiStripGainsAAG-Express-v1-316702-316766/ALCAPROMPT
https://tinyurl.com/yaz6vfyt
So that they can be distinguished by dataset name (run range) and even by displayed Run Number (in the header of the GUI) despite all have 9999999

@ahmad3213 @emanueleusai @rvenditti please speak either if you agree or disagree

Thanks

[1] https://github.com/dmwm/WMCore/pull/9746/files#diff-3c13cdc9485083bb43b4e4d3d37f7310b878d36bc137ce2a7cf8f08de4e9daf0L181-R184

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 440 tests deleted
    • 19 tests no longer failing
    • 13 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 64 warnings and errors that must be fixed
    • 5 warnings
    • 343 comments to review
  • Pylint py3k check: failed
    • 102 errors and warnings that should be fixed
    • 79 warnings
  • Pycodestyle check: succeeded
    • 447 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13176/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong setting of RunNumber in harvesting output for MC MultiRun
4 participants