Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix isData when neither --data nor --mc is used #34302

Merged
merged 1 commit into from
Jul 9, 2021

Conversation

srimanob
Copy link
Contributor

@srimanob srimanob commented Jul 1, 2021

PR description:

When neither --data nor --mc is used, cmsDriver will consider if the job should be data or MC using
https://github.com/cms-sw/cmssw/blob/CMSSW_12_0_0_pre3/Configuration/Applications/python/cmsDriverOptions.py#L214-L235
and then print out.

However, when it considers a job as data, options.isData is not set. Some config, i.e. NanoAOD,
https://github.com/cms-sw/cmssw/blob/CMSSW_12_0_0_pre3/Configuration/Applications/python/ConfigBuilder.py#L1705
will need options.isData to be defined. If not MC customization will be picked.

This bug was shown in the NanoV9 pilot sample,
https://cms-unified.web.cern.ch/cms-unified/report/pdmvserv_Run2018C_JetHT_UL2018_MiniAODv2_NanoAODv9_pilot_210617_171408_1560
with a config file in
https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/5dabb3c8610cca5485ccc25118fc5c8c/configFile
shows
from PhysicsTools.NanoAOD.nano_cff import nanoAOD_customizeMC
is used, instead of data.

PR validation:

Run cmsDriver without --mc, --data, and see if it picks up the proper NanoAOD:
cmsDriver.py RECO --conditions 106X_dataRun2_v35 --customise Configuration/DataProcessing/Utils.addMonitoring --datatier NANOAOD --era Run2_2018,run2_nanoAOD_106Xv2 --eventcontent NANOEDMAOD --filein placeholder.root --fileout file:ReReco-Run2018C-JetHT-UL2018_MiniAODv2_NanoAODv9_pilot-00001.root --nThreads 2 --no_exec --python_filename ReReco-Run2018C-JetHT-UL2018_MiniAODv2_NanoAODv9_pilot-00001_0_cfg.py --scenario pp --step NANO
will give

We have determined that this is real data (if not, rerun cmsDriver.py with --mc)
....
customising the process with nanoAOD_customizeData from PhysicsTools/NanoAOD/nano_cff

This is correct. Before, it picked up MC customization.

if this PR is a backport please specify the original PR and why you need to backport that PR:

Backport to 10_6

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 1, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34302/23631

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 1, 2021

A new Pull Request was created by @srimanob (Phat Srimanobhas) for master.

It involves the following packages:

Configuration/Applications

@cmsbuild, @silviodonato, @qliphy, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks.
@makortel, @Martin-Grunewald, @fabiocos this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@srimanob
Copy link
Contributor Author

srimanob commented Jul 1, 2021

Please test

@mariadalfonso
Copy link
Contributor

assign xpog

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 1, 2021

New categories assigned: xpog

@fgolf,@mariadalfonso,@gouskos you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 1, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d3a480/16389/summary.html
COMMIT: 44b7e47
CMSSW: CMSSW_12_0_X_2021-07-01-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/34302/16389/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 12 differences found in the comparisons
  • DQMHistoTests: Total files compared: 38
  • DQMHistoTests: Total histograms compared: 2785711
  • DQMHistoTests: Total failures: 18
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2785670
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 37 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 160 log files, 37 edm output root files, 38 DQM output files
  • TriggerResults: no differences found

@srimanob
Copy link
Contributor Author

srimanob commented Jul 3, 2021

type bug-fix

@qliphy
Copy link
Contributor

qliphy commented Jul 4, 2021

+operations

@srimanob
Copy link
Contributor Author

srimanob commented Jul 9, 2021

Hi @cms-sw/xpog-l2 @qliphy
Why this PR is holding? It is not related to any Nano code, but CMSSW config builder.

@qliphy
Copy link
Contributor

qliphy commented Jul 9, 2021

+1

@cms-sw/xpog-l2 Let us know if you have any comment.

@qliphy
Copy link
Contributor

qliphy commented Jul 9, 2021

merge

@cmsbuild cmsbuild merged commit 91b871c into cms-sw:master Jul 9, 2021
@mariadalfonso
Copy link
Contributor

mariadalfonso commented Jul 9, 2021

Hi @cms-sw/xpog-l2 @qliphy
Why this PR is holding? It is not related to any Nano code, but CMSSW config builder.

in miniAOD when the default is not set, pick the MC configuration, and now you set a different logic in nano. so having something different Is not good at all.

This flag should not be there probably .
For what we see only the met tool (common to nano and mini) is using this flag.
definitively something to come back when we can run mini and nano together i.e. Prompt configuration

@srimanob
Copy link
Contributor Author

srimanob commented Jul 9, 2021

Hi @cms-sw/xpog-l2 @qliphy
Why this PR is holding? It is not related to any Nano code, but CMSSW config builder.

in miniAOD when the default is not set, pick the MC configuration, and now you set a different logic in nano. so having something different Is not good at all.

This flag should not be there probably .
For what we see only the met tool (common to nano and mini) is using this flag.
definitively something to come back when we can run mini and nano together i.e. Prompt configuration

Hi,

If you want to fix Nano sequence is OK too. I avoid touching it in the beginning. It was there long ago somehow.
However, setting isMC or isData should be good too in case nothing is set. If you define that workflow should be data then isData should be true anyways.

@qliphy
Copy link
Contributor

qliphy commented Jul 11, 2021

@srimanob Please note there are some unit failures after merging this PR.
#34438
Taking into account comments from @mariadalfonso , do you think we should revert this PR? Or wait for @cms-sw/dqm-l2 to fix the unit failure?

@srimanob
Copy link
Contributor Author

Thanks @qliphy
However, it does not make much sense as we don't do touch something except setting the proper data workflow. And we keep it a bit, I can look at it. I will also make PR for Nano sequence to switch the main sequence to data, as mini.

@srimanob
Copy link
Contributor Author

However, you can revert it for now. I can make another PR when the situation is clear.
FYI @smuzaffar

@qliphy
Copy link
Contributor

qliphy commented Jul 11, 2021

@srimanob Ok, thanks. Will revert it for now.

@srimanob
Copy link
Contributor Author

srimanob commented Jul 11, 2021

Answer why IB failed in
#34438 (comment)

The validation step seems to run a strange sequence, using Data as input but look for MC content (i.e. mixing). When we set isData, the issue pop up. This is wrong.

However, to avoid IB failing, I will make another PR to fix for Nano sequence for now. But we should follow up with DQM why they need to do that kind of sequence.

@srimanob
Copy link
Contributor Author

It seems we do not have a straightforward way to fix. Mini(PAT) will face the same issue if it is data workflow but run without "--data". CMSSW will guess it is data, but PAT sequence will pick MC customization because nothing is set (*).

I believe this PR is the right thing to do, by assigning isData when the guess result is data workflow. What should be fix is the DQM validation sequence.

(*)
cmsDriver.py PAT --conditions 106X_dataRun2_v35 --datatier MINIAOD --era Run2_2018 --eventcontent MINIAOD --filein placeholder.root --fileout file:fileout.root --nThreads 2 --no_exec --python_filename miniaod_cfg.py --scenario pp --step PAT

@qliphy
Copy link
Contributor

qliphy commented Jul 12, 2021

@srimanob Would you please make a new PR or issue page to move discussions there? Thanks!

@srimanob
Copy link
Contributor Author

Here it is,
#34443

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants