Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make duplicate branch check work properly with SwitchProducer that has EDAlias case #40136

Merged
merged 2 commits into from Nov 29, 2022

Conversation

makortel
Copy link
Contributor

PR description:

#40110 demonstrated that the OutputModule's duplicate branch check for EDAliases did not work properly when a SwitchProducer had an EDAlias case. The problem was that the SwitchProducer-EDAlias' alias-for BranchID was not inserted into the trueBranchIDToKeptBranchDesc, and therefore the check threw an exception only if the non-SwitchProducer-EDAlias duplicate entry was earlier in the map. This PR fixes the check to use the proper alias-for BranchID for both normal and SwitchProducer EDAliases.

The first commit consolidates some of the SwitchProducer unit test configuration files to reduce copy paste.

Resolves #40110

PR validation:

Framework unit tests run. The example HLT configuration throws now the proper exception.

…s EDAlias case

Earlier check worked only if the non-SwitchProducer branch was in the map earlier.
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40136/33129

  • This PR adds an extra 32KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel (Matti Kortelainen) for master.

It involves the following packages:

  • FWCore/Framework (core)
  • FWCore/Integration (core)

@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks.
@missirol, @wddgit this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals RelVals-INPUT AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-420682/29201/summary.html
COMMIT: 3759817
CMSSW: CMSSW_12_6_X_2022-11-22-2300/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40136/29201/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

----- Begin Fatal Exception 23-Nov-2022 04:08:03 CET-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Duplicate Output Selection Two (or more) equivalent branches have been selected for output.
#1: BranchKey(HBHERecHitsSorted, hltHbherecoLegacy, , reHLT)
#2: BranchKey(HBHERecHitsSorted, hltHbhereco, , reHLT)
Please drop at least one of them.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 23-Nov-2022 04:21:24 CET-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Duplicate Output Selection Two (or more) equivalent branches have been selected for output.
#1: BranchKey(HBHERecHitsSorted, hltHbherecoLegacy, , HLT)
#2: BranchKey(HBHERecHitsSorted, hltHbhereco, , HLT)
Please drop at least one of them.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 23-Nov-2022 04:21:24 CET-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Duplicate Output Selection Two (or more) equivalent branches have been selected for output.
#1: BranchKey(HBHERecHitsSorted, hltHbherecoLegacy, , HLT)
#2: BranchKey(HBHERecHitsSorted, hltHbhereco, , HLT)
Please drop at least one of them.
----- End Fatal Exception -------------------------------------------------
Expand to see more relval errors ...

RelVals-INPUT

  • 139.003139.003_RunHLTPhy2021+RunHLTPhy2021+HLTDR3_2022+RECODR3_reHLT_HLTPhysics_Offline+HARVESTD2021HLTPhy/step2_RunHLTPhy2021+RunHLTPhy2021+HLTDR3_2022+RECODR3_reHLT_HLTPhysics_Offline+HARVESTD2021HLTPhy.log
  • 139.001139.001_RunMinimumBias2021+RunMinimumBias2021+HLTDR3_2022+RECODR3_reHLT_MinBiasOffline+HARVESTD2021MB/step2_RunMinimumBias2021+RunMinimumBias2021+HLTDR3_2022+RECODR3_reHLT_MinBiasOffline+HARVESTD2021MB.log
  • 139.004139.004_RunNoBPTX2021+RunNoBPTX2021+HLTDR3_2022+RECODR3_reHLT_AlCaTkCosmics_Offline+HARVESTDR3/step2_RunNoBPTX2021+RunNoBPTX2021+HLTDR3_2022+RECODR3_reHLT_AlCaTkCosmics_Offline+HARVESTDR3.log
Expand to see more relval errors ...

AddOn Tests

----- Begin Fatal Exception 23-Nov-2022 04:12:29 CET-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Duplicate Output Selection Two (or more) equivalent branches have been selected for output.
#1: BranchKey(HBHERecHitsSorted, hltHbherecoLegacy, , HLTRECO)
#2: BranchKey(HBHERecHitsSorted, hltHbhereco, , HLTRECO)
Please drop at least one of them.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 23-Nov-2022 04:10:54 CET-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Duplicate Output Selection Two (or more) equivalent branches have been selected for output.
#1: BranchKey(HBHERecHitsSorted, hltHbherecoLegacy, , HLTRECO)
#2: BranchKey(HBHERecHitsSorted, hltHbhereco, , HLTRECO)
Please drop at least one of them.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 23-Nov-2022 04:10:37 CET-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Duplicate Output Selection Two (or more) equivalent branches have been selected for output.
#1: BranchKey(HBHERecHitsSorted, hltHbherecoLegacy, , HLTRECO)
#2: BranchKey(HBHERecHitsSorted, hltHbhereco, , HLTRECO)
Please drop at least one of them.
----- End Fatal Exception -------------------------------------------------
Expand to see more addon errors ...

@missirol
Copy link
Contributor

Looks like this will require a change in HLTrigger_EventContent_cff (either to the script that creates that cff, or to the inputs it uses).

'keep *_hltHbherecoFromGPU_*_*',
'keep *_hltHbherecoLegacy_*_*',
'keep *_hltHbhereco_*_*',

(cc: @Martin-Grunewald)

@Martin-Grunewald
Copy link
Contributor

Martin-Grunewald commented Nov 23, 2022

Is it enough to drop hltHbhereco but keeping both hltHbherecoLegacy and hltHbherecoFromGPU?
Indeed it does. So instead of the generic hltHbhereco we could add in the affected EventContents used to build the overall HLT event contents only one or both of the two others, but clients would then need to be updated to use those.
These four: hltOutputDQM,hltOutputHLTMonitor,hltOutputHIHLTMonitor,hltOutputHIDQM
have keep *_hltHbhereco_*_* and the non-HI ones are used to build the EventContent.

@missirol
Copy link
Contributor

Is it enough to drop hltHbhereco but keeping both hltHbherecoLegacy and hltHbherecoFromGPU?

I think it would solve this problem, but I think hltHbhereco is needed downstream, for example by DQM modules in RelVal wfs (e.g. HcalRecHitsValidation in step3 of wf 11634.0; although that client might just silently not fill histograms if the collection is missing, as opposed to throwing an exception).

On the HLT side:

  • It looks like we would have to manually drop some of these collections when we make the cff (as opposed to changing online EventContents in ConfDB, where hltHbhereco is required by the DQM stream (and others), and the other two collections are needed in the DQMGPUvsCPU stream). The alternative would be to remove the DQMGPUvsCPU EventContent from HLTDebugRAW/FEVT, but I think we discussed in the past that it should be in there.

More on the sw side:

  • I don't understand why we didn't see an issue like Segmentation violation in all DQM online clients during HI run #40110 when using FEVTDEBUGHLT files produced so far, where we were keeping both hltHbhereco and hltHbherecoLegacy / hltHbherecoFromGPU, e.g. step2 of wf 11634.0.

  • For the future, should we live with this limitation, or are improvements possible? I'm thinking about two files of one MC sample, one file produced on CPU and the other on GPU, and a user having to consume hltHbherecoLegacy in one case and hltHbherecoFromGPU in the other (assuming we have to drop hltHbhereco), whereas the user would just consume hltHbhereco instead.

Tagging @fwyzard in case he has opinions/suggestions/clarifications, since this involves HLT and GPUs.

@makortel
Copy link
Contributor Author

This is a good question. I see the step3 of workflow 11634.0 consumes only hltHbhereco. But also a specific test reading both hltHbhereco and hltHbherecoLegacy from a ROOT file succeeded. It could be that there is something in the ROOT output/input that allows the setup to work in some situations.

  • For the future, should we live with this limitation, or are improvements possible? I'm thinking about two files of one MC sample, one file produced on CPU and the other on GPU, and a user having to consume hltHbherecoLegacy in one case and hltHbherecoFromGPU in the other (assuming we have to drop hltHbhereco), whereas the user would just consume hltHbhereco instead.

This limitation is intended behavior (since the introduction of EDAlias a decade ago), because otherwise we'd get to issues like seen in #40110. A high-level premise has been that it doesn't make sense to store both alias and alias-for products (what would be the point of the alias?).

It seems to me we're mixing two different use cases into one

  • Portable: can be run with or without a GPU, output product has always the same label (hltHbhereco), downstream consumers don't (have to) care if the product was produced with or without a GPU
  • Non-portable: must be run with a GPU, separate module labels for the CPU and GPU modules (hltHbherecoLegacy vs hltHbherecoFromGPU), consumers must be explicitly aware

I think it would be better to deal with these use cases separately with separate files.

@missirol
Copy link
Contributor

missirol commented Nov 23, 2022

I think it would be better to deal with these use cases separately with separate files.

Thanks for clarifying. I would say this goes in the direction of keeping hltHbhereco in the existing tiers (e.g. FEVTDEBUGHLT), removing the DQMGPUvsCPU EventContent from them. If so, the update of HLT_EventContent_cff to unstuck this PR can be done by the end of the week.

We would then have to come up with a separate offline event content for wfs dedicated to HLT GPU-vs-CPU comparisons. In a way, this is similar to what is already done online, where different clients are fed by different streams (e.g. DQM with hltHbhereco, and DQMGPUvsCPU with the explicit CPU/GPU products).

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-420682/29258/summary.html
COMMIT: 3759817
CMSSW: CMSSW_12_6_X_2022-11-25-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40136/29258/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3417232
  • DQMHistoTests: Total failures: 24
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3417186
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 206 log files, 48 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@missirol
Copy link
Contributor

I see the step3 of workflow 11634.0 consumes only hltHbhereco.

Just for the record, step3 of wf 11634.0 also runs a GlobalRecHitsAnalyzer which consumed both hltHbhereco and hltHbherecoLegacy (on CPU), see #40155 (comment).

@makortel
Copy link
Contributor Author

I see the step3 of workflow 11634.0 consumes only hltHbhereco.

Just for the record, step3 of wf 11634.0 also runs a GlobalRecHitsAnalyzer which consumed both hltHbhereco and hltHbherecoLegacy (on CPU), see #40155 (comment).

Thanks @missirol. I was relying on the output of Tracer service with dumpPathsAndConsumes=True, and apparently that doesn't include consumesMany.

@makortel
Copy link
Contributor Author

@cmsbuild, please test

#40153 is now merged (and thanks @missirol for the update!)

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-420682/29307/summary.html
COMMIT: 3759817
CMSSW: CMSSW_13_0_X_2022-11-28-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/40136/29307/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 9 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3417311
  • DQMHistoTests: Total failures: 9
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3417280
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 206 log files, 48 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation violation in all DQM online clients during HI run
5 participants