Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Herwig lhe matching fix #40939

Merged
merged 1 commit into from Mar 29, 2023

Conversation

Dominic-Stafford
Copy link
Contributor

PR description:

Adds a HadronizerFilter mode to the Herwig7Interface, which uses numbering of LHE events to ensure that the LHE and Gen level events in the CMS event record match up- previously this was not the case for processes with merging, as Herwig would skip events silently. Should be tested with cms-sw/cmsdist#8349, which propagates the LHE numbering through Herwig.

PR validation:

Have tested the functionality works in CMSSW_10_6. Have also updated all Herwig validation examples with the new HadronizerFilter, and they all work

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2023

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40939/34425

  • This PR adds an extra 48KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2023

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40939/34426

  • This PR adds an extra 48KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40939/34427

  • This PR adds an extra 48KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2023

A new Pull Request was created by @Dominic-Stafford for master.

It involves the following packages:

  • Configuration/Generator (generators)
  • GeneratorInterface/Herwig7Interface (generators)
  • GeneratorInterface/LHEInterface (generators)

@SiewYan, @mkirsano, @Saptaparna, @cmsbuild, @alberto-sanchez, @menglu21, @GurpreetSinghChahal can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @missirol, @alberto-sanchez, @mkirsano, @fabiocos this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@smuzaffar
Copy link
Contributor

please test with cms-sw/cmsdist#8349

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 3, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7862e3/31026/summary.html
COMMIT: 9c74729
CMSSW: CMSSW_13_1_X_2023-03-02-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40939/31026/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7862e3/31026/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7862e3/31026/git-merge-result

Comparison Summary

Summary:

  • You potentially added 6 lines to the logs
  • Reco comparison results: 9 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3529699
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3529671
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@Saptaparna
Copy link
Contributor

please test 537 and 538

@Saptaparna
Copy link
Contributor

please test 537

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7862e3/31632/summary.html
COMMIT: 567218e
CMSSW: CMSSW_13_1_X_2023-03-27-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40939/31632/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7862e3/31632/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7862e3/31632/git-merge-result

Comparison Summary

Summary:

  • You potentially added 23 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 196 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3556112
  • DQMHistoTests: Total failures: 1410
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3554680
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 217 log files, 168 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

@Saptaparna
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

Will test workflow 537.0, 538.0 again as soon as CMSSW_13_1_X_2023-03-28-1100 becomes available

There are suspicious differences reported in dqm, for example this one from wf 538 where apparently only generator tracks at positive pseudorapidity are plotted:
image

I cannot believe that it is due to this PR. Therefore, let try another round of tests with hopefully less additional PRs merged on top of this one.

A few other "curious" distributions from the same wf 538 follow:
image
image
image

@Saptaparna
Copy link
Contributor

Thanks, Andrea!

@perrotta
Copy link
Contributor

please test workflow 537.0, 538.0

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7862e3/31646/summary.html
COMMIT: 567218e
CMSSW: CMSSW_13_1_X_2023-03-28-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40939/31646/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7862e3/31646/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7862e3/31646/git-merge-result

Comparison Summary

There are some workflows for which there are errors in the baseline:
24234.61 step 2
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

  • You potentially added 35 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 2900 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3556112
  • DQMHistoTests: Total failures: 5519
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3550571
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 217 log files, 168 edm output root files, 51 DQM output files
  • TriggerResults: found differences in 1 / 49 workflows

@perrotta
Copy link
Contributor

Clearly, plots in the DQM/Generator folder are not reproducible, at least in wf 538. For example, the first plot that I showed above now reads
image
with differences wrt to the previous drawing also in the baseline. (In reality even the red histogram is not all at 0: it simply stays below the beginning of the y-axis range, which is at around 100 events).

This is something @cms-sw/generators-l2 should take care of.

On the other hand, these non reproducibilities does not seem to depend on this PR: let have it merged, then, and the group can continue investigating on their origin. Wf 537 does not seem affected (only changes related to this PR are visible in the DQM comparisons for that workflow), something that could shed some light on the origin of the non reproducibility.

@perrotta
Copy link
Contributor

+1

@perrotta
Copy link
Contributor

@Dominic-Stafford since the merging of this PR IBs are crashing with the following error message (e.g. from wf 512.0):

[INFO] MG5 LO LHE with event_norm = sum detected. Will recalculate weights in each event block.
Unit weight: +8.6690076E+02
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-03-29-2300/bin/el8_amd64_gcc11/mergeLHE.py", line 429, in <module>
    main()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-03-29-2300/bin/el8_amd64_gcc11/mergeLHE.py", line 425, in main
    lhe_merger.merge()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-03-29-2300/bin/el8_amd64_gcc11/mergeLHE.py", line 264, in merge
    orig_wgt = float(line.split()[2])
IndexError: list index out of range
%MSG-e ExcessiveTime:  ExternalLHEProducer:externalLHEProducer@beginRun  30-Mar-2023 05:25:07 CEST Run: 1
ExcessiveTime: Module used 1978.84 seconds of time which exceeds the error threshold configured in the Timing Service of 600 seconds.
%MSG
----- Begin Fatal Exception 30-Mar-2023 05:25:07 CEST-----------------------
An exception of category 'ExternalLHEProducer' occurred while
   [0] Processing global begin Run run: 1
   [1] Calling method for module ExternalLHEProducer/'externalLHEProducer'
Exception Message:
Child failed with exit code 1.
----- End Fatal Exception -------------------------------------------------

Please notice that the very same error appears independently on the merge of cms-sw/cmsdist#8409, which I forgot for CMSSW_13_1_X_2023-03-29-1100 and merged later for CMSSW_13_1_X_2023-03-29-2300, but as you can see both IBs are crashing with the same error message.

Could you please have a look and provide a fix at your earliest?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants