Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[12_4_HLT_X] Improve memory usage in ParameterSet #45000

Merged
merged 12 commits into from
Jun 5, 2024

Conversation

makortel
Copy link
Contributor

@makortel makortel commented May 17, 2024

PR description:

This PR backports #42742 in order to allow a file written in 14_0_X to be read by a 12_4_X cmsRun in order to run the HLT step in 12_4_X in the upcoming 2022 MC campaign.

The last commit was needed to get the backport to compile, because of #43898 was in 13_0_X already before the backport.

Note that files written with a release containing this PR will be unreadable by earlier 12_4_X releases. Therefore this PR is to be merged only in the 12_4_HLT_X branch.

Resolves cms-sw/framework-team#919

PR validation:

I modified the test_MC_22_setup test, added in #44578, to use my local 12_4_20 + this PR developer area, and the test got beyond the ParameterSet-related error. The job still failed with another error

----- Begin Fatal Exception 07-May-2024 18:46:04 CEST-----------------------
An exception of category 'ConditionsError' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'HLT_PPSMaxTracksPerRP4_v2'
   [2] Calling method for module L1TGlobalProducer/'hltGtStage2ObjectMap'
Exception Message:
 Error L1 menu loaded in via conditions does not match the L1 actually run 1517097079 vs 2016981387. This means that the mapping of the names to the bits may be incorrect. Please check the L1TUtmTriggerMenuRcd record supplied. Unless you know what you are doing, do not simply disable this check via the config as this a major error and the indication of something very wrong
----- End Fatal Exception -------------------------------------------------

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Backport of #42742

@cmsbuild
Copy link
Contributor

cmsbuild commented May 17, 2024

A new Pull Request was created by @makortel for CMSSW_12_4_HLT_X.

It involves the following packages:

  • DQMOffline/Trigger (dqm)
  • FWCore/Framework (core)
  • FWCore/Integration (core)
  • FWCore/ParameterSet (core)
  • IOPool/Common (core)
  • IOPool/Input (core)
  • SimGeneral/HepPDTRecord (simulation)

@tjavaid, @Dr15Jones, @syuvivida, @rvenditti, @smuzaffar, @antoniovagnerini, @cmsbuild, @mdhildreth, @makortel, @nothingface0, @civanch can you please review it and eventually sign? Thanks.
@missirol, @slomeo, @trocino, @mtosi, @fabiocos, @wddgit, @cericeci, @rociovilar, @jhgoh, @HuguesBrun, @Fedespring this is something you requested to watch as well.
@sextonkennedy, @rappoccio, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented May 17, 2024

cms-bot internal usage

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@makortel
Copy link
Contributor Author

The job still failed with another error

----- Begin Fatal Exception 07-May-2024 18:46:04 CEST-----------------------
An exception of category 'ConditionsError' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'HLT_PPSMaxTracksPerRP4_v2'
   [2] Calling method for module L1TGlobalProducer/'hltGtStage2ObjectMap'
Exception Message:
 Error L1 menu loaded in via conditions does not match the L1 actually run 1517097079 vs 2016981387. This means that the mapping of the names to the bits may be incorrect. Please check the L1TUtmTriggerMenuRcd record supplied. Unless you know what you are doing, do not simply disable this check via the config as this a major error and the indication of something very wrong
----- End Fatal Exception -------------------------------------------------

@cms-sw/pdmv-l2 Please note the test_MC_22_setup needs further work beyond this backport PR.

@makortel
Copy link
Contributor Author

cms/45000/HLT/el8_amd64_gcc10/comparison Pending — Waiting for tests to start

@smuzaffar Are the tests stuck?

@AdrianoDee
Copy link
Contributor

The job still failed with another error

----- Begin Fatal Exception 07-May-2024 18:46:04 CEST-----------------------
An exception of category 'ConditionsError' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'HLT_PPSMaxTracksPerRP4_v2'
   [2] Calling method for module L1TGlobalProducer/'hltGtStage2ObjectMap'
Exception Message:
 Error L1 menu loaded in via conditions does not match the L1 actually run 1517097079 vs 2016981387. This means that the mapping of the names to the bits may be incorrect. Please check the L1TUtmTriggerMenuRcd record supplied. Unless you know what you are doing, do not simply disable this check via the config as this a major error and the indication of something very wrong
----- End Fatal Exception -------------------------------------------------

@cms-sw/pdmv-l2 Please note the test_MC_22_setup needs further work beyond this backport PR.

On it, thanks!

@makortel makortel changed the title [12_4_X] Improve memory usage in ParameterSet [12_4_HLT_X] Improve memory usage in ParameterSet May 21, 2024
@smuzaffar
Copy link
Contributor

please test

@makortel , yes tests were stuck as there was no CMSSW_12_4_X_2024-05-14-1100 IB ( for baseline generation) for CMSSW_12_4_HLT_X_2024-05-14-1100. lets re-run so that PR can use CMSSW_12_4_X_2024-05-19-0000

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-38052b/39456/summary.html
COMMIT: 74905fa
CMSSW: CMSSW_12_4_HLT_X_2024-05-19-0000/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45000/39456/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 2 errors in the following unit tests:

---> test test-das-selected-lumis had ERRORS
---> test test_edmPickEvents had ERRORS

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3764887
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3764857
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -308.312 KiB( 50 files compared)
  • DQMHistoSizes: changed ( 1000.0 ): -308.312 KiB HLT/EGM
  • Checked 212 log files, 167 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

Both unit test failures are

DAS error: b'2024/05/22 00:15:34 ERROR failed to parse X509 proxy: crypto/tls: failed to parse key: asn1: syntax error: data truncated\n'

that looks transient

@makortel
Copy link
Contributor Author

+core

@makortel
Copy link
Contributor Author

backport of #42742

@civanch
Copy link
Contributor

civanch commented May 26, 2024

+1

@makortel
Copy link
Contributor Author

@cms-sw/dqm-l2 Could you review and sign? Thanks!

@AdrianoDee
Copy link
Contributor

AdrianoDee commented May 30, 2024

The job still failed with another error

----- Begin Fatal Exception 07-May-2024 18:46:04 CEST-----------------------
An exception of category 'ConditionsError' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'HLT_PPSMaxTracksPerRP4_v2'
   [2] Calling method for module L1TGlobalProducer/'hltGtStage2ObjectMap'
Exception Message:
 Error L1 menu loaded in via conditions does not match the L1 actually run 1517097079 vs 2016981387. This means that the mapping of the names to the bits may be incorrect. Please check the L1TUtmTriggerMenuRcd record supplied. Unless you know what you are doing, do not simply disable this check via the config as this a major error and the indication of something very wrong
----- End Fatal Exception -------------------------------------------------

@cms-sw/pdmv-l2 Please note the test_MC_22_setup needs further work beyond this backport PR.

@makortel so this error is coming from the fact that the L1T menu loaded from the 140X GT (L1Menu_Collisions2022_v1_4_0-d1_xml) is different from the one needed by HLT:2022v14 (L1Menu_Collisions2022_v1_3_0-d1_xml as can be seen from ). At the moment there's no 140X GT including the v1_3_0 menu
or, viceversa,124X GT including the v1_4_0 one. The final GT for the campaigns need to be defined yet.

I have set up a possible workaround forcing the cmsDriver.py running the DIGI,L1,DIGI2RAW steps to use the proper payload but it's a bit cumbersome since it should, to make it general, load the l1Menus dictionary from the HLT specific release and overwrite the GT in the config. Now, since this is something that would be solved by a dedicated GT I'm not sure this is really useful.

Also because, on the bright side, if I run the test_MC_22_setup

  1. adding to the DIGI,L1,DIGI2RAW step in test_MC_setup_gen_sim.sh:
--custom_conditions L1Menu_Collisions2022_v1_3_0-d1_xml,L1TUtmTriggerMenuRcd,frontier://FrontierProd/CMS_CONDITIONS,,2022-08-01 08:47:17.000
  1. with this PR on top of 12_4_20;

the chain runs smoothly. So most probably it's not really needed.

@makortel
Copy link
Contributor Author

makortel commented Jun 3, 2024

@cms-sw/dqm-l2 Could you please review and sign? Thanks!

@tjavaid
Copy link

tjavaid commented Jun 4, 2024

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2024

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_4_HLT_X IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@makortel
Copy link
Contributor Author

makortel commented Jun 5, 2024

@cms-sw/orp-l2 Here is my request on the 13_0_HLT_X and 12_4_HLT_X releases that I brought up in the ORP yesterday

  • 13_0_X: After the next release (CMSSW_13_0_19 ?) has been built, build a corresponding 13_0_19_HLT release (that includes Improve memory usage in ParameterSet #42742 on top of 13_0_19)
  • 12_4_X: After the next release (12_4_21 ?) has been built, and this PR has been merged, build a corresponding 12_4_21_HLT release (i.e. 12_4_21 and this PR)

@AdrianoDee I won't be able to update the tests in 14_1_X/14_0_X to use these 13_0_19_HLT and 12_4_21_HLT release until July, so if you need to make progress sooner, someone else has to update the tests.

@antoniovilela
Copy link
Contributor

+1

@antoniovilela
Copy link
Contributor

merge

@cmsbuild cmsbuild merged commit 403a8d3 into cms-sw:CMSSW_12_4_HLT_X Jun 5, 2024
8 of 9 checks passed
@antoniovilela
Copy link
Contributor

@cms-sw/orp-l2 Here is my request on the 13_0_HLT_X and 12_4_HLT_X releases that I brought up in the ORP yesterday

  • 13_0_X: After the next release (CMSSW_13_0_19 ?) has been built, build a corresponding 13_0_19_HLT release (that includes Improve memory usage in ParameterSet #42742 on top of 13_0_19)
  • 12_4_X: After the next release (12_4_21 ?) has been built, and this PR has been merged, build a corresponding 12_4_21_HLT release (i.e. 12_4_21 and this PR)

@AdrianoDee I won't be able to update the tests in 14_1_X/14_0_X to use these 13_0_19_HLT and 12_4_21_HLT release until July, so if you need to make progress sooner, someone else has to update the tests.

And the 13_0_HLT_X PR is already merged (#44921).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants