[12_2_X] Add Run3 relval GTs and use fixed snapshot in online GTs #37670

francescobrivio · 2022-04-25T16:41:43Z

PR description:

Backport of #37171
As reported in #37653 (comment) (and later in the AlCaDB CMS Talk forum by @Martin-Grunewald), since datataking switched to CMSSW_12_3_X (and the boost version in use is now 1_78) the HLT tests in the 12_2_X IBs are failing because they are still using the auto:run3_hlt and auto:run3_data_prompt GTs (which have an infinite snapshot).

This PR introduces, for the 12_2_X cycle, the same "relval" autoCond entries for run3 data there were already introduced in 12_3_X and 12_4_X, namely:

auto:run3_hlt_relval --> GT: 122X_dataRun3_HLT_relval_v2
auto:run3_data_relval --> GT: 122X_dataRun3_relval_v2

Following @missirol suggestion I also updated autoCondHLT.py to use the newly created GTs.

Edit:
After some discussion in @cms-sw/alca-l2 we decided it is better NOT to have "open-ended" GTs in the release as they can cause all sort of troubles in the IBs and also possibly reproducibility issues.
So I updated all the online GTs (HLT/Express/Prompt) to a "frozen" version, i.e. with a fixed snapshot at the time of the last run processed with 12_2_X in Tier0 (as announced in this CMSTalk post): Run 350279 - Snapshot: 2022-04-12 15:15:00

Differences wrt the hlt and offline Run3 data GTs are:
Run 3 HLT RelVals
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/122X_dataRun3_HLT_v4/122X_dataRun3_HLT_relval_v2

Run 3 data RelVals
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/122X_dataRun3_v3/122X_dataRun3_relval_v2

Run 3 HLT
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/122X_dataRun3_HLT_v4/122X_dataRun3_HLT_frozen_v1

Run 3 Express
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/122X_dataRun3_Express_v3/122X_dataRun3_Express_frozen_v1

Run 3 Prompt
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/122X_dataRun3_Prompt_v3/122X_dataRun3_Prompt_frozen_v1

PR validation:

Code compiles and I run the same HLT test that is running in the IBs (thanks @missirol for the recipe):

git cms-addpkg HLTrigger/Configuration
cd HLTrigger/Configuration/test
./runAll.csh dev

As expected this produces:

16:23:01 cmsRun RelVal_RECO_GRun_DATA.py >& RelVal_RECO_GRun_DATA.log
191.193u 35.487s 3:09.59 119.5%    0+0k 4524536+79240io 13954pf+0w
16:26:11 exit status: 0

Proving that the updates are correct.

Backport

backport of #37171

cmsbuild · 2022-04-25T16:42:02Z

A new Pull Request was created by @francescobrivio for CMSSW_12_2_X.

It involves the following packages:

Configuration/AlCa (alca)
Configuration/HLT (hlt)

@malbouis, @yuanchao, @cmsbuild, @missirol, @Martin-Grunewald, @francescobrivio, @tvami can you please review it and eventually sign? Thanks.
@silviodonato, @tocheng, @Martin-Grunewald, @missirol, @mmusich, @fabiocos this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

Backported from Update autoCond.py to include GTs with fixed L1 Menu for Run3 #37171

tvami · 2022-04-25T16:44:31Z

@cmsbuild , please test

francescobrivio · 2022-04-25T16:46:43Z

backport of #37171

cmsbuild · 2022-04-25T19:53:33Z

-1

Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-79a9f7/24194/summary.html
COMMIT: 6ff8cb4
CMSSW: CMSSW_12_2_X_2022-04-25-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37670/24194/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-INPUT

138.4138.4_PromptCollisions+RunMinimumBias2021+ALCARECOPROMPTR3+HARVESTDPROMPTR3/step2_PromptCollisions+RunMinimumBias2021+ALCARECOPROMPTR3+HARVESTDPROMPTR3.log

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 4 differences found in the comparisons
DQMHistoTests: Total files compared: 42
DQMHistoTests: Total histograms compared: 3251324
DQMHistoTests: Total failures: 6
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3251296
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 41 files compared)
Checked 177 log files, 37 edm output root files, 42 DQM output files
TriggerResults: no differences found

Configuration/AlCa/python/autoCond.py

Martin-Grunewald · 2022-04-28T06:38:05Z

Can we progress on this PR, so that the IB HLT-validation and TSG tests work again? The test failure can not be related to this PR.

In general the use case from the TSG side for these two special GTs is to run the (more recent and developing) HLT menus on data taken with an older HLT menu. The goal is NOT to reproduce the old HLT from the time the data was taken, but so see the performance of the new HLT menu, in turn using new calibrations etc, on that old data (to be replaced by real Run3 collisions once we have them).
A secondary goal would be if we run the same HLT menu now and in a week (in the same IB/release), we'd like to get same results, but this is a secondary goal, not the primary, and thus not need to be guaranteed for these two special GTs.

Of course, if 12_2 is no longer needed, then we could let it rot but then it should be removed from IB/relval testing, ie,
removed from https://cmssdt.cern.ch/SDT/html/cmssdt-ib/ to save resources. @perrotta @qliphy

cmsbuild · 2022-05-01T17:25:59Z

Pull request #37670 was updated. @malbouis, @yuanchao, @cmsbuild, @missirol, @Martin-Grunewald, @francescobrivio, @tvami can you please check and sign again.

francescobrivio · 2022-05-01T17:28:12Z

test parameters:

workflow = 138.4

francescobrivio · 2022-05-01T17:28:45Z

@cmsbuild please test

francescobrivio · 2022-05-01T17:33:29Z

Can we progress on this PR, so that the IB HLT-validation and TSG tests work again? The test failure can not be related to this PR

Hi Martin, sorry for the delay I was offline for a couple of days, it should be fixed now.
After discussing with @cms-sw/alca-l2 and @mmusich we agreed it's better to have a fixed snapshot for the online GTs (including the two "relval" ones). I'll update the PR description and title to give more extensive details.

We would like to test in 12_2_X IBs if everything is ok like this, if so we will update also 12_3_X and 12_4_X with the same policy. @perrotta @qliphy I know the forward porting is not the standard procedure, but in this case using the 12_2_X IBs is perfect to reproduce the boost version change between releases, I hope that is fine with you 😄

cmsbuild · 2022-05-01T20:35:54Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-79a9f7/24365/summary.html
COMMIT: 5306e7f
CMSSW: CMSSW_12_2_X_2022-05-01-0000/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/37670/24365/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

There are some workflows for which there are errors in the baseline:
138.4 step 2
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

You potentially added 940 lines to the logs
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 42
DQMHistoTests: Total histograms compared: 3251324
DQMHistoTests: Total failures: 0
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3251302
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 41 files compared)
Checked 179 log files, 37 edm output root files, 42 DQM output files
TriggerResults: no differences found

Martin-Grunewald · 2022-05-02T08:25:41Z

Can we progress on this PR, so that the IB HLT-validation and TSG tests work again? The test failure can not be related to this PR

Hi Martin, sorry for the delay I was offline for a couple of days, it should be fixed now. After discussing with @cms-sw/alca-l2 and @mmusich we agreed it's better to have a fixed snapshot for the online GTs (including the two "relval" ones). I'll update the PR description and title to give more extensive details.

We would like to test in 12_2_X IBs if everything is ok like this, if so we will update also 12_3_X and 12_4_X with the same policy. @perrotta @qliphy I know the forward porting is not the standard procedure, but in this case using the 12_2_X IBs is perfect to reproduce the boost version change between releases, I hope that is fine with you smile

Sorry, but does this fixed snapshot mean the GTs do not (automatically) update for new (later) calibration records?
Would we then need to manually request from you an update of those GTs?

francescobrivio · 2022-05-02T08:46:37Z

Sorry, but does this fixed snapshot mean the GTs do not (automatically) update for new (later) calibration records?
Would we then need to manually request from you an update of those GTs?

Hi Martin, yes that is correct.
We will update manually all the "frozen" GTs every time that the "online" ones are updated. This is somehow restoring the same policy that we had back at the beginning of Run2, as pointed out by Marco few messages above. In order to keep the history of what is used in data-taking I added in the commented lines all the info needed to go back from the "frozen" GTs to their exact corresponding "online" GTs.

Please note that for the relval GTs the snapshot has always been frozen, see e.g. the current 123X relval GTs:

Martin-Grunewald · 2022-05-02T09:24:57Z

OK, thanks for the clarification and the workflow to update.

malbouis · 2022-05-02T13:34:12Z

+alca

introducing new policy for "online" GTs in autoCond.py

cmsbuild · 2022-05-02T13:34:33Z

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_2_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_4_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

mmusich · 2022-05-02T13:55:16Z

introducing new policy for "online" GTs in autoCond.py

Except is not new at all :D

francescobrivio · 2022-05-02T15:55:15Z

@perrotta @qliphy it would be nice to merge this PR and check the effect on the 12_2_X IBs, which should now be fixed :D

perrotta · 2022-05-03T16:54:17Z

Summary:

You potentially added 940 lines to the logs

Reco comparison results: 0 differences found in the comparisons

Is anybody able to trace where do those extra lines in the logs come from?
I checked a few Run3 wf logs (at random) and I didn't find any. They could potentially point out some issue, though.

tvami · 2022-05-03T17:08:21Z

Isnt that reffering to this
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-79a9f7/24365/build.log
?

Most things I see here are not related to this PR

missirol · 2022-05-03T17:20:54Z

Just a guess: maybe it has to do with the wf that was failing in IB, and passes now thanks to this PR, i.e. 138.4.

Baseline :
https://cmssdt.cern.ch/SDT/jenkins-artifacts/ib-baseline-tests/CMSSW_12_2_X_2022-05-01-0000/slc7_amd64_gcc900/-GenuineIntel/matrix-results/138.4_PromptCollisions+RunMinimumBias2021+ALCARECOPROMPTR3+HARVESTDPROMPTR3/step2_PromptCollisions+RunMinimumBias2021+ALCARECOPROMPTR3+HARVESTDPROMPTR3.log

Baseline+PR :
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-79a9f7/24365/runTheMatrix-results/138.4_PromptCollisions+RunMinimumBias2021+ALCARECOPROMPTR3+HARVESTDPROMPTR3/step2_PromptCollisions+RunMinimumBias2021+ALCARECOPROMPTR3+HARVESTDPROMPTR3.log

That wf does not run HLT, and I don't know if those log messages are to be expected.

mmusich · 2022-05-03T17:29:56Z

That wf does not run HLT, and I don't know if those log messages are to be expected.

If that wf does not run HLT, why are we running the HLT tracking monitoring in it? That is producing a gazillion of product not found warnings...

missirol · 2022-05-03T17:38:27Z

Good question for DQM? @cms-sw/dqm-l2

missirol · 2022-05-03T18:43:30Z

Good question for DQM?

Actually, the question might be better directed to who designed, or maintains, that wf.

In any case, one way to fix this is [*]. I can't judge if that's the correct fix, that's left to the experts.

It's clear that this issue is not introduced by this PR, and it likely needs to be fixed in master first.

Arguably, this PR should not be delayed because of it.

[*]

diff --git a/Configuration/PyReleaseValidation/python/relval_steps.py b/Configuration/PyReleaseValidation/python/relval_steps.py
index d7a461e1bda..2dde975c0c4 100644
--- a/Configuration/PyReleaseValidation/python/relval_steps.py
+++ b/Configuration/PyReleaseValidation/python/relval_steps.py
@@ -2738,7 +2738,7 @@ steps['ALCARECOEXPR3']=merge([{'-s':'ALCAOUTPUT:SiPixelCalZeroBias+SiStripCalZer
                                '--triggerResultsProcess': 'RECO',
                                '--customise':'Configuration/DataProcessing/RecoTLR.customiseExpress'},steps['RECODR3']])
 
-steps['ALCARECOPROMPTR3']=merge([{'-s':'RAW2DIGI,L1Reco,RECO,ALCA:SiStripCalZeroBias+SiStripCalMinBias+TkAlMinBias+HcalCalHO+HcalCalIterativePhiSym+HcalCalHBHEMuonProducerFilter+HcalCalIsoTrkProducerFilter,DQM',
+steps['ALCARECOPROMPTR3']=merge([{'-s':'RAW2DIGI,L1Reco,RECO,ALCA:SiStripCalZeroBias+SiStripCalMinBias+TkAlMinBias+HcalCalHO+HcalCalIterativePhiSym+HcalCalHBHEMuonProducerFilter+HcalCalIsoTrkProducerFilter,DQM:@standardDQMFakeHLT',
                                   '--conditions':'auto:run3_data_prompt',
                                   '--scenario':'pp',
                                   '--era':'Run3',

missirol · 2022-05-05T09:00:30Z

@perrotta, the issue of spurious warnings in wf 138.4 is being addressed elsewhere.

Do you think this PR can now be integrated?

mmusich · 2022-05-05T09:12:30Z

@smuzaffar

You potentially added 940 lines to the logs

is there any easy way to scan from where these come from?

smuzaffar · 2022-05-05T09:19:04Z

@mmusich , please check the https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_12_2_X_2022-05-01-0000+79a9f7/49980/validateJR/logRootQA.log log file for details

mmusich · 2022-05-05T09:25:34Z

Just a guess: maybe it has to do with the wf that was failing in IB, and passes now thanks to this PR, i.e. 138.4.

I think this is indeed probably the cause of the additions (from the log above, thanks @smuzaffar !):

You added 2712 to /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-79a9f7/138.4_PromptCollisions+RunMinimumBias2021+ALCARECOPROMPTR3+HARVESTDPROMPTR3/step2_PromptCollisions+RunMinimumBias2021+ALCARECOPROMPTR3+HARVESTDPROMPTR3.log

perrotta · 2022-05-05T09:36:03Z

Thank you @smuzaffar : it is not an easy read, but I also think that it can confirm that the extra lines in the log mostly come from 138.4.

According to that link the extra lines in that workflow log are 2712, while #37670 (comment) says that overall the extra lines are 940. It means that all other log comprehensively remove O(1800) lines. I think they could be the oned detailed in the bottom of https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_12_2_X_2022-05-01-0000+79a9f7/49980/validateJR/logRootQA.log where "You removed..." is written. But I don't think that any more detailed debug of the issue is needed ;-)

Conclusion: what @missirol suggested in #37670 (comment) makes sense, and we can merge this PR, so that next IB tests in 12_2_X will be clean.

perrotta · 2022-05-05T09:37:17Z

+1

Meant to remove the errors to the HLT tests in 12_2_X IBs

Add Run3 relval GTs

6ff8cb4

cmsbuild added this to the CMSSW_12_2_X milestone Apr 25, 2022

cmsbuild added alca-pending hlt-pending orp-pending pending-signatures tests-pending labels Apr 25, 2022

cmsbuild added tests-started and removed tests-pending labels Apr 25, 2022

cmsbuild added the backport-ok label Apr 25, 2022

cmsbuild added tests-rejected and removed tests-started labels Apr 25, 2022

mmusich reviewed Apr 25, 2022

View reviewed changes

Configuration/AlCa/python/autoCond.py Outdated Show resolved Hide resolved

mmusich reviewed Apr 25, 2022

View reviewed changes

Configuration/AlCa/python/autoCond.py Outdated Show resolved Hide resolved

Use frozen snapshot for online GTs

5306e7f

cmsbuild added tests-pending and removed tests-rejected labels May 1, 2022

cmsbuild added tests-started and removed tests-pending labels May 1, 2022

francescobrivio changed the title ~~[12_2_X] Add Run3 relval GTs~~ [12_2_X] Add Run3 relval GTs and use fixed snapshot in online GTs May 1, 2022

cmsbuild added alca-approved fully-signed and removed alca-pending pending-signatures labels May 2, 2022

mmusich mentioned this pull request May 5, 2022

don't run HLT DQM monitoring in wf 138.4 #37812

Merged

cmsbuild added orp-approved and removed orp-pending labels May 5, 2022

cmsbuild merged commit 4ed73ff into cms-sw:CMSSW_12_2_X May 5, 2022

malbouis mentioned this pull request May 9, 2022

Add EGM HLT tags to data GTs and introduce fixed snapshot in online GTs #37870

Merged

francescobrivio mentioned this pull request May 10, 2022

[12_3_X] Add EGM HLT tags to data GTs and introduce fixed snapshot in online GTs #37883

Merged

francescobrivio mentioned this pull request May 23, 2022

Lock more methods of JobReport [12_1] #38018

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[12_2_X] Add Run3 relval GTs and use fixed snapshot in online GTs #37670

[12_2_X] Add Run3 relval GTs and use fixed snapshot in online GTs #37670

francescobrivio commented Apr 25, 2022 •

edited

cmsbuild commented Apr 25, 2022 •

edited

tvami commented Apr 25, 2022

francescobrivio commented Apr 25, 2022

cmsbuild commented Apr 25, 2022

Martin-Grunewald commented Apr 28, 2022

cmsbuild commented May 1, 2022

francescobrivio commented May 1, 2022

francescobrivio commented May 1, 2022

francescobrivio commented May 1, 2022

cmsbuild commented May 1, 2022

Martin-Grunewald commented May 2, 2022

francescobrivio commented May 2, 2022

Martin-Grunewald commented May 2, 2022

malbouis commented May 2, 2022

cmsbuild commented May 2, 2022

mmusich commented May 2, 2022

francescobrivio commented May 2, 2022

perrotta commented May 3, 2022

tvami commented May 3, 2022

missirol commented May 3, 2022 •

edited

mmusich commented May 3, 2022

missirol commented May 3, 2022

missirol commented May 3, 2022

missirol commented May 5, 2022

mmusich commented May 5, 2022 •

edited

smuzaffar commented May 5, 2022

mmusich commented May 5, 2022

perrotta commented May 5, 2022

perrotta commented May 5, 2022

[12_2_X] Add Run3 relval GTs and use fixed snapshot in online GTs #37670

[12_2_X] Add Run3 relval GTs and use fixed snapshot in online GTs #37670

Conversation

francescobrivio commented Apr 25, 2022 • edited

PR description:

PR validation:

Backport

cmsbuild commented Apr 25, 2022 • edited

tvami commented Apr 25, 2022

francescobrivio commented Apr 25, 2022

cmsbuild commented Apr 25, 2022

RelVals-INPUT

Comparison Summary

Martin-Grunewald commented Apr 28, 2022

cmsbuild commented May 1, 2022

francescobrivio commented May 1, 2022

francescobrivio commented May 1, 2022

francescobrivio commented May 1, 2022

cmsbuild commented May 1, 2022

Comparison Summary

Martin-Grunewald commented May 2, 2022

francescobrivio commented May 2, 2022

Martin-Grunewald commented May 2, 2022

malbouis commented May 2, 2022

cmsbuild commented May 2, 2022

mmusich commented May 2, 2022

francescobrivio commented May 2, 2022

perrotta commented May 3, 2022

tvami commented May 3, 2022

missirol commented May 3, 2022 • edited

mmusich commented May 3, 2022

missirol commented May 3, 2022

missirol commented May 3, 2022

missirol commented May 5, 2022

mmusich commented May 5, 2022 • edited

smuzaffar commented May 5, 2022

mmusich commented May 5, 2022

perrotta commented May 5, 2022

perrotta commented May 5, 2022

francescobrivio commented Apr 25, 2022 •

edited

cmsbuild commented Apr 25, 2022 •

edited

missirol commented May 3, 2022 •

edited

mmusich commented May 5, 2022 •

edited