Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[121X] include GTs for Pilot Beam Test #35593

Merged
merged 2 commits into from Oct 13, 2021

Conversation

malbouis
Copy link
Contributor

@malbouis malbouis commented Oct 8, 2021

PR description:

This PR is a forward port of PR #35561 that includes the online GTs for datataking and updates the DQM unit tests to use a 2021 CRUZET run.

As it was suggested at the AlCaDB workshop (https://indico.cern.ch/event/1069946/contributions/4511740/attachments/2313276/3937722/Condition%20Database_%20Approaching%20Run3.pdf), we are planning to replace the most populated tags (> 50k) with a new tag in the Run 3 Prompt GT, due to better performance and manageability. In this PR, the newly introduced 120X Run 3 Prompt GT has a new such tag, EcalLaserAPDPNRatiosRcd, that is only valid starting from Run3 data. It was discussed and agreed upon also with the Ecal group.

In summary, the Run 3 Prompt GTs introduced in this PR are only valid for Run 3 data.

GT differences wrt CRUZET ones:
HLT:
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/113X_dataRun3_HLT_v3/120X_dataRun3_HLT_v3
Express:
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/113X_dataRun3_Express_v4/120X_dataRun3_Express_v2
Prompt:
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/113X_dataRun3_Prompt_v3/120X_dataRun3_Prompt_v2

The hypernews with the announcement of the Pilot Test Beam GTs is here: https://hypernews.cern.ch/HyperNews/CMS/get/calibrations/4488.html

PR validation:

Just as a sanity check as we do not expect to catch the errors that showed up in DQM online:

runTheMatrix.py -l 138.1,138.2 --ibeos -j8

if this PR is a backport please specify the original PR and why you need to backport that PR:

It is a forward port of #35561.

@malbouis
Copy link
Contributor Author

malbouis commented Oct 8, 2021

@cmsbuild , please test with #35550

@cmsbuild cmsbuild added this to the CMSSW_12_1_X milestone Oct 8, 2021
@malbouis malbouis changed the title include GTs for Pilot Beam Test [121X] include GTs for Pilot Beam Test Oct 8, 2021
@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 8, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35593/25849

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 8, 2021

A new Pull Request was created by @malbouis for master.

It involves the following packages:

  • Configuration/AlCa (alca)
  • DQM/Integration (dqm)

@malbouis, @yuanchao, @pmandrik, @emanueleusai, @ahmad3213, @tvami, @jfernan2, @rvenditti, @pbo0, @francescobrivio can you please review it and eventually sign? Thanks.
@batinkov, @battibass, @tocheng, @Martin-Grunewald, @missirol, @mmusich, @threus, @fabiocos, @francescobrivio this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 8, 2021

-1

Failed Tests: UnitTests AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5be715/19507/summary.html
COMMIT: 8a3b8fe
CMSSW: CMSSW_12_1_X_2021-10-08-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35593/19507/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test TestDQMOnlineClient-beam_dqm_sourceclient had ERRORS
---> test TestDQMOnlineClient-castor_dqm_sourceclient had ERRORS
---> test TestDQMOnlineClient-beampixel_dqm_sourceclient had ERRORS
---> test TestDQMOnlineClient-ctpps_dqm_sourceclient had ERRORS
and more ...

AddOn Tests

----- Begin Fatal Exception 08-Oct-2021 22:45:24 CEST-----------------------
An exception of category 'NoRecord' occurred while
   [0] Processing  Event run: 323775 lumi: 99 event: 111372621 stream: 2
   [1] Running path 'FEVTDEBUGHLToutput_step'
   [2] Prefetching for module PoolOutputModule/'FEVTDEBUGHLToutput'
   [3] Prefetching for module CaloTowersCreator/'towerMaker'
   [4] Prefetching for module EcalRecHitProducer/'ecalRecHit@cpu'
   [5] Calling method for EventSetup module EcalLaserCorrectionService/''
   [6] While getting dependent Record from Record EcalLaserDbRecord
Exception Message:
No "EcalLaserAPDPNRatiosRcd" record found in the EventSetup.

 The Record is delivered by an ESSource or ESProducer but there is no valid IOV for the synchronization value.
 Please check 
   a) if the synchronization value is reasonable and report to the hypernews if it is not.
   b) else check that all ESSources have been properly configured. 
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 08-Oct-2021 22:40:55 CEST-----------------------
An exception of category 'NoRecord' occurred while
   [0] Processing  Event run: 325112 lumi: 6 event: 29040 stream: 2
   [1] Running path 'FEVTDEBUGHLToutput_step'
   [2] Prefetching for module PoolOutputModule/'FEVTDEBUGHLToutput'
   [3] Prefetching for module CaloTowersCreator/'towerMaker'
   [4] Prefetching for module EcalRecHitProducer/'ecalRecHit@cpu'
   [5] Calling method for EventSetup module EcalLaserCorrectionService/''
   [6] While getting dependent Record from Record EcalLaserDbRecord
Exception Message:
No "EcalLaserAPDPNRatiosRcd" record found in the EventSetup.

 The Record is delivered by an ESSource or ESProducer but there is no valid IOV for the synchronization value.
 Please check 
   a) if the synchronization value is reasonable and report to the hypernews if it is not.
   b) else check that all ESSources have been properly configured. 
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 08-Oct-2021 22:40:31 CEST-----------------------
An exception of category 'NoRecord' occurred while
   [0] Processing  Event run: 323775 lumi: 99 event: 110936866 stream: 2
   [1] Running path 'FEVTDEBUGHLToutput_step'
   [2] Prefetching for module PoolOutputModule/'FEVTDEBUGHLToutput'
   [3] Prefetching for module CaloTowersCreator/'towerMaker'
   [4] Prefetching for module EcalRecHitProducer/'ecalRecHit@cpu'
   [5] Calling method for EventSetup module EcalLaserCorrectionService/''
   [6] While getting dependent Record from Record EcalLaserDbRecord
Exception Message:
No "EcalLaserAPDPNRatiosRcd" record found in the EventSetup.

 The Record is delivered by an ESSource or ESProducer but there is no valid IOV for the synchronization value.
 Please check 
   a) if the synchronization value is reasonable and report to the hypernews if it is not.
   b) else check that all ESSources have been properly configured. 
----- End Fatal Exception -------------------------------------------------
Expand to see more addon errors ...

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 40
  • DQMHistoTests: Total histograms compared: 2798082
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2798047
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 39 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 169 log files, 37 edm output root files, 40 DQM output files
  • TriggerResults: no differences found

@malbouis
Copy link
Contributor Author

malbouis commented Oct 8, 2021

I think that for the AddOnTests what is happening is that the tests are running the Run 3 Prompt GT on 2018 data, for example, here: https://github.com/cms-sw/cmssw/blob/master/Configuration/HLT/python/addOnTestsHLT.py#L36
The Run 3 Prompt GT is not supposed to run on Run 2 data.

For the DQM unit tests, I am not so sure what could be the problem.

@malbouis
Copy link
Contributor Author

malbouis commented Oct 9, 2021

@cmsbuild , please test with #35550

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35593/25862

  • This PR adds an extra 16KB to repository

@tvami
Copy link
Contributor

tvami commented Oct 9, 2021

+alca

@jfernan2
Copy link
Contributor

+1
Backport tested in Online DQM at P5

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

@malbouis @tvami should this really get merged together with #35550, as you are requesting for the 12_0_X version of this PR? I notice that the last tests run without #35550 ended up succesfully, while the previous ones made together with that PR showed several errors instead...

@francescobrivio
Copy link
Contributor

@malbouis @tvami should this really get merged together with #35550, as you are requesting for the 12_0_X version of this PR? I notice that the last tests run without #35550 ended up succesfully, while the previous ones made together with that PR showed several errors instead...

Hi @perrotta the last tests run with #35593, #35600 and #35550 as fas as I can tell from https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5be715/19525/summary.html

@perrotta
Copy link
Contributor

@malbouis @tvami should this really get merged together with #35550, as you are requesting for the 12_0_X version of this PR? I notice that the last tests run without #35550 ended up succesfully, while the previous ones made together with that PR showed several errors instead...

Hi @perrotta the last tests run with #35593, #35600 and #35550 as fas as I can tell from https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5be715/19525/summary.html

Ah, ok, correct: I overlooked #35593 (comment)

Still I would like to know about correlations with the other PRs, and get a guidance about the needed order of mergings (or an indication about which ones must be merged together)ones

@missirol
Copy link
Contributor

To make the HLT tests pass, I think #35600 (or something equivalent) should be merged before this PR.

@mmusich
Copy link
Contributor

mmusich commented Oct 11, 2021

Just noting here for the casual future reader: the update in EcalLaserAPDPNRatiosRcd (and others) proposed in the Prompt Reco Global Tag update here, squarely breaks the fairly convenient feature of having a Prompt Reco Global Tag in autoCond which is "guaranteed" to work for whatever real data past run.

  • Would it be possible to document in the PR description that you are introducing such a breaking point so that it is immediately visibile?

On a related note, I read in #35600 (comment) that having more 50k IOVs the tag EcalLaserAPDPNRatios_prompt_v2 had become hardly manageable. As far as I know there are other tags in the Prompt Reco Global Tag with similar sizes of IoV lists (e.g. SiStripDetVOff_v6_prompt has 48842 IoVs, BeamSpotObjects_PCL_byLumi_v0_prompt has 103220 IoVs, etc. ).

  • Could you elaborate more on the issue? Shall one expect similar "cuts" for other records?

@francescobrivio
Copy link
Contributor

Just noting here for the casual future reader: the update in EcalLaserAPDPNRatiosRcd (and others) proposed in the Prompt Reco Global Tag update here, squarely breaks the fairly convenient feature of having a Prompt Reco Global Tag in autoCond which is "guaranteed" to work for whatever real data past run.

* Would it be possible to document in the PR description that you are introducing such a breaking point so that it is immediately visibile?

Hi @mmusich is this "guarantee" an actual policy? i.e. is it stated explicitely somewhere and people are counting on it? The breaking of these tags was proposed by @ggovi during the AlCaDB workshop (indico contribution).

On a related note, I read in #35600 (comment) that having more 50k IOVs the tag EcalLaserAPDPNRatios_prompt_v2 had become hardly manageable. As far as I know there are other tags in the Prompt Reco Global Tag with similar sizes of IoV lists (e.g. SiStripDetVOff_v6_prompt has 48842 IoVs, BeamSpotObjects_PCL_byLumi_v0_prompt has 103220 IoVs, etc. ).

* Could you elaborate more on the issue? Shall one expect similar "cuts" for other records?

Indeed we have a list of tags which have >50k IOVs, it's around 10 tags and some of them are for the UL offline reco, so they won't see any new append. The plan was to split the other ones (used in "online" GTs) after the Pilot Beam test in Novemeber in preparation for Run3.

@malbouis
Copy link
Contributor Author

Just noting here for the casual future reader: the update in EcalLaserAPDPNRatiosRcd (and others) proposed in the Prompt Reco Global Tag update here, squarely breaks the fairly convenient feature of having a Prompt Reco Global Tag in autoCond which is "guaranteed" to work for whatever real data past run.

* Would it be possible to document in the PR description that you are introducing such a breaking point so that it is immediately visibile?

Hi @mmusich is this "guarantee" an actual policy? i.e. is it stated explicitely somewhere and people are counting on it? The breaking of these tags was proposed by @ggovi during the AlCaDB workshop (indico contribution).

On a related note, I read in #35600 (comment) that having more 50k IOVs the tag EcalLaserAPDPNRatios_prompt_v2 had become hardly manageable. As far as I know there are other tags in the Prompt Reco Global Tag with similar sizes of IoV lists (e.g. SiStripDetVOff_v6_prompt has 48842 IoVs, BeamSpotObjects_PCL_byLumi_v0_prompt has 103220 IoVs, etc. ).

* Could you elaborate more on the issue? Shall one expect similar "cuts" for other records?

Indeed we have a list of tags which have >50k IOVs, it's around 10 tags and some of them are for the UL offline reco, so they won't see any new append. The plan was to split the other ones (used in "online" GTs) after the Pilot Beam test in Novemeber in preparation for Run3.

I just updated the description of the PR with ~ the same reply. :-)

@mmusich
Copy link
Contributor

mmusich commented Oct 11, 2021

is this "guarantee" an actual policy? i.e. is it stated explicitely somewhere and people are counting on it?

I wouldn't say it's a policy, but as I tried to phrase carefully in my previous message, it's rather a convenient feature.
If this has become now untenable (for what reason?), I understand but I was surprised to see only one tag is modified, while there are far worse offenders.

@francescobrivio
Copy link
Contributor

I wouldn't say it's a policy, but as I tried to phrase carefully in my previous message, it's rather a convenient feature.

Ok yes I see. The idea was that if one wants to re-run the prompt reco on some old data the old prompt GT can be used.

If this has become now untenable (for what reason?), I understand

Quoting from @ggovi slide 8:

  • Better performance in queries
  • Lower memory consumption in CMSSW jobs
  • Faster and more manageable response by DBBrowser

but I was surprised to see only one tag is modified, while there are far worse offenders.

Since we were updating the EcalLaser O2O system we took the (good) opportunity to split that tag only, the others will come
later when there is less frantic activity in preparation of data-taking 😄

@tvami
Copy link
Contributor

tvami commented Oct 11, 2021

@perrotta @qliphy indeed #35600 needs to merge first and then this and #35550 needs to merge at the same time

@qliphy
Copy link
Contributor

qliphy commented Oct 13, 2021

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants