Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PPS and EGM tags to dataRun3 GTs #37557

Merged
merged 1 commit into from Apr 15, 2022

Conversation

malbouis
Copy link
Contributor

@malbouis malbouis commented Apr 13, 2022

PR description:

This PR is to include the new 123X online GTs as well as updating 123X_dataRun3_HLT_relval in autoCond.py

The difference with respect to the previous version of the GTs is the inclusion of new PPS and EGM HLT supercluster regression tags (presented here: https://indico.cern.ch/event/1149147/#22-egm-hlt-supercluster-regres).

Note on the EGM tags: the current GTs do not yet have the new payloads. The new payloads will be appended once the FTV has validated them. And then, when doing the append, the corresponding flag should be changed as well.

The GT differences are listed below:
run3_hlt
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts//123X_dataRun3_HLT_v7/123X_dataRun3_HLT_v5

run3_hlt_relval
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts//123X_dataRun3_HLT_relval_v4/123X_dataRun3_HLT_relval_v3

run3_data_express
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts//123X_dataRun3_Express_v5/123X_dataRun3_Express_v4

run3_data_prompt
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts//123X_dataRun3_Prompt_v6/123X_dataRun3_Prompt_v5

PR validation:

The GTs were validated on a Tier0 replay (Express and Prompt) and a Fast Track Validation (HLT and Express).

if this PR is a backport please specify the original PR and why you need to backport that PR:

backport needed in 12_3_X

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37557/29289

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @malbouis for master.

It involves the following packages:

  • Configuration/AlCa (alca)

@cmsbuild, @malbouis, @tvami, @yuanchao, @francescobrivio can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @missirol, @mmusich, @fabiocos, @tocheng this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@tvami
Copy link
Contributor

tvami commented Apr 13, 2022

@cmsbuild , please test

@missirol
Copy link
Contributor

Mh.. I guess I need a clarification on the EGM update (@cms-sw/egamma-pog-l2).

This comment from @francescobrivio includes

we (alca) will do a PR with the new HLT GT and changing the flag to true

The 'flag' refers to what is also mentioned in the summary slide of the EGM talk:

Given the FTV is successful, along with the update of these tags in the HLT data GT, theFLAG would have to be set TRUE in [2] and [3]

Assuming the flag is changed in the source code, what happens then for MC? Since the HLT config is unchanged, wouldn't we need the new EGM-HLT tags in the Run-3 MC GTs?

Also, what about auto:run3_data and auto:run3_data_relval? Those still contain the old HLT-EGM tags, as far as I can see. I think those HLT tags are consumed from those GT (well, the relval one) in offline tests that run HLT+RECO in the same job.

@malbouis
Copy link
Contributor Author

Also, what about auto:run3_data and auto:run3_data_relval? Those still contain the old HLT-EGM tags, as far as I can see. I think those HLT tags are consumed from those GT (well, the relval one) in offline tests that run HLT+RECO in the same job.

I will let @cms-sw/egamma-pog-l2 reply about the MC GTs as we have not been requested to update them.

Concerning the offline data GTs, we thought they could be updated at a later stage, once the FTV is finished and the new payload validated. The same goes for the flag that is not set to True in this PR.

In case you think we should be updating the offline GTs right away due to the HLT offline tests, please let us know.

@missirol
Copy link
Contributor

once the FTV is finished and the new payload validated

It's not clear to me why we are updating the HLT GTs if the new payload is not yet validated (the fast-track validation based on the HLT menu for Cosmics is unlikely to exercise this update of the EGM HLT supercluster regression tags).

In any case, my understanding based on comments from others is that updating the tag without updating the flag in the source code leads to incorrect results, so I was expecting the changes to be done at the same time, as written in #37491 (review).

In case you think we should be updating the offline GTs right away due to the HLT offline tests, please let us know.

Regarding the (HLT tags in the) offline GTs, the same reasoning applies: the tags would have to change as soon as the source code is changed, if HLT is to produce the intended results. EGM experts might correct me.

@malbouis
Copy link
Contributor Author

malbouis commented Apr 13, 2022

once the FTV is finished and the new payload validated

It's not clear to me why we are updating the HLT GTs if the new payload is not yet validated (the fast-track validation based on the HLT menu for Cosmics is unlikely to exercise this update of the EGM HLT supercluster regression tags).

As of now, the GTs are updated with the new tag but that tag does not contain the new payload, so it is basically the "old" tag.
So once the payload is validated with the FTV, one can update the Flag in CMSSW in conjunction with the append of the new payload.

In any case, my understanding based on comments from others is that updating the tag without updating the flag in the source code leads to incorrect results, so I was expecting the changes to be done at the same time, as written in #37491 (review).

Indeed. The flag could be changed when the new payloads will be appended to the tags. As mentioned above, the new payload is not appended yet.

In case you think we should be updating the offline GTs right away due to the HLT offline tests, please let us know.

Regarding the (HLT tags in the) offline GTs, the same reasoning applies: the tags would have to change as soon as the source code is changed, if HLT is to produce the intended results. EGM experts might correct me.

Understood, thanks.
Let's wait then on EGM to confirm the update of the offline as well as the MC GTs with these tags.

I will update the description of the PR to make it clear that the current EGM tags in the GTs do not yet have the new payloads. That will be appended once the FTV has validated it. And then, when doing the append, the Flag should be changed as well.

@missirol
Copy link
Contributor

Okay, thanks for the clarification; clearly I didn't notice that the payload was the same.

Opening a short parenthesis for my own understanding: what's the advantage of updating/changing the tags first without changing the payloads?

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-596180/23888/summary.html
COMMIT: 19027dd
CMSSW: CMSSW_12_4_X_2022-04-13-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37557/23888/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3593039
  • DQMHistoTests: Total failures: 13
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3593003
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 200 log files, 45 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@malbouis
Copy link
Contributor Author

Okay, thanks for the clarification; clearly I didn't notice that the payload was the same.

Opening a short parenthesis for my own understanding: what's the advantage of updating/changing the tags first without changing the payloads?

I would say it is so that we have autoCond.py in line with the data-taking GTs that will soon be used at P5.

@swagata87
Copy link
Contributor

(sorry I'm travelling, and can engage in this properly only early next week.. )

about the MC GTs as we have not been requested to update them.

yes we missed to request for this, my fault. Thanks Helena for clarifying. And Saumya will be requesting this soon. So HLT regression tags need to go to MC GT also, so that HLT can be properly simulated.

I think I also saw a comment in this thread about HLT tags in offline data reco GT.. why is that needed?
We have different regression for online and offline reco. In offline data GTs, we have put in the offline Run3 regression tags. Isn't that enough?

About the flag, we added a new flag in the .cc file which is set to False now,
and it should be True to get the Run3 HLT regression,
should we change it in the .cc code? or do people prefer to change it in HLT config in confdb? If config is preferred then we need to do it by today/tomorrow as HLT V2 closes tomorrow. And V2 will be used in MC production. So let us know. I don't see any issue to change it in the .cc code though..

@tvami
Copy link
Contributor

tvami commented Apr 14, 2022

Hi @missirol just to add to Helena's comment here (#37557 (comment) ) what we did here is to introduce a multi-iov tag to which it is possible to append the new payload from a future run (from a run when the new HLT menu should be used)

Regarding changing the code or changing the menu: arent we running HLT add-on tests (and relvals) on Run-2 data as well? If we change the code in cmssw then conditions for those past data should be updated too. Instead changing the HLT menu one would not care about the past and just have the current payloads used with the new code for the future HLT runs.

@missirol
Copy link
Contributor

@swagata87 , trying to clarify.

I think I also saw a comment in this thread about HLT tags in offline data reco GT.. why is that needed?

HLT tags can be consumed by HLT from the Offline GT in wfs where HLT and RECO run in the same job using the Offline GT. In fact (although it's technically no proof), the Offline GT contains the HLT-EGM regressions tags (looking for pfsc[..] here).

should we change it in the .cc code?

That was my question, but Helena correctly pointed out that only the tag has changed, not the actual payload yet. Once the payload is changed, it's clear that the source code needs to be updated.

or do people prefer to change it in HLT config in confdb? If config is preferred then we need to do it by today/tomorrow as HLT V2 closes tomorrow.

No preference from my side based on what I understood so far.

@francescobrivio
Copy link
Contributor

So HLT regression tags need to go to MC GT also, so that HLT can be properly simulated.

@swagata87 so if these (new) regression tags will be also relevant for MC production, then it brings up again the point that the setting of the flag to true or false should be under some Run3 era modifier no?

@francescobrivio
Copy link
Contributor

+alca

  • tests are clean (as expected)
  • we understood that new tags for MC are coming (soon), so we will handle the setting of the flag (through a modifier) in a subsequent PR which also updates the MC GTs

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@missirol
Copy link
Contributor

missirol commented Apr 14, 2022

just to add to Helena's comment here (#37557 (comment) ) what we did here is to introduce a multi-iov tag to which it is possible to append the new payload from a future run (from a run when the new HLT menu should be used)

Thanks, Tamas. It's clearer now. Can you also clarify for me how the update of the payload will be sync'd with the update the source code? (is it "we make a new release with flag=true, we deploy it, and the new payload is picked from the first run that uses the new release"? or..)

Regarding changing the code or changing the menu: arent we running HLT add-on tests (and relvals) on Run-2 data as well?

Mh, not Run-2, but 2021 pilot-beam data (still older data, inevitably). For even-older data in RelVals, the HLT is 'Fake', so none of this applies.

If we change the code in cmssw then conditions for those past data should be updated too. Instead changing the HLT menu one would not care about the past and just have the current payloads used with the new code for the future HLT runs.

I'm not sure I follow the distinction. In this case, changing a parameter in the HLT config is effectively the same as changing the parameter from fillDescriptions in the source code since that parameter affects only HLT.

Aside from addOnTests/RelVals, we do routinely use the latest Run-3 HLT in the latest CMSSW release to estimate rates,etc on Run-2 (2018) data. In that case, we would currently pick up the Run-2 EGM tags and the flag in question would have to remain False (while it will true as soon as the source code is updated). So, TSG would need to add a customisation for that in its workflows for these 2018-based estimates (@silviodonato). Or, would EGM prefer to somehow run the latest Run-3 EGM tags when running on 2018 data? (opinions, @swagata87 ?)

Maybe I'm missing something obvious to others. Apologies in advance.

@missirol
Copy link
Contributor

@francescobrivio

so if these (new) regression tags will be also relevant for MC production, then it brings up again the point that the setting of the flag to true or false should be under some Run3 era modifier no?

IIuc, the new flag only affects HLT (because another flag named isHLT is always False outside of HLT configs), and HLT does not really make use of modifiers. Do we have the constraint of the HLT outputs not changing in 12_3_X? I wouldn't think so, because the HLT menu is still under countinous development in 12_3_X..

@tvami
Copy link
Contributor

tvami commented Apr 14, 2022

Can you also clarify for me how the update of the payload will be sync'd with the update the source code? (is it "we make a new release with flag=true, we deploy it, and the new payload is picked from the first run that uses the new release"? or..)

Yes, what you wrote here sounds completely reasonable.

and HLT does not really make use of modifiers.

Right but that comment was about MC, like if we change the variable in the cpp code, then how will that affect Run-1 and Run-2 MC RelVals? Because IIUC changing the code to True we'd need to update the conditions, and w/o the era modifier, we'd need to update the MC conditions for Run-1/Run-2 while with the modifier, it's just the change of conditions for Run-3 (which we do have at hand -- thus the push for the modifier)

@missirol
Copy link
Contributor

Right but that comment was about MC, like if we change the variable in the cpp code, then how will that affect Run-1 and Run-2 MC RelVals?

Meaning wfs like 10824.0? Those should be completely unaffected by this change, b/c (1) RECO is insensitive to all this, and (2) the HLT menus in those wfs do not include any reconstruction ("Fake" menus), thus don't include the EGM/PF modules in question.

(1) is important: it is the reason why I think that a modifier would not be needed in this specific case. When it comes to modifiers, RECO and HLT follow very different approaches; in short, HLT does not use them, because a given release only aims to support the 'latest' HLT menu (there is no attempt to maintain a Run2-like HLT centrally in 12_4_X, for example).

Note that I'm not proposing nor opposing anything here. I'm just trying to figure out how this update unfolds and affects HLT.

@tvami
Copy link
Contributor

tvami commented Apr 15, 2022

Thanks @missirol for the clarification.
Since we have the request for MC now, let's merge this PR and I'll make another one that include the MC tags and make the switch in the flag.
@cms-sw/orp-l2 I think we are ready to get this merged

@missirol
Copy link
Contributor

missirol commented Apr 15, 2022

Aside from addOnTests/RelVals, we do routinely use the latest Run-3 HLT in the latest CMSSW release to estimate rates,etc on Run-2 (2018) data. In that case, we would currently pick up the Run-2 EGM tags and the flag in question would have to remain False (while it will true as soon as the source code is updated). So, TSG would need to add a customisation for that in its workflows for these 2018-based estimates (@silviodonato). Or, would EGM prefer to somehow run the latest Run-3 EGM tags when running on 2018 data? (opinions, @swagata87 ?)

We are left with a question to EGM on how exactly TSG should deal with this update when using Run-2 data for measurements with the latest release and Run-3 HLT menu. The easiest/minimum would be for TSG to use a customisation that ensures the flag be set to false when running on Run-2 data (I'm not talking about RelVals or central wfs, but recipes that TSG runs elsewhere).

Anyways, this can be clarified in the PR that will change the source code.

Thanks for the clarifications.

@swagata87
Copy link
Contributor

swagata87 commented Apr 15, 2022

We would like to use Run3 HLT regression while re-running HLT on 2018 RAW data from now on.
This way we will get an idea of how much it impacts the rates.
As usual, we expect that a better energy correction would lead to better turn-on for signal efficiency, but also an increase in rate is expected. And we would like to know exactly how much the rate would increase, so that we can decide if tuning of cuts is necessary or not.

Another point is that the Run2 HLT regression was trained probably with 2015 or 2016 MC, as far as I know. So, while it was optimal for 2015/16 (may be even 2017) data-taking, it was not fully optimal for 2018 data-taking conditions. Run3 and 2018 conditions are more close to each other. Thus Run3 regression is rather appropriate for 2018.

@missirol
Copy link
Contributor

We would like to use Run3 HLT regression while re-running HLT on 2018 RAW data from now on.

Then, I guess one solution for those measurements would be to overwrite the GT records similarly to how TSG is already testing the latest Run-3 HLT JECs on 2018 Data (example).

There is a difference here with respect to the HLT-JEC case, though.

For the HLT-JECs when running the latest Run-3 HLT on 2018 data, using the Run-3 or Run-2 ones is a bit of a choice, none of the two is wrong per se. In fact, that type of customisation is not in HLTrigger/Configuration/python as of now.

On the other hand, once these EGM updates are in place, if one simply follows the current TSG recipes (with customiseFor2018Input, etc), the output will be pretty much wrong iiuc, since we would have regTrainedWithPS=true and old payloads when running on 2018 events. This leads me to think that we do need some type of customisation for this here. Up to TSG and EGM to suggest the best one (cc: @silviodonato).

@perrotta
Copy link
Contributor

+1

  • Now everybody agrees that this PR can be merged: another PR will follow which includes the MC tags and makes the switch in the flag.

@cmsbuild cmsbuild merged commit 6ae122f into cms-sw:master Apr 15, 2022
@Martin-Grunewald
Copy link
Contributor

What about run3_data_relval ?

@francescobrivio
Copy link
Contributor

What about run3_data_relval ?

Hi Martin, run3_data_relval is in line with the Run 3 data offline GT (run3_data), which should not be updated according to @cms-sw/egamma-pog-l2.
See the difference in the GTs at:
https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/123X_dataRun3_v4/123X_dataRun3_relval_v3

@Martin-Grunewald
Copy link
Contributor

Hmm, I thought that was ONLY for the L1 Trigger Menu Record??

@francescobrivio
Copy link
Contributor

Hmm, I thought that was ONLY for the L1 Trigger Menu Record??

Right, so L1T should be the only difference right? The other tags (including EGM ones) should be the same, as shown in the difference in #37557 (comment). Maybe am I missing something?

@Martin-Grunewald
Copy link
Contributor

123X_dataRun3_v4 and 123X_dataRun3_relval_v3 show more differences than just the L1 Menu - labelled IOV differences. Should those be discounted?

@francescobrivio
Copy link
Contributor

123X_dataRun3_v4 and 123X_dataRun3_relval_v3 show more differences than just the L1 Menu - labelled IOV differences. Should those be discounted?

You are right, there are some IOV differences due to the fact that 123X_dataRun3_relval_v3 is snapshotted while 123X_dataRun3_v4 has an infinite snapshot. We'll create a new run3_data_relval with infinite snapshot as well.

@mmusich
Copy link
Contributor

mmusich commented Apr 20, 2022

We'll create a new run3_data_relval with infinite snapshot as well.

This is a recipe for irreproducibility in relval workflows!

@swagata87
Copy link
Contributor

Hi Martin, run3_data_relval is in line with the Run 3 data offline GT (run3_data), which should not be updated according to @cms-sw/egamma-pog-l2. See the difference in the GTs at: https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/123X_dataRun3_v4/123X_dataRun3_relval_v3

I'm getting confused now.
From the discussion in this thread above, in particular #37557 (comment) and #37557 (comment) , I was under the impression that the HLT tags are also needed in offline-data GTs. And I thought the plan was to update the tags after the FTV is complete.

@missirol
Copy link
Contributor

Indeed, this is what was asked in #37557 (comment).

Also, what about auto:run3_data and auto:run3_data_relval? Those still contain the old HLT-EGM tags, as far as I can see. I think those HLT tags are consumed from those GT (well, the relval one) in offline tests that run HLT+RECO in the same job.

By the way, auto:run3_data_prompt still contains the old HLT-EGM tags, too. I realised only now that its update in this PR only included the PPS tags (see link in the description): https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts//123X_dataRun3_Prompt_v6/123X_dataRun3_Prompt_v5. Keeping the old HLT tags there is at the very least a source of confusion.

What I don't know is whether we need to keep the HLT tags (in general, not specifically for EGM) in both run3_data(_relval) and run3_data_prompt GTs.

@malbouis
Copy link
Contributor Author

malbouis commented Apr 20, 2022

Hi, there are a few points to be clarified and a proposal to follow, for your consideration.

Concerning the Offline tags: auto:run3_data and auto:run3_data_relval

We could use the same tags as the ones used in HLT GT. They would have the right IOV structure and only pick up the new payload for new data. In case @cms-sw/egamma-pog-l2 agrees, would you please ask your AlCaDb contact to queue these tags to the Queues 123X_dataRun3_relval_Queue and 123X_dataRun3_Queue? We will be updating these GTs once the new payload is appended.

123X_dataRun3_v4 was created with the wrong snapshot time by mistake. It should not be infinite.

dataRun3_Prompt:
From this comment here: #37557 (comment) it seems that RECO does not consume this tag (or is insensitive to it)? If this is true, I think there would be no problem in updating the tags to the Prompt GT. @cms-sw/egamma-pog-l2, please also ask your contact to queue these tags to 123X_dataRun3_Prompt_Queue.

@cms-sw/egamma-pog-l2 , @missirol , @Martin-Grunewald , please let us know how does this plan sounds for you.

@missirol
Copy link
Contributor

Thanks, sounds good to me.

My understanding is that these HLT tags (i.e. the ones with labels pfscecal_*_online) are not consumed in the offline reconstruction.

To be exact, I do see one place in RECO where these labels are referenced: here. I think this is used in certain wfs that run only parts of the offline reconstruction, and that were mainly introduced for GPU-related development. I don't think this should prevent the update of the HLT tags in the Offline GTs, but I don't know these wfs well, so I thought it was worth mentioning this in case others see problems.

@swagata87
Copy link
Contributor

Helena's proposal sounds good to me too. I had not noticed before that e/gamma HLT tags are also present already in offline data GTs. My apologies for the oversight. Since the HLT tags are already there in offline GTs, it makes sense to update them too, so that all Run3 GTs are consistent.

@saumyaphor4252 , please take note of Helena's proposal above, and kindly queue egamma HLT supercluster regression tags to 123X_dataRun3_relval_Queue , 123X_dataRun3_Queue , and 123X_dataRun3_Prompt_Queue. Thanks!

@Martin-Grunewald
Copy link
Contributor

OK, sounds good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants