No "LHCInfoRcd" record found in the EventSetup.n (CTPPSProtonProducer/'ctppsProtons') #32340

silviodonato · 2020-11-30T17:48:52Z

As reported from @cms-sw/pdmv-l2, many HIN and Run-3 workflows are crashing after ~130 events.

https://cms-unified.web.cern.ch/cms-unified/report/mmeena_RVCMSSW_11_2_0_pre10TTbar_14TeV__rsb_201129_121403_983
https://cms-unified.web.cern.ch/cms-unified/report/mmeena_RVCMSSW_11_2_0_pre10QCD_Pt_80_120_14_HI_2021_PU__rsb_201129_122841_2740

You can easily reproduce the error by copying /afs/cern.ch/work/s/sdonato/public/debug_PPS/ in your folder and run cmsRun PSet.py (I selected a single event causing the crash)

30-Nov-2020 18:35:45 CET  Initiating request to open file file:badd237c-4857-4f48-b10f-52c832f57f02_ev1126.root
30-Nov-2020 18:35:50 CET  Successfully opened file file:badd237c-4857-4f48-b10f-52c832f57f02_ev1126.root
2020-11-30 18:36:21.960641: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
%MSG-e TkDetLayers:   ConversionTrackCandidateProducer:uncleanedOnlyConversionTrackCandidates@streamBeginRun  30-Nov-2020 18:37:20 CET Run: 1 Stream: 0
 ForwardDiskSectorBuilderFromDet: Trying to build Petal Wedge from Dets at different z positions !! Delta_z = -0.951241
%MSG
%MSG-w BeamFitter:  AlcaBeamMonitor:AlcaBeamMonitor@endLumi  30-Nov-2020 18:37:30 CET Run: 1 Lumi: 11
No event read! No Fitting!
%MSG
Begin processing the 1st record. Run 1, Event 1126, LumiSection 12 on stream 0 at 30-Nov-2020 18:37:30.836 CET
----- Begin Fatal Exception 30-Nov-2020 18:37:58 CET-----------------------
An exception of category 'NoRecord' occurred while
   [0] Processing  Event run: 1 lumi: 12 event: 1126 stream: 0
   [1] Running path 'dqmoffline_step'
   [2] Prefetching for module DQMMessageLogger/'DQMMessageLogger'
   [3] Prefetching for module LogErrorHarvester/'logErrorHarvester'
   [4] Calling method for module CTPPSProtonProducer/'ctppsProtons'
Exception Message:
No "LHCInfoRcd" record found in the EventSetup.n
 Please add an ESSource or ESProducer that delivers such a record.
----- End Fatal Exception -------------------------------------------------
%MSG-w BSFitter:  AlcaBeamMonitor:AlcaBeamMonitor@endLumi  30-Nov-2020 18:37:58 CET Run: 1 Lumi: 12
need at least 150 tracks to run beamline fitter.
%MSG

LHCInfoRcd should be produced by CTPPSLHCInfoRandomXangleESSource (see https://github.com/cms-sw/cmssw/pull/28492/files#diff-d435950ce350dde1efbc324448a77f75894e0f7027a444503d500a4a93827ee1R30)

PPS was added to DIGI by #32003

The text was updated successfully, but these errors were encountered:

silviodonato · 2020-11-30T17:49:06Z

assign dqm, alca

cmsbuild · 2020-11-30T17:49:07Z

New categories assigned: dqm,alca

@jfernan2,@christopheralanwest,@andrius-k,@fioriNTU,@tlampen,@pohsun,@yuanchao,@tocheng,@kmaeshima,@ErnestaP you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2020-11-30T17:49:10Z

A new Issue was created by @silviodonato Silvio Donato.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

silviodonato · 2020-11-30T17:50:22Z

@mundim
(please add other PPS people who might be interested)

jfernan2 · 2020-11-30T17:55:59Z

@jan-kaspar @ forthommel @nminafra @AndreaBellora @popovvp as CT-PPS DQM developers, can you please have a look?

christopheralanwest · 2020-11-30T18:07:43Z

There is no MC tag of record type LHCInfo: https://cms-conddb.cern.ch/cmsDbBrowser/search/Prod/LHCInfo Do MC tags for these records need to be created?

silviodonato · 2020-11-30T20:17:36Z

Yes, it look like the record are supposed to be taken from GT https://github.com/cms-sw/cmssw/blob/master/CalibPPS/ESProducers/python/ctppsLHCInfo_cff.py#L3 (@jan-kaspar)

silviodonato · 2020-11-30T20:17:57Z

unassign dqm

silviodonato · 2020-11-30T20:20:33Z

assign reconstruction

After having removed the DQM modules, we still get

----- Begin Fatal Exception 30-Nov-2020 20:58:52 CET-----------------------
An exception of category 'NoRecord' occurred while
   [0] Processing  Event run: 1 lumi: 12 event: 1126 stream: 0
   [1] Running path 'RECOSIMoutput_step'
   [2] Prefetching for module PoolOutputModule/'RECOSIMoutput'
   [3] Calling method for module CTPPSProtonProducer/'ctppsProtons'
Exception Message:
No "LHCInfoRcd" record found in the EventSetup.n
 Please add an ESSource or ESProducer that delivers such a record.
----- End Fatal Exception -------------------------------------------------

(actually the problem looks like to be a missing tag in the GT)

cmsbuild · 2020-11-30T20:20:52Z

New categories assigned: reconstruction

@slava77,@perrotta,@jpata you have been requested to review this Pull request/Issue and eventually sign? Thanks

slava77 · 2020-11-30T20:51:42Z

assign reconstruction

the previous discussion points to the content of the GT and CalibPPS software. Why is this a reco issue?

mundim · 2020-12-01T00:52:14Z

@mundim
(please add other PPS people who might be interested)

@wpcarvalho @malbouis might be related to this topic.
The simulation should use the information in a next development because it is used for the proton propagation. This specific issue, as already mentioned, might be related to the reconstruction.
However, in my tests, I have not seen this problem yet.

silviodonato · 2020-12-01T07:48:29Z

unassign reconstruction

silviodonato · 2020-12-01T08:07:29Z

@christopheralanwest I see from #26394, that #26415 (@tocheng) added LHCInfo only in Run1 and Run2
(https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/106X_dataRun2_PromptLike_v4/106X_dataRun2_PromptLike_v1). I think we need something similar for Run3.

silviodonato · 2020-12-01T08:09:38Z

urgent

this issue is blocking the validation of CMSSW_11_2_0_pre10

silviodonato · 2020-12-01T08:44:32Z

You can reproduce the error also in this way (in CMSSW_11_2_0_pre10)

cmsDriver.py stepTest  --conditions auto:phase1_2021_realistic -s RAW2DIGI,L1Reco,RECO --datatier GEN-SIM-RECO -n 10 --geometry DB:Extended --era Run3 --eventcontent RECOSIM --filein file:/afs/cern.ch/work/s/sdonato/public/debug_PPS/badd237c-4857-4f48-b10f-52c832f57f02_ev1126.root

jan-kaspar · 2020-12-01T08:48:55Z

Let me add some potentially relevant info.

The LHCInfo is a part of conditions essential for PPS. Among others, the LHCInfo contains the LHC xangle (or crossing angle) which influences many aspects of proton propagation from the IP to the PPS detectors (RPs). This info is both important for data and simulation.

For LHC data, the LHCInfo should be stored in DB, as it was provided by LHC. Let me emphasize that this info changes during every LHC fill, thus is time dependent.

For simulation, in principle we may store the info in DB, too. However, due to the time-dependent nature, it is somewhat difficult. We wish the MC conditions to be compatible with the LHC ones. In order to prepare the MC payloads accordingly, we would need to know the number of events/LS to be used in the MC simulation. And this is often not known or even variable. Therefore, we tend to prefer another option: to have an ES module which generates the conditions on fly (based on fundamental ingredients which can be stored in DB). This new ES module is in now in PR #32207.

silviodonato · 2020-12-01T09:04:47Z

@cms-sw/alca-l2 do you have a rough estimate of the timescale for the new global tag? I would like to start the RelVals before the weekend. I prepared #32346 in case it is not possible to have the global tag in time.

silviodonato · 2020-12-01T13:19:48Z

@jan-kaspar and @fabferro agreed to remove PPS from Run-3 reco (#32207 (comment)). So this issue is temporarily solved by #32346 and #32352

christopheralanwest · 2020-12-01T14:17:12Z

Why is this a problem only for Run 3 workflows? There is no LHCInfoRcd in any MC global tag.

mundim · 2020-12-01T14:31:31Z

Hi @christopheralanwest, we are implementing a different way to get the optics information in order to have the most accurate representation of the real optics as possible. @jan-kaspar can give more detailed information... thanks

slava77 · 2020-12-01T14:50:21Z

For simulation, in principle we may store the info in DB, too. However, due to the time-dependent nature, it is somewhat difficult. We wish the MC conditions to be compatible with the LHC ones. In order to prepare the MC payloads accordingly, we would need to know the number of events/LS to be used in the MC simulation. And this is often not known or even variable. Therefore, we tend to prefer another option: to have an ES module which generates the conditions on fly (based on fundamental ingredients which can be stored in DB). This new ES module is in now in PR #32207.

I'm not sure to understand the arguments.
MC (so far, at least) has only one IOV, there is no time dependence. A single payload in GT for MC would then suffice (at least for some single "representative" point, consider an analogy of a broken/not running clock being right twice a day).
It sounds like a solution is to make MC even more variable than data.
Perhaps I mis-read, and the plan is to make both data and MC to use the dynamic mechanism?

jan-kaspar · 2020-12-01T15:12:25Z

I'm not sure to understand the arguments.
MC (so far, at least) has only one IOV, there is no time dependence.

This is exactly what is difficult for PPS - because in reality (LHC) the conditions do vary. If we wish the simulation to be realistic, we need to split the MC data into chunks and for each chunk use different set of conditions (both for simu and reco). As in LHC data, a chunk was acquired with some value of xangle, another chunk with another xangle.

The proposed solution (in a simplified manner) to fullfil our needs (varying conditions) within the existing constraints (single IOV) is to extract from LHC data the distribution of relevant parameters (e.g. xangle) and strore them in DB (single IOV is sufficient). Then we introduce a ES module which, every given number of lumisections, will generate a random xangle according to the distribution extracted from data. With sufficient number of xangle samples, the simu will be done with reasonably similar xangle distribution.

Is our idea any clearer now?

boudoul · 2020-12-01T15:19:08Z

Is this another use case to converge on an IOV-based MC ?

slava77 · 2020-12-01T15:30:39Z

Is our idea any clearer now?

no, not really.
Different MC samples have different number of lumisections; clearly all samples would have to have the random values synchronised.

I'm not sure if the situation is that much different from anything else in CMS, conditions vary for all detectors; ECAL has perhaps the most significant variation of response vs time (every fill) and we are still OK (not perfect and can do better) with MC having just one payload.
(Perhaps here the point is that we can not even place the sim-hits in the right place for PPS, but apparently that's not to be addressed at pps proton reco, shouldn't it be done upstream).

Indeed, run/IOV-based MC strategy would improve the agreement with data, but I do not see a conceptual difference wrt other detectors.

jan-kaspar · 2020-12-01T15:49:41Z

Conceptually, I can imagine the situation is similar for every sub-detector. What may be different is the size of the variations. For PPS, different xangles can mean sizeable difference in acceptance, for instance. AFAIK, we PPS don't think that a single set of conditions is sufficient - I've asked the Proton POG conveners to support this (personal) statement.

antoniovilela · 2020-12-01T16:13:31Z

As Jan Kaspar already mentioned, we need some sort of dynamically generated conditions. As an example, the crossing angle changes continually during a fill (by steps of 1urad or so). The crossing angle affects in the simulation where a forward proton will end up in the detector downstream.
In an extreme example, imagine that the CMS magnet field was changing significantly during a run. You would not be able to use a simulation with a single representative point to describe simulated tracks (by the way, we did not ask to have changing beam conditions, we only cope with it).

slava77 · 2020-12-01T16:16:32Z

considering that the cost of running ctppsProtons is fairly small, would it be still useful to have low, middle, and high points (to be present in GT with different labels or via a derived ES producer) and produce consistently 3 variants of protons?
or do we really need random scatter?

antoniovilela · 2020-12-01T16:23:25Z

We are open to suggestions, but I still do not see how we can have a representative MC simulation with a small number of working points. This is why we went in the direction of the random conditions.
This is in a way similar to the continuous distribution of the number of pileup events that we have in the simulation.

slava77 · 2020-12-01T16:32:43Z

We are open to suggestions, but I still do not see how we can have a representative MC simulation with a small number of working points. This is why we went in the direction of the random conditions.
This is in a way similar to the continuous distribution of the number of pileup events that we have in the simulation.

I disagree with the analogy: pileup is intrinsically different event by event.
The optics, even though it varies during the fill, is still well-defined and should be correlated (or even identical) between signal and background. AFAIK, there is no well defined mechanism to correlate these parameters across generated samples.

antoniovilela · 2020-12-01T16:46:14Z

Yes, correct, but the crossing angle is still rapidly varying.
Typically, how many lumi section transitions happen in simulation? Can we tune it, together with the (binned) input distributions such that they are properly populated for all samples?

christopheralanwest · 2020-12-01T17:42:19Z

As far as I know, there is one lumi section per (GEN-SIM?) job in production and the frequency of lumi section transitions is not independently configurable. There is ongoing work to develop run-dependent MC according to the recommendations of the Time Dependent MC Working Group, which uses a similar method of generating run dependence based on lumi-sections. An example of the implementation of time-dependent conditions can be found in PR #28214. @Dr15Jones can provide additional information about the time-dependent MC implementation.

That said, I don't understand why you have chosen an implementation based on lumi-sections rather than random distributions of the relevant quantities. For run-dependent MC, the primary difficulty with random sampling is that one needs the conditions with which the pileup distribution is generated to match that used in the simulation of the rest of the event. Is that relevant here?

I suggest that we have a meeting that includes all relevant groups. We can use the AlCaDB meeting on Monday at 16:00 for this purpose. Would that work for everyone?

davidlange6 · 2020-12-01T18:11:39Z

Maybe its possible to illustrate the physics impact of using an average condition vs something more complex? This is the same situation as all detectors fwiw.

…

On Dec 1, 2020, at 4:12 PM, jan-kaspar ***@***.***> wrote: I'm not sure to understand the arguments. MC (so far, at least) has only one IOV, there is no time dependence. This is exactly what is difficult for PPS - because in reality (LHC) the conditions do vary. If we wish the simulation to be realistic, we need to split the MC data into chunks and for each chunk use different set of conditions (both for simu and reco). As in LHC data, a chunk was acquired with some value of xangle, another chunk with another xangle. The proposed solution (in a simplified manner) to fullfil our needs (varying conditions) within the existing constraints (single IOV) is to extract from LHC data the distribution of relevant parameters (e.g. xangle) and strore them in DB (single IOV is sufficient). Then we introduce a ES module which, every given number of lumisections, will generate a random xangle according to the distribution extracted from data. With sufficient number of xangle samples, the simu will be done with reasonably similar xangle distribution. Is our idea any clearer now? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mundim · 2020-12-01T19:02:56Z

Hi everyone. Can we postpone this discussion to after a meeting already booked between the PPS people involved, please? There are some aspects involved that still need some internal discussion.
thanks.

clemencia · 2020-12-03T14:21:35Z

I would call this "metaconditions" and I think it could give the result that PPS needs within the constraints that the simulation conditions have .

The proposed solution (in a simplified manner) to fullfil our needs (varying conditions) within the existing constraints (single IOV) is to extract from LHC data the distribution of relevant parameters (e.g. xangle) and strore them in DB (single IOV is sufficient). Then we introduce a ES module which, every given number of lumisections, will generate a random xangle according to the distribution extracted from data. With sufficient number of xangle samples, the simu will be done with reasonably similar xangle distribution.

Is our idea any clearer now?

jan-kaspar · 2020-12-04T10:06:39Z

@christopheralanwest Many thanks for the detailed information and apologies for the silence - yesterday we had a discussion within PPS on how to continue. We decided to have two lines of action:

for the full simulation, follow the fastest/simpliest solution which is using the usual single IOV DB tag. At the moment, the full simulation of PPS only aims at Run3 where anyway we currently have only an approximate idea about the conditions, thus the single IOV approach is not a limitation.
In parallel we wish to continue investigations what is really needed for PPS simulations (e.g. by following the recommendation by @davidlange6 ) and if confirmed, investigate further the most appropriate technical solution.

We appreciate your invitation for discussion. The next Monday (7 Dec) seems a bit too tight. What about the next one (14 Dec)?

A quick answer to your questions. Currently, we have all conditions data in EventSetup. AKAIK, CMSSW only allows updating ES data at LS boundaries. That's why our choice. We have checked that typical CMS simulations have enough LS to reasonably sample our condition distributions. Then, thanks for pointing out the possible correlation with PU. Indeed, LHC introduces PU correlation with xangle - both decrease with time, PU due to burn off, xangle due to the choice of lumi-levelling scheme. We think that it is interesting to include this effect to our investigations.

mundim · 2020-12-04T12:27:06Z

Just another comment on top of @jan-kaspar. We have agreed upon a strategy to provide a db tag to be included in the GT for the simulation with the desired condition AND following the current convention. No new code will be needed from the full simulation side apart from un update in a config file. We hope to have this in place soon, but it will take a couple of weeks (likely).
Further discussion might be needed involving the AlcaDB people
Thanks for all your support.
Luiz

silviodonato · 2021-03-25T09:49:40Z

Solved by #32352
Issue moved to #32356

cmsbuild added alca-pending dqm-pending pending-signatures labels Nov 30, 2020

cmsbuild removed the dqm-pending label Nov 30, 2020

cmsbuild added the reconstruction-pending label Nov 30, 2020

cmsbuild removed the reconstruction-pending label Dec 1, 2020

cmsbuild added the urgent label Dec 1, 2020

silviodonato mentioned this issue Dec 1, 2020

Remove temporarily ctppsProtons from recoCTPPSTask (Run-3 only, 11_2_X backport) #32346

Merged

This was referenced Dec 1, 2020

dynamic conditions for PPS direct simulation #32207

Merged

Remove temporarily ctppsProtons from recoCTPPSTask (Run-3 only) #32352

Merged

slava77 mentioned this issue Dec 1, 2020

re-enable ctppsProtons for Run3 after #32352 #32356

Closed

silviodonato removed the urgent label Dec 3, 2020

This was referenced Mar 25, 2021

Re-enable ctppsProtons from recoCTPPSTask (Run-3 only) #33265

Closed

Re-enable ctppsProtons from recoCTPPSTask in Run-3 #33266

Merged

silviodonato closed this as completed Mar 25, 2021

cmsbuild added the urgent label Mar 25, 2021

No "LHCInfoRcd" record found in the EventSetup.n (CTPPSProtonProducer/'ctppsProtons') #32340

No "LHCInfoRcd" record found in the EventSetup.n (CTPPSProtonProducer/'ctppsProtons') #32340

Comments

silviodonato commented Nov 30, 2020

silviodonato commented Nov 30, 2020

cmsbuild commented Nov 30, 2020

cmsbuild commented Nov 30, 2020

silviodonato commented Nov 30, 2020

jfernan2 commented Nov 30, 2020

christopheralanwest commented Nov 30, 2020

silviodonato commented Nov 30, 2020

silviodonato commented Nov 30, 2020

silviodonato commented Nov 30, 2020

cmsbuild commented Nov 30, 2020

slava77 commented Nov 30, 2020

mundim commented Dec 1, 2020

silviodonato commented Dec 1, 2020

silviodonato commented Dec 1, 2020

silviodonato commented Dec 1, 2020

silviodonato commented Dec 1, 2020

jan-kaspar commented Dec 1, 2020

silviodonato commented Dec 1, 2020

silviodonato commented Dec 1, 2020

christopheralanwest commented Dec 1, 2020

mundim commented Dec 1, 2020

slava77 commented Dec 1, 2020

jan-kaspar commented Dec 1, 2020

boudoul commented Dec 1, 2020

slava77 commented Dec 1, 2020

jan-kaspar commented Dec 1, 2020

antoniovilela commented Dec 1, 2020

slava77 commented Dec 1, 2020

antoniovilela commented Dec 1, 2020

slava77 commented Dec 1, 2020

antoniovilela commented Dec 1, 2020

christopheralanwest commented Dec 1, 2020

davidlange6 commented Dec 1, 2020 via email

mundim commented Dec 1, 2020

clemencia commented Dec 3, 2020

jan-kaspar commented Dec 4, 2020

mundim commented Dec 4, 2020

silviodonato commented Mar 25, 2021