Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No "LHCInfoRcd" record found in the EventSetup.n (CTPPSProtonProducer/'ctppsProtons') #32340

Closed
silviodonato opened this issue Nov 30, 2020 · 38 comments

Comments

@silviodonato
Copy link
Contributor

As reported from @cms-sw/pdmv-l2, many HIN and Run-3 workflows are crashing after ~130 events.

https://cms-unified.web.cern.ch/cms-unified/report/mmeena_RVCMSSW_11_2_0_pre10TTbar_14TeV__rsb_201129_121403_983
https://cms-unified.web.cern.ch/cms-unified/report/mmeena_RVCMSSW_11_2_0_pre10QCD_Pt_80_120_14_HI_2021_PU__rsb_201129_122841_2740

You can easily reproduce the error by copying /afs/cern.ch/work/s/sdonato/public/debug_PPS/ in your folder and run cmsRun PSet.py (I selected a single event causing the crash)

30-Nov-2020 18:35:45 CET  Initiating request to open file file:badd237c-4857-4f48-b10f-52c832f57f02_ev1126.root
30-Nov-2020 18:35:50 CET  Successfully opened file file:badd237c-4857-4f48-b10f-52c832f57f02_ev1126.root
2020-11-30 18:36:21.960641: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
%MSG-e TkDetLayers:   ConversionTrackCandidateProducer:uncleanedOnlyConversionTrackCandidates@streamBeginRun  30-Nov-2020 18:37:20 CET Run: 1 Stream: 0
 ForwardDiskSectorBuilderFromDet: Trying to build Petal Wedge from Dets at different z positions !! Delta_z = -0.951241
%MSG
%MSG-w BeamFitter:  AlcaBeamMonitor:AlcaBeamMonitor@endLumi  30-Nov-2020 18:37:30 CET Run: 1 Lumi: 11
No event read! No Fitting!
%MSG
Begin processing the 1st record. Run 1, Event 1126, LumiSection 12 on stream 0 at 30-Nov-2020 18:37:30.836 CET
----- Begin Fatal Exception 30-Nov-2020 18:37:58 CET-----------------------
An exception of category 'NoRecord' occurred while
   [0] Processing  Event run: 1 lumi: 12 event: 1126 stream: 0
   [1] Running path 'dqmoffline_step'
   [2] Prefetching for module DQMMessageLogger/'DQMMessageLogger'
   [3] Prefetching for module LogErrorHarvester/'logErrorHarvester'
   [4] Calling method for module CTPPSProtonProducer/'ctppsProtons'
Exception Message:
No "LHCInfoRcd" record found in the EventSetup.n
 Please add an ESSource or ESProducer that delivers such a record.
----- End Fatal Exception -------------------------------------------------
%MSG-w BSFitter:  AlcaBeamMonitor:AlcaBeamMonitor@endLumi  30-Nov-2020 18:37:58 CET Run: 1 Lumi: 12
need at least 150 tracks to run beamline fitter.
%MSG

LHCInfoRcd should be produced by CTPPSLHCInfoRandomXangleESSource (see https://github.com/cms-sw/cmssw/pull/28492/files#diff-d435950ce350dde1efbc324448a77f75894e0f7027a444503d500a4a93827ee1R30)

PPS was added to DIGI by #32003

@silviodonato
Copy link
Contributor Author

assign dqm, alca

@cmsbuild
Copy link
Contributor

New categories assigned: dqm,alca

@jfernan2,@christopheralanwest,@andrius-k,@fioriNTU,@tlampen,@pohsun,@yuanchao,@tocheng,@kmaeshima,@ErnestaP you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @silviodonato Silvio Donato.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@silviodonato
Copy link
Contributor Author

@mundim
(please add other PPS people who might be interested)

@jfernan2
Copy link
Contributor

@jan-kaspar @ forthommel @nminafra @AndreaBellora @popovvp as CT-PPS DQM developers, can you please have a look?

@christopheralanwest
Copy link
Contributor

There is no MC tag of record type LHCInfo: https://cms-conddb.cern.ch/cmsDbBrowser/search/Prod/LHCInfo Do MC tags for these records need to be created?

@silviodonato
Copy link
Contributor Author

Yes, it look like the record are supposed to be taken from GT https://github.com/cms-sw/cmssw/blob/master/CalibPPS/ESProducers/python/ctppsLHCInfo_cff.py#L3 (@jan-kaspar)

@silviodonato
Copy link
Contributor Author

unassign dqm

@silviodonato
Copy link
Contributor Author

assign reconstruction

After having removed the DQM modules, we still get

----- Begin Fatal Exception 30-Nov-2020 20:58:52 CET-----------------------
An exception of category 'NoRecord' occurred while
   [0] Processing  Event run: 1 lumi: 12 event: 1126 stream: 0
   [1] Running path 'RECOSIMoutput_step'
   [2] Prefetching for module PoolOutputModule/'RECOSIMoutput'
   [3] Calling method for module CTPPSProtonProducer/'ctppsProtons'
Exception Message:
No "LHCInfoRcd" record found in the EventSetup.n
 Please add an ESSource or ESProducer that delivers such a record.
----- End Fatal Exception -------------------------------------------------

(actually the problem looks like to be a missing tag in the GT)

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction

@slava77,@perrotta,@jpata you have been requested to review this Pull request/Issue and eventually sign? Thanks

@slava77
Copy link
Contributor

slava77 commented Nov 30, 2020

assign reconstruction

the previous discussion points to the content of the GT and CalibPPS software. Why is this a reco issue?

@mundim
Copy link
Contributor

mundim commented Dec 1, 2020

@mundim
(please add other PPS people who might be interested)

@wpcarvalho @malbouis might be related to this topic.
The simulation should use the information in a next development because it is used for the proton propagation. This specific issue, as already mentioned, might be related to the reconstruction.
However, in my tests, I have not seen this problem yet.

@silviodonato
Copy link
Contributor Author

unassign reconstruction

@silviodonato
Copy link
Contributor Author

@christopheralanwest I see from #26394, that #26415 (@tocheng) added LHCInfo only in Run1 and Run2
(https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/106X_dataRun2_PromptLike_v4/106X_dataRun2_PromptLike_v1). I think we need something similar for Run3.

@silviodonato
Copy link
Contributor Author

urgent

this issue is blocking the validation of CMSSW_11_2_0_pre10

@cmsbuild cmsbuild added the urgent label Dec 1, 2020
@silviodonato
Copy link
Contributor Author

You can reproduce the error also in this way (in CMSSW_11_2_0_pre10)

cmsDriver.py stepTest  --conditions auto:phase1_2021_realistic -s RAW2DIGI,L1Reco,RECO --datatier GEN-SIM-RECO -n 10 --geometry DB:Extended --era Run3 --eventcontent RECOSIM --filein file:/afs/cern.ch/work/s/sdonato/public/debug_PPS/badd237c-4857-4f48-b10f-52c832f57f02_ev1126.root 

@jan-kaspar
Copy link
Contributor

Let me add some potentially relevant info.

The LHCInfo is a part of conditions essential for PPS. Among others, the LHCInfo contains the LHC xangle (or crossing angle) which influences many aspects of proton propagation from the IP to the PPS detectors (RPs). This info is both important for data and simulation.

For LHC data, the LHCInfo should be stored in DB, as it was provided by LHC. Let me emphasize that this info changes during every LHC fill, thus is time dependent.

For simulation, in principle we may store the info in DB, too. However, due to the time-dependent nature, it is somewhat difficult. We wish the MC conditions to be compatible with the LHC ones. In order to prepare the MC payloads accordingly, we would need to know the number of events/LS to be used in the MC simulation. And this is often not known or even variable. Therefore, we tend to prefer another option: to have an ES module which generates the conditions on fly (based on fundamental ingredients which can be stored in DB). This new ES module is in now in PR #32207.

@silviodonato
Copy link
Contributor Author

@cms-sw/alca-l2 do you have a rough estimate of the timescale for the new global tag? I would like to start the RelVals before the weekend. I prepared #32346 in case it is not possible to have the global tag in time.

@silviodonato
Copy link
Contributor Author

@jan-kaspar and @fabferro agreed to remove PPS from Run-3 reco (#32207 (comment)). So this issue is temporarily solved by #32346 and #32352

@christopheralanwest
Copy link
Contributor

Why is this a problem only for Run 3 workflows? There is no LHCInfoRcd in any MC global tag.

@mundim
Copy link
Contributor

mundim commented Dec 1, 2020

Hi @christopheralanwest, we are implementing a different way to get the optics information in order to have the most accurate representation of the real optics as possible. @jan-kaspar can give more detailed information... thanks

@slava77
Copy link
Contributor

slava77 commented Dec 1, 2020

For simulation, in principle we may store the info in DB, too. However, due to the time-dependent nature, it is somewhat difficult. We wish the MC conditions to be compatible with the LHC ones. In order to prepare the MC payloads accordingly, we would need to know the number of events/LS to be used in the MC simulation. And this is often not known or even variable. Therefore, we tend to prefer another option: to have an ES module which generates the conditions on fly (based on fundamental ingredients which can be stored in DB). This new ES module is in now in PR #32207.

I'm not sure to understand the arguments.
MC (so far, at least) has only one IOV, there is no time dependence. A single payload in GT for MC would then suffice (at least for some single "representative" point, consider an analogy of a broken/not running clock being right twice a day).
It sounds like a solution is to make MC even more variable than data.
Perhaps I mis-read, and the plan is to make both data and MC to use the dynamic mechanism?

@jan-kaspar
Copy link
Contributor

I'm not sure to understand the arguments.
MC (so far, at least) has only one IOV, there is no time dependence.

This is exactly what is difficult for PPS - because in reality (LHC) the conditions do vary. If we wish the simulation to be realistic, we need to split the MC data into chunks and for each chunk use different set of conditions (both for simu and reco). As in LHC data, a chunk was acquired with some value of xangle, another chunk with another xangle.

The proposed solution (in a simplified manner) to fullfil our needs (varying conditions) within the existing constraints (single IOV) is to extract from LHC data the distribution of relevant parameters (e.g. xangle) and strore them in DB (single IOV is sufficient). Then we introduce a ES module which, every given number of lumisections, will generate a random xangle according to the distribution extracted from data. With sufficient number of xangle samples, the simu will be done with reasonably similar xangle distribution.

Is our idea any clearer now?

@boudoul
Copy link
Contributor

boudoul commented Dec 1, 2020

Is this another use case to converge on an IOV-based MC ?

@slava77
Copy link
Contributor

slava77 commented Dec 1, 2020

Is our idea any clearer now?

no, not really.
Different MC samples have different number of lumisections; clearly all samples would have to have the random values synchronised.

I'm not sure if the situation is that much different from anything else in CMS, conditions vary for all detectors; ECAL has perhaps the most significant variation of response vs time (every fill) and we are still OK (not perfect and can do better) with MC having just one payload.
(Perhaps here the point is that we can not even place the sim-hits in the right place for PPS, but apparently that's not to be addressed at pps proton reco, shouldn't it be done upstream).

Indeed, run/IOV-based MC strategy would improve the agreement with data, but I do not see a conceptual difference wrt other detectors.

@jan-kaspar
Copy link
Contributor

Conceptually, I can imagine the situation is similar for every sub-detector. What may be different is the size of the variations. For PPS, different xangles can mean sizeable difference in acceptance, for instance. AFAIK, we PPS don't think that a single set of conditions is sufficient - I've asked the Proton POG conveners to support this (personal) statement.

@antoniovilela
Copy link
Contributor

As Jan Kaspar already mentioned, we need some sort of dynamically generated conditions. As an example, the crossing angle changes continually during a fill (by steps of 1urad or so). The crossing angle affects in the simulation where a forward proton will end up in the detector downstream.
In an extreme example, imagine that the CMS magnet field was changing significantly during a run. You would not be able to use a simulation with a single representative point to describe simulated tracks (by the way, we did not ask to have changing beam conditions, we only cope with it).

@slava77
Copy link
Contributor

slava77 commented Dec 1, 2020

considering that the cost of running ctppsProtons is fairly small, would it be still useful to have low, middle, and high points (to be present in GT with different labels or via a derived ES producer) and produce consistently 3 variants of protons?
or do we really need random scatter?

@antoniovilela
Copy link
Contributor

We are open to suggestions, but I still do not see how we can have a representative MC simulation with a small number of working points. This is why we went in the direction of the random conditions.
This is in a way similar to the continuous distribution of the number of pileup events that we have in the simulation.

@slava77
Copy link
Contributor

slava77 commented Dec 1, 2020

We are open to suggestions, but I still do not see how we can have a representative MC simulation with a small number of working points. This is why we went in the direction of the random conditions.
This is in a way similar to the continuous distribution of the number of pileup events that we have in the simulation.

I disagree with the analogy: pileup is intrinsically different event by event.
The optics, even though it varies during the fill, is still well-defined and should be correlated (or even identical) between signal and background. AFAIK, there is no well defined mechanism to correlate these parameters across generated samples.

@antoniovilela
Copy link
Contributor

Yes, correct, but the crossing angle is still rapidly varying.
Typically, how many lumi section transitions happen in simulation? Can we tune it, together with the (binned) input distributions such that they are properly populated for all samples?

@christopheralanwest
Copy link
Contributor

As far as I know, there is one lumi section per (GEN-SIM?) job in production and the frequency of lumi section transitions is not independently configurable. There is ongoing work to develop run-dependent MC according to the recommendations of the Time Dependent MC Working Group, which uses a similar method of generating run dependence based on lumi-sections. An example of the implementation of time-dependent conditions can be found in PR #28214. @Dr15Jones can provide additional information about the time-dependent MC implementation.

That said, I don't understand why you have chosen an implementation based on lumi-sections rather than random distributions of the relevant quantities. For run-dependent MC, the primary difficulty with random sampling is that one needs the conditions with which the pileup distribution is generated to match that used in the simulation of the rest of the event. Is that relevant here?

I suggest that we have a meeting that includes all relevant groups. We can use the AlCaDB meeting on Monday at 16:00 for this purpose. Would that work for everyone?

@davidlange6
Copy link
Contributor

davidlange6 commented Dec 1, 2020 via email

@mundim
Copy link
Contributor

mundim commented Dec 1, 2020

Hi everyone. Can we postpone this discussion to after a meeting already booked between the PPS people involved, please? There are some aspects involved that still need some internal discussion.
thanks.

@clemencia
Copy link
Contributor

I would call this "metaconditions" and I think it could give the result that PPS needs within the constraints that the simulation conditions have .

The proposed solution (in a simplified manner) to fullfil our needs (varying conditions) within the existing constraints (single IOV) is to extract from LHC data the distribution of relevant parameters (e.g. xangle) and strore them in DB (single IOV is sufficient). Then we introduce a ES module which, every given number of lumisections, will generate a random xangle according to the distribution extracted from data. With sufficient number of xangle samples, the simu will be done with reasonably similar xangle distribution.

Is our idea any clearer now?

@jan-kaspar
Copy link
Contributor

@christopheralanwest Many thanks for the detailed information and apologies for the silence - yesterday we had a discussion within PPS on how to continue. We decided to have two lines of action:

  • for the full simulation, follow the fastest/simpliest solution which is using the usual single IOV DB tag. At the moment, the full simulation of PPS only aims at Run3 where anyway we currently have only an approximate idea about the conditions, thus the single IOV approach is not a limitation.
  • In parallel we wish to continue investigations what is really needed for PPS simulations (e.g. by following the recommendation by @davidlange6 ) and if confirmed, investigate further the most appropriate technical solution.

We appreciate your invitation for discussion. The next Monday (7 Dec) seems a bit too tight. What about the next one (14 Dec)?

A quick answer to your questions. Currently, we have all conditions data in EventSetup. AKAIK, CMSSW only allows updating ES data at LS boundaries. That's why our choice. We have checked that typical CMS simulations have enough LS to reasonably sample our condition distributions. Then, thanks for pointing out the possible correlation with PU. Indeed, LHC introduces PU correlation with xangle - both decrease with time, PU due to burn off, xangle due to the choice of lumi-levelling scheme. We think that it is interesting to include this effect to our investigations.

@mundim
Copy link
Contributor

mundim commented Dec 4, 2020

Just another comment on top of @jan-kaspar. We have agreed upon a strategy to provide a db tag to be included in the GT for the simulation with the desired condition AND following the current convention. No new code will be needed from the full simulation side apart from un update in a config file. We hope to have this in place soon, but it will take a couple of weeks (likely).
Further discussion might be needed involving the AlcaDB people
Thanks for all your support.
Luiz

@silviodonato
Copy link
Contributor Author

Solved by #32352
Issue moved to #32356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests