Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception when calling the method EventSetup module RPCConeBuilder #36786

Closed
elfontan opened this issue Jan 24, 2022 · 40 comments
Closed

Exception when calling the method EventSetup module RPCConeBuilder #36786

elfontan opened this issue Jan 24, 2022 · 40 comments

Comments

@elfontan
Copy link
Contributor

Hi all,
while testing the new CMSSW_12_3_0_pre4 release to run the L1 emulation, I encountered the following error that I was not seeing before:

The mapping of bin lower bounds to indices does not contain all possible entries!!!
%MSG
----- Begin Fatal Exception 23-Jan-2022 16:52:55 CET-----------------------
An exception of category 'RPCInternal' occurred while
[0] Processing global begin Run run: 1
[1] Prefetching for module EventSetupRecordDataGetter/'hltGetConditions'
[2] Calling method for EventSetup module RPCConeBuilder/''
Exception Message:
Size differs for ring 3014 +- 100
----- End Fatal Exception -------------------------------------------------

Here the command that I used:
hltGetConfiguration /dev/CMSSW_12_3_0/GRun --mc --full --unprescale --globaltag auto:phase1_2021_realistic --process MYHLT --output minimal --l1Xml L1Menu_Collisions2022_v0_1_2.xml --l1-emulator uGT --input root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_1/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/120X_mcRun3_2021_realistic_v7-v2/10000/b67e121b-b29f-4eb0-8628-b3aa1cb76720.root >& hlt_uGT.py

Does anyone have an idea of the reason?
Thank you very much in advance,
--Elisa

@cmsbuild
Copy link
Contributor

A new Issue was created by @elfontan Elisa Fontanesi.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@francescobrivio
Copy link
Contributor

@bsunanda @cms-sw/geometry-l2 RPC geometry was changed in pre4, could this be the cause of this issue?

@bsunanda
Copy link
Contributor

Could you specify which workflow sees this exception? Please also put RPC experts in this link

@francescobrivio
Copy link
Contributor

hi @bsunanda! the command to reproduce this is given by Elisa in the description:

hltGetConfiguration /dev/CMSSW_12_3_0/GRun --mc --full --unprescale --globaltag auto:phase1_2021_realistic --process MYHLT --output minimal --l1Xml L1Menu_Collisions2022_v0_1_2.xml --l1-emulator uGT --input root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_1/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/120X_mcRun3_2021_realistic_v7-v2/10000/b67e121b-b29f-4eb0-8628-b3aa1cb76720.root >& hlt_uGT.py

@francescobrivio
Copy link
Contributor

FYI @cms-sw/rpc-dpg-l2

@perrotta
Copy link
Contributor

assign hlt
(hltGetConfiguration is a tool used by TSG)

I suspect that the issue may come from using as input file for hltGetConfiguration in CMSSW_12_3_0_pre4 a RelVal that was produced in CMSSW_12_0_1, i.e. with the previous (RPC) geometry

@cmsbuild
Copy link
Contributor

New categories assigned: hlt

@missirol,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@missirol
Copy link
Contributor

missirol commented Jan 24, 2022

Thanks for reporting, Elisa.

I have tried to check a few things, and here's my current understanding.

  • The error is not specific to the fact that an old input file was used in the original report; even if I create a RAW file in 12_3_0_pre4 from scratch, and run the problematic command on that, I get the same error.

  • I ran the same type of test in 12_3_0_pre3 and 12_3_0_pre4, in both cases with GT auto:phase1_2021_realistic (so, different GTs), and I see the problem only in 12_3_0_pre4.

  • The error is not there if one removes the requirement of re-emulating the L1T menu from the original hltGetConfiguration command; removing the specification of the xml file does not remove the error, so the error doesn't seem to be specific to that particular L1T menu.

  • The error comes from RPCConeBuilder, which is an ESProducer used for the L1Trigger.

  • I tried to run the L1T re-emulation standalone with cmsDriver (so without the HLT step), and I could not reproduce the error (but my expertise in L1T configs is limited); this would suggest to me that the ESProducer is not actually called in the L1T re-emulation, at least for MC (thus, no crash), but I would need L1T experts to check this. Is this ESProducer needed anywhere?

  • In my understanding, the reason the crash might occur only in conjunction with HLT is that HLT pre-fetches all ES data via the module hltGetConditions; in that case, the problematic ES producer would run even though no module would actually consume its products (note: assuming this is so, the ES module is unnecessarily introduced by the L1T emulation, not by the HLT menu); this is confirmed by the fact that, if I manually remove the module hltGetConditions, the problem disappears.

  • If I run the problematic command in 12_3_0_pre4 using 123X_mcRun3_2021_realistic_v3 instead of 123X_mcRun3_2021_realistic_v4 (i.e. auto:phase1_2021_realistic), the problem disappears [for this, I also needed to use --output none, instead of --output minimal, otherwise I run into a segmentation-fault caused by another L1T module, something which I haven't debugged]. As mentioned by Francesco, the diff of the 2 GTs includes a change related to RPC.

So, I think HLT exposes the problem because of the use of hltGetConditions, but besides that, the issue appears to be related to the RPC geometry, and the L1T re-emulation.

Experts are free to correct any of the above, and can hopefully provide further feedback.

@makortel
Copy link
Contributor

assign geometry,l1

@cmsbuild
Copy link
Contributor

New categories assigned: geometry,l1

@cvuosalo,@mdhildreth,@epalencia,@ianna,@Dr15Jones,@rekovic,@makortel,@cecilecaillol,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

@perrotta
Copy link
Contributor

Thank you @missirol for the detailed investigation
This has to be fully understood before we can fully backport in 12_2

@perrotta
Copy link
Contributor

urgent

@cvuosalo
Copy link
Contributor

In CMSSW_12_3_0_pre4, I tried the command, but no error occurs.

hltGetConfiguration /dev/CMSSW_12_3_0/GRun --mc --full --unprescale --globaltag auto:phase1_2021_realistic --process MYHLT --output minimal --l1Xml L1Menu_Collisions2022_v0_1_2.xml --l1-emulator uGT --input root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_1/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/120X_mcRun3_2021_realistic_v7-v2/10000/b67e121b-b29f-4eb0-8628-b3aa1cb76720.root > & hlt_uGT.py

How can the error be reproduced?

@Martin-Grunewald
Copy link
Contributor

Martin-Grunewald commented Jan 25, 2022

You would need to run the extracted python file:
cmsRun hlt_uGT.py

However, the L1T xml file is not in the release itself:

----- Begin Fatal Exception 25-Jan-2022 06:36:34 CET-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing ESProducer: class=L1TUtmTriggerMenuESProducer label='L1TriggerMenu'
Exception Message:
edm::FileInPath unable to find file L1Trigger/L1TGlobal/data/Luminosity/startup/L1Menu_Collisions2022_v0_1_2.xml anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /scratch/CMS3/12/CMSSW_12_3_0_pre4/poison:/scratch/CMS3/12/CMSSW_12_3_0_pre4/src:/scratch/CMS3/12/CMSSW_12_3_0_pre4/external/slc7_amd64_gcc10/data:/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre4/src:/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre4/external/slc7_amd64_gcc10/data
Current directory is: /scratch/CMS3/12/CMSSW_12_3_0_pre4/src/HLTrigger/Configuration/test
----- End Fatal Exception -------------------------------------------------

@elfontan Please post complete instructions to reproduce!

@elfontan
Copy link
Contributor Author

elfontan commented Jan 25, 2022

Hi @Martin-Grunewald, sure, here the instructions to use the new menu, sorry:

cmsrel CMSSW_12_3_0_pre4
cd CMSSW_12_3_0_pre4/src
cmsenv
git cms-init
git-cms-addpkg L1Trigger/L1TGlobal
mkdir -p L1Trigger/L1TGlobal/data/Luminosity/startup
cd L1Trigger/L1TGlobal/data/Luminosity/startup
wget https://raw.githubusercontent.com/cms-l1-dpg/L1MenuRun3/master/development/L1Menu_Collisions2022_v0_1_2/L1Menu_Collisions2022_v0_1_2.xml
cd -
git-cms-addpkg L1Trigger/Configuration

Edit the file L1Trigger/Configuration/python/customiseUtils.py to change the L1TriggerMenuFile in process.TriggerMenu.L1TriggerMenuFile

scram b -j 8

Thank you for the help,
--Elisa

@Martin-Grunewald
Copy link
Contributor

I assume in the above you mean git-cms-addpkg L1Trigger/Configuration?

@elfontan
Copy link
Contributor Author

Yes!

@civanch
Copy link
Contributor

civanch commented Jan 25, 2022

Hi all, I am not an expert but in the code there is the check if the geometry has a complete ring. In the case of demo chambers, which are added the ring is not filled, only 2 sectors are added.

May be these lines of the code should be commented out?

@perrotta
Copy link
Contributor

Hi all, I am not an expert but in the code there is the check if the geometry has a complete ring. In the case of demo chambers, which are added the ring is not filed, only 2 sectors are added.

May be these lines of code should be commented out?

Thank you @civanch , I think it makes sense.
Would it be possible to disable that check only for the demo chambers, instead?

@civanch
Copy link
Contributor

civanch commented Jan 25, 2022

@elfontan , @missirol, can you, please comment out these lines with the check and exception (it is end of the file) and repeat your test. We cannot exclude that some another crash will be in some another place.

@elfontan
Copy link
Contributor Author

Hi @civanch,
thank you for your comment!
I confirm that commenting these lines, the code runs correctly.

@civanch
Copy link
Contributor

civanch commented Jan 25, 2022

Great! Who will make PR with the fix?

@bsunanda
Copy link
Contributor

bsunanda commented Jan 25, 2022 via email

@elfontan
Copy link
Contributor Author

I can prepare the PR, sure.
But for my understanding: does it mean that this check is not useful anymore? Is there a specific reason for this?
Thank you,
--Elisa

@perrotta
Copy link
Contributor

I can prepare the PR, sure. But for my understanding: does it mean that this check is not useful anymore? Is there a specific reason for this? Thank you, --Elisa

I had the same question above
Could you please get in touch with the RPC experts and arrange the fix with them?

@civanch
Copy link
Contributor

civanch commented Jan 25, 2022

In this a bit stress situation I would not remove these lines but comment them out and add a comment in the code with reference to this issue. Code authors may improve the check later if needed.

@mileva
Copy link
Contributor

mileva commented Jan 25, 2022

Hi All. I would agree with @civanch . Meanwhile we can try to improve the check. Personally, I don't know where this producer is used and need some time to test. @kbunkow, do you know is this producer is still needed? Roumyana

@cvuosalo
Copy link
Contributor

I tested by changing line 135 of of RPCConeBuilder.cc from

if (key != 2000) {  // Hey 2100 has no counterring

to

if (key != 2000 && key != 3014 && key != 4014) {

This change prevents the exception, but then the program encounters a segmentation violation soon after this section of the code. I think 3014 and 4014 lack counter-rings.

@perrotta
Copy link
Contributor

I tested by changing line 135 of of RPCConeBuilder.cc from

if (key != 2000) {  // Hey 2100 has no counterring

to

if (key != 2000 && key != 3014 && key != 4014) {

This change prevents the exception, but then the program encounters a segmentation violation soon after this section of the code. I think 3014 and 4014 lack counter-rings.

I'm puzzled. If we comment the check as a whole, the keys 3014 and 4014 pass as well, but they apparently do not originate a segmentation violation afterwards, see #36786 (comment)

Can anybody explain? Carl, are you getting the same segmentation violation if you simply comment out the whole check?

@kbunkow
Copy link
Contributor

kbunkow commented Jan 26, 2022

Hi all,

The RPCConeBuilder.cc is a part of the L1Trigger/RPCTrigger, which is the emulator of the Run 1 RPC PAC Trigger, which is not used in the L1 Trigger decision since 2016.
Maybe I am wrong, but I cannot see a reason to run this emulator for the Run 2 and Run 3 data. It would be needed for re-emulation of the Run 1 data, but - if such re-emulation is needed at all - is it done with the latest CMSSW version?

Adding @konec just in case.

@missirol
Copy link
Contributor

Can anybody explain?

Maybe. As noted in #36786 (comment) [*], if I use the command in the original bug report, and work around the RPC problem with either an older GT or commenting out the exception, I do see a segmentation fault, but coming from a different L1T producer [**]. At first glance, this looks unrelated to the RPC issue at hand, but that's not to say it should not be debugged (I'm assuming this requires a look by L1T experts). I didn't catch in Carl's message if he thinks that 'his' segmentation violation comes from RPCConeBuilder, or from somewhere else.

[*]

for this, I also needed to use --output none, instead of --output minimal, otherwise I run into a segmentation-fault caused by another L1T module, something which I haven't debugged

[**]

Pseudo-recipe:

# in 12_3_0_pre4, after commenting out the RPCConeBuilder exception
hltGetConfiguration /dev/CMSSW_12_3_0/GRun --mc --full --unprescale --globaltag 123X_mcRun3_2021_realistic_v4 \
 --process HLTX --output minimal --l1-emulator uGT --max-events 1 \
 --input root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_1/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/120X_mcRun3_2021_realistic_v7-v2/10000/b67e121b-b29f-4eb0-8628-b3aa1cb76720.root \
  > tmp.py && cmsRun tmp.py &> tmp.log

Stack trace attached. Some maybe-relevant parts:

%MSG-e L1TExtCondProducer:  L1TExtCondProducer:simGtExtFakeStage2Digis  26-Jan-2022 09:36:05 CET Run: 1 Event: 1                                                                                   
Unexpectedly small L1A history from TCDSRecord                                                                                                                                                     
%MSG                                                                                                                                                                                               
                                                                                                                                                                                                   
                                                                                                                                                                                                   
A fatal system signal has occurred: segmentation violation                                                                                                                                         
The following is the call stack containing the origin of the signal.                                                                                                                               

[..]

Thread 9 (Thread 0x7efb9bbb4700 (LWP 50188) "cmsRun"):                                                                                                                                             
#0  0x00007efbfc2f2ddd in poll () from /lib64/libc.so.6                                                                                                                                            
#1  0x00007efbf2d9d10f in full_read.constprop () from /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so                          
#2  0x00007efbf2d9da9c in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre4/lib/slc7_amd64_gcc10/pluginFWCoreServices\
Plugins.so                                                                                                                                                                                         
#3  0x00007efbf2da03eb in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre4/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so                       
#4  <signal handler called>                                                                                                                                                                        
#5  0x00007efbc4e00da9 in L1TExtCondProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre4/lib/slc7_amd64_gcc10/pluginL1Tr\
iggerL1TGlobalPlugins.so                                                                                                                                                                           
#6  0x00007efbfed1ac63 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/slc7_a\
md64_gcc10/cms/cmssw/CMSSW_12_3_0_pre4/lib/slc7_amd64_gcc10/libFWCoreFramework.so                                                                                                                  

[..]

Thread 8 (Thread 0x7efb9c5b5700 (LWP 50187) "cmsRun"):
[..]
#16 0x00007efbc4e698cb in l1t::TriggerMenuParser::parseCondFormats(L1TUtmTriggerMenu const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre4/lib/slc7_amd64_gcc10/pluginL1TriggerL1TGlobalPlugins.so                                                                                                                                                                         
#17 0x00007efbc4e35be6 in L1TGlobalProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre4/lib/slc7_amd64_gcc10/pluginL1TriggerL1TGlobalPlugins.so                                                                                                                                                                            

[..]

Current Modules:

Module: L1TExtCondProducer:simGtExtFakeStage2Digis (crashed)
Module: L1TGlobalProducer:simGtStage2Digis
Module: MaskedMeasurementTrackerEventProducer:hltTripletRecoveryMaskedMeasurementTrackerEventDisplacedNRReg
Module: HLTPrescaler:hltPrePFMET120PFMHT120IDTightPFHT60

A fatal system signal has occurred: segmentation violation

@elfontan
Copy link
Contributor Author

Hi @missirol , all,
thank you for all comments!
I confirm that @cvuosalo's suggestion works and that the segmentation fault after the fix is NOT related to the RPCBuilder anymore.
It is related to another issue discussed in parallel, where the work-around mentioned by Marino (to use --output none instead of --output minimal in the hltGetConfiguration command) works [1].

I would suggest then to introduce Carl's fix instead of simply comment the check and proceed (and I would include Karol's explanation as a description of the module). Let me know what you think. In case I can update the PR soon after.

Cheers,
--Elisa

[1]
hltGetConfiguration /dev/CMSSW_12_3_0/GRun --mc --full --unprescale --globaltag auto:phase1_2021_realistic --process MYHLT --output none --l1Xml L1Menu_Collisions2022_v0_1_2.xml --l1-emulator uGT --input root://xrootd-cms.infn.it//store/relval/CMSSW_12_3_0_pre2/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/122X_mcRun3_2021_realistic_v5-v1/2580000/04b2f8e0-57d6-449c-aa9f-a416c9926f31.root >& hlt_uGT_outputNone.py

@cvuosalo
Copy link
Contributor

+1

@missirol
Copy link
Contributor

+hlt

#36803 fixed the problem at hand, but based on comments in this issue

  • the RPCConeBuilder ESProducer belongs to the pre-2016 L1T, so (I dare say) it should not be loaded in the configuration when one tries to run the L1T re-emulation for Run 3;

  • the segmentation violation coming from L1TExtCondProducer requires a separate follow-up (I think Elisa is already in contact with the relevant L1T experts; thanks!).

@missirol
Copy link
Contributor

Just for completeness, regarding the crash from L1TExtCondProducer (see #36786 (comment)):

  • workaround: it was noted elsewhere that, for the particular use case at the top of this issue, adding the following line to the output of hltGetConfiguration allows to avoid the seg-fault

    process.hltOutputMinimal.outputCommands.append('drop GlobalExtBlkBXVector_simGtExtFakeStage2Digis_*_*')

    (so it looks like the execution of the problematic producer is triggered by the output module)

  • the problem only appears with MC, not with Data;

  • the crash occurs here, because tcdsRecord.getFullL1aHistory() (and eventHistory) are empty; this suggests that this LogError call should at least be changed to throwing an exception.

@perrotta
Copy link
Contributor

perrotta commented Feb 4, 2022

@missirol can you confirm that after the merging of #36803 and #36839 this issue can be considered as fully solved?

@missirol
Copy link
Contributor

missirol commented Feb 4, 2022

For completeness: a fix for the seg-fault discussed in #36786 (comment) and #36786 (comment) was provided by @elfontan in #36839.

@perrotta Yes, this issue is resolved (Elisa should correct me if I'm wrong). Both problems came from L1T and Elisa provided fixes for those.

I still find it sub-optimal that the L1T re-emulation for Run 3 brings in L1T ESProducers from Run 1 (see #36786 (comment)), but this goes beyond this issue.

@missirol
Copy link
Contributor

missirol commented Feb 9, 2022

@elfontan @perrotta Unless you think differently, this issue could be closed.

@elfontan
Copy link
Contributor Author

elfontan commented Feb 9, 2022

Thank you @missirol, I agree.

@perrotta
Copy link
Contributor

perrotta commented Feb 9, 2022

Thank you @missirol, I agree.

@perrotta perrotta closed this as completed Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests