Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in DD4hep geometry, step2 #36837

Closed
fabiocos opened this issue Jan 30, 2022 · 38 comments
Closed

Crash in DD4hep geometry, step2 #36837

fabiocos opened this issue Jan 30, 2022 · 38 comments

Comments

@fabiocos
Copy link
Contributor

As reported by @bsunanda at the latest SIM meeting, DD4hep geometries succeed in step1, but fail in phase2 step2, see geometry D77 wf 35034.911

PersistencyIO    INFO  +++ Set Streamer to dd4hep::OpaqueDataBlock
DD4hep           WARN  ++ Using globally Geant4 unit system (mm,ns,MeV)
CompactLoader    INFO  +++ Processing compact file: /gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-01-27-2300/src/Geometry/CMSCommonData/data/dd4hep/cmsExtendedGeometry2026D77.xml with flag (null)
DD4CMS           INFO  +++ Processing the CMS detector description file:///gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-01-27-2300/src/Geometry/CMSCommonData/data/dd4hep/cmsExtendedGeometry2026D77.xml
Detector         INFO  *********** Created World volume with size: 101000 101000 450000
PlacedVolume     INFO  REFLECTION: (x.Cross(y)).Dot(z):       -1 Parent: eregalgo:EFAW [TGeoVolume] Daughter: eregalgo:EHAWR [TGeoVolumeAssembly]
Detector         INFO  +++ Patching names of anonymous shapes....
DDDefinition     INFO  +++ Finished processing file:///gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-01-27-2300/src/Geometry/CMSCommonData/data/dd4hep/cmsExtendedGeometry2026D77.xml
DD4hep           WARN  ++ Using globally Geant4 unit system (mm,ns,MeV)
DD4CMS           INFO  +++ Processing the CMS detector description xml-memory-buffer
Detector         INFO  *********** Created World volume with size: 101000 101000 450000
Detector         INFO  +++ Patching names of anonymous shapes....
DDDefinition     INFO  +++ Finished processing xml-memory-buffer
----- Begin Fatal Exception 28-Jan-2022 19:05:20 CET-----------------------
An exception of category 'InvalidReference' occurred while
   [0] Processing global begin Run run: 1
   [1] Prefetching for module L1FPGATrackProducer/'TTTracksFromExtendedTrackletEmulation'
   [2] Calling method for EventSetup module trackerDTC::ProducerES/'TrackTriggerSetup'
Exception Message:
NullPointer 
----- End Fatal Exception -------------------------------------------------
@cmsbuild
Copy link
Contributor

A new Issue was created by @fabiocos Fabio Cossutti.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@fabiocos
Copy link
Contributor Author

assign geometry

@cmsbuild
Copy link
Contributor

New categories assigned: geometry

@cvuosalo,@mdhildreth,@ianna,@Dr15Jones,@makortel,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

@srimanob
Copy link
Contributor

assign upgrade

@cmsbuild
Copy link
Contributor

New categories assigned: upgrade

@AdrianoDee,@srimanob you have been requested to review this Pull request/Issue and eventually sign? Thanks

@srimanob
Copy link
Contributor

srimanob commented Jan 31, 2022

Hi @fabiocos @bsunanda

This issue seems a bit strange and I didn't see it before as I reported since last Oct. So now, I have additional clues on step-2 (DIGI):

  • I see the issue that you reported when I run a job with 1 core. If I run 8 cores, I see a different issue (same as my report in last Oct (*)) ==> Test by turn-off the dt-alignment, I now get the same error as mentioned in this issue.
  • I can pass the Digi step if I just use DDD sequence (i.e. take it from wf 35034.0. 1 or 8 cores are fine). The MSG-e I get is (**), but job can run until end.

One can try with
runTheMatrix.py --what upgrade -l 35034.911,35034.0 --wm init for 1 core
runTheMatrix.py --what upgrade -l 35034.911,35034.0 -t 8 --wm init for 8 core

(*)
----- Begin Fatal Exception 12-Oct-2021 10:19:31 CEST-----------------------
An exception of category 'GeometryMismatch' occurred while
[0] Processing global begin Run run: 1
[1] Prefetching for module L1TMuonOverlapPhase1TrackProducer/'simOmtfDigis'
[2] Calling method for EventSetup module DTGeometryESModule/''
Exception Message:
Size mismatch between geometry (size=2780) and alignments (size=3650)
----- End Fatal Exception -------------------------------------------------
12-Oct-2021 10:19:31 CEST Closed file file:step1.root

(**)
%MSG-e HcalDigitizer: MixingModule:mix 31-Jan-2022 19:23:43 CET Run: 1 Event: 2
bad hcal id found in digitizer. Skipping 1161838611 (HE -16,19,4)
%MSG

@srimanob
Copy link
Contributor

srimanob commented Feb 1, 2022

Hi,
I think I found the source of the issue. Comparing DDD and DD4hep configurations (*) in
/afs/cern.ch/user/s/srimanob/public/ForGeometry/L1Track

  • L1Track_DDD_2026D77.py: out from cmsDriver
  • L1Track_DD4hep_2026D77.py: out from cmsDriver
  • L1Track_DD4hep_2026D77_mod.py: manual update the driver

Both out-of-the-box configs come with a call to XMLIdealGeometryESSource in TrackTriggerSetup. The config is at https://github.com/cms-sw/cmssw/blob/master/L1Trigger/TrackerDTC/python/ProducerES_cfi.py#L13-L17

    ProcessHistory = cms.PSet(
        GeometryConfiguration = cms.string('XMLIdealGeometryESSource@'),
        TTStubAlgorithm = cms.string('TTStubAlgorithm_official_Phase2TrackerDigi_@')
    ),

But only in DDD config that has the following ESSource in the config,

process.XMLIdealGeometryESSource = cms.ESSource("XMLIdealGeometryESSource",
    geomXMLFiles = cms.vstring(
        'Geometry/CMSCommonData/data/materials/2021/v1/materials.xml',
        'Geometry/CMSCommonData/data/rotations.xml',
        'Geometry/CMSCommonData/data/extend/v2/cmsextent.xml',
        'Geometry/CMSCommonData/data/cavernData/2021/v1/cavernData.xml',
        'Geometry/CMSCommonData/data/cms/2026/v5/cms.xml',
        'Geometry/CMSCommonData/data/cmsMother.xml',

For DD4Hep case, there is

process.DDDetectorESProducer = cms.ESSource("DDDetectorESProducer",
     appendToDataLabel = cms.string(''),
     confGeomXMLFiles = cms.FileInPath('Geometry/CMSCommonData/data/dd4hep/cmsExtendedGeometry2026D77.xml')
)

I think for the DD4hep case, there should be a modifier somewhere that this ideal geometry source will be added to the config.

With the manual update (L1Track_DD4hep_2026D77_mod.py), I can manage to run. It then crashes with
Module: DTDigitizer:simMuonDTDigis (crashed)

FYI @cvuosalo @cms-sw/l1-l2 @bsunanda
Note sure who will fix the L1Track config part.
Do you want to continue this issue on crash with simMuonDTDigis ?

(*)
Note on cmsDriver I use, it is a short one of step2 (just stop at L1TrackTrigger).
cmsDriver.py step2 -s DIGI:pdigi_valid,L1TrackTrigger --conditions auto:phase2_realistic_T21 --datatier GEN-SIM-DIGI-RAW -n 10 --eventcontent FEVTDEBUGHLT --geometry DD4hepExtended2026D77 --era Phase2C11I13M9 --procModifiers dd4hep --python L1Track_DD4hep_2026D77.py --no_exec --filein file:step1.root --fileout file:step2_DD4hep.root --dump_python

cmsDriver.py step2 -s DIGI:pdigi_valid,L1TrackTrigger --conditions auto:phase2_realistic_T21 --datatier GEN-SIM-DIGI-RAW -n 10 --eventcontent FEVTDEBUGHLT --geometry Extended2026D77 --era Phase2C11I13M9 --python L1Track_DDD_2026D77.py --no_exec --filein file:step1.root --fileout file:step2_DDD.root --dump_python

@srimanob
Copy link
Contributor

srimanob commented Feb 1, 2022

By the way, this is a stack trace before the crash in DTDigitizer:simMuonDTDigis

Thread 2 (Thread 0x7f2f80dac700 (LWP 25960) "cmsRun"):
#0  0x00007f2fabfa31d9 in waitpid () from /lib64/libpthread.so.0
#1  0x00007f2f9ce5be07 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#2  0x00007f2f9ce5c97a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3  0x00007f2fac59af90 in std::execute_native_thread_routine (__p=0x7f2f9ebae5e0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#4  0x00007f2fabf9bea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f2fabcc4b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f2fa9e19540 (LWP 25939) "cmsRun"):
#0  0x00007f2fabcb9ddd in poll () from /lib64/libc.so.6
#1  0x00007f2f9ce5c0bf in full_read.constprop () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#2  0x00007f2f9ce5ca4c in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3  0x00007f2f9ce5f39b in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f2f213b884e in DTDigitizer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/pluginSimMuonDTDigitizer.so
#6  0x00007f2fae730573 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#7  0x00007f2fae71988f in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#8  0x00007f2fae674c65 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#9  0x00007f2fae674f5b in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#10 0x00007f2fae677545 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#11 0x00007f2fae86f7d5 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/libFWCoreConcurrency.so
#12 0x00007f2facdc1b8c in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7f2fa874a500, this=0x7f2fa873fe00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-75e6d730601d8461f20893321f4f7660/tbb-v2021.4.0/src/tbb/task_dispatcher.h:322
#13 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7f2fa873fe00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-75e6d730601d8461f20893321f4f7660/tbb-v2021.4.0/src/tbb/task_dispatcher.h:463
#14 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-75e6d730601d8461f20893321f4f7660/tbb-v2021.4.0/src/tbb/task_dispatcher.cpp:168
#15 0x00007f2fae5e5908 in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#16 0x00007f2fae5f063b in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/nweek-02718/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-01-30-0000/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#17 0x000000000040a266 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#18 0x00007f2facdb015b in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-75e6d730601d8461f20893321f4f7660/tbb-v2021.4.0/src/tbb/arena.cpp:698
#19 0x000000000040b094 in main::{lambda()#1}::operator()() const ()
#20 0x000000000040971c in main ()

Current Modules:

Module: DTDigitizer:simMuonDTDigis (crashed)

FYI @cms-sw/dt-dpg-l2

@srimanob
Copy link
Contributor

srimanob commented Feb 7, 2022

Hi @perrotta @qliphy @civanch @makortel @cvuosalo @bsunanda

I've listed issues that I found in DD4hep phase-2 workflow in
https://docs.google.com/document/d/1es0C2gH8KVt87iDoPRpdq8VVDjuEKtdd4kPjTsfW_RU/edit?usp=sharing

How do you want me to deal with the report on git issue? Should we move one-by-one in separate issues or else? Since some crashes appear after my manual fixes of other issues, it will be a bit complicated to explain in ticket-by-ticket.

Thanks for your advices.

@perrotta
Copy link
Contributor

perrotta commented Feb 7, 2022

Hi @perrotta @qliphy @civanch @makortel @cvuosalo @bsunanda

I've listed issues that I found in DD4hep phase-2 workflow in https://docs.google.com/document/d/1es0C2gH8KVt87iDoPRpdq8VVDjuEKtdd4kPjTsfW_RU/edit?usp=sharing

How do you want me to deal with the report on git issue? Should we move one-by-one in separate issues or else? Since some crashes appear after my manual fixes of other issues, it will be a bit complicated to explain in ticket-by-ticket.

Thanks for your advices.

@srimanob thank you for your investigations
All issues are either correlated, or anyhow they act on the same steps and workflows: I think that a cumulative ticket could be more appropriate to keep track of it, with the different issues representing possible milestones towards the solution of the whole problem.

@fabiocos
Copy link
Contributor Author

Following the discussion at last simulation meeting, I checked scenario D88 for MTD only (see https://github.com/fabiocos/cmssw/tree/fc-testmtdD88 ) and I confirm there is a crash in the reconstruction geometry of MTD:

Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 14-Feb-2022 10:15:53.539 CET
PersistencyIO    INFO  +++ Set Streamer to dd4hep::OpaqueDataBlock
DD4hep           WARN  ++ Using globally Geant4 unit system (mm,ns,MeV)
CompactLoader    INFO  +++ Processing compact file: /cvmfs/cms-ib.cern.ch/nweek-02719/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-02-11-1100/src/Geometry/CMSCommonData/data/dd4hep/cmsExtendedGeometry2026D88.xml with flag (null)
DD4CMS           INFO  +++ Processing the CMS detector description file:///cvmfs/cms-ib.cern.ch/nweek-02719/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-02-11-1100/src/Geometry/CMSCommonData/data/dd4hep/cmsExtendedGeometry2026D88.xml
Detector         INFO  *********** Created World volume with size: 101000 101000 450000
PlacedVolume     INFO  REFLECTION: (x.Cross(y)).Dot(z):       -1 Parent: eregalgo:EFAW [TGeoVolume] Daughter: eregalgo:EHAWR [TGeoVolumeAssembly]
Detector         INFO  +++ Patching names of anonymous shapes....


A fatal system signal has occurred: bus error
The following is the call stack containing the origin of the signal.

Mon Feb 14 10:16:01 CET 2022
Thread 2 (Thread 0x7fa2ddab6700 (LWP 22715) "cmsRun"):
#0  0x00007fa3061cb1d9 in waitpid () from /lib64/libpthread.so.0
#1  0x00007fa2f8e5ce07 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms-ib.cern.ch/nweek-02719/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-02-10-1100/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#2  0x00007fa2f8e5d97a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/nweek-02719/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-02-10-1100/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3  0x00007fa3067c3f90 in std::execute_native_thread_routine (__p=0x7fa2e8972540) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#4  0x00007fa3061c3ea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007fa305eecb0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fa30403f540 (LWP 22518) "cmsRun"):
#0  0x00007fa305ee1ddd in poll () from /lib64/libc.so.6
#1  0x00007fa2f8e5d0bf in full_read.constprop () from /cvmfs/cms-ib.cern.ch/nweek-02719/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-02-10-1100/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#2  0x00007fa2f8e5da4c in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/nweek-02719/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-02-10-1100/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3  0x00007fa2f8e6039b in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02719/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-02-10-1100/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007fa2dcb99900 in cms::DDFilteredView::parameters() const () from /cvmfs/cms-ib.cern.ch/nweek-02719/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-02-10-1100/lib/slc7_amd64_gcc10/libDetectorDescriptionDDCMS.so
#6  0x00007fa2dac8427b in GeometricTimingDet::GeometricTimingDet(cms::DDFilteredView*, GeometricTimingDet::GTDEnumType) () from /gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/libGeometryMTDNumberingBuilder.so
#7  0x00007fa2da51eae0 in CmsMTDConstruction<cms::DDFilteredView>::buildSubdet(cms::DDFilteredView&) () from /gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/pluginGeometryMTDNumberingBuilderPlugins.so
#8  0x00007fa2da52026d in DDCmsMTDConstruction::construct(cms::DDCompactView const&) () from /gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/pluginGeometryMTDNumberingBuilderPlugins.so
#9  0x00007fa2da5245a3 in MTDGeometricTimingDetESModule::produce(IdealGeometryRecord const&) () from /gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/pluginGeometryMTDNumberingBuilderPlugins.so
#10 0x00007fa2da52ac79 in edm::eventsetup::Callback<MTDGeometricTimingDetESModule, std::unique_ptr<GeometricTimingDet, std::default_delete<GeometricTimingDet> >, IdealGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<IdealGeometryRecord> >::runProducerAsync(tbb::detail::d1::task_group*, std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const () from /gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/pluginGeometryMTDNumberingBuilderPlugins.so
#11 0x00007fa2da52b336 in decltype ({parm#1}()) edm::convertException::wrap<edm::eventsetup::Callback<MTDGeometricTimingDetESModule, std::unique_ptr<GeometricTimingDet, std::default_delete<GeometricTimingDet> >, IdealGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<IdealGeometryRecord> >::runProducerAsync(tbb::detail::d1::task_group*, std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}::operator()() const::{lambda()#1}>(edm::eventsetup::Callback<MTDGeometricTimingDetESModule, std::unique_ptr<GeometricTimingDet, std::default_delete<GeometricTimingDet> >, IdealGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<IdealGeometryRecord> >::runProducerAsync(tbb::detail::d1::task_group*, std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}::operator()() const::{lambda()#1}) () from /gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/pluginGeometryMTDNumberingBuilderPlugins.so
#12 0x00007fa2da52b454 in edm::eventsetup::Callback<MTDGeometricTimingDetESModule, std::unique_ptr<GeometricTimingDet, std::default_delete<GeometricTimingDet> >, IdealGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<IdealGeometryRecord> >::runProducerAsync(tbb::detail::d1::task_group*, std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}::operator()() const () from /gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/pluginGeometryMTDNumberingBuilderPlugins.so
#13 0x00007fa2da52c44f in void edm::SerialTaskQueueChain::actionToRun<edm::eventsetup::Callback<MTDGeometricTimingDetESModule, std::unique_ptr<GeometricTimingDet, std::default_delete<GeometricTimingDet> >, IdealGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<IdealGeometryRecord> >::runProducerAsync(tbb::detail::d1::task_group*, std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}&>(edm::eventsetup::Callback<MTDGeometricTimingDetESModule, std::unique_ptr<GeometricTimingDet, std::default_delete<GeometricTimingDet> >, IdealGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<IdealGeometryRecord> >::runProducerAsync(tbb::detail::d1::task_group*, std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}&) () from /gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/pluginGeometryMTDNumberingBuilderPlugins.so
#14 0x00007fa2da52c4c1 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::eventsetup::Callback<MTDGeometricTimingDetESModule, std::unique_ptr<GeometricTimingDet, std::default_delete<GeometricTimingDet> >, IdealGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<IdealGeometryRecord> >::runProducerAsync(tbb::detail::d1::task_group*, std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}>(tbb::detail::d1::task_group&, edm::eventsetup::Callback<MTDGeometricTimingDetESModule, std::unique_ptr<GeometricTimingDet, std::default_delete<GeometricTimingDet> >, IdealGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<IdealGeometryRecord> >::runProducerAsync(tbb::detail::d1::task_group*, std::__exception_ptr::exception_ptr const*, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&)::{lambda()#1}&&)::{lambda()#1}>::execute() () from /gpfs/cms/users/cossutti/Timing/geometry/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/pluginGeometryMTDNumberingBuilderPlugins.so
#15 0x00007fa308a8f075 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/nweek-02719/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-02-10-1100/lib/slc7_amd64_gcc10/libFWCoreConcurrency.so
#16 0x00007fa306feeb8c in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7fa302a9f300, this=0x7fa302acb700) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-75e6d730601d8461f20893321f4f7660/tbb-v2021.4.0/src/tbb/task_dispatcher.h:322
#17 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7fa302acb700) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-75e6d730601d8461f20893321f4f7660/tbb-v2021.4.0/src/tbb/task_dispatcher.h:463
#18 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-75e6d730601d8461f20893321f4f7660/tbb-v2021.4.0/src/tbb/task_dispatcher.cpp:168
#19 0x00007fa30880a988 in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#20 0x00007fa3088157db in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-02-11-1100/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#21 0x000000000040a266 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#22 0x00007fa306fdd15b in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.4.0-75e6d730601d8461f20893321f4f7660/tbb-v2021.4.0/src/tbb/arena.cpp:698
#23 0x000000000040b094 in main::{lambda()#1}::operator()() const ()
#24 0x000000000040971c in main ()

To be understood and fixed

@srimanob
Copy link
Contributor

I've pinged @cms-sw/dt-dpg-l2 @cms-sw/l1-l2 on the issue also on DIGI-L1_HLT step.
Thanks to @fabiocos @cms-sw/mtd-dpg-l2 for following up the issue of MTD in the RECO step.

@fabiocos
Copy link
Contributor Author

I believe I have understood the cause of the crash in the dd4hep version of MTD D88, but I am not sure about the way out. In D88 ETL unfortunately needs to use boolean solids, starting with the main mother volume EndcapTimingLayer, see https://github.com/cms-sw/cmssw/blob/master/Geometry/MTDCommonData/data/etl/v7/etl.xml#L184 . When building the reconstruction geometry, a GeometricTimingDet is created, retrieving the shape parameters, see https://github.com/cms-sw/cmssw/blob/master/Geometry/MTDNumberingBuilder/src/GeometricTimingDet.cc#L99

When a boolean solid is found, this uses an obscure (at least for me) logic within our DD4hep interface, see https://github.com/cms-sw/cmssw/blob/master/DetectorDescription/DDCMS/src/DDFilteredView.cc#L544

In the ETL case there is a boolean union o a boolean union of a polycone and of a trapezoid. Due to the implemented logic there is an iterative search for boolean composites unless the left shape is a box, which is not the case for ETL. And of course when the left shape is a polycone and the static_cast forces it to be a TGeoComposite, a crash happens because there is no protection against this.

This logic was implemented by @ianna some 3 years ago, and I am not sure to which extent it was tested in practice. In any case, there seems to be an implicit assumption that any boolean composition must collapse into the parameters of a box, unless I misinterpret it, why?

@fabiocos
Copy link
Contributor Author

fabiocos commented Feb 14, 2022

This modification to the DDFilteredView::parameters() method allows the code to run until the end providing a meaningful geometry dump (for this specific case):

diff --git a/DetectorDescription/DDCMS/src/DDFilteredView.cc b/DetectorDescription/DDCMS/src/DDFilteredView.cc
index d656338ca47..59852d50ea2 100644
--- a/DetectorDescription/DDCMS/src/DDFilteredView.cc
+++ b/DetectorDescription/DDCMS/src/DDFilteredView.cc
@@ -536,16 +536,30 @@ bool DDFilteredView::accept(std::string_view name) {
 const std::vector<double> DDFilteredView::parameters() const {
   assert(node_);
   Volume currVol = node_->GetVolume();
+  edm::LogInfo("DD4hep") << "Volume = " << currVol->GetName();
+  edm::LogInfo("DD4hep") << "Shape  = " << currVol->GetShape()->GetName();
   // Boolean shapes are a special case
   if (currVol->GetShape()->IsA() == TGeoCompositeShape::Class() and
       not dd4hep::isA<dd4hep::PseudoTrap>(currVol.solid())) {
     const TGeoCompositeShape* shape = static_cast<const TGeoCompositeShape*>(currVol->GetShape());
     const TGeoBoolNode* boolean = shape->GetBoolNode();
-    while (boolean->GetLeftShape()->IsA() != TGeoBBox::Class()) {
+    edm::LogInfo("DD4hep") << "Class  = " << boolean->GetLeftShape()->IsA();
+    edm::LogInfo("DD4hep") << "Comp   = " << TGeoCompositeShape::Class();
+    edm::LogInfo("DD4hep") << "Box    = " << TGeoBBox::Class();
+    edm::LogInfo("DD4hep") << "Pcon   = " << TGeoPcon::Class();
+    edm::LogInfo("DD4hep") << "Shape  = " << boolean->GetLeftShape()->GetName();
+    //while (boolean->GetLeftShape()->IsA() != TGeoBBox::Class()) {
+    while (boolean->GetLeftShape()->IsA() == TGeoCompositeShape::Class()) {
       boolean = static_cast<const TGeoCompositeShape*>(boolean->GetLeftShape())->GetBoolNode();
-    }
-    const TGeoBBox* box = static_cast<const TGeoBBox*>(boolean->GetLeftShape());
-    return {box->GetDX(), box->GetDY(), box->GetDZ()};
+      edm::LogInfo("DD4hep") << "lClass     = " << boolean->GetLeftShape()->IsA();
+      edm::LogInfo("DD4hep") << "lBoolNode  = " << boolean->GetName();
+      edm::LogInfo("DD4hep") << "lShape     = " << boolean->GetLeftShape()->GetName();
+    }
+    //const TGeoBBox* box = static_cast<const TGeoBBox*>(boolean->GetLeftShape());
+    //return {box->GetDX(), box->GetDY(), box->GetDZ()};
+    double boundCyl[4];
+    boolean->GetLeftShape()->GetBoundingCylinder(boundCyl);
+    return {boundCyl[0],boundCyl[1],boundCyl[2],boundCyl[3]};
   } else
     return currVol.solid().dimensions();
 }

I cannot say whether it makes any sense for other use cases, in ETL the boolean solids involve just the main envelope and passive materials (of course the printouts are there just for debugging purposes).

@fabiocos
Copy link
Contributor Author

@cvuosalo @civanch any comment?

@civanch
Copy link
Contributor

civanch commented Feb 14, 2022

@cvuosalo , @ianna , is the problem investigated by Fabio originate from CMSSW or from DD4hep? if it the first, we need a fix, if second - we need to put a ticket for DD4hep asap.

@fabiocos
Copy link
Contributor Author

@civanch well, the above code is in DetectorDescription/DDCMS , so definitely it is a CMSSW part. To which extent this is forced by DD4hep it is for others to comment

@ianna
Copy link
Contributor

ianna commented Feb 14, 2022

This logic was implemented by @ianna some 3 years ago, and I am not sure to which extent it was tested in practice. In any case, there seems to be an implicit assumption that any boolean composition must collapse into the parameters of a box, unless I misinterpret it, why?

@fabiocos - there used to be only one case of the boolean shapes needed that logic. I think, one can add the ETL-specific case to retrieve the parameters, or generalise it there. The latter will be less performant. This is if we do not want to use the DD4hep native parameters (that came later).

@civanch
Copy link
Contributor

civanch commented Feb 14, 2022

@ianna , do I understand you correctly that DD4hep has limitations for union solids? If yes, is it a fundamental problem or simply not addressed one?

@cvuosalo
Copy link
Contributor

@fabiocos Is this problem an opportunity to remove the error-prone sequence of anonymous numbers called params_ from the MTD GeometricTimingDet? This code depends on all components obeying an implicit convention for the order these numbers are returned. It would be more reliable for each parameter to be named and returned by name, rather than relying on an implicit convention.

@fabiocos
Copy link
Contributor Author

@ianna are you referring to the internal structure of PGeometricTimingDet and https://github.com/cms-sw/cmssw/blob/master/Geometry/MTDNumberingBuilder/src/GeometricTimingDet.cc#L148 ? To my knowledge this was brutally copied from the tracker code but it is not used in reality so far, and will have to be reviewed when thinking to a persistent geometry model. The sensitive elements of MTD are always simple boxes. In any case I do not see I have the time now to redesign this part, although I agree it will have to be reviewed.

In practice at present what would do the job and unblock scenario D88 is to add a specific protection in the code above accounting for both the case you designed and this one.

@fabiocos
Copy link
Contributor Author

sorry, I see that the question was by @cvuosalo ...

@fabiocos
Copy link
Contributor Author

@ianna @cvuosalo @civanch #36970 proposes a possible solution to the problem found by @srimanob, which should allow us to move forward in catching other geometry problems in the DD4hep version of D88 step2.

@ianna
Copy link
Contributor

ianna commented Feb 15, 2022

@ianna @cvuosalo @civanch #36970 proposes a possible solution to the problem found by @srimanob, which should allow us to move forward in catching other geometry problems in the DD4hep version of D88 step2.

@fabiocos - the solution looks good to me. Thanks, @srimanob !

@srimanob
Copy link
Contributor

Test with @fabiocos, the step-3 can pass MTD reco geometry construction now. We move to the next issue. Gdoc is updaetd.

@srimanob
Copy link
Contributor

The fix on GEM RECO geometry construction comes in #36941

@cvuosalo
Copy link
Contributor

The fix for MTD in #36970 looks good. Thanks @fabiocos. I'll create an issue for the possible re-design of Geometry/MTDNumberingBuilder/src/GeometricTimingDet.cc, as mentioned by Fabio: #36837 (comment)

@srimanob
Copy link
Contributor

srimanob commented Feb 20, 2022

Following up on the DT issue (from email discussion with @cms-sw/dt-dpg-l2), we first focus on simMuonDTDigis.

The DT detIDs have not been built properly in Phase2 DD4hep. We got a total of 2780 (Chamber+SL+Layer) instead of 3650. Looking at Geometry/DTGeometryBuilder/src/[DTGeometryBuilderFromDD4hep.cc, these lines

bool doSL = fview.nextSibling();
bool doLayers = fview.sibling();

in https://github.com/cms-sw/cmssw/blob/master/Geometry/DTGeometryBuilder/src/DTGeometryBuilderFromDD4hep.cc#L60-L65 do not return as expected. For example, doSL = FALSE for some chambers, so no super layers have been built for those chambers.

Consider that

  • the same class is used for Run-3, and DT detIDs returns properly in Run-3 wfs (=3650),
  • and the same XML used for DDD and DD4hep in Phase-2,

we need to debug more since the issue seems to appear only on Phase-2 DD4hep.

To reproduce the issue one can use the following cmsDriver with CMSSW_12_3_0_pre5,
DD4hep:
cmsDriver.py step2 -s DIGI:pdigi_valid --conditions auto:phase2_realistic_T21 --datatier GEN-SIM-DIGI-RAW -n 10 --eventcontent FEVTDEBUGHLT --geometry DD4hepExtended2026D88 --era Phase2C17I13M9 --procModifiers dd4hep --python Digi_DD4hep_Dump_2026D88.py --no_exec --filein file:/afs/cern.ch/user/s/srimanob/public/ForPhase2SW/DD4hep/GEN-SIM/step1-DD4hep.root --fileout file:step2.root --dump_python

Digi-only DDD:
cmsDriver.py step2 -s DIGI:pdigi_valid --conditions auto:phase2_realistic_T21 --datatier GEN-SIM-DIGI-RAW -n 10 --eventcontent FEVTDEBUGHLT --geometry Extended2026D88 --era Phase2C17I13M9 --python Digi_DDD_Dump_2026D88.py --no_exec --filein file:/afs/cern.ch/user/s/srimanob/public/ForPhase2SW/DD4hep/GEN-SIM/step1-DD4hep.root --fileout file:step2.root --dump_python

FYI @slomeo @ianna

@slomeo
Copy link
Contributor

slomeo commented Feb 20, 2022

Hi @srimanob : For Run3 all the geometry checks made were all ok (i.e. DDD vs DD4HEP) so it is really stange the presence of this issue for Phase2. I'll try my best to perform a check: @ianna do you have any suggestions?

@slomeo
Copy link
Contributor

slomeo commented Feb 20, 2022

Hi @srimanob can you please write the cmsDriver.py options to reproduce your step1-DD4hep.root ? I'd like to reproduce the issue starting from a GEN-SIM .root file.

@srimanob
Copy link
Contributor

srimanob commented Feb 20, 2022

I pick it from the wf 39434.911, i.e.

runTheMatrix.py --what upgrade -l 39434.0 -t 8 --wm init
or
cmsDriver.py TTbar_14TeV_TuneCP5_cfi -s GEN,SIM -n 10 --conditions auto:phase2_realistic_T21 --beamspot HLLHC14TeV --datatier GEN-SIM --eventcontent FEVTDEBUG --geometry DD4hepExtended2026D88 --era Phase2C17I13M9 --procModifiers dd4hep --python TTbar_14TeV_TuneCP5_2026D88_GenSimHLBeamSpot14_DD4hep.py --no_exec --fileout file:step1.root --nThreads 8

@srimanob
Copy link
Contributor

My test PR on DD4hep for L1Track step #37005
The local test shows that I can pass the issue in #36837 (comment)

@srimanob
Copy link
Contributor

srimanob commented Feb 27, 2022

Hi All,

I've proposed an update on dtSpecsFilter.xml used in Phase-2, now the workflow 39434.911 can run until the end. Needed PRs include

Now, we have error/warning message from HCAL to look at. FYI @cms-sw/hcal-dpg-l2

Message in DIGI step:

%MSG-e HcalDigitizer:  MixingModule:mix 27-Feb-2022 11:23:14 CET Run: 1 Event: 3
bad hcal id found in digitizer. Skipping 1161838639 (HE -16,47,4)
%MSG

Message in RECO step:

%MSG-w HcalDetId:  HcalDigisValidation:AllHcalDigisValidation  27-Feb-2022 11:30:04 CET Run: 1 Event: 9
HcalDetID(SimHit) presents conflicting information. Depth: 4, iphi: 53, ieta: 16. Max depth from geometry is: 0. TestNumber = 1
%MSG

The summary on how to reproduce the result is in (*)

(*)
Please use recent IB release as fixed in MTD reco geometry is not yet in the official release.

cmsrel CMSSW_12_3_X_2022-02-26-1100
cd CMSSW_12_3_X_2022-02-26-1100/src/
cmsenv
git cms-merge-topic srimanob:123_L1TDD4HepWf
git cms-merge-topic srimanob:123_FixPhase2DTSpecsFilter
scram b -j 8
runTheMatrix.py --what upgrade -l 39434.911 -t 8 --wm init

@fabiocos
Copy link
Contributor Author

@srimanob the issue in itself looks closed now that the workflow can run through all steps without crashes. Residual warning/error messages seem to go beyond the scope of this issue, if meaningful results can be generally produced. Therefore once all the fixes are integrated, I would prefer to close this issue, and move possible further discussions about residual problems to some dedicated thread/issue.

@srimanob
Copy link
Contributor

Hi @fabiocos
I agree with you. When 3 PRs (#37005, #37078, #37079) are merged, we can close this issue. I will open the new issue(s) to follow up on error/warning message of the workflow.

@srimanob
Copy link
Contributor

I've opened the issue #37087 to collect issues/problems in DDD-DD4hep Phase-2 validation.

@fabiocos
Copy link
Contributor Author

fabiocos commented Mar 3, 2022

@srimanob since all the PR listed above have been merged, I consider this issue as solved.

@fabiocos fabiocos closed this as completed Mar 3, 2022
@cvuosalo
Copy link
Contributor

cvuosalo commented Mar 8, 2022

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants