Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer overflow in RecHitsSortedInPhi on NaN #35489

Closed
dan131riley opened this issue Sep 30, 2021 · 21 comments · Fixed by #36852
Closed

buffer overflow in RecHitsSortedInPhi on NaN #35489

dan131riley opened this issue Sep 30, 2021 · 21 comments · Fixed by #36852

Comments

@dan131riley
Copy link

ASAN found a heap buffer overflow in RecHitsSortedInPhi::RecHitsSortedInPhi(), very likely because phi for the track was NaN, which doesn't sort well.

%MSG-e TrajectoryNaN:  TrackProducer:hltIterL3OIMuCtfWithMaterialTracksNoVtx  30-Sep-2021 06:32:04 CEST Run: 1 Event: 109
Trajectory has NaN
%MSG
%MSG-e TrajectoryNaN:  TrackProducer:hltIterL3OIMuCtfWithMaterialTracksNoVtx  30-Sep-2021 06:32:04 CEST Run: 1 Event: 109
Trajectory has NaN
%MSG
%MSG-e TrajectoryNaN:  TrackProducer:hltIterL3OIMuCtfWithMaterialTracks  30-Sep-2021 06:32:05 CEST Run: 1 Event: 109
Trajectory has NaN
%MSG
%MSG-e TrajectoryNaN:   TrackProducer:hltIterL3OIMuCtfWithMaterialTracksScoutingNoVtx  30-Sep-2021 06:32:11 CEST Run: 1 Event: 109
Trajectory has NaN
%MSG
%MSG-e TrajectoryNaN:   TrackProducer:hltIterL3OIMuCtfWithMaterialTracksScoutingNoVtx  30-Sep-2021 06:32:11 CEST Run: 1 Event: 109
Trajectory has NaN
%MSG
=================================================================
==3648==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61b0013c9d78 at pc 0x2b614f17ac19 bp 0x2b616f95a810 sp 0x2b616f95a808
READ of size 4 at 0x61b0013c9d78 thread T7
    #0 0x2b614f17ac18 in RecHitsSortedInPhi::RecHitsSortedInPhi(std::vector<BaseTrackerRecHit const*, std::allocator<BaseTrackerRecHit const*> > const&, Point3DBase<float, GlobalTag> const&, DetLayer const*) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libRecoTrackerTkHitPairs.so+0x82c18)
    #1 0x2b614f16984a in LayerHitMapCache::operator()(SeedingLayerSetsHits::SeedingLayer const&, TrackingRegion const&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libRecoTrackerTkHitPairs.so+0x7184a)
    #2 0x2b614f163383 in HitPairGeneratorFromLayerPair::doublets(TrackingRegion const&, edm::Event const&, edm::EventSetup const&, SeedingLayerSetsHits::SeedingLayer const&, SeedingLayerSetsHits::SeedingLayer const&, LayerHitMapCache&) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libRecoTrackerTkHitPairs.so+0x6b383)
    #3 0x2b6180807612 in (anonymous namespace)::Impl<(anonymous namespace)::DoNothing, (anonymous namespace)::ImplIntermediateHitDoublets, (anonymous namespace)::RegionsLayersSeparate>::produce(bool, edm::Event&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/pluginRecoTrackerTkHitPairsPlugins.so+0x31612)
    #4 0x2b61807ed63a in HitPairEDProducer::produce(edm::Event&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/pluginRecoTrackerTkHitPairsPlugins.so+0x1763a)
    #5 0x2b611cdb3f1e in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so+0x904f1e)
    #6 0x2b611cd1ee72 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so+0x86fe72)
    #7 0x2b611ca13004 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so+0x564004)
    #8 0x2b611ca1384a in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so+0x56484a)
    #9 0x2b611ca1f2a6 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so+0x5702a6)
    #10 0x2b611d719231 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreConcurrency.so+0x11231)
    #11 0x2b611ed433de in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/task_dispatcher.h:321
    #12 0x2b611ed433de in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/task_dispatcher.h:462
    #13 0x2b611ed433de in tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/arena.cpp:138
    #14 0x2b611ed528ea in tbb::detail::r1::market::process(rml::job&) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/market.cpp:596
    #15 0x2b611ed562d3 in tbb::detail::r1::rml::private_worker::run() /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/private_server.cpp:267
    #16 0x2b611ed564f8 in tbb::detail::r1::rml::private_worker::thread_routine(void*) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/private_server.cpp:221
    #17 0x2b611f955ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)
    #18 0x2b611fc689fc in __clone (/lib64/libc.so.6+0xfe9fc)

0x61b0013c9d78 is located 8 bytes to the left of 1552-byte region [0x61b0013c9d80,0x61b0013ca390)
allocated by thread T7 here:
    #0 0x2b611bb923d7 in operator new(unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x2b614f17a99a in RecHitsSortedInPhi::RecHitsSortedInPhi(std::vector<BaseTrackerRecHit const*, std::allocator<BaseTrackerRecHit const*> > const&, Point3DBase<float, GlobalTag> const&, DetLayer const*) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libRecoTrackerTkHitPairs.so+0x8299a)
    #2 0x2b614f16984a in LayerHitMapCache::operator()(SeedingLayerSetsHits::SeedingLayer const&, TrackingRegion const&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libRecoTrackerTkHitPairs.so+0x7184a)
    #3 0x2b614f163383 in HitPairGeneratorFromLayerPair::doublets(TrackingRegion const&, edm::Event const&, edm::EventSetup const&, SeedingLayerSetsHits::SeedingLayer const&, SeedingLayerSetsHits::SeedingLayer const&, LayerHitMapCache&) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libRecoTrackerTkHitPairs.so+0x6b383)
    #4 0x2b6180807612 in (anonymous namespace)::Impl<(anonymous namespace)::DoNothing, (anonymous namespace)::ImplIntermediateHitDoublets, (anonymous namespace)::RegionsLayersSeparate>::produce(bool, edm::Event&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/pluginRecoTrackerTkHitPairsPlugins.so+0x31612)
    #5 0x2b61807ed63a in HitPairEDProducer::produce(edm::Event&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/pluginRecoTrackerTkHitPairsPlugins.so+0x1763a)
    #6 0x2b611cdb3f1e in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so+0x904f1e)
    #7 0x2b611cd1ee72 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so+0x86fe72)
    #8 0x2b611ca13004 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so+0x564004)
    #9 0x2b611ca1384a in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so+0x56484a)
    #10 0x2b611ca1f2a6 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so+0x5702a6)
    #11 0x2b611d719231 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libFWCoreConcurrency.so+0x11231)
    #12 0x2b611ed433de in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/task_dispatcher.h:321
    #13 0x2b611ed433de in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/task_dispatcher.h:462
    #14 0x2b611ed433de in tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/arena.cpp:138
    #15 0x2b611ed528ea in tbb::detail::r1::market::process(rml::job&) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/market.cpp:596

Thread T7 created by T0 here:
    #0 0x2b611bb3c0a2 in __interceptor_pthread_create ../../../../libsanitizer/asan/asan_interceptors.cpp:214
    #1 0x2b611ed5608e in tbb::detail::r1::rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/rml_thread_monitor.h:208
    #2 0x2b611ed5608e in tbb::detail::r1::rml::private_worker::wake_or_launch() /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/private_server.cpp:299
    #3 0x2b611ed5608e in tbb::detail::r1::rml::private_server::wake_some(int) /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc10/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc10/external/tbb/v2021.3.0-d08d4722ccbb5e1470ca0b76226c3cf7/tbb-v2021.3.0/src/tbb/private_server.cpp:397

SUMMARY: AddressSanitizer: heap-buffer-overflow (/cvmfs/cms-ib.cern.ch/nweek-02700/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_1_ASAN_X_2021-09-29-2300/lib/slc7_amd64_gcc10/libRecoTrackerTkHitPairs.so+0x82c18) in RecHitsSortedInPhi::RecHitsSortedInPhi(std::vector<BaseTrackerRecHit const*, std::allocator<BaseTrackerRecHit const*> > const&, Point3DBase<float, GlobalTag> const&, DetLayer const*)
Shadow bytes around the buggy address:
  0x0c3680271350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c3680271360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c3680271370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c3680271380: 00 00 00 00 fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c3680271390: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c36802713a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
  0x0c36802713b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c36802713c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c36802713d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c36802713e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c36802713f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==3648==ABORTING
@cmsbuild
Copy link
Contributor

A new Issue was created by @dan131riley Dan Riley.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign hlt,reconstruction

@cmsbuild
Copy link
Contributor

New categories assigned: hlt,reconstruction

@slava77,@Martin-Grunewald,@jpata,@missirol you have been requested to review this Pull request/Issue and eventually sign? Thanks

@missirol
Copy link
Contributor

@dan131riley @makortel how can i reproduce the issue? (forgive my ignorance here..)

@dan131riley
Copy link
Author

It was seen in step2 of the ASAN builds of WF 11834.0, logs here:

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/slc7_amd64_gcc10/CMSSW_12_1_ASAN_X_2021-09-29-2300/pyRelValMatrixLogs/run/11834.0_TTbar_14TeV+2021PU+TTbar_14TeV_TuneCP5_GenSimINPUT+DigiPU+RecoPU+HARVESTPU+Nano/step2_TTbar_14TeV+2021PU+TTbar_14TeV_TuneCP5_GenSimINPUT+DigiPU+RecoPU+HARVESTPU+Nano.log

To reproduce, you'd need an ASAN build, e.g. CMSSW_12_1_ASAN_X_2021-09-29-2300, and probably the exact step1 input for that relval, as this appears to be a rare condition. Since this is a rare condition, I'd recommend just putting in some protection against NaNs.

@slava77
Copy link
Contributor

slava77 commented Sep 30, 2021

@vmariani @mmusich
please take a note

@mmusich
Copy link
Contributor

mmusich commented Oct 1, 2021

so... it took a lot of time to run in an ASAN_X build but this change:

diff --git a/RecoTracker/TkHitPairs/interface/RecHitsSortedInPhi.h b/RecoTracker/TkHitPairs/interface/RecHitsSortedInPhi.h
index cadec4e0d1d..b952a873cf4 100644
--- a/RecoTracker/TkHitPairs/interface/RecHitsSortedInPhi.h
+++ b/RecoTracker/TkHitPairs/interface/RecHitsSortedInPhi.h
@@ -3,6 +3,7 @@
 
 #include "DataFormats/TrackerRecHit2D/interface/BaseTrackerRecHit.h"
 #include "TrackingTools/DetLayers/interface/DetLayer.h"
+#include "FWCore/Utilities/interface/isFinite.h"
 
 #include <vector>
 #include <array>
@@ -33,7 +34,13 @@ public:
   };
 
   struct HitLessPhi {
-    bool operator()(const HitWithPhi& a, const HitWithPhi& b) { return a.phi() < b.phi(); }
+    bool operator()(const HitWithPhi& a, const HitWithPhi& b) {
+      if (!edm::isFinite(a.phi()))
+        return false;
+      if (!edm::isFinite(b.phi()))
+        return true;
+      return a.phi() < b.phi();
+    }
   };
   typedef std::vector<HitWithPhi>::const_iterator HitIter;
   typedef std::pair<HitIter, HitIter> Range;

seems to be able to remove the address sanitizer issues in step2 of wf. 11834.0.
Would it be acceptable for a PR?

@VinInn
Copy link
Contributor

VinInn commented Oct 1, 2021

or

int ia, ib;
memcpy(&ia, &a.phi(),4); memcpy(&ib, &b.phi(),4);
return ia<ib;

@mmusich
Copy link
Contributor

mmusich commented Oct 1, 2021

or

int ia, ib;
memcpy(&ia, &a.phi(),4); memcpy(&ib, &b.phi(),4);
return ia<ib;

as is , it does not compile.

 error: lvalue required as unary '&' operand

the interface is:

bool operator()(const HitWithPhi& a, const HitWithPhi& b) { return a.phi() < b.phi(); }

This, instead:

diff --git a/RecoTracker/TkHitPairs/interface/RecHitsSortedInPhi.h b/RecoTracker/TkHitPairs/interface/RecHitsSortedInPhi.h
index cadec4e0d1d..e3936af535e 100644
--- a/RecoTracker/TkHitPairs/interface/RecHitsSortedInPhi.h
+++ b/RecoTracker/TkHitPairs/interface/RecHitsSortedInPhi.h
@@ -33,7 +33,13 @@ public:
   };
 
   struct HitLessPhi {
-    bool operator()(const HitWithPhi& a, const HitWithPhi& b) { return a.phi() < b.phi(); }
+    bool operator()(const HitWithPhi& a, const HitWithPhi& b) {
+      int ia, ib;
+      auto phia = a.phi();
+      auto phib = b.phi();
+      memcpy(&ia,&phia,4); memcpy(&ib,&phib,4);
+      return ia<ib;
+    }
   };
   typedef std::vector<HitWithPhi>::const_iterator HitIter;
   typedef std::pair<HitIter, HitIter> Range;

does compile and resolves the issue, though I observe some more warnings that were not before with the solution at #35489 (comment)

%MSG-w Muon:  L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx  01-Oct-2021 13:22:46 CEST Run: 1 Event: 9
     Failed to load tracker track trajectory
%MSG
%MSG-w RecoMuon:  L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx  01-Oct-2021 13:22:46 CEST Run: 1 Event: 9
     Failed to load tracker track trajectory
%MSG
%MSG-w GlobalMuon:  L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx  01-Oct-2021 13:22:46 CEST Run: 1 Event: 9
     Failed to load tracker track trajectory
%MSG
%MSG-w GlobalTrajectoryBuilderBase:   L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx 01-Oct-2021 13:22:46 CEST  Run: 1 Event: 9
     Failed to load tracker track trajectory
%MSG

I can open the PR with any of these solutions, please @cms-sw/reconstruction-l2 let me know.

@VinInn
Copy link
Contributor

VinInn commented Oct 1, 2021

I think we should cure the cause not the symptoms..

  1. understand how could be that the position of a HIT becomes NaN
  2. eventually filter them out in filling the RecHitsSortedInPhi data structure

@VinInn
Copy link
Contributor

VinInn commented Oct 1, 2021

HLT is full of those warning (even in MC). Again would be useful to understand their origin

@dan131riley
Copy link
Author

  •  memcpy(&ia,&phia,4); memcpy(&ib,&phib,4);
    
  •  return ia<ib;
    

Definitely not, that was @VinInn trolling. Preferences, in order best to least preferred:

  1. fix the underlying cause--how are we even getting NaNs?
  2. filter out NaNs in RecHitsSortedInPhi
  3. modify the comparison function to sort NaNs in a consistent order (your first proposal)

Right out: anything with memcpy().

@VinInn
Copy link
Contributor

VinInn commented Oct 1, 2021 via email

@mmusich
Copy link
Contributor

mmusich commented Oct 1, 2021

@dan131riley :

Preferences, in order best to least preferred:

I have difficulties to reconcile this what you wrote earlier:

Since this is a rare condition, I'd recommend just putting in some protection against NaNs.

Also I don't understand how

filter out NaNs in RecHitsSortedInPhi

can be OK. I guess they still need to be put somewhere in the list.

@VinInn
Copy link
Contributor

VinInn commented Oct 1, 2021

Hits with NaN (or other illness) do not require to be loaded in RecHitsSortedInPhi that is used to build Doublets for seeds.

@VinInn
Copy link
Contributor

VinInn commented Oct 1, 2021

I am uneasy with adding two ifs in the less operator...
The condition is indeed very rare: (not able to trap it yet)
maybe we just trap nan in the constructor of "HitWithPhi" and assign 0 (the logic is wrapping so assigning pi+eps will not help
at least is done once and not twice for each couple of hits..

@VinInn
Copy link
Contributor

VinInn commented Oct 1, 2021

back to 1)

added

iff --git a/RecoTracker/TkHitPairs/src/RecHitsSortedInPhi.cc b/RecoTracker/TkHitPairs/src/RecHitsSortedInPhi.cc
index 3d806501f56..0969c0daa2a 100644
--- a/RecoTracker/TkHitPairs/src/RecHitsSortedInPhi.cc
+++ b/RecoTracker/TkHitPairs/src/RecHitsSortedInPhi.cc
@@ -1,5 +1,6 @@
 #include "RecoTracker/TkHitPairs/interface/RecHitsSortedInPhi.h"
 #include "DataFormats/TrackerRecHit2D/interface/BaseTrackerRecHit.h"
+#include "FWCore/Utilities/interface/isFinite.h"

 #include <algorithm>
 #include <cassert>
@@ -21,9 +22,13 @@ RecHitsSortedInPhi::RecHitsSortedInPhi(const std::vector<Hit>& hits, GlobalPoint
   // assert(origin.x()==0 && origin.y()==0);

   theHits.reserve(hits.size());
-  for (auto const& hp : hits)
+  for (auto const& hp : hits) {
+    auto gp = hp->globalPosition();
+    auto phi = gp.barePhi();
+    if (!edm::isFinite(phi))
+      std::cout << "NAN in layer " << layer << ' ' << hp->rawId() << ' ' << hp-> type() << ' ' << phi << ' ' << gp << ' ' << hp->localPosition() << std::endl;
     theHits.emplace_back(hp);
-
+  }
   std::sort(theHits.begin(), theHits.end(), HitLessPhi());

   for (unsigned int i = 0; i != theHits.size(); ++i) {

generated this

[innocent@patatrack02 innocent]$ dasgoclient --limit 0 --query 'file dataset=/RelValTTbar_14TeV/CMSSW_12_0_0_pre4-120X_mcRun3_2021_realistic_v2-v1/GEN-SIM' | sort -u > step1_dasquery.log
[innocent@patatrack02 innocent]$ cmsDriver.py step2 -s DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2021 --conditions auto:phase1_2021_realistic --datatier GEN-SIM-DIGI-RAW -n 10 --eventcontent FEVTDEBUGHLT --geometry DB:Extended --era Run3 --pileup Run3_Flat55To75_PoissonOOTPU --pileup_input das:/RelValMinBias_14TeV/CMSSW_12_0_0_pre4-120X_mcRun3_2021_realistic_v2-v1/GEN-SIM --filein filelist:step1_dasquery.log --fileout file:step2.root --no_exec
DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2021,ENDJOB
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
with DB:
entry filelist:step1_dasquery.log
found files:  ['/store/relval/CMSSW_12_0_0_pre4/RelValTTbar_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/361e4fff-3935-4325-9891-62841915ccd1.root', '/store/relval/CMSSW_12_0_0_pre4/RelValTTbar_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/5a5f9bff-e48b-47c3-9081-5b77513d1bff.root', '/store/relval/CMSSW_12_0_0_pre4/RelValTTbar_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/826f9b32-706e-467e-a51b-f3636d7d4147.root', '/store/relval/CMSSW_12_0_0_pre4/RelValTTbar_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/99843aa4-3246-443b-88be-da487ad3edc8.root']
the query is file dataset = /RelValMinBias_14TeV/CMSSW_12_0_0_pre4-120X_mcRun3_2021_realistic_v2-v1/GEN-SIM
DAS succeeded after 1 attempts 0
found files:  ['/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/1f2432c4-b04b-4be9-a5f4-d6121657ad61.root', '/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/5188ff57-8de1-4731-958d-0206eefb725e.root', '/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/8ffd7938-eb07-44b5-9694-ea018ac892db.root', '/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/ace00a2c-259f-4ac1-b449-9fed3dc70e17.root', '/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/b1db77ed-9af1-41e1-8acf-6aeb3a4ffefd.root', '/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root']

and see no error report

Begin processing the 2nd record. Run 1, Event 101, LumiSection 2 on stream 0 at 01-Oct-2021 14:59:26.932 CEST
[2021-10-01 15:00:42.020547 +0200][Error  ][PostMaster        ] [cmsxrootd-site3.fnal.gov:1093 #0] Forcing error on disconnect: [ERROR] Operation interrupted.
[2021-10-01 15:00:51.071862 +0200][Error  ][PostMaster        ] [cmsxrootd-site1.fnal.gov:1093 #0] Forcing error on disconnect: [ERROR] Operation interrupted.
Begin processing the 3rd record. Run 1, Event 103, LumiSection 2 on stream 0 at 01-Oct-2021 15:02:11.713 CEST
Begin processing the 4th record. Run 1, Event 104, LumiSection 2 on stream 0 at 01-Oct-2021 15:05:33.020 CEST
Begin processing the 5th record. Run 1, Event 105, LumiSection 2 on stream 0 at 01-Oct-2021 15:08:32.481 CEST
[2021-10-01 15:09:39.030113 +0200][Error  ][PostMaster        ] [cmsxrootd-site2.fnal.gov:1093 #0] Forcing error on disconnect: [ERROR] Operation interrupted.
%MSG-e TrajectoryNotPosDef:   TrackProducer:hltDoubletRecoveryPFlowCtfWithMaterialTracksForBTag  01-Oct-2021 15:11:16 CEST Run: 1 Event: 105
Trajectory covariance is not positive-definite
%MSG
Begin processing the 6th record. Run 1, Event 106, LumiSection 2 on stream 0 at 01-Oct-2021 15:11:28.326 CEST
[2021-10-01 15:12:56.048811 +0200][Error  ][PostMaster        ] [eoscms.cern.ch:1094 #0] Forcing error on disconnect: [ERROR] Operation interrupted.
Begin processing the 7th record. Run 1, Event 107, LumiSection 2 on stream 0 at 01-Oct-2021 15:14:13.954 CEST
[2021-10-01 15:15:42.037775 +0200][Error  ][PostMaster        ] [cmsxrootd-site3.fnal.gov:1094 #0] Forcing error on disconnect: [ERROR] Operation interrupted.
Begin processing the 8th record. Run 1, Event 109, LumiSection 2 on stream 0 at 01-Oct-2021 15:16:57.392 CEST
Begin processing the 9th record. Run 1, Event 108, LumiSection 2 on stream 0 at 01-Oct-2021 15:19:48.384 CEST
[2021-10-01 15:21:56.335535 +0200][Error  ][PostMaster        ] [cmsxrootd-site2.fnal.gov:1093 #0] Forcing error on disconnect: [ERROR] Operation interrupted.
Begin processing the 10th record. Run 1, Event 110, LumiSection 2 on stream 0 at 01-Oct-2021 15:22:42.009 CEST
01-Oct-2021 15:25:42 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValTTbar_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/361e4fff-3935-4325-9891-62841915ccd1.root
01-Oct-2021 15:25:47 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/ace00a2c-259f-4ac1-b449-9fed3dc70e17.root

We can put a warning (or even an error) and see if ever triggers...

@VinInn
Copy link
Contributor

VinInn commented Oct 1, 2021

is step2 reproducible?
I run again on a different release (and with 4 threads) and got a different output

01-Oct-2021 16:12:46 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/8ffd7938-eb07-44b5-9694-ea018ac892db.root
01-Oct-2021 16:12:46 CEST  Initiating request to open file root://eoscms.cern.ch//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root
%MSG-w XrdAdaptorInternal:  MixingModule:mix 01-Oct-2021 16:12:47 CEST  Run: 1 Event: 104
Failed to open file at URL root://eoscms.cern.ch:1094//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root?xrdcl.requuid=2b399f7b-b13f-4da4-b6ec-fe3e2e250c36.
%MSG
%MSG-w XrdAdaptorInternal:  MixingModule:mix 01-Oct-2021 16:12:47 CEST  Run: 1 Event: 104
Failed to open file at URL root://eoscms.cern.ch:1094//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root?tried=&xrdcl.requuid=317ddd99-0c16-48d5-9230-2a9621634aa2.
%MSG
01-Oct-2021 16:12:47 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/b1db77ed-9af1-41e1-8acf-6aeb3a4ffefd.root
01-Oct-2021 16:12:47 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/8ffd7938-eb07-44b5-9694-ea018ac892db.root
01-Oct-2021 16:12:47 CEST  Initiating request to open file root://eoscms.cern.ch//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/5188ff57-8de1-4731-958d-0206eefb725e.root
01-Oct-2021 16:12:47 CEST  Initiating request to open file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root
01-Oct-2021 16:12:47 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/b1db77ed-9af1-41e1-8acf-6aeb3a4ffefd.root
01-Oct-2021 16:12:47 CEST  Initiating request to open file root://eoscms.cern.ch//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/ace00a2c-259f-4ac1-b449-9fed3dc70e17.root
01-Oct-2021 16:12:47 CEST  Initiating request to open file root://eoscms.cern.ch//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root
%MSG-w XrdAdaptorInternal:  MixingModule:mix 01-Oct-2021 16:12:48 CEST  Run: 1 Event: 103
Failed to open file at URL root://eoscms.cern.ch:1094//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/5188ff57-8de1-4731-958d-0206eefb725e.root?xrdcl.requuid=2a221798-34c4-4030-8a07-bf32b1664fea.
%MSG
%MSG-w XrdAdaptorInternal:  MixingModule:mix 01-Oct-2021 16:12:48 CEST  Run: 1 Event: 103
Failed to open file at URL root://eoscms.cern.ch:1094//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/5188ff57-8de1-4731-958d-0206eefb725e.root?tried=&xrdcl.requuid=db720972-958d-486c-86eb-9f59831ec492.
%MSG
01-Oct-2021 16:12:48 CEST  Initiating request to open file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/5188ff57-8de1-4731-958d-0206eefb725e.root
%MSG-w XrdAdaptor:  MixingModule:mix 01-Oct-2021 16:12:51 CEST Run: 1 Event: 104
Data is served from T2_BR_SPRACE instead of original site T2_FR_GRIF_LLR
%MSG
01-Oct-2021 16:12:52 CEST  Successfully opened file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root
%MSG-w XrdAdaptorInternal:  MixingModule:mix 01-Oct-2021 16:12:53 CEST  Run: 1 Event: 101
Failed to open file at URL root://eoscms.cern.ch:1094//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/ace00a2c-259f-4ac1-b449-9fed3dc70e17.root?xrdcl.requuid=2bf38a50-173b-4610-8623-a24b37486c9c.
%MSG
%MSG-w XrdAdaptorInternal:  MixingModule:mix 01-Oct-2021 16:12:53 CEST  Run: 1 Event: 101
Failed to open file at URL root://eoscms.cern.ch:1094//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/ace00a2c-259f-4ac1-b449-9fed3dc70e17.root?tried=&xrdcl.requuid=18ab90ff-bd2d-4097-824d-a81608fba7f7.
%MSG
01-Oct-2021 16:12:53 CEST  Initiating request to open file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/ace00a2c-259f-4ac1-b449-9fed3dc70e17.root
%MSG-w XrdAdaptorInternal:  MixingModule:mix 01-Oct-2021 16:12:54 CEST  Run: 1 Event: 102
Failed to open file at URL root://eoscms.cern.ch:1094//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root?xrdcl.requuid=e3005ae8-8446-4b4a-b0df-2e65b7bb42e3.
%MSG
%MSG-w XrdAdaptorInternal:  MixingModule:mix 01-Oct-2021 16:12:54 CEST  Run: 1 Event: 102
Failed to open file at URL root://eoscms.cern.ch:1094//eos/cms/store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root?tried=&xrdcl.requuid=e5d01cb8-39b7-4eed-9111-b6bdf78f6996.
%MSG
%MSG-w XrdAdaptor:  MixingModule:mix 01-Oct-2021 16:13:08 CEST Run: 1 Event: 103
Data is served from T2_BR_SPRACE instead of original site T2_FR_GRIF_LLR
%MSG
01-Oct-2021 16:13:09 CEST  Successfully opened file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/5188ff57-8de1-4731-958d-0206eefb725e.root
01-Oct-2021 16:13:09 CEST  Initiating request to open file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root
%MSG-w XrdAdaptor:  MixingModule:mix 01-Oct-2021 16:13:12 CEST Run: 1 Event: 101
Data is served from T2_BR_SPRACE instead of original site T2_FR_GRIF_LLR
%MSG
01-Oct-2021 16:13:12 CEST  Successfully opened file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/ace00a2c-259f-4ac1-b449-9fed3dc70e17.root
%MSG-w XrdAdaptor:  MixingModule:mix 01-Oct-2021 16:13:15 CEST Run: 1 Event: 102
Data is served from T2_BR_SPRACE instead of original site T2_FR_GRIF_LLR
%MSG
01-Oct-2021 16:13:16 CEST  Successfully opened file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root
%MSG-w XrdAdaptor:  (NoModuleName) 01-Oct-2021 16:13:27 CEST pre-events
Data is now served from T2_BR_SPRACE, [2620 instead of previous T2_BR_SPRACE
%MSG
%MSG-w XrdAdaptor:  MixingModule:mix 01-Oct-2021 16:14:39 CEST Run: 1 Event: 101
Data is now served from [2620 instead of previous T2_BR_SPRACE, [2620
%MSG
%MSG-w XrdAdaptor:  (NoModuleName) 01-Oct-2021 16:15:28 CEST pre-events
Data is now served from T2_BR_SPRACE, [2620 instead of previous T2_BR_SPRACE
%MSG
%MSG-w XrdAdaptor:  MixingModule:mix 01-Oct-2021 16:15:44 CEST Run: 1 Event: 102
Data is now served from T2_BR_SPRACE instead of previous T2_BR_SPRACE, [2620
%MSG
%MSG-w XrdAdaptor:  (NoModuleName) 01-Oct-2021 16:16:03 CEST pre-events
Data is now served from T2_BR_SPRACE, [2620 instead of previous T2_BR_SPRACE
%MSG
%MSG-w XrdAdaptor:  MixingModule:mix 01-Oct-2021 16:16:04 CEST Run: 1 Event: 103
Data is now served from [2620 instead of previous T2_BR_SPRACE, [2620
%MSG
%MSG-w XrdAdaptor:  (NoModuleName) 01-Oct-2021 16:16:15 CEST pre-events
Data is now served from T2_BR_SPRACE, [2620 instead of previous T2_BR_SPRACE
%MSG
%MSG-w XrdAdaptor:  MixingModule:mix 01-Oct-2021 16:16:20 CEST Run: 1 Event: 104
Data is now served from [2620 instead of previous T2_BR_SPRACE, [2620
%MSG
2021-10-01 16:16:35.849590: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
%MSG-e TkDetLayers:  LXXXCorrectorProducer:hltAK4PFRelativeCorrector  01-Oct-2021 16:16:37 CEST Run: 1 Event: 102
 ForwardDiskSectorBuilderFromDet: Trying to build Petal Wedge from Dets at different z positions !! Delta_z = -0.951241
%MSG
Begin processing the 5th record. Run 1, Event 105, LumiSection 2 on stream 2 at 01-Oct-2021 16:16:59.018 CEST
Begin processing the 6th record. Run 1, Event 106, LumiSection 2 on stream 3 at 01-Oct-2021 16:17:12.351 CEST
[2021-10-01 16:18:02.014870 +0200][Error  ][PostMaster        ] [[2620:6a:0:8421::4b]:31939 #0] Forcing error on disconnect: [ERROR] Operation interrupted.
Begin processing the 7th record. Run 1, Event 107, LumiSection 2 on stream 0 at 01-Oct-2021 16:18:02.284 CEST
Begin processing the 8th record. Run 1, Event 109, LumiSection 2 on stream 2 at 01-Oct-2021 16:19:02.335 CEST
Begin processing the 9th record. Run 1, Event 108, LumiSection 2 on stream 1 at 01-Oct-2021 16:19:16.281 CEST
Begin processing the 10th record. Run 1, Event 110, LumiSection 2 on stream 0 at 01-Oct-2021 16:19:40.436 CEST
Begin processing the 11th record. Run 1, Event 111, LumiSection 2 on stream 3 at 01-Oct-2021 16:19:51.297 CEST
Begin processing the 12th record. Run 1, Event 112, LumiSection 2 on stream 2 at 01-Oct-2021 16:20:37.233 CEST
Begin processing the 13th record. Run 1, Event 113, LumiSection 2 on stream 2 at 01-Oct-2021 16:21:58.613 CEST
Begin processing the 14th record. Run 1, Event 114, LumiSection 2 on stream 0 at 01-Oct-2021 16:22:12.964 CEST
Begin processing the 15th record. Run 1, Event 115, LumiSection 2 on stream 1 at 01-Oct-2021 16:22:24.424 CEST
%MSG-w Muon:  L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx  01-Oct-2021 16:23:07 CEST Run: 1 Event: 113
     Failed to load tracker track trajectory
%MSG
%MSG-w RecoMuon:  L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx  01-Oct-2021 16:23:07 CEST Run: 1 Event: 113
     Failed to load tracker track trajectory
%MSG
%MSG-w GlobalMuon:  L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx  01-Oct-2021 16:23:07 CEST Run: 1 Event: 113
     Failed to load tracker track trajectory
%MSG
%MSG-w GlobalTrajectoryBuilderBase:   L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx 01-Oct-2021 16:23:07 CEST  Run: 1 Event: 113
     Failed to load tracker track trajectory
%MSG
%MSG-w Muon:  L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx  01-Oct-2021 16:23:07 CEST Run: 1 Event: 113
     Failed to load tracker track trajectory
%MSG
%MSG-w RecoMuon:  L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx  01-Oct-2021 16:23:07 CEST Run: 1 Event: 113
     Failed to load tracker track trajectory
%MSG
%MSG-w GlobalMuon:  L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx  01-Oct-2021 16:23:07 CEST Run: 1 Event: 113
     Failed to load tracker track trajectory
%MSG
%MSG-w GlobalTrajectoryBuilderBase:   L3MuonProducer:hltL3MuonsIterL3OIScoutingNoVtx 01-Oct-2021 16:23:07 CEST  Run: 1 Event: 113
     Failed to load tracker track trajectory
%MSG
Begin processing the 16th record. Run 1, Event 116, LumiSection 2 on stream 2 at 01-Oct-2021 16:23:22.479 CEST
Begin processing the 17th record. Run 1, Event 118, LumiSection 2 on stream 3 at 01-Oct-2021 16:23:38.004 CEST
Begin processing the 18th record. Run 1, Event 117, LumiSection 2 on stream 0 at 01-Oct-2021 16:24:16.057 CEST
Begin processing the 19th record. Run 1, Event 120, LumiSection 2 on stream 2 at 01-Oct-2021 16:24:45.240 CEST
Begin processing the 20th record. Run 1, Event 119, LumiSection 2 on stream 1 at 01-Oct-2021 16:24:57.404 CEST
01-Oct-2021 16:28:24 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValTTbar_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/361e4fff-3935-4325-9891-62841915ccd1.root
01-Oct-2021 16:28:28 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root
01-Oct-2021 16:28:28 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/e9794f10-c40e-45ad-8677-0f3f7410d966.root
01-Oct-2021 16:28:28 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/5188ff57-8de1-4731-958d-0206eefb725e.root
01-Oct-2021 16:28:28 CEST  Closed file root://xrootd-cms.infn.it//store/relval/CMSSW_12_0_0_pre4/RelValMinBias_14TeV/GEN-SIM/120X_mcRun3_2021_realistic_v2-v1/00000/ace00a2c-259f-4ac1-b449-9fed3dc70e17.root

@makortel
Copy link
Contributor

makortel commented Jan 5, 2022

This issue could be the culprit for random failures in 11834.X step 2 workflows being followed up in #36336 (thanks to @swagata87 for making the connection).

@makortel
Copy link
Contributor

makortel commented Jan 5, 2022

is step2 reproducible?

Pileup mixing is not fully reproducible when run with multiple threads because the "assignment" of input GEN events to edm streams (MinBias files for mixing are processed per stream) is not deterministic.

@missirol
Copy link
Contributor

missirol commented Jan 24, 2022

+hlt

  • to the best of my knowledge, #36669 #36852 fixes this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants