Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert Provenance Prefetching #17556

Merged
merged 1 commit into from Feb 18, 2017

Conversation

Dr15Jones
Copy link
Contributor

When running on very many threads it appears that the framework sometimes thinks the prefetching for the PoolOutputModule never finishes and therefore the module is never run. Until the problem is found, we need to not do the prefetching.

The problem was seen when running on KNL for 48 or 64 threads. Reverting only this part avoids large recompilation and allows the fix to be added with only minor recompilation later.

When running on very many threads it appears that the framework sometimes thinks the prefetching for the PoolOutputModule never finishes and therefore the module is never run. Until the problem is found, we need to not do the prefetching.
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_9_0_X.

It involves the following packages:

FWCore/Integration
IOPool/Output

@cmsbuild, @smuzaffar, @Dr15Jones, @davidlange6 can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @wddgit, @wmtan this is something you requested to watch as well.
@davidlange6, @smuzaffar you are the release manager for this.

cms-bot commands are listed here #13028

@Dr15Jones
Copy link
Contributor Author

@davidlange6 this needs to be in for pre5 to avoid problems using the release on Cori.

@Dr15Jones
Copy link
Contributor Author

please test

@Dr15Jones
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 17, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/17852/console Started: 2017/02/17 23:44

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_9_0_X IBs after it passes the integration tests. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @smuzaffar

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

@davidlange6
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit c00c4e8 into cms-sw:CMSSW_9_0_X Feb 18, 2017
@Dr15Jones
Copy link
Contributor Author

@davidlange6 new information about this. The problem does not appear to be with this pull request but appears to only trigger the problem. The PoolOutputModule doesn't run because the module unsortedOfflinePrimaryVertices4D of type PrimaryVertexProducer starts on the stream but never stops.

@Dr15Jones
Copy link
Contributor Author

@lgray The problem appears to be in
http://cmslxr.fnal.gov/source/RecoVertex/PrimaryVertexProducer/src/DAClusterizerInZT.cc?v=CMSSW_9_0_0_pre4#0505

Our 200 pileup job appears to be stuck in this routine (or one it calls) for hours. The tracebacks we get after that time which have that module are

#4  0x00002afd99e341af in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341af in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a9ab in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e5fd38 in DAClusterizerInZ_vect::purge(DAClusterizerInZ_vect::vertex_t&, DAClusterizerInZ_vect::track_t&, double&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e63411 in DAClusterizerInZ_vect::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e654d5 in DAClusterizerInZ_vect::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341b4 in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341af in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e34375 in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a522 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e5e587 in DAClusterizerInZ_vect::update(double, DAClusterizerInZ_vect::track_t&, DAClusterizerInZ_vect::vertex_t&, bool, double&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e63183 in DAClusterizerInZ_vect::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e654d5 in DAClusterizerInZ_vect::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e353b1 in DAClusterizerInZT::purge(std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, double&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a911 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341af in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e353b1 in DAClusterizerInZT::purge(std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, double&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a911 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afcef478c81 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a522 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afcef478c69 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a522 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afcef478b04 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e342ca in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341b4 in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e343db in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afcef478c69 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e3436c in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341b4 in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e5e550 in DAClusterizerInZ_vect::update(double, DAClusterizerInZ_vect::track_t&, DAClusterizerInZ_vect::vertex_t&, bool, double&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e6343f in DAClusterizerInZ_vect::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e654d5 in DAClusterizerInZ_vect::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so

#4  0x00002afcef478ca9 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afcef478b00 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e342d5 in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so

@lgray
Copy link
Contributor

lgray commented Feb 18, 2017

@Dr15Jones OK - interesting. I think this can be solved by introducing a max-iterations cut, since simulated annealing converges smoothly towards the end of the cooling process.

@Dr15Jones
Copy link
Contributor Author

@gartung the two jobs I looked at event number 10 (which is the 7th in the input file). Could you point people to the input file used as well as the configuration file?

@gartung
Copy link
Member

gartung commented Feb 18, 2017

On cmslpc

/eos/uscms/store/user/gartung/step2/pu200/step2.root
/uscms_data/d2/gartung/tev/step3_RAW2DIGI_L1Reco_RECO_PU_64_64.py

@Dr15Jones
Copy link
Contributor Author

@davidlange6 could we revert this change since we discovered the problem has nothing to do with this pull request? It does appear that this change does have a significant impact on the threading efficiency for large numbers of threads.

@Dr15Jones
Copy link
Contributor Author

I ran the job for 10 events (using 4 threads and 4 streams) on a standard Xeon system (cmslpc27) using the release CMSSW_9_0_X_2017-02-17-2300 (which has the prefetching) and in CMSSW_9_0_X_2017-02-18-1100 (which doesn't have the prefetching). Both ran to completion just fine, processing one of the events which had gotten stuck on the KNL system. While running those versions I noticed that the number of the modules was different so the configuration did change slightly. This leads me to conclude that

  1. the problem could be in how we build the code we are using, or
  2. the problem could be in how the KNL vs the Xeon cores processes (e.g. math functions), or
  3. the change in the configuration avoids the problem.

In all of these the prefetching has no role in the problem.

@Dr15Jones
Copy link
Contributor Author

Dr15Jones commented Feb 18, 2017

I've now also run the configuration using CMSSW_9_0_0_pre4 with 4 threads/streams on a Xeon machine. The job finishes just fine.

An interesting note is vanilla pre4 also has a different module numbering scheme than the KNL test.

@Dr15Jones
Copy link
Contributor Author

#17564 reinstates the prefetching

@Dr15Jones Dr15Jones deleted the revertProveancePrefetch branch February 20, 2017 18:54
@Dr15Jones
Copy link
Contributor Author

@lgray I was able to watch the job in the debugger. The code isn't in an infinite loop, it is just in an incredibly slowly converging loop (i.e. 8+ hours for one event). The job is 'stuck' in the purge while loop
http://cmslxr.fnal.gov/source/RecoVertex/PrimaryVertexProducer/src/DAClusterizerInZT.cc?v=CMSSW_9_0_0_pre4#0587
When I checked, tks.size() = 1480 and y.size() = 1172. Then after greater than 5 minutes y.size() = 1168.

@gartung
Copy link
Member

gartung commented Feb 21, 2017

Running with CMSSW_9_0_X_2017-02-21-1100 over the first 10 events the job completed in a reasonable amount of time. I am trying now with 320 events.

@lgray
Copy link
Contributor

lgray commented Feb 21, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants