Revert Provenance Prefetching #17556

Dr15Jones · 2017-02-17T22:40:26Z

When running on very many threads it appears that the framework sometimes thinks the prefetching for the PoolOutputModule never finishes and therefore the module is never run. Until the problem is found, we need to not do the prefetching.

The problem was seen when running on KNL for 48 or 64 threads. Reverting only this part avoids large recompilation and allows the fix to be added with only minor recompilation later.

When running on very many threads it appears that the framework sometimes thinks the prefetching for the PoolOutputModule never finishes and therefore the module is never run. Until the problem is found, we need to not do the prefetching.

cmsbuild · 2017-02-17T22:40:44Z

A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_9_0_X.

It involves the following packages:

FWCore/Integration
IOPool/Output

@cmsbuild, @smuzaffar, @Dr15Jones, @davidlange6 can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @wddgit, @wmtan this is something you requested to watch as well.
@davidlange6, @smuzaffar you are the release manager for this.

cms-bot commands are listed here #13028

Dr15Jones · 2017-02-17T22:40:51Z

@davidlange6 this needs to be in for pre5 to avoid problems using the release on Cori.

Dr15Jones · 2017-02-17T22:44:02Z

please test

Dr15Jones · 2017-02-17T22:44:07Z

+1

cmsbuild · 2017-02-17T22:44:17Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/17852/console Started: 2017/02/17 23:44

cmsbuild · 2017-02-17T22:44:18Z

This pull request is fully signed and it will be integrated in one of the next CMSSW_9_0_X IBs after it passes the integration tests. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @smuzaffar

cmsbuild · 2017-02-17T23:45:12Z

+1
Tested at: 160b21b
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-17556/17852/summary.html

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
a2fc9ea
62c3f26
cde1e04
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-17556/17852/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-17556/17852/git-merge-result

cmsbuild · 2017-02-17T23:45:16Z

Comparison job queued.

cmsbuild · 2017-02-18T01:10:57Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-17556/17852/summary.html

davidlange6 · 2017-02-18T08:35:09Z

+1

Dr15Jones · 2017-02-18T14:55:14Z

@davidlange6 new information about this. The problem does not appear to be with this pull request but appears to only trigger the problem. The PoolOutputModule doesn't run because the module unsortedOfflinePrimaryVertices4D of type PrimaryVertexProducer starts on the stream but never stops.

Dr15Jones · 2017-02-18T15:46:38Z

@lgray The problem appears to be in
http://cmslxr.fnal.gov/source/RecoVertex/PrimaryVertexProducer/src/DAClusterizerInZT.cc?v=CMSSW_9_0_0_pre4#0505

Our 200 pileup job appears to be stuck in this routine (or one it calls) for hours. The tracebacks we get after that time which have that module are

#4  0x00002afd99e341af in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341af in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a9ab in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e5fd38 in DAClusterizerInZ_vect::purge(DAClusterizerInZ_vect::vertex_t&, DAClusterizerInZ_vect::track_t&, double&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e63411 in DAClusterizerInZ_vect::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e654d5 in DAClusterizerInZ_vect::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341b4 in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341af in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e34375 in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a522 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e5e587 in DAClusterizerInZ_vect::update(double, DAClusterizerInZ_vect::track_t&, DAClusterizerInZ_vect::vertex_t&, bool, double&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e63183 in DAClusterizerInZ_vect::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e654d5 in DAClusterizerInZ_vect::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e353b1 in DAClusterizerInZT::purge(std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, double&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a911 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341af in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e353b1 in DAClusterizerInZT::purge(std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, double&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a911 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afcef478c81 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a522 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afcef478c69 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a522 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afcef478b04 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e342ca in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341b4 in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e343db in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afcef478c69 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e3436c in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e341b4 in DAClusterizerInZT::e_ik(DAClusterizerInZT::track_t const&, DAClusterizerInZT::vertex_t const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e342cf in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e5e550 in DAClusterizerInZ_vect::update(double, DAClusterizerInZ_vect::track_t&, DAClusterizerInZ_vect::vertex_t&, bool, double&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e6343f in DAClusterizerInZ_vect::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e654d5 in DAClusterizerInZ_vect::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so

#4  0x00002afcef478ca9 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afcef478b00 in __ieee754_exp_avx () from /lib64/libm.so.6
#5  0x00002afd99e342da in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3aa14 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#8  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so


#4  0x00002afd99e342d5 in DAClusterizerInZT::update(double, std::vector<DAClusterizerInZT::track_t, std::allocator<DAClusterizerInZT::track_t> >&, std::vector<DAClusterizerInZT::vertex_t, std::allocator<DAClusterizerInZT::vertex_t> >&, double) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#5  0x00002afd99e3a940 in DAClusterizerInZT::vertices(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, int) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#6  0x00002afd99e3ba04 in DAClusterizerInZT::clusterize(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&) const () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/libRecoVertexPrimaryVertexProducer.so
#7  0x00002afd99e0f388 in PrimaryVertexProducer::produce(edm::Event&, edm::EventSetup const&) () from /scratch/gartung/CMSSW_9_0_0_pre4/lib/slc7_amd64_gcc530/pluginRecoVertexPrimaryVertexProducerPlugins.so

lgray · 2017-02-18T16:00:32Z

@Dr15Jones OK - interesting. I think this can be solved by introducing a max-iterations cut, since simulated annealing converges smoothly towards the end of the cooling process.

Dr15Jones · 2017-02-18T17:20:19Z

@gartung the two jobs I looked at event number 10 (which is the 7th in the input file). Could you point people to the input file used as well as the configuration file?

gartung · 2017-02-18T17:25:23Z

On cmslpc

/eos/uscms/store/user/gartung/step2/pu200/step2.root
/uscms_data/d2/gartung/tev/step3_RAW2DIGI_L1Reco_RECO_PU_64_64.py

Dr15Jones · 2017-02-18T20:42:27Z

@davidlange6 could we revert this change since we discovered the problem has nothing to do with this pull request? It does appear that this change does have a significant impact on the threading efficiency for large numbers of threads.

Dr15Jones · 2017-02-18T21:54:17Z

I ran the job for 10 events (using 4 threads and 4 streams) on a standard Xeon system (cmslpc27) using the release CMSSW_9_0_X_2017-02-17-2300 (which has the prefetching) and in CMSSW_9_0_X_2017-02-18-1100 (which doesn't have the prefetching). Both ran to completion just fine, processing one of the events which had gotten stuck on the KNL system. While running those versions I noticed that the number of the modules was different so the configuration did change slightly. This leads me to conclude that

the problem could be in how we build the code we are using, or
the problem could be in how the KNL vs the Xeon cores processes (e.g. math functions), or
the change in the configuration avoids the problem.

In all of these the prefetching has no role in the problem.

Dr15Jones · 2017-02-18T22:34:55Z

I've now also run the configuration using CMSSW_9_0_0_pre4 with 4 threads/streams on a Xeon machine. The job finishes just fine.

An interesting note is vanilla pre4 also has a different module numbering scheme than the KNL test.

Dr15Jones · 2017-02-18T22:56:29Z

#17564 reinstates the prefetching

Dr15Jones · 2017-02-21T19:52:48Z

@lgray I was able to watch the job in the debugger. The code isn't in an infinite loop, it is just in an incredibly slowly converging loop (i.e. 8+ hours for one event). The job is 'stuck' in the purge while loop
http://cmslxr.fnal.gov/source/RecoVertex/PrimaryVertexProducer/src/DAClusterizerInZT.cc?v=CMSSW_9_0_0_pre4#0587
When I checked, tks.size() = 1480 and y.size() = 1172. Then after greater than 5 minutes y.size() = 1168.

gartung · 2017-02-21T20:00:47Z

Running with CMSSW_9_0_X_2017-02-21-1100 over the first 10 events the job completed in a reasonable amount of time. I am trying now with 320 events.

lgray · 2017-02-21T20:04:17Z

Hi Chris, We also found some numerical issues that might be related to this. Best, -Lindsey

…

On Tue, Feb 21, 2017 at 1:52 PM, Chris Jones ***@***.***> wrote: @lgray <https://github.com/lgray> I was able to watch the job in the debugger. The code isn't in an infinite loop, it is just in an incredibly slowly converging loop (i.e. 8+ hours for one event). The job is 'stuck' in the purge while loop http://cmslxr.fnal.gov/source/RecoVertex/PrimaryVertexProducer/src/ DAClusterizerInZT.cc?v=CMSSW_9_0_0_pre4#0587 When I checked, tks.size() = 1480 and y.size() = 1172. Then after greater than 5 minutes y.size() = 1168. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17556 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABBMOf0MxzRqFc3GUNBA8INtK0L5LL6xks5re0CSgaJpZM4ME1uW> .

Revert Provenance Prefetching

160b21b

When running on very many threads it appears that the framework sometimes thinks the prefetching for the PoolOutputModule never finishes and therefore the module is never run. Until the problem is found, we need to not do the prefetching.

cmsbuild added this to the Next CMSSW_9_0_X milestone Feb 17, 2017

cmsbuild added comparison-pending core-pending orp-pending pending-signatures tests-pending labels Feb 17, 2017

cmsbuild added core-approved fully-signed and removed core-pending pending-signatures labels Feb 17, 2017

cmsbuild added tests-started and removed tests-pending labels Feb 17, 2017

cmsbuild added tests-approved and removed tests-started labels Feb 17, 2017

cmsbuild added comparison-available and removed comparison-pending labels Feb 18, 2017

cmsbuild added orp-approved and removed orp-pending labels Feb 18, 2017

cmsbuild merged commit c00c4e8 into cms-sw:CMSSW_9_0_X Feb 18, 2017

Dr15Jones deleted the revertProveancePrefetch branch February 20, 2017 18:54

Dr15Jones mentioned this pull request Feb 22, 2017

Excessive time spent in PrimaryVertexProducer in 200PU #17604

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert Provenance Prefetching #17556

Revert Provenance Prefetching #17556

Dr15Jones commented Feb 17, 2017

cmsbuild commented Feb 17, 2017

Dr15Jones commented Feb 17, 2017

Dr15Jones commented Feb 17, 2017

Dr15Jones commented Feb 17, 2017

cmsbuild commented Feb 17, 2017 •

edited

cmsbuild commented Feb 17, 2017

cmsbuild commented Feb 17, 2017

cmsbuild commented Feb 17, 2017

cmsbuild commented Feb 18, 2017

davidlange6 commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017

lgray commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017

gartung commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017 •

edited

Dr15Jones commented Feb 18, 2017

Dr15Jones commented Feb 21, 2017

gartung commented Feb 21, 2017

lgray commented Feb 21, 2017 via email

Revert Provenance Prefetching #17556

Revert Provenance Prefetching #17556

Conversation

Dr15Jones commented Feb 17, 2017

cmsbuild commented Feb 17, 2017

Dr15Jones commented Feb 17, 2017

Dr15Jones commented Feb 17, 2017

Dr15Jones commented Feb 17, 2017

cmsbuild commented Feb 17, 2017 • edited

cmsbuild commented Feb 17, 2017

cmsbuild commented Feb 17, 2017

cmsbuild commented Feb 17, 2017

cmsbuild commented Feb 18, 2017

davidlange6 commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017

lgray commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017

gartung commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017

Dr15Jones commented Feb 18, 2017 • edited

Dr15Jones commented Feb 18, 2017

Dr15Jones commented Feb 21, 2017

gartung commented Feb 21, 2017

lgray commented Feb 21, 2017 via email

cmsbuild commented Feb 17, 2017 •

edited

Dr15Jones commented Feb 18, 2017 •

edited