New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive time spent in PrimaryVertexProducer in 200PU #17604
Comments
A new Issue was created by @Dr15Jones Chris Jones. @davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign upgrade |
New categories assigned: upgrade @kpedro88 you have been requested to review this Pull request/Issue and eventually sign? Thanks |
@bendavid also |
Thanks. I suspect this might be due to numerical issues on the KNL compared to Xeon. Josh found a nice solution to those that we may want to try. It should be cherry-pickable so we could test it very quickly. |
On 2/22/17 6:56 AM, Chris Jones wrote:
Using a 200 pileup based file, we are finding that the module labelled
unsortedOfflinePrimaryVertices4D, which is of type
PrimaryVertexProducer, can run for an excessively long time. Using
CMSSW_9_0_X_2017-02-21-1100 on a KNL machine the module ran for 4 hours
before the job was killed.
This is a problem since we want to run this workflow on NERSC to test
algorithmic performance but this algorithm makes the jobs too
prohibitively long to run.
For tests you can switch to D4 geometry layout for the existing release.
Next would be to fix the tails in unsortedOfflinePrimaryVertices4D
…
We were able to isolate the problem to this line
https://github.com/cms-sw/cmssw/blob/CMSSW_9_0_X/RecoVertex/PrimaryVertexProducer/src/DAClusterizerInZT.cc#L587
The problem is it takes way too long for |purge| to ever return false in
order to break out of that |while| loop
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#17604>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEdcbnKLbva1GsgvCbvRU1UlhImTzda3ks5rfEyEgaJpZM4MIvta>.
|
For reference, this discussion originated in #17556 |
I ran the same event on a Xeon system and the module completed in a few minutes. It does appear that the problem is related to a difference between the KNL and Xeon systems when running this algorithm. Numeric instability could definitely trigger such a problem. @lgray @bendavid could either of you point us to the code modification that was mentioned? |
Yes, I'll send you the branch in a moment. Unless Josh has already cherry
picked the appropriate commits.
…-L
On Feb 22, 2017 11:19 AM, "Chris Jones" ***@***.***> wrote:
I ran the same event on a Xeon system and the module completed in a few
minutes. It does appear that the problem is related to a difference between
the KNL and Xeon systems when running this algorithm. Numeric instability
could definitely trigger such a problem.
@lgray <https://github.com/lgray> @bendavid <https://github.com/bendavid>
could either of you point us to the code modification that was mentioned?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOdJ4V8uLljDNEkUumzYbCJFaA_06ks5rfG4JgaJpZM4MIvta>
.
|
On 2/22/17 9:19 AM, Chris Jones wrote:
I ran the same event on a Xeon system and the module completed in a few
minutes. It does appear that the problem is related to a difference
between the KNL and Xeon systems when running this algorithm. Numeric
instability could definitely trigger such a problem.
I've seen (at least an earlier version) getting stuck on intel (whatever
cmsdev02 was early last Fall)
… @lgray <https://github.com/lgray> @bendavid
<https://github.com/bendavid> could either of you point us to the code
modification that was mentioned?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17604 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEdcbuhRAIRKvnLM0qP0OeyAlrHMJzNiks5rfG4IgaJpZM4MIvta>.
|
I noticed that you use exp in one case and std::exp in another case. The unscoped one will call the C exp function. Is that what was intended? |
Hi,
It should be uniformly called everywhere, in case we want to sub-in the vdt
libraries.
…-Lindsey
On Wed, Feb 22, 2017 at 11:57 AM, Patrick Gartung ***@***.***> wrote:
https://github.com/cmssw/cmssw/blob/CMSSW_9_0_X/RecoVertex/
PrimaryVertexProducer/src/DAClusterizerInZT.cc#L217
I noticed that you use exp in one case and std::exp in another case. The
unscoped one will call the C exp function. Is that what was intended?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOQtgs0U5GgzVK6-_66GLzq4rWlPsks5rfHcXgaJpZM4MIvta>
.
|
Which CMSSW version are you presently using on the KNL system? (so I can
base this branch correctly)
…On Wed, Feb 22, 2017 at 11:58 AM, Patrick Gartung ***@***.***> wrote:
https://github.com/cms-sw/cmssw/blob/CMSSW_9_0_X/RecoVertex/
PrimaryVertexProducer/src/DAClusterizerInZT.cc#L217
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOfExlHNh90jP18jXqNmAxaUOb--1ks5rfHdegaJpZM4MIvta>
.
|
There are quite a few instances of unscoped exp function calls. |
CMSSW_9_0_X_2017-02-21-1100 |
Hi,
Could you please try the branch lgray:fix_2d_vertex_stability
Best,
-Lindsey
…On Wed, Feb 22, 2017 at 12:02 PM, Patrick Gartung ***@***.***> wrote:
CMSSW_9_0_X_2017-02-21-1100
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOQ14rMJrlBlbHU2lAg29fGnfpBfLks5rfHhSgaJpZM4MIvta>
.
|
There are also a few other simple things that could speed up the calculation. Move the following outside of the loop Change the loop to be a range based for since you never use Move Break out of the loop if Making this more memory access friendly would take more effort than these few changes. |
There is a version for 1D clustering that's been vectorized. I was
intending to use that at some point as a template to do the 2D clustering
as well (which should vectorize equally well). I believe in the end they
got a factor 2 out of it in speed.
…On Wed, Feb 22, 2017 at 12:31 PM, Chris Jones ***@***.***> wrote:
There are also a few other simple things that could speed up the
calculation.
Move the following outside of the loop
https://cmssdt.cern.ch/lxr/source/RecoVertex/PrimaryVertexProducer/src/
DAClusterizerInZT.cc?v=CMSSW_9_0_X_2017-02-22-0000#0100
Change the loop to be a range based for since you never use i for
anything but tks[i].
https://cmssdt.cern.ch/lxr/source/RecoVertex/PrimaryVertexProducer/src/
DAClusterizerInZT.cc?v=CMSSW_9_0_X_2017-02-22-0000#0097
Move rho0*exp(-beta*dzCutOff_*dzCutOff_) to outside the loop.
https://cmssdt.cern.ch/lxr/source/RecoVertex/PrimaryVertexProducer/src/
DAClusterizerInZT.cc?v=CMSSW_9_0_X_2017-02-22-0000#0217
Break out of the loop if ++nUnique >= 2 since k0 will not change after
that happens.
https://cmssdt.cern.ch/lxr/source/RecoVertex/PrimaryVertexProducer/src/
DAClusterizerInZT.cc?v=CMSSW_9_0_X_2017-02-22-0000#0222
Making this more memory access friendly would take more effort than these
few changes.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOeI6nwr6Hz8P0TEKeArAwavZJRFZks5rfH71gaJpZM4MIvta>
.
|
Ran with just 2 events, one being a problematic event. It too over an hour to complete even with the patch.
TimeReport 0.000167 0.000167 0.000167 uncleanedOnlyMulti5x5SuperClustersWithPreshower
TimeReport 0.000119 0.000119 0.000119 uncleanedOnlyOutInConversionTrackProducer
TimeReport 0.001844 0.001844 0.001844 uncleanedOnlyPfConversions
TimeReport 0.429079 0.429079 0.429079 uncleanedOnlyPfTrack
TimeReport 0.000198 0.000198 0.000198 uncleanedOnlyPfTrackElec
TimeReport 96.963743 96.963743 96.963743 unsortedOfflinePrimaryVertices
TimeReport 21.858344 21.858344 21.858344 unsortedOfflinePrimaryVertices1D
TimeReport 4654.898395 4654.898395 4654.898395 unsortedOfflinePrimaryVertices4D
TimeReport 0.022437 0.022437 0.022437 vertexMerger
TimeReport 0.000000 0.000000 0.000000 zdcreco
TimeReport per event per exec per visit Name
T---Report end!
++ finished: end job
MemoryReport> Peak virtual size 5019.48 Mbytes
Key events increasing vsize:
[0] run: 0 lumi: 0 event: 0 vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[0] run: 0 lumi: 0 event: 0 vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[1] run: 1 lumi: 1 event: 127 vsize = 5019.48 deltaVsize = 0 rss = 4501.61 delta = 0
[0] run: 0 lumi: 0 event: 0 vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[0] run: 0 lumi: 0 event: 0 vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[0] run: 0 lumi: 0 event: 0 vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[2] run: 1 lumi: 1 event: 125 vsize = 5019.48 deltaVsize = 0 rss = 3902.14 delta = -599.469
[1] run: 1 lumi: 1 event: 127 vsize = 5019.48 deltaVsize = 0 rss = 4501.61 delta = 0
TimeReport> Time report complete in 9773.01 seconds
Time Summary:
- Min event: 3728.18
- Max event: 8907.42
- Avg event: 6317.8
- Total loop: 9769.8
- Total job: 9773.01
Event Throughput: 0.000204712 ev/s
CPU Summary:
- Total loop: 17734.8
- Total job: 17737.9
=============================================
From: Lindsey Gray <notifications@github.com>
Reply-To: cms-sw/cmssw <reply@reply.github.com>
Date: Wednesday, February 22, 2017 at 12:35 PM
To: cms-sw/cmssw <cmssw@noreply.github.com>
Cc: Patrick E Gartung <gartung@fnal.gov>, Mention <mention@noreply.github.com>
Subject: Re: [cms-sw/cmssw] Excessive time spent in PrimaryVertexProducer in 200PU (#17604)
There is a version for 1D clustering that's been vectorized. I was
intending to use that at some point as a template to do the 2D clustering
as well (which should vectorize equally well). I believe in the end they
got a factor 2 out of it in speed.
On Wed, Feb 22, 2017 at 12:31 PM, Chris Jones ***@***.***> wrote:
There are also a few other simple things that could speed up the
calculation.
Move the following outside of the loop
https://cmssdt.cern.ch/lxr/source/RecoVertex/PrimaryVertexProducer/src/
DAClusterizerInZT.cc?v=CMSSW_9_0_X_2017-02-22-0000#0100
Change the loop to be a range based for since you never use i for
anything but tks[i].
https://cmssdt.cern.ch/lxr/source/RecoVertex/PrimaryVertexProducer/src/
DAClusterizerInZT.cc?v=CMSSW_9_0_X_2017-02-22-0000#0097
Move rho0*exp(-beta*dzCutOff_*dzCutOff_) to outside the loop.
https://cmssdt.cern.ch/lxr/source/RecoVertex/PrimaryVertexProducer/src/
DAClusterizerInZT.cc?v=CMSSW_9_0_X_2017-02-22-0000#0217
Break out of the loop if ++nUnique >= 2 since k0 will not change after
that happens.
https://cmssdt.cern.ch/lxr/source/RecoVertex/PrimaryVertexProducer/src/
DAClusterizerInZT.cc?v=CMSSW_9_0_X_2017-02-22-0000#0222
Making this more memory access friendly would take more effort than these
few changes.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOeI6nwr6Hz8P0TEKeArAwavZJRFZks5rfH71gaJpZM4MIvta>
.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#17604 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AEF-WIoDGFFaBYOmxLWUu2xGRygPcA2Vks5rfIADgaJpZM4MIvta>.
|
Trying this with vdt::fast_exp in place of std::exp. |
Sadly it made it worse TimeReport 97.496880 97.496880 97.496880 unsortedOfflinePrimaryVertices T---Report end! ++ finished: end job
============================================= |
For reference, on the Xeon system the module took 869 seconds to complete that event. Given the speed differences between the Xeon and KNL I'd say they are roughly the same performance. |
If you add in my commits from #17622
you should see an even better improvement.
The same issue was present in the vectorized code.
…On Thu, Feb 23, 2017 at 3:10 PM, Patrick Gartung ***@***.***> wrote:
64s/64t 128 events on KNL. Much better...
TimeReport 0.303056 0.303056 0.303056 uncleanedOnlyPfTrackElec
TimeReport 114.332794 114.332794 114.332794 unsortedOfflinePrimaryVertices
TimeReport 23.326733 23.326733 23.326733 unsortedOfflinePrimaryVertices1D
TimeReport 3.480172 3.480172 3.480172 unsortedOfflinePrimaryVertices4D
TimeReport 0.087038 0.087038 0.087038 vertexMerger
TimeReport 0.000000 0.000000 0.000000 zdcreco
TimeReport per event per exec per visit Name
T---Report end!
MemoryReport> Peak virtual size 110867 Mbytes
Key events increasing vsize:
[6] run: 1 lumi: 1 event: 12 vsize = 57902.4 deltaVsize = 1591 rss = 42653
delta = 1129.43
[4] run: 1 lumi: 1 event: 26 vsize = 55452.2 deltaVsize = 2076.75 rss =
41523.5 delta = 1572.27
[8] run: 1 lumi: 1 event: 42 vsize = 60788.4 deltaVsize = 2886 rss =
44623.2 delta = 3099.68
[12] run: 1 lumi: 1 event: 17 vsize = 68068.3 deltaVsize = 2743.5 rss =
50991.2 delta = 6368.01
[10] run: 1 lumi: 1 event: 63 vsize = 64189.1 deltaVsize = 2296.86 rss =
48034.1 delta = 3410.86
[127] run: 1 lumi: 1 event: 123 vsize = 110867 deltaVsize = 0 rss =
15993.3 delta = -2678.49
[126] run: 1 lumi: 1 event: 113 vsize = 110867 deltaVsize = 0 rss =
17400.2 delta = -1271.62
[125] run: 1 lumi: 1 event: 127 vsize = 110867 deltaVsize = 0.25 rss =
18671.8 delta = -4313.08
TimeReport> Time report complete in 7727.02 seconds
Time Summary:
- Min event: 1051.5
- Max event: 4408.94
- Avg event: 2646.23
- Total loop: 7711.99
- Total job: 7727.02
Event Throughput: 0.0165975 ev/s
CPU Summary:
- Total loop: 271172
- Total job: 271187
StallMonitor> Module label # of stalls Total stalled time Max stalled time
StallMonitor> ------------ ----------- ------------------ ----------------
StallMonitor> AODSIMoutput 122 91798 s 1121.46 s
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOaGbT10E8FvFru1usftJPZfyFTRVks5rffXRgaJpZM4MIvta>
.
|
XEON with #17622 TimeReport 18.831715 18.831715 18.831715 unsortedOfflinePrimaryVertices T---Report end! ++ finished: end job
|
Using range-for statements where possible might improve #17622 slightly. |
With #17622 on Xeon
With the old algorithm on Xeon
with Patrick's full changes (which include changing to range based for loops)
So clearly the range based for loops are an essential component. |
So the improvement comes from the range fors and not the loop clipping?
Somehow that doesn't strike me as true.
…On Thu, Feb 23, 2017 at 3:57 PM, Patrick Gartung ***@***.***> wrote:
Using range-for statements where possible might improve #17622
<#17622> slightly.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOa3uWhaCs5EjLIv1FLyYU0iWdeZ3ks5rfgDAgaJpZM4MIvta>
.
|
The range based for may make it much easier for the optimizer |
I am double checking the test with no modifications now. It is possible the the compiler knows how to vectorize the range-for statement. |
Jesus take the wheel...
…On Thu, Feb 23, 2017 at 4:04 PM, Patrick Gartung ***@***.***> wrote:
I am double checking the test with no modifications now.
It is possible the the compiler knows how to vectorize the range-for
statement.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOVXuNT0fX-RbdYArkP_u6C1cUZKiks5rfgJxgaJpZM4MIvta>
.
|
The loop clipping happened in |
OK There's also a bug. Zi is never reset per track, when it should be.
https://github.com/gartung/cmssw/blob/b061093a4212690088b9f4273223da6fcc9991c1/RecoVertex/PrimaryVertexProducer/src/DAClusterizerInZT.cc#L95
…On Thu, Feb 23, 2017 at 4:07 PM, Chris Jones ***@***.***> wrote:
The loop clipping happened in purge but all debug and profiling we did
said the algorithm was spending the absolute majority of its times in
update.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMORvyEjTkcJUP6mBexjwRPRi9nJ5aks5rfgMzgaJpZM4MIvta>
.
|
Updated #17622, moving making sure to move the initial value of Zi out of
the loop and to reset it each time to the correct value.
Please give it a try.
This was already done in the vectorized version of the code. So no speed up
expected there.
On Thu, Feb 23, 2017 at 4:11 PM, Lindsey Gray <lindsey.gray@gmail.com>
wrote:
… OK There's also a bug. Zi is never reset per track, when it should be.
https://github.com/gartung/cmssw/blob/b061093a4212690088b9f4273223da
6fcc9991c1/RecoVertex/PrimaryVertexProducer/src/DAClusterizerInZT.cc#L95
On Thu, Feb 23, 2017 at 4:07 PM, Chris Jones ***@***.***>
wrote:
> The loop clipping happened in purge but all debug and profiling we did
> said the algorithm was spending the absolute majority of its times in
> update.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#17604 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABBMORvyEjTkcJUP6mBexjwRPRi9nJ5aks5rfgMzgaJpZM4MIvta>
> .
>
|
The clang-modernize extension can change for loops to range-for loops automatically when appropriate, ie when the index value is not referenced. |
Results are the same as before with your branch. I can only speculate the I created a vectorized for loop somewhere. I will look at the difference in assembly generated by the compiler for your file and mine. TimeReport 18.140637 18.140637 18.140637 unsortedOfflinePrimaryVertices T---Report end! ++ finished: end job
|
Oh well, speed improvement was a bug.
…On Feb 23, 2017 18:31, "Patrick Gartung" ***@***.***> wrote:
Results are the same are before with your branch. I can only speculate the
I created a vectorized for loop somewhere. I will look at the difference in
assembly generated by the compiler for your file and mine.
TimeReport 18.140637 18.140637 18.140637 unsortedOfflinePrimaryVertices
TimeReport 4.118666 4.118666 4.118666 unsortedOfflinePrimaryVertices1D
TimeReport 519.004210 519.004210 519.004210 unsortedOfflinePrimaryVertices
4D
TimeReport 0.005757 0.005757 0.005757 vertexMerger
TimeReport 0.000000 0.000000 0.000000 zdcreco
TimeReport per event per exec per visit Name
T---Report end!
++ finished: end job
MemoryReport> Peak virtual size 5017.57 Mbytes
Key events increasing vsize:
[0] run: 0 lumi: 0 event: 0 vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[0] run: 0 lumi: 0 event: 0 vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[1] run: 1 lumi: 1 event: 125 vsize = 5017.57 deltaVsize = 0 rss = 4326.28
delta = 0
[0] run: 0 lumi: 0 event: 0 vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[0] run: 0 lumi: 0 event: 0 vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[0] run: 0 lumi: 0 event: 0 vsize = 0 deltaVsize = 0 rss = 0 delta = 0
[2] run: 1 lumi: 1 event: 127 vsize = 5017.57 deltaVsize = 0 rss = 3463.34
delta = -862.934
[1] run: 1 lumi: 1 event: 125 vsize = 5017.57 deltaVsize = 0 rss = 4326.28
delta = 0
TimeReport> Time report complete in 1276.2 seconds
Time Summary:
- Min event: 731.399
- Max event: 1001.63
- Avg event: 866.514
- Total loop: 1275.09
- Total job: 1276.2
Event Throughput: 0.00156851 ev/s
CPU Summary:
- Total loop: 1963.84
- Total job: 1964.95
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOQhvSCC3eA0rMYd0Y5OALDgt-4ulks5rfiTbgaJpZM4MIvta>
.
|
KNL 64/64 128 events TimeReport 109.267564 109.267564 109.267564 unsortedOfflinePrimaryVertices T---Report end! MemoryReport> Peak virtual size 116593 Mbytes
StallMonitor> Module label # of stalls Total stalled time Max stalled time |
Fixing the bug on my branch and running on XEON performance is worse than original. TimeReport 18.888834 18.888834 18.888834 unsortedOfflinePrimaryVertices T---Report end! ++ finished: end job
|
At this point I'd have to look in detail at what is going on with the
convergence of the algorithm.
If it turns out that a large portion of the vertices are staying the same
iteration to iteration in these slowly converging events, then caching
might be a way around it (if vertex has been altered or is new recalculate,
else use cached value), since the energy of the system is defined by the
vertex state vector and not the tracks (which are immutable anyway).
I can give it a try but a fix may take some time.
…On Fri, Feb 24, 2017 at 7:45 AM, Patrick Gartung ***@***.***> wrote:
KNL 64/64 128 events
One event lasted 10 hours. Logs and stall graph on cmslpc at
/uscms_data/d2/gartung/132414.tev.fnal.gov
TimeReport 109.267564 109.267564 109.267564 unsortedOfflinePrimaryVertices
TimeReport 22.353710 22.353710 22.353710 unsortedOfflinePrimaryVertices1D
TimeReport 1181.681153 1181.681153 1181.681153
unsortedOfflinePrimaryVertices4D
TimeReport 0.086290 0.086290 0.086290 vertexMerger
TimeReport 0.000000 0.000000 0.000000 zdcreco
TimeReport per event per exec per visit Name
T---Report end!
MemoryReport> Peak virtual size 116593 Mbytes
Key events increasing vsize:
[2] run: 1 lumi: 1 event: 17 vsize = 61521.3 deltaVsize = 1399.5 rss =
48639 delta = 1183.66
[3] run: 1 lumi: 1 event: 35 vsize = 64226.8 deltaVsize = 2705.5 rss =
50732.7 delta = 2093.63
[4] run: 1 lumi: 1 event: 63 vsize = 72491.3 deltaVsize = 8264.5 rss =
58224.6 delta = 7491.92
[12] run: 1 lumi: 1 event: 31 vsize = 82373.5 deltaVsize = 1910.25 rss =
62308.3 delta = 4083.73
[7] run: 1 lumi: 1 event: 49 vsize = 77393.3 deltaVsize = 1731.75 rss =
61224.6 delta = 3000.04
[122] run: 1 lumi: 1 event: 112 vsize = 116592 deltaVsize = 0.25 rss =
22231.7 delta = -5034.11
[128] run: 1 lumi: 1 event: 6 vsize = 116593 deltaVsize = 0 rss = 23259.9
delta = 5748.87
[127] run: 1 lumi: 1 event: 109 vsize = 116593 deltaVsize = 0.25 rss =
17511 delta = -4720.7
TimeReport> Time report complete in 30722.7 seconds
Time Summary:
- Min event: 1300.36
- Max event: 29809.4
- Avg event: 3211.52
- Total loop: 30703.7
- Total job: 30722.7
Event Throughput: 0.00416888 ev/s
CPU Summary:
- Total loop: 436942
- Total job: 436961
StallMonitor> Module label # of stalls Total stalled time Max stalled time
StallMonitor> ------------ ----------- ------------------ ----------------
StallMonitor> AODSIMoutput 102
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17604 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBMOR6u1gkxyhRlmM3nPqMxSPisd0FNks5rft7zgaJpZM4MIvta>
.
|
I also ran variations of the implementation last night.
I them compared the values in the output ROOT file for the offlinePrimaryVertices4D branch (since that branch uses as input the results from unsortedOfflinePrimaryVertices4D). The original, 1. and 2. versions of the algorithm all gave different results. I expected the original to be different (because of the change in combining |
I think one of the problems is the behavior of That just compounds the problem where a job with a large number of vertices takes longer to calculate each call to |
As a test, I changed the return value of The returned result had 8 additional vertices but the distributions looked to be pretty much the same. |
As one last test, for all cases where there was a comparison TimeReport 1208.423231 1208.423231 1208.423231 unsortedOfflinePrimaryVertices4D |
+1 |
This issue is fully signed and ready to be closed. |
resolved |
Using a 200 pileup based file, we are finding that the module labelled unsortedOfflinePrimaryVertices4D, which is of type PrimaryVertexProducer, can run for an excessively long time. Using CMSSW_9_0_X_2017-02-21-1100 on a KNL machine the module ran for 4 hours before the job was killed.
This is a problem since we want to run this workflow on NERSC to test algorithmic performance but this algorithm makes the jobs too prohibitively long to run.
We were able to isolate the problem to this line
https://github.com/cms-sw/cmssw/blob/CMSSW_9_0_X/RecoVertex/PrimaryVertexProducer/src/DAClusterizerInZT.cc#L587
The problem is it takes way too long for
purge
to ever return false in order to break out of thatwhile
loopThe text was updated successfully, but these errors were encountered: