HPX performance degrades with time since execution begins #1753

dmarce1 · 2015-09-17T16:04:41Z

In the runs where octotiger runs for a bit before crashing or locking up, I am noticing that the performance gets worse as time progresses.

Example: http://s7.postimg.org/5x1kah1wb/degrade.png

This seems to be the case for every run I have looked at, each with 100 or 128 processors on SuperMIC. There is no computational reason that should be happening - the number of floating point operations required for each time-step is nearly constant.

sithhell · 2015-09-22T11:05:38Z

How do I run the code with a different number of timesteps?

neutrino504 · 2015-09-22T15:14:17Z

The code executes forever until it crashes. Do you want to set a maximum number? The main loop is in main.cpp. Have you been able to execute it at all? You can change the maximum refinement level with MAX_LEVEL in defs.hpp

sithhell · 2015-09-22T15:55:49Z

Am 22.09.2015 5:14 nachm. schrieb "Neutrino" notifications@github.com:

The code executes forever until it crashes. Do you want to set a maximum
number? The main loop is in main.cpp. Have you been able to execute it at
all? You can change the maximum refinement level with MAX_LEVEL in defs.hpp

Yes, im able to build and run the code. I was just wondering how you
obtained the numbers for the different time steps.

sithhell · 2015-09-24T19:29:00Z

@dmarce1, how do I get the output how long a single timestep took?

dmarce1 · 2015-09-24T22:33:21Z

There should be a line of output for every timestep showing

step number - current simulation time - simulation timestep size - wallclock time for timestep

These same numbers will be in "step.dat"

sithhell · 2015-09-29T11:01:21Z

I was able to succesfully run your code with #1772 now. There were bunch of problems in your code as well. I'll open a PR ASAP (see #1752 for more information). What you can see is that I can't reproduce your regression in the wallclock time of the timesteps:

However, plotting dt, gives a similar graph than what you showed:

dmarce1 · 2015-09-29T12:11:01Z

I think if you replot the wall clock time plot with the yrange set to something like [1.5:2.0] you'll get a similar picture to what I got.

sithhell · 2015-09-29T12:41:28Z

You are most certainly correct. The yscale did make a misleading impression. So we see a slowdown of a factor of roughly 1.4. To give a better impression, I plotted the relative slowdown (omitting the peaks where the checkpoint data is being written:

So we are getting down to about 70% of the performance... Does this have to be caused by HPX or could this also be caused by the algorithm used (work imbalances etc.)?

hkaiser · 2015-09-29T13:03:32Z

We're currently investigating this issue in another application and we think to have some hints on what's causing this. I hope to have more information available soon.

hkaiser · 2015-10-01T19:23:49Z

This should be fixed once #1775 is merged.

hkaiser · 2015-10-24T18:25:45Z

@dmarce1 #1811 fixed another set of problems causing slowdown. Please report back if this issue can be closed now

hkaiser · 2015-11-05T13:47:41Z

@dmarce1 Can this be closed now? From what I understand these issues should have been solved by now.

hkaiser · 2015-11-07T18:59:23Z

I'm closing this now, please reopen, if appropriate.

dmarce1 · 2015-12-21T18:20:14Z

I am still getting performance degradation over time. Here is a plot:

http://s17.postimg.org/6h8qkbvn3/performance.png

This is with 64 nodes on SuperMIC

This is with a version of octotiger that doesn't rely on serialization of std::array, dmarce1/octotiger@7214efe

dmarce1 · 2015-12-22T15:03:17Z

Here's a longer plot:

The little spikes aren't a problem, those are steps where octotiger outputs to the file system. The wall clock time for non output steps increases linearly to a point and then levels off. There are small changes in the total load over the run as AMR refinement grids are created and destroyed, but this is less than +/- 10% of the starting load - not nearly enough to account for this.

Could this be a problem in octo-tiger itself and if so what would it look like?

On a positive note - this run represents the longest I have ever run octo-tiger using distributed HPX without a crash or freezeup.

sithhell · 2015-12-22T16:28:56Z

Is this with a release build?
Which column represents the wall clock time of a step?

dmarce1 · 2015-12-22T16:48:34Z

This is with a debug build. Each data point is one timestep, with the y-axis being the wall clock time for that timestep in seconds, and the x-axis being the number of the timestep since the start of the evolution.

Its as if there is some resource not being freed.

sithhell · 2015-12-22T17:21:52Z

This is with a debug build.
Each data point is one timestep, with the y-axis being the wall clock time for that timestep in seconds, and the x-axis being the number of the timestep since the start of the evolution.

Sorry, i meant the output in octotiger...

dmarce1 · 2015-12-22T17:26:15Z

My bad - Wall clock time is the 4th column in step.dat. The timestep number is the 1st column

Whatever is causing the issue doesn't appear to be taking up much memory. I started the run over and I've been tracking how much memory is used and it hasn't changed much while the wall clock time has already gone up to ~10 x the original step size.

hkaiser · 2015-12-22T17:41:50Z

Frankly, I wouldn't draw any conclusions from performance measurements done in debug-builds.
HPX is doing a lot of book-keeping and other additional things when compiled in debug-mode which might cause the slowdown you're seeing. Could you please verify whether the slowdown is still happening when run in release-mode?

dmarce1 · 2015-12-22T18:17:41Z

I am running in release mode now and still getting the same problem.

dmarce1 · 2016-01-04T22:02:02Z

Update-

I have been looking at performance counters and how they are changing with execution time for one timestep. I am resetting the counters with each time-step. This is what I have found:

Counters with a normalized increase over time similar to the normalized increase over time for wall clock per time-step:

/threads/count/cumulative-phases
/threads/count/pending-accesses
/threads/count/pending-misses

Counters that increase over time but at a slower rate
/threads/count/cumulative
/threads/count/stack-recycles
/threads/count/stack-unbinds
/threads/count/stolen-from-pending

Counters that do not change much over time:
/threads/count/instantaneous/
/threads/count/objects

Counters that increase a little bit at first, then come back down, then go back up again
/threads/count/stolen-from-staged
/threads/count/stolen-to-staged

I have not tested any other counters. Can anyone suggest which might be the most useful?

hkaiser · 2016-01-04T22:37:29Z

When using the plot script I linked above there wouldn't be any need to reset counters (resetting might be tricky as it is a 'global' operation which does not take into account computational phases).

For other counters you might want to look at: I'd suggest looking at action invocation counts:

/runtime/count/action_invocation@*
/runtime/count/remote_action_invocation@*

which count all (local and remote) action invocations individually. Note that remote action invocations are counted as local as well (on the locality where they are run)

Also, can you provide a link to those graphs you described above? The increase in increments for those counters makes sense as all of them are related to thread execution. Everything hints at an overall increase of some operation counts. The action counters above might give us a better (higher level) insight.

dmarce1 · 2016-01-05T19:52:42Z

I have found evidence of the culprit. When I turn off agas caching, the initial performance is of course a little worse - but it does not degrade with time.

BTW - I had some trouble getting the python script to run at first, so above I was just looking at the output csv with gnuplot. I'm able to get the script to run partially now, except it does not print all of the counters that are output for some reason.

dmarce1 · 2016-01-05T20:18:32Z

Attached is a zip file with results from the agas cache performance counters when agas caching is turned "on".

plots.zip

Here is what appears to me to be the only plot of note:

The relative increase in time it takes for "get_entry" to run is roughly proportional to the increase in wall clock per timestep over that same time period.

hkaiser · 2016-01-06T12:29:40Z

The first plot makes total sense, as the counter /agas/time/cache/get_entry is cumulative. I'd expect for it to go up linearly. Were you resetting counters while gathering that data?

dmarce1 · 2016-01-06T18:13:55Z

Yes - I am resetting the counters every time-step. The plot should be roughly horizontal in that case, no?

hkaiser · 2016-01-06T19:07:26Z

Could you rerun without resetting counters, please? I'd like to exclude the possibility of problems related to that.

dmarce1 · 2016-01-07T22:12:28Z

Here is the re-run without re-setting. Note I am using 'large' stacks, and output is turned on (it is turned off above). This is why there are spikes in the 'diff' plot.

hkaiser · 2016-01-09T21:58:43Z

I think I know what's going on. @dmarce1 How many overall localities/cores were you running on for the latest graph above?

dmarce1 · 2016-01-11T17:39:39Z

64 nodes, 20 cores a node, 1 locality per node, 20 threads a locality.

dmarce1 · 2016-01-22T20:49:13Z

Is someone working on this?

hkaiser · 2016-01-23T02:11:33Z

Yes, I think I have a solution, working on it.

hkaiser · 2016-02-22T18:30:57Z

@dmarce1 reported today that he believes for the slowdown to have been fixed.

hkaiser added type: defect affecting: LSU labels Sep 17, 2015

hkaiser added this to the 0.9.11 milestone Sep 17, 2015

hkaiser closed this as completed Nov 7, 2015

hkaiser mentioned this issue Nov 7, 2015

Decrease in application performance overtime; occasional spikes of major slowdown #1008

Closed

dmarce1 mentioned this issue Dec 21, 2015

Peformance degrading over time #1924

Closed

zao reopened this Dec 21, 2015

dmarce1 mentioned this issue Dec 28, 2015

Floating point exception in statistics_counter<boost::accumulators::tag::mean>::get_counter_value #1933

Closed

dmarce1 mentioned this issue Jan 6, 2016

assertion 'p.destination_locality() == hpx::get_locality()' failed #1944

Closed

hkaiser closed this as completed Feb 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPX performance degrades with time since execution begins #1753

HPX performance degrades with time since execution begins #1753

dmarce1 commented Sep 17, 2015

sithhell commented Sep 22, 2015

neutrino504 commented Sep 22, 2015

sithhell commented Sep 22, 2015

sithhell commented Sep 24, 2015

dmarce1 commented Sep 24, 2015

sithhell commented Sep 29, 2015

dmarce1 commented Sep 29, 2015

sithhell commented Sep 29, 2015

hkaiser commented Sep 29, 2015

hkaiser commented Oct 1, 2015

hkaiser commented Oct 24, 2015

hkaiser commented Nov 5, 2015

hkaiser commented Nov 7, 2015

dmarce1 commented Dec 21, 2015

dmarce1 commented Dec 22, 2015

sithhell commented Dec 22, 2015

dmarce1 commented Dec 22, 2015

sithhell commented Dec 22, 2015

dmarce1 commented Dec 22, 2015

hkaiser commented Dec 22, 2015

dmarce1 commented Dec 22, 2015

dmarce1 commented Jan 4, 2016

hkaiser commented Jan 4, 2016

dmarce1 commented Jan 5, 2016

dmarce1 commented Jan 5, 2016

hkaiser commented Jan 6, 2016

dmarce1 commented Jan 6, 2016

hkaiser commented Jan 6, 2016

dmarce1 commented Jan 7, 2016

hkaiser commented Jan 9, 2016

dmarce1 commented Jan 11, 2016

dmarce1 commented Jan 22, 2016

hkaiser commented Jan 23, 2016

hkaiser commented Feb 22, 2016

HPX performance degrades with time since execution begins #1753

HPX performance degrades with time since execution begins #1753

Comments

dmarce1 commented Sep 17, 2015

sithhell commented Sep 22, 2015

neutrino504 commented Sep 22, 2015

sithhell commented Sep 22, 2015

sithhell commented Sep 24, 2015

dmarce1 commented Sep 24, 2015

sithhell commented Sep 29, 2015

dmarce1 commented Sep 29, 2015

sithhell commented Sep 29, 2015

hkaiser commented Sep 29, 2015

hkaiser commented Oct 1, 2015

hkaiser commented Oct 24, 2015

hkaiser commented Nov 5, 2015

hkaiser commented Nov 7, 2015

dmarce1 commented Dec 21, 2015

dmarce1 commented Dec 22, 2015

sithhell commented Dec 22, 2015

dmarce1 commented Dec 22, 2015

sithhell commented Dec 22, 2015

dmarce1 commented Dec 22, 2015

hkaiser commented Dec 22, 2015

dmarce1 commented Dec 22, 2015

dmarce1 commented Jan 4, 2016

hkaiser commented Jan 4, 2016

dmarce1 commented Jan 5, 2016

dmarce1 commented Jan 5, 2016

hkaiser commented Jan 6, 2016

dmarce1 commented Jan 6, 2016

hkaiser commented Jan 6, 2016

dmarce1 commented Jan 7, 2016

hkaiser commented Jan 9, 2016

dmarce1 commented Jan 11, 2016

dmarce1 commented Jan 22, 2016

hkaiser commented Jan 23, 2016

hkaiser commented Feb 22, 2016