New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

VTK Performance Improvements #1637

Merged
merged 10 commits into from Nov 4, 2015

Conversation

Projects
None yet
5 participants
@chaosphere2112
Contributor

chaosphere2112 commented Oct 26, 2015

This pull request addresses performance issues experienced in the VCS interactive features.

TL;DR

I made significant cuts to the number of renderers and actors (we now use fewer than in 2.2), but performance of prop picking is still worse than in 2.2. Need help getting it lower, @sankhesh @aashish24 @doutriaux1 @jbeezley.

The Problem

The problem boils down to determining what exactly is under the mouse. To do that, I use VTK's vtkPropPicker, which, as far as I can tell, is the best tool for determining what actors are at a given (X, Y) coordinate. This object's PickProp method is called on every mouse move event and every mouse click event (this can be optimized out, since a mouse move will always happen before a mouse click, but is such a small fraction of the events triggering the PickProp that it's basically irrelevant).

The amount of time we spend hunting for actors has ballooned dramatically since 2.2; using methods detailed below, I discovered that the amount of time we spend in actor_at_point has nearly doubled. There are two main axes which we can adjust to effect the numbers on the profile; the number of renderers (each renderer has PickProp called on it separately) and the number of props. I discovered through experimentaiton the PickProp's performance is linear with regards to the number of props, so reducing the number of renderers doesn't accomplish anything besides decreasing the total number of calls to PickProp (and maybe some fringe benefits with regards to render time that I haven't explored too much). That means that as far as I can tell, the only way to decrease the time spent picking is to reduce the total number of props in use. So, below is the detailed rundown of how I went about making some improvements to the performance of this function.

Methodology

import vcs, cdms2

x = vcs.init()
try:
    f = cdms2.open(vcs.sample_data + '/clt.nc')
except AttributeError:
    # vcs.sample_data is new in 2.4
    f = cdms2.open(vcs.prefix + "/sample_data/clt.nc")
s = f('clt')

import cProfile
x.bgX = 1536
x.bgY = 1186
x.plot(s)
x.configure()
s = x.backend.renWin.GetSize()

p = cProfile.Profile()
p.enable()
for i in range(s[0]):
    x1, y1 = i, i % s[1]
    a = x.configurator.actor_at_point(x1, y1)
p.disable()
p.print_stats(sort="tottime")

This calls the relevant function (actor_at_point) a bunch of times, descending across to the right (wraps around eventually).

2.2.0 Profile

This release had quite tolerable performance. I'd say it's basically the goal.

Dividing ncalls of PickProp by ncalls of actor_at_point, we get 17 renderers in use.

ncalls tottime percall cumtime percall filename:lineno(function)
26112 72.052 0.003 72.211 0.003 {built-in method PickProp}
1536 0.264 0.000 72.720 0.047 configurator.py:275(actor_at_point)
30720 0.124 0.000 0.124 0.000 {built-in method GetNextItem}
52224 0.091 0.000 0.159 0.000 manager.py:38(__place)
52224 0.067 0.000 0.067 0.000 {built-in method GetSize}
29184 0.038 0.000 0.048 0.000 manager.py:144(get_manager)
30720 0.036 0.000 0.163 0.000 vcs2vtk.py:1466(vtkIterate)
29184 0.014 0.000 0.014 0.000 {built-in method GetLayer}
1536 0.013 0.000 0.013 0.000 {built-in method GetRenderers}
29184 0.010 0.000 0.010 0.000 {method 'get' of 'dict' objects}
1536 0.004 0.000 0.006 0.000 configurator.py:112(render_window)
1536 0.002 0.000 0.002 0.000 {built-in method InitTraversal}
1536 0.002 0.000 0.002 0.000 {built-in method GetRenderWindow}
815 0.002 0.000 0.002 0.000 {built-in method GetViewProp}
1 0.000 0.000 0.000 0.000 {range}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

2.4-RC2 Profile

This is where we're at right now. It takes a smidge less than twice as long to do the same task, which is super annoying while you're mousing around. You'll notice that the percall time has increased, as well as the ncalls time. The reason ncalls has gone up is because we are operating with quite a lot more renderers, and we're picking across all of them. This means that the function PickProp, which has gotten slower since 2.2.0, will take longer and is called more. 馃槥

Dividing ncalls of PickProp by ncalls of actor_at_point, we get 23 renderers in use.

ncalls tottime percall cumtime percall filename:lineno(function)
35311 129.447 0.004 129.673 0.004 {built-in method PickProp}
1536 0.335 0.000 130.356 0.085 configurator.py:301(actor_at_point)
39936 0.191 0.000 0.191 0.000 {built-in method GetNextItem}
70622 0.128 0.000 0.226 0.000 manager.py:49(__place)
70622 0.098 0.000 0.098 0.000 {built-in method GetSize}
38400 0.049 0.000 0.062 0.000 manager.py:158(get_manager)
39936 0.047 0.000 0.240 0.000 vcs2vtk.py:1670(vtkIterate)
38400 0.021 0.000 0.021 0.000 {built-in method GetLayer}
1536 0.014 0.000 0.014 0.000 {built-in method GetRenderers}
38400 0.013 0.000 0.013 0.000 {method 'get' of 'dict' objects}
837 0.004 0.000 0.004 0.000 {built-in method GetViewProp}
1536 0.004 0.000 0.006 0.000 configurator.py:113(render_window)
1536 0.002 0.000 0.002 0.000 {built-in method InitTraversal}
1536 0.002 0.000 0.002 0.000 {built-in method GetRenderWindow}
1 0.000 0.000 0.000 0.000 {range}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Reduced Renderers Profile

I started with the low-hanging fruit; I dramatically reduced the number of times we're going to call PickProp by reducing the number of renderers we're using. This dropped the ncalls to even lower than 2.2.0 馃憤 , but it tripled the percall time 馃憥 ; looks like PickProp is roughly linear with the number of props. So, we're going to have to examine the use of props.

Dividing ncalls of PickProp by ncalls of actor_at_point, we get 8 renderers in use.

ncalls tottime percall cumtime percall filename:lineno(function)
12279 125.505 0.010 125.597 0.010 {built-in method PickProp}
1536 0.159 0.000 125.868 0.082 configurator.py:301(actor_at_point)
24558 0.054 0.000 0.092 0.000 manager.py:49(__place)
24558 0.038 0.000 0.038 0.000 {built-in method GetSize}
16896 0.023 0.000 0.046 0.000 vcs2vtk.py:1670(vtkIterate)
15360 0.022 0.000 0.029 0.000 manager.py:158(get_manager)
16896 0.021 0.000 0.021 0.000 {built-in method GetNextItem}
1536 0.017 0.000 0.017 0.000 {built-in method GetRenderers}
15360 0.010 0.000 0.010 0.000 {built-in method GetLayer}
15360 0.006 0.000 0.006 0.000 {method 'get' of 'dict' objects}
815 0.004 0.000 0.004 0.000 {built-in method GetViewProp}
1536 0.004 0.000 0.006 0.000 configurator.py:113(render_window)
1536 0.003 0.000 0.003 0.000 {built-in method InitTraversal}
1536 0.002 0.000 0.002 0.000 {built-in method GetRenderWindow}
1 0.000 0.000 0.000 0.000 {range}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Prop Analysis

import vcs, cdms2

x = vcs.init()
try:
    f = cdms2.open(vcs.sample_data + '/clt.nc')
except AttributeError:
    f = cdms2.open(vcs.prefix + "/sample_data/clt.nc")
s = f('clt')

x.bgX = 1536
x.bgY = 1186
x.plot(s)

actors = []
rens = []
for ren in vcs.vcs2vtk.vtkIterate(x.backend.renWin.GetRenderers()):
    for act in vcs.vcs2vtk.vtkIterate(ren.GetActors()):
        actors.append(act)
    rens.append(ren)

print "# Actors:", len(actors)
print "# Renderers:", len(rens)
Version Actor Count Renderer Count
2.2.0 62 17
2.4-RC2 286 23
Reduced Renderers 286 8

Uh, wow. That's a big uptick in actors. Decided to double check my renderer count just in case the math was fuzzy above.

Alright, so where are those coming from? Guess it's time to dive into the types of actors in use...

There wasn't any easy way to gather this info externally, so I just added some print statements to every place I found that created a vtkActor() in the codebase and tallied them up afterwards.

2.2.0

Actor Source Count
vcs2vtk.prepLine 59
vcs2vtk.prepFillarea 1
VTKPlots.plot2D 1
VTKPlots.plotContinents 1

So, a lot of lines, and a few other things. Not too bad.

2.4-RC2

Actor Source Count
vcs2vtk.prepLine 59
vcs2vtk.prepFillarea 225
boxfillpipeline._plotInternal 1
VTKPlots.plotContinents 1

Whoa

The code in prepFillarea was changed during the pattern/hatch PR to create a separate vtkPolyData/vtkPolyDataMapper/vtkActor per filled box, which is why we have so many actors. I cleaned that up a bit (now will only make one for each patterned fill and one that does all of the "solid" fills). This brought our actor count back down to where it was for 2.2, but, sadly, the performance gains did not keep pace.

Reduced Renderers + FillArea Refactor Actor Counts

Actor Source Count
vcs2vtk.prepLine 59
vcs2vtk.prepFillarea 1
boxfillpipeline._plotInternal 1
VTKPlots.plotContinents 1

Reduced Renderers + FillArea Refactor Profile

ncalls tottime percall cumtime percall filename:lineno(function)
12279 106.924 0.009 107.009 0.009 {built-in method PickProp}
1536 0.157 0.000 107.275 0.070 configurator.py:301(actor_at_point)
24558 0.050 0.000 0.085 0.000 manager.py:49(__place)
24558 0.035 0.000 0.035 0.000 {built-in method GetSize}
15360 0.023 0.000 0.029 0.000 manager.py:158(get_manager)
16896 0.023 0.000 0.045 0.000 vcs2vtk.py:1687(vtkIterate)
16896 0.019 0.000 0.019 0.000 {built-in method GetNextItem}
1536 0.016 0.000 0.016 0.000 {built-in method GetRenderers}
15360 0.009 0.000 0.009 0.000 {built-in method GetLayer}
15360 0.006 0.000 0.006 0.000 {method 'get' of 'dict' objects}
1536 0.004 0.000 0.006 0.000 configurator.py:113(render_window)
815 0.004 0.000 0.004 0.000 {built-in method GetViewProp}
1536 0.003 0.000 0.003 0.000 {built-in method InitTraversal}
1536 0.003 0.000 0.003 0.000 {built-in method GetRenderWindow}
1 0.000 0.000 0.000 0.000 {range}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Alright, we shaved 25 seconds off of the execution time of the test, and .001 seconds off the percall time. That is still a pretty sizable gap between here and 2.2 performance, though...

Line Polydata Optimization

So, let's start getting wild and crazy and actually try and be better behaved than 2.2. 59 lines is a lot of actors/polydata. How about we combine a few of those? As is, every single line that is drawn on screen gets its own actor/polydata pipeline. I merged them down so that each vcs Line that is passed in will only generate one actor per line width and line pattern (since those are set on the actor itself). In practice, this pretty much means one actor per vcs Line object plotted.

This shaves another 11 seconds off of the testcase.

Line Polydata Optimization Actor Counts

Actor Source Count
vcs2vtk.prepLine 6
vcs2vtk.prepFillarea 1
boxfillpipeline._plotInternal 1
VTKPlots.plotContinents 1

Line Polydata + Reduced Renderers + Fillarea Polydata Profile

ncalls tottime percall cumtime percall filename:lineno(function)
12279 95.333 0.008 95.415 0.008 {built-in method PickProp}
1536 0.150 0.000 95.668 0.062 configurator.py:301(actor_at_point)
24558 0.048 0.000 0.082 0.000 manager.py:49(__place)
24558 0.034 0.000 0.034 0.000 {built-in method GetSize}
15360 0.021 0.000 0.027 0.000 manager.py:158(get_manager)
16896 0.019 0.000 0.040 0.000 vcs2vtk.py:1711(vtkIterate)
1536 0.018 0.000 0.018 0.000 {built-in method GetRenderers}
16896 0.017 0.000 0.017 0.000 {built-in method GetNextItem}
15360 0.008 0.000 0.008 0.000 {built-in method GetLayer}
15360 0.006 0.000 0.006 0.000 {method 'get' of 'dict' objects}
1536 0.004 0.000 0.006 0.000 configurator.py:113(render_window)
815 0.004 0.000 0.004 0.000 {built-in method GetViewProp}
1536 0.003 0.000 0.003 0.000 {built-in method InitTraversal}
1536 0.002 0.000 0.002 0.000 {built-in method GetRenderWindow}
1 0.000 0.000 0.000 0.000 {range}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

So now I've reduced the number of actors to as few as possible, reduced the number of renderers to as few as possible, and performance is still worse than it was (and at this point I would expect it to be better). I'm not sure where to go from here to get those last 20 seconds of execution time back, so I'm opening it up to suggestions.

@sankhesh

This comment has been minimized.

Contributor

sankhesh commented Oct 27, 2015

馃弳 @chaosphere2112 Great job with the analyses.

An alternative approach, I would suggest, is just getting the world position of the picked point from the interactor and then looping through the possible actors to determine if they would be of interest. This eliminates the n calls to calculate the world pick position (this is done internally in PickProp) which does a bunch of coordinate transform calculations. You could also fine tune which props to select from using this approach.

@chaosphere2112

This comment has been minimized.

Contributor

chaosphere2112 commented Oct 27, 2015

@sankhesh

I gathered up the actors we're checking for and passed them in as a vtkPropCollection to the PickProp call and performance held steady; I'll give your alternative a try next.

ncalls tottime percall cumtime percall filename:lineno(function)
12288 97.684 0.008 97.771 0.008 {built-in method PickProp}
1536 0.236 0.000 98.119 0.064 configurator.py:301(actor_at_point)
24576 0.053 0.000 0.088 0.000 manager.py:49(__place)
24576 0.035 0.000 0.035 0.000 {built-in method GetSize}
16896 0.021 0.000 0.021 0.000 {built-in method GetNextItem}
16896 0.019 0.000 0.043 0.000 vcs2vtk.py:1711(vtkIterate)
15360 0.019 0.000 0.024 0.000 manager.py:158(get_manager)
1536 0.010 0.000 0.010 0.000 {built-in method GetRenderers}
15360 0.008 0.000 0.008 0.000 {built-in method GetLayer}
13824 0.008 0.000 0.008 0.000 {built-in method AddItem}
26112 0.006 0.000 0.006 0.000 displayplot.py:212(_get_backend)
15360 0.005 0.000 0.005 0.000 {method 'get' of 'dict' objects}
13824 0.005 0.000 0.005 0.000 {built-in method IsA}
1536 0.003 0.000 0.005 0.000 configurator.py:113(render_window)
1536 0.003 0.000 0.003 0.000 {built-in method InitTraversal}
1536 0.002 0.000 0.002 0.000 {built-in method GetRenderWindow}
1 0.000 0.000 0.000 0.000 {range}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
@sankhesh

This comment has been minimized.

Contributor

sankhesh commented Oct 27, 2015

@chaosphere2112 Yeah I would've expected that. Each renderer internally maintains a vtkPropCollection too.

@chaosphere2112

This comment has been minimized.

Contributor

chaosphere2112 commented Oct 27, 2015

@sankhesh I managed to shave off a considerable chunk of execution time by running through the significant actors and checking if any of them are in the renderer prior to using PickProp; it's well below 2.2 levels now, so I'm probably happy with that. Now I just need to fix up the busted tests.

@chaosphere2112

This comment has been minimized.

Contributor

chaosphere2112 commented Oct 27, 2015

ncalls tottime percall cumtime percall filename:lineno(function)
3072 48.615 0.016 48.646 0.016 {built-in method PickProp}
1536 0.220 0.000 48.986 0.032 configurator.py:301(actor_at_point)
115200 0.037 0.000 0.037 0.000 {built-in method HasViewProp}
6144 0.019 0.000 0.031 0.000 manager.py:49(__place)
1536 0.016 0.000 0.016 0.000 {built-in method GetRenderers}
16896 0.015 0.000 0.015 0.000 {built-in method GetNextItem}
16896 0.014 0.000 0.032 0.000 vcs2vtk.py:1711(vtkIterate)
6144 0.012 0.000 0.012 0.000 {built-in method GetSize}
26112 0.005 0.000 0.005 0.000 displayplot.py:212(_get_backend)
3072 0.005 0.000 0.008 0.000 manager.py:158(get_manager)
13824 0.005 0.000 0.005 0.000 {built-in method IsA}
13824 0.004 0.000 0.004 0.000 {method 'append' of 'list' objects}
1536 0.003 0.000 0.005 0.000 configurator.py:113(render_window)
1536 0.003 0.000 0.003 0.000 {built-in method InitTraversal}
3072 0.002 0.000 0.002 0.000 {method 'get' of 'dict' objects}
1536 0.002 0.000 0.002 0.000 {built-in method GetRenderWindow}
3072 0.002 0.000 0.002 0.000 {built-in method GetLayer}
9 0.000 0.000 0.000 0.000 {built-in method GetViewProp}
1 0.000 0.000 0.000 0.000 {range}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
@sankhesh

This comment has been minimized.

Contributor

sankhesh commented Oct 27, 2015

馃憦

@chaosphere2112

This comment has been minimized.

Contributor

chaosphere2112 commented Oct 29, 2015

OK, so now it should pass all but three tests. @doutriaux1 and I had a conversation yesterday, and we've decided that we're going to keep pattern behaving in the way it did in the old UV-CDAT (where patterns are always black and white), and make the opacity attributes apply the same way they do for hatches (essentially making pattern a hatch that is forced to be black). I also had to revert a minor optimization (condensed renderers) due to some small graphical glitches (like, say, losing isoline labels, which I'm sure Jerry would have loved). I'm uploading new baselines shortly, and I'll send up another commit to retrigger the builds.

@sankhesh

This comment has been minimized.

Contributor

sankhesh commented Oct 29, 2015

@chaosphere2112 @doutriaux1 Mind if I ask why change the pattern behavior?

I am asking since there were extensive discussions when that was being implemented and @durack1 and other users had liked the idea of having those additional features.

Additionally the current way, a translucent black hatch with white background is an easily created option rather than the definition of a pattern.

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Oct 29, 2015

Hey alll @sankhesh @chaosphere2112 @doutriaux1 its not a great idea to change the behavior without a general discussion. If you want to do that, then the best thing is to start a discussion here (on a separate issue) or in the mailing list. For the project management and community effort, that's the best approach as there are lot of people out there using UV-CDAT.

@chaosphere2112

This comment has been minimized.

Contributor

chaosphere2112 commented Oct 29, 2015

@sankhesh

My biggest problem is that it's a weird and inconsistent API. opacity and fillareaopacity (henceforth referred to as opacity) have different behaviors depending on style, and default to 0 for one of the three styles but 255 for the other two. That's incredibly odd. I spent like 2 hours trying to figure out why my patterned fills were coming out black and white, when I knew that there were colored patterns in the tests. I was assuming that opacity referred to the overall opacity of the pattern and background color, but isn't the case. Having this kind of inconsistency in an API is really frustrating, and for the most part I think we've managed to keep the inconsistencies to a minimum.

I'll let @doutriaux1 fill in his arguments when he gets in this afternoon (he's got the morning off).

@aashish24 I'd be happy to move this change into a different PR; it was convenient to do since I was already rooting around in that code here. The issue mostly came up because @doutriaux1 is working on his colormap/color update, and was confused by some behavior in the pattern tests.

Sam Fries
@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Oct 29, 2015

I agree with @chaosphere2112. Pattern are usually used to highlight an area of interest the background color is introducing inconsistencies in the API and can easily be dealt with by plotting a regular isofill underneath it.

@durack1

This comment has been minimized.

Member

durack1 commented Oct 29, 2015

@chaosphere2112 @doutriaux1 I'd be more than happy to provide my input on this one.. But I'm a little lost as to what the proposed change is, and how it affects my ability to plot colors and overplot hatches/patterns.. @chaosphere2112 can we talk about this next week?

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Oct 29, 2015

@durack1 it does not affects any capability but makes it a little less confusing since the opacity parameters now consistently has the same meaning.

@durack1

This comment has been minimized.

Member

durack1 commented Oct 29, 2015

@doutriaux1, thanks for clarifying.. It did seem to me from reading the above that you're aiming at standardizing behaviors.. And as long these changes also allow overplotting and opacity control, I'm all for making behaviors more consistent and consequently easier for a user to interpret and use..

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Oct 30, 2015

@aashish24 I'd be happy to move this change into a different PR; it was convenient to do since I was already rooting around in that code here.

I think in general its better to do something like this in another PR mostly because of clarity and consistency. This PR which is for performance improvements now is going to the change the API and the behavior which is not best as it makes it harder to draw the line and to review the code.

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Oct 30, 2015

@doutriaux1 @chaosphere2112 please create another issue and discussion topic for the pattern and opacity stuff. I would like @sankhesh and @durack1 to be involved as well and let's just focus on the PI on this branch.

@chaosphere2112

This comment has been minimized.

Contributor

chaosphere2112 commented Oct 30, 2015

Sure, I'll go ahead and make another issue for discussing that. I've already removed the offending code, and I'm passing all tests (ish). The NOGUI build failed for some unrelated reason (wasn't able to pull CMOR), FULL on crunchy failed on an unrelated test, and I'm not sure what garant's issue is.

@aashish24

This comment has been minimized.

Contributor

aashish24 commented Nov 2, 2015

thanks @chaosphere2112 much appreciated!.. you are awesome 馃槂 :

doutriaux1 added a commit that referenced this pull request Nov 4, 2015

Merge pull request #1637 from UV-CDAT/reduced_renderers
VTK Performance Improvements

@doutriaux1 doutriaux1 merged commit d2ac967 into master Nov 4, 2015

6 of 9 checks passed

cont-int/LLNL/Darwin-Mac1 10.10.5 (NOGUI) running 'make -j4' (Thu Oct 29 15:46:02 2015)
Details
cont-int/LLNL/Linux-crunchy RH6 (FULL) running 'ctest -j12 -D Experimental' (Thu Oct 29 11:32:55 2015)
Details
continuous-integration/kitware-buildbot/uvcdat-garant-linux-release/ Build done.
Details
cont-int/LLNL/Darwin-Mac 10.10.5 (LEAN) running 'ctest -j4 -D Experimental' (Thu Oct 29 11:13:43 2015)
Details
cont-int/LLNL/Darwin-Mac2 10.10.5 (FULL) running 'ctest -j4 -D Experimental' (Thu Oct 29 14:26:05 2015)
Details
cont-int/LLNL/Linux-annie Ubuntu 15.04 (FULL/MESA) running 'ctest -j15 -D Experimental' (Thu Oct 29 12:08:49 2015)
Details
cont-int/LLNL/Linux-oceanonly RH6 (MESA/NOGUI) running 'ctest -j12 -D Experimental' (Thu Oct 29 11:37:34 2015)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@doutriaux1 doutriaux1 deleted the reduced_renderers branch Nov 4, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment