New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macosx backend slowdown with 1.2.0 #1563
Comments
This script will not give you an accurate time measurement. |
Agreed. Would you be able to provide the complete runtime @jwoillez? |
Not sure how I can time things any other way in a non interactive mode, since show() blocks. For now I can report no noticeable time difference up to show(), whereas it takes fraction of a seconds for the window with the figure to appear in 1.1.1 and 20seconds in 1.2, for 10000 points. That delay increases with number of points. If you can suggest a more accurate script, I will be happy to run it. |
Would it be better to put |
On 2012/12/05 7:14 AM, Julien Woillez wrote:
You are right, this problem does not require any fancy timing to see it. |
@jwoillez the next step is to use bisection to track down the commit that caused the slowdown. Would you be able to do this? |
I don't see this problem. The tk backend has never worked for me so I'm using qt4agg instead. The macosx backend is quicker than the qt4 backend. Noticeably so. |
It might be helpful to track down whether path simplification is being performed. |
@mdboom, yes, absence of path simplification seems like a possible explanation. I haven't tried to track it down. Still, a delay of 20 s or so to plot 10,000 points seems excessive on an i7 even without path simplification. Something else seems to be going on here. @dmcdougall, with "ipython --pylab=qt" I get the same instantaneous response as with "--pylab=tk", on Mountain Lion. But with plain "ipython --pylab", so it is using the macosx backend, it takes 23 seconds, roughly timed with a watch. The processor is doing something the whole time--the machine is drawing about 32 watts instead of its idle value of 18 or so. All I am doing is starting ipython, executing "plot(np.random.randn(10000))", and timing from when I hit return to when the plot appears. The ipython prompt returns immediately, and the plot window is created; the delay is in waiting for the actual plot to appear in the window. The same delay occurs with any pan attempt. |
@efiring I have tried both |
Crude measurements with 1000 pts confirms this. |
Bisection points to commit 51611c8 as the first that triggers the slowdown. Looking at that commit, however, I have no idea how it could cause a slowdown specifically for plotting a large number of points. |
Trying on a different Mac, I was able to replicate the slowdown with the current matplotlib in github compared to 1.1.1. It seems to be an actual problem with Quartz itself, as the slowdown depends on the exact line width of the curve being drawn. If I set the line width to 1.0 with CGContextSetLineWidth just before the call to CGContextStrokePath, the drawing is fast. If I set it to 1.00001, it is slow. Actually, in gc.set_linewidth, should we scale the line width by dpi/72.0? Or should we use the line width as is? Also, could you guys run jwoillez' script and report back whether the script is painfully slow (by measuring the time between starting the script and seeing the output on screen), and which version of Mac OS X you are using? I don't remember seeing any slowdown on Mac OS X 10.8 with jwoillez' script, only with Mac OS X 10.5, so this may occur only for older Macs. |
@mdehoon, no need to run the script. I have verified that within my ipython session, plotting 10000 points, I get a slowdown (20 seconds or so) with figure.dpi = 80 (the default), and fast plotting (essentially instantaneous) with figure.dpi = 72. This is with the same kind of recent machine as @dmcdougall has, OS X 10.8. Line width in mpl is in points, so assuming Quartz is working with line width in dots, the present scaling is correct. It looks like this is a major Quartz limitation that we are stuck with--slow plotting when the line width is not an integer number of dots. I suppose we could simply round it to an integer number of dots, with a minimum of zero. This probably would be a good move, since it would solve the speed problem, and I suspect it would have only a minor and acceptable effect on the screen display quality. As a separate issue, @mdboom asked whether path simplification is used in this backend. I think the answer is "no". @mdehoon, is this correct, and if so, could path simplification be added? |
Path simplification is used in this backend. If I switch off path simplification, the script takes almost twice as long. Some more testing revealed that Quartz is slow if the line width is greater than 1. It doesn't matter if it is an integer or not. But it also turns out that drawing 10000 points as 100 x 100 points is much faster than drawing 10000 points at once. So we can speed up the Mac OS X backend that way. We'd have to be careful though to make sure the end result is exactly the same, in particular at the end points if alpha!=1. |
By the way, I won't be able to implement any fixes for this issue myself any time soon, so if somebody else wants to give it a try, please go ahead. |
Given this rather horrible behavior by Quartz, would it make sense to use Agg for the rendering instead? I realize this would be a big change, and I am in no position to contribute to it. Apart from the cost of making the change, what is the advantage of Quartz rendering? |
An unrelated problem with the macosx backend is that it doesn't respond continuously to pan/zoom the way the other backends do. Is this an inherent limitation? Has it already been reported? |
At this point, I would not use Agg for the rendering instead, as there are more straightforward options to explore first. My first step would be to get some feedback from the Apple developers to see if there is a simple way to get better performance. If not, simply drawing long paths as multiple shorter paths is the simplest solution. Or we could try QuickDraw instead of Quartz rendering. Also I haven't tried if switching off anti-aliasing for long paths makes a difference. With regard to the macosx backend not responding to panning and zooming, let's open a separate issue for that, if it has not been reported yet. |
I don't think QuickDraw is an option. According to http://en.wikipedia.org/wiki/QuickDraw it has been deprecated since OSX 10.4. |
If I use SNAP_TRUE instead of SNAP_AUTO in the call to get_path_iterator in GraphicsContext_draw_path, I get much better (near-instantaneous) performance. Are there any other (better?) ways to tweak the call to get_path_iterator? If not, I suggest that we simply use SNAP_TRUE instead of SNAP_AUTO for long paths. |
Is the slow-down still gone if you have SNAP_TRUE and thicker linewidths? |
A linewidth of 1 is fastest, but a linewidth of 10 still gives an acceptable speed. |
But isn't the drawing less accurate with snapping on? Snapping is only intended for rectilinear (i.e. axis-aligned) lines, and the AUTO mode first does a test to determine if the path falls into that category. And the testing is turned off when the path has more than 1024 points, so it shouldn't be triggered in this case. It would be good to see some images with snapping on and off to determine if we're not losing quality there. |
I don't think the issue is that the auto-detection of path snapping is |
Below are three screen shots. All three look very similar to me, but current and simplify_threshold_tenfold look a bit closer to each other. Perhaps we could simplify paths more when they contain many points? |
On 2012/12/07 6:16 PM, Damon McDougall wrote:
The question is of accuracy, not crispness. Snapping makes it crisp, at |
So, 1) is unacceptably slow but very accurate. 2) is crisp but accurate only to within a pixel. 3) retains sub-pixel accuracy, looks similar to 1) but discards data for the purposes of speed. How do we objectively decide which option is the most appropriate? |
I tried a different approach, which is to divide the paths into subpaths and draw each of them separately, which is much faster. This seems the cleanest solution to me. I'll do some more testing to make sure it doesn't cause problems in other types of plots. I noticed that CLOSEPOLY is defined differently in lib/matplotlib/path.py (CLOSEPOLY = 0x4f) from src/_backend_agg.h (CLOSEPOLY = 5). In src/_backend_agg.h it says that these constants should be kept in sync with path.py. Perhaps it is better if we put these definitions in a separate .h file, and make them available to path.py via src/path_cleanup.cpp? I am asking since I would like to add EMPTY to these to signify an empty path. Then in src/_macosx.m I can distinguish more easily between an empty path, a partially finished path, and a finished path (which is what I need to draw parts of the path separately). |
Done. |
That is interesting that it's faster if you break up the path. It might be that CoreGraphics is multithreading the processing. |
As far as I know it is not due to multithreading, but it has to do with CoreGraphics needing to perform more calculations for longer paths to find out if they overlap or self-intersect. |
Ahhh, interesting. I wonder what the speed difference, if any, there is between passing one huge line with lots of (say, 10^4) points, and 10^4 - 1 lines each of two points. If they're comparable, finding the 'sweet spot' might be beneficial. Also, on the topic of multithreading, since most Macs have at least two cores now, I wonder if we can utilise Grand Central Dispatch to ship out parts of the line to various threads. The main things I'd be concerned about there are whether or not the drawing is guaranteed to end before the event loop does, and also ensuring shipping out parts of the the drawing doesn't implicitly change the z-order. That's probably beyond the scope of this pull request -- I just wanted to think out loud. |
The downside of splitting the path is when alpha blending (which I think @mdehoon already pointed out). Maybe it makes sense to turning of splitting when alpha != 1.0? It seems rather unlikely someone would be alpha-blending a high-vertex-count line. I think I prefer that rather than snapping everything, which does diminish accuracy. And just curious -- is there anything in Apple's bug tracker or knowledge base about this? I wonder if there aren't other suggested workarounds. |
The suggestion from Apple developers was to split up the path. I agree that it is unlikely that someone would be alpha-blending a line with many vertices. But then we don't have to switch off splitting when alpha!=1, since splitting is only done if the path has many vertices (I have done some tests with splitting every 100 or every 1000 vertices. Both of them are fast, though I want to do some more testing before committing this). So for now I would prefer to always split the path if it has more than e.g. 1000 vertices, regardless of the alpha value. If some day we find a case where the results for alpha!=1 are unacceptably bad, we can reconsider. |
Just pitching in to say I've had several local Python users tell me they were having issues with plotting even 10,000 points with the MacOS X backend. So for now, the solution is to recommend using a different backend? |
@astrofrog: is that with or without alpha. I think the problem here is alpha-specific. Usually when it's a slow down, the first thing to check is whether path simplification is turned on and being applied. |
I didn't check , I just noticed users trying to do |
@mdboom, I don't think this is an alpha problem; at least based on this Issue history, it looks like @mdehoon's path-splitting fix never made it to the PR stage. Michiel verified earlier in this thread that path simplification is being used. The underlying problem is that Quartz is slow. (This also shows up in Preview; I have run into pdf documents with line plots that are rendered nearly instantaneously using evince on Linux, but that are unusably slow on the Mac with Preview.) I think that the path-splitting fix is likely the best way to get around it for mpl 1.3. @astrofrog: yes, the workaround is to use one of the agg-based backends. |
Indeed I never got around to creating a pull request for this fix. Sorry for that. I can do so over the next 2 weeks. |
@astrofrog, using "figure.dpi : 72" in my matplotlibrc worked for me (following @efiring's suggestion), as long as I am not playing with linewidth and alpha. |
See this pull request: |
#1816 merged; closing. |
I've noticed a major slowdown of the macosx backend between version 1.1.1 and 1.2.0, when using pylab.show(). The following code times the call and gives very different results between the two versions:
dt ~ 454 pts/s for 1.2
dt ~ 168,480 pts/s for 1.1.1
A difference by a factor ~ 200. Can others test and confirm? The results did not change much when using TkAgg.
The text was updated successfully, but these errors were encountered: