Faster image generation in WebAgg/NbAgg backends #5389

mdboom · 2015-11-03T15:25:07Z

Write the PNG data to string buffer, rather than an io.BytesIO
to eliminate the cost of memory reallocation
Add compression and filter arguments to _png.write_png
Use compression=3 and filter=NONE, which seems to be a sweet spot
for processing time vs. file size

Most of the speed increase comes from this last point. There are two knobs that libpng provides: compression level, which is just the zlib lossless compression level 0-9, and filter, which is described here: http://www.w3.org/TR/PNG-Filters.html

While the PNG standard says that the filter is to prepare the image for optimum compression, for typical plotting images which have lots of solid colors, it actually appears to make compression worse. (I suspect the filtering helps mainly on photographic images). Worse yet, the algorithm that automatically determines which filtering algorithm will be optimal takes a significant amount of runtime overhead. By selecting an algorithm explicitly, we can eliminate that overhead entirely.

To benchmark, I used a series of 16 difference images collected directly from a WebAgg session. I then ran these images through different combinations of compression and filter and compared the runtime and resulting file sizes. These benchmark images are available at mdboom/webagg-speed.

The results are below:

The behavior prior to this PR was compression of 6, filter of "auto", which corresponds to the "X" at time ~2.2s in the plot. After this PR, I've changed it to compression of 6, filter of "NONE", which seems like a good low risk choice. It is not significantly worse in terms of data size, and more than 4x better in runtime. One could be more aggressive on runtime by decreasing compression to 1 or 2 at the expense of a little file size, but that could be detrimental over a low bandwidth connection.

WeatherGod · 2015-11-03T15:44:12Z

src/_png.cpp

@@ -227,8 +274,13 @@ static PyObject *Py_write_png(PyObject *self, PyObject *args, PyObject *kwds)
    }

    if (PyErr_Occurred()) {
+        Py_XDECREF(buff.str);


Should you check that buff.str is not NULL?

Py_XDECREF already does that (as opposed to Py_DECREF which does not).

WeatherGod · 2015-11-03T16:21:49Z

In principle, this all makes sense, and the changes looks good to me. My C/C++ skills are rusty, though, so we probably want another +1 on this.

Maybe we should use the new memcheck script and exercise the webagg/nbagg backends with it, somehow?

mdboom · 2015-11-03T16:23:34Z

WebAgg is darn-tootin' snappy after this change. This was the major bottleneck.

It's a tricky one to automatically test. We'd either have to use something like Selenium or write a stubby websocket "browser-like-thing". Though I agree would would be nice long term.

- Write the PNG data to string buffer, rather than an io.BytesIO to eliminate the cost of memory reallocation - Add compression and filter arguments to `_png.write_png` - Use compression=3 and filter=NONE, which seems to be a sweet spot for processing time vs. file size

QuLogic · 2015-11-03T23:06:20Z

src/_png.cpp

@@ -131,7 +164,14 @@ static PyObject *Py_write_png(PyObject *self, PyObject *args, PyObject *kwds)
        py_file = filein;
    }

-    if ((fp = mpl_PyFile_Dup(py_file, (char *)"wb", &offset))) {
+    if (filein == Py_None) {
+        buff.size = width * height * 4 + 1024;


Presumably, this is large enough for uncompressed + PNG header and other overhead?

tacaswell · 2015-11-04T02:08:14Z

Does the 'normal' save to png go through the same code path? We should probably tweak those parameters / expose them out there as well.

Do we document these internal functions at all?

It is a bit of a code smell that by passing in None you can make this function return the string buffer, but if you pass in a buffer it returns None.

mdboom · 2015-11-04T02:24:48Z

Does the 'normal' save to png go through the same code path? We should probably tweak those parameters / expose them out there as well.

Yes, though it's not passing any compression or filter parameters. It probably does make sense to expose that up to the surface. I don't know if we have a standard way to pass file-format-specific parameters down through savefig, though. If not, we should: JPEG is another format where people like to tweak.

All that said, though, I think for normal use, this kind of speed savings isn't important. Though it is interesting that the automatic filter selection doesn't really help on this particular data set. It might make sense to turn off filtering by default.

Do we document these internal functions at all?

No. _png is explicitly privatized. Internal documentation wouldn't be a bad idea, but I like the idea of keeping the low-level C extension private and building porcelain on top.

It is a bit of a code smell that by passing in None you can make this function return the string buffer, but if you pass in a buffer it returns None.

Yes. But not sure how else to do it. The speed trick here is that the extension creates the buffer itself so that it can resize it more efficiently. The docs explicitly say you can't do that if passed in from Python or known about from anywhere else:

https://docs.python.org/3/c-api/bytes.html#c._PyBytes_Resize

Could make a new method that returns a buffer, I suppose, that just does exactly this underneath. But given that _png is already private, maybe not worth it.

tacaswell · 2015-11-04T02:29:55Z

I know @efiring has talked about running scripts that generate many output figures so there might be use-cases where this helps.

Main reason for internal docs is bus-factor. Completely agree about keeping it private. ~~The exposed c-api seems to be one of the most painful parts of moving numpy forward.~~ [this is unrelated but true]

Fair enough re code-smell.

mdboom · 2015-11-04T02:47:58Z

I've added docstrings to _png.cpp

tacaswell · 2015-11-04T03:53:06Z

Great, I'll give this a try and merge it once travis passes again (looks like unrelated network failures).

mdboom · 2015-11-04T14:59:23Z

Should I expose this to the outside world (in savefig and friends) in this PR or in a follow-on?

jenshnielsen · 2015-11-04T15:04:09Z

I think it's fine to leave to a follow up PR

Faster image generation in WebAgg/NbAgg backends

Faster image generation in WebAgg/NbAgg backends Conflicts: src/_png.cpp Due to having previously backported matplotlib#5910 ( 3675841 ) as ( 99ad89d )

Merge pull request #5389 from mdboom/webagg-speed

mdboom added the status: needs review label Nov 3, 2015

WeatherGod reviewed Nov 3, 2015
View reviewed changes

mdboom force-pushed the webagg-speed branch from 5ef8107 to 62b556d Compare November 3, 2015 15:58

QuLogic reviewed Nov 3, 2015
View reviewed changes

tacaswell added this to the proposed next point release (2.1) milestone Nov 4, 2015

mdboom force-pushed the webagg-speed branch from 62b556d to 51ae45b Compare November 4, 2015 02:47

Add docstrings to _png.cpp

bf1dca7

mdboom force-pushed the webagg-speed branch from 51ae45b to bf1dca7 Compare November 4, 2015 02:49

mdboom mentioned this pull request Nov 4, 2015

Expose compression and filter PNG options through savefig #5397

Closed

jenshnielsen added a commit that referenced this pull request Nov 4, 2015

Merge pull request #5389 from mdboom/webagg-speed

cb3c6f3

Faster image generation in WebAgg/NbAgg backends

jenshnielsen merged commit cb3c6f3 into matplotlib:master Nov 4, 2015

tacaswell removed the status: needs review label Nov 4, 2015

mdboom deleted the webagg-speed branch November 10, 2015 02:46

jenshnielsen mentioned this pull request Jan 25, 2016

Fix reading/writing from urllib.request objects #5910

Merged

tacaswell added a commit that referenced this pull request Jun 30, 2016

Merge pull request #6667 from tacaswell/backport_png_updates

c4fa469

Merge pull request #5389 from mdboom/webagg-speed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster image generation in WebAgg/NbAgg backends #5389

Faster image generation in WebAgg/NbAgg backends #5389

mdboom commented Nov 3, 2015

WeatherGod Nov 3, 2015

mdboom Nov 3, 2015

WeatherGod commented Nov 3, 2015

mdboom commented Nov 3, 2015

QuLogic Nov 3, 2015

mdboom Nov 3, 2015

tacaswell commented Nov 4, 2015

mdboom commented Nov 4, 2015

tacaswell commented Nov 4, 2015

mdboom commented Nov 4, 2015

tacaswell commented Nov 4, 2015

mdboom commented Nov 4, 2015

jenshnielsen commented Nov 4, 2015

Faster image generation in WebAgg/NbAgg backends #5389

Faster image generation in WebAgg/NbAgg backends #5389

Conversation

mdboom commented Nov 3, 2015

WeatherGod Nov 3, 2015

Choose a reason for hiding this comment

mdboom Nov 3, 2015

Choose a reason for hiding this comment

WeatherGod commented Nov 3, 2015

mdboom commented Nov 3, 2015

QuLogic Nov 3, 2015

Choose a reason for hiding this comment

mdboom Nov 3, 2015

Choose a reason for hiding this comment

tacaswell commented Nov 4, 2015

mdboom commented Nov 4, 2015

tacaswell commented Nov 4, 2015

mdboom commented Nov 4, 2015

tacaswell commented Nov 4, 2015

mdboom commented Nov 4, 2015

jenshnielsen commented Nov 4, 2015