ImageCache/ImageBuf read performance#480
ImageCache/ImageBuf read performance#480lgritz merged 9 commits intoAcademySoftwareFoundation:masterfrom
Conversation
… than std::vector in order to reduce needless initialization of the buffers.
…::vector in order to avoid useless initialization of the memory when it's allocated.
…tead of read_scanline
This was an artifact of an old restriction of TextureSystem which is no longer necessary.
|
This is awesome! Great speedups, and a great write up as well. To confirm, this is binary incompatible, correct? (I note the change to scoped_array inside the ImageBuf public header). So is the assumptions this would be rolled into a new major version? (or the next major version). (It's also interesting to note that if ImageBuf were implemented using the pimp pattern, this probably could have been done in a binary compatible manner). |
|
You're right: Because this breaks binary compatibility of public types, it just has to be a 1.2 feature, it just can't be backported to 1.1. You're also correct that ImageBuf has become too complex to have the implementation in the public headers; it really needs PIMPL. I had avoided that just to make it as performant as possible rather than needing a function call and a pointer indirection for absolutely every method, even trivial get and sets, but maybe I shouldn't be worried about that in the grand scheme of things. Should I make PIMPL-ifying IB a part of this review, or do it separately? |
|
This commit looks good to me. (master branch only) Definitely a separate review for pimpl-ing. We dont necessarily have to pimpl everything, so we should look at real performance numbers and see if the indirection hurts us on the little functions (getpixel, setpixel, etc). But there are a whole bunch of classes in OIIO which are not performance limited, we should pimpl those as well. |
|
OK, I will merge this and look into pmpling separately. Maybe there's a hybrid approach where a small number of POD-and-unlikely-to-change fields can be public and make the frequently-called access routines as fast as possible, and the tricky stuff can be pimpled without any performance impact. |
ImageCache/ImageBuf read performance improvements
I spent some time carefully profiling image read performance of scanline OpenEXR images in OIIO. I was particularly interested in making sure that reading via an ImageBuf or ImageCache::get_pixels did not have too much overhead compared to a raw ImageInput::read_image (presumed to be "speed of light", the barest wrapping of the underlying library calls), and how autotile/autoscanline affected things.
First thing I noticed is a big flaw in my libOpenImageIO/imagespeed_test, I had neglected to fully flush the ImageCache between sub-tests, and that was throwing off my numbers, making IB and IC look rosier than they really were. I also augmented imagespeed_test to explicitly cover several more of the combinations I describe above. Here is example output from my Macbook Pro on a 2336x1198 4-channel float OpenEXR file:
Some interesting things we can say already:
OK, so I carefully profiled those code paths and made a number of improvements. I'll give you the results first:
Bottom line is that I've DOUBLED the image reading speed of using ImageBuf and ImageCache::get_pixels when autotile is off, they are now almost as fast as raw calls to ImageInput::read_scanlines or read_image. Autotile is still much slower, though much improved compared to before. And the combination of autotile and autoscanline has been nearly tripled in performance, and is now only incrementally slower than when autotile is off (that is, when the ImageCache reads the whole image as one tile).
So, how did I do it? Basically it broke down to three main improvements:
And there were a few other minor refactors with smaller effects, and some minor bug fixes for things that broke with these changes. Look at the individual commit comments if you care.
The final takeaway is: