[WIP] Global caching#20712
Conversation
|
@masterpiga : Very interesting approach. A couple of questions:
Would be happy to test if that's useful. |
|
Just my two cents ...
|
|
Thanks for chiming in, @andriiryzhkov and @jenshannoschwalm. As I mentioned above, I didn't find a lot of obvious benefits in interactive usage. OTOH I have a very fast M4 Pro, so maybe my set up is not one that would benefit a lot from this change. Hence my question: is there an established way of comparing interactive execution speed after pipeline refactorings? I know I can run a benchmark with darktable-cli, but it's not batch processing speed that I am after (even though, of course, I wouldn't mind to get some improvements there as well). @andriiryzhkov (1) my question exactly. (2) yes, ideally you should see most benefits when you have a long chain of modules and you edit something in the top half. @jenshannoschwalm indeed, I am very ignorant about this space, I understand only superficially what is happening and I am not sure if the ideas have a lot of value given the current state of dt. Consider this PR as an excuse to have a conversation about this topic. |
|
If you want to go into pipe performance, the point getting into would be mask distorting in the pipe :-) that might be beneficial for all processing ... |
|
Yes I fully agree with @jenshannoschwalm, in current dt we have done a lot for speed since then. All this is @jenshannoschwalm work on the pipe, and @ralfbrown for many iop by squeezing CPU cycles as much as possible. There may be room for improvement, as always, but we need figures for this. |
|
Some more cents :-)
|
|
Ok, thanks for your feedback. Closing this for now as I understand that there are lower hanging fruits and more promising directions to explore. |
|
There is one (rarely-used) module which would be considerably helped by improved caching - liquify. It computes its displacement map at least twice for every pipe run as well as every time the shapes for drawn masks are updated on screen, and that calculation can take hundreds of milliseconds for large/numerous warps. That's part of the reason why I spent a fair bit of time minimizing refreshes a couple of years ago. So we really should be caching liquify's displacement map as well as its output. |
In this post AP documents some dramatic pipeline improvements using more aggressive global caching. Modulo the name calling and unpleasant attitude, it is quite an interesting writeup.
For curiosity, I asked Claude to incorporate these changes in darktable's pixelpipe. I asked to decompose the changes in self-contained WPs, and then I asked it to implement the first 3. See
pixelpipe_caching.mdfor the full analysis.I played with the resulting binary a bit and fixed a couple of crashes, I think it's pretty stable now. If I run with
-d pipeI see that there are quite a lot of cache hits, so the change appears to be effective. However, I don't find a huge difference in interactive usage, but I didn't really try to stress the pipeline.@TurboGit @jenshannoschwalm are there any benchmarks that you would like to try to measure if there are noticeable interactive speedups? @kofa73 you may also be interested in taking a look.
I don't have a strong opinion about this PR. I consider it to be still WIP and I am not even sure that it's something that we want to incorporate. I decided to start the discussion here because I thought that having it over a PR with a testable binary would be more productive.