Skip to content

[WIP] Global caching#20712

Closed
masterpiga wants to merge 2 commits intodarktable-org:masterfrom
masterpiga:caching
Closed

[WIP] Global caching#20712
masterpiga wants to merge 2 commits intodarktable-org:masterfrom
masterpiga:caching

Conversation

@masterpiga
Copy link
Copy Markdown
Collaborator

@masterpiga masterpiga commented Mar 31, 2026

In this post AP documents some dramatic pipeline improvements using more aggressive global caching. Modulo the name calling and unpleasant attitude, it is quite an interesting writeup.

For curiosity, I asked Claude to incorporate these changes in darktable's pixelpipe. I asked to decompose the changes in self-contained WPs, and then I asked it to implement the first 3. See pixelpipe_caching.md for the full analysis.

I played with the resulting binary a bit and fixed a couple of crashes, I think it's pretty stable now. If I run with -d pipe I see that there are quite a lot of cache hits, so the change appears to be effective. However, I don't find a huge difference in interactive usage, but I didn't really try to stress the pipeline.

@TurboGit @jenshannoschwalm are there any benchmarks that you would like to try to measure if there are noticeable interactive speedups? @kofa73 you may also be interested in taking a look.

I don't have a strong opinion about this PR. I consider it to be still WIP and I am not even sure that it's something that we want to incorporate. I decided to start the discussion here because I thought that having it over a PR with a testable binary would be more productive.

@andriiryzhkov
Copy link
Copy Markdown
Collaborator

andriiryzhkov commented Mar 31, 2026

@masterpiga : Very interesting approach.

A couple of questions:

  1. Benchmarking: What's the recommended way to measure the performance difference? Would -d pipe log timestamps be enough to compare pipeline stage timings between this branch and master? Or is there a more structured benchmark scenario you'd suggest?

  2. Test scenarios: Which editing workflows would best demonstrate the caching benefits? I'm guessing images with many active modules (filmic, tone equalizer, denoise profile, masks) and interactive operations like zoom/pan/slider changes would show the biggest difference.

Would be happy to test if that's useful.

@jenshannoschwalm
Copy link
Copy Markdown
Collaborator

Just my two cents ...

  1. A was using a malfuctioning pipe caching as in dt 4.0. Since then we have done a lot. So i honestly don't care what ap rants about dt.
  2. He reorganized the pipe quite a lot btw , where is scaling happening for example. So a different strategy for caching might be used.
  3. In dt we use other very efficient ways to speedup interactive use.
  4. Did you test cache efficiecy with current dt at all? The hit rate is very high as i see it.
  5. We don't cache opencl mem buffers until now. We could thus possibly avoid a copy to cl device although that is not very costly.
  6. A unified cache would simplify things although code burden is very low for that.

@masterpiga
Copy link
Copy Markdown
Collaborator Author

masterpiga commented Mar 31, 2026

Thanks for chiming in, @andriiryzhkov and @jenshannoschwalm.

As I mentioned above, I didn't find a lot of obvious benefits in interactive usage. OTOH I have a very fast M4 Pro, so maybe my set up is not one that would benefit a lot from this change. Hence my question: is there an established way of comparing interactive execution speed after pipeline refactorings? I know I can run a benchmark with darktable-cli, but it's not batch processing speed that I am after (even though, of course, I wouldn't mind to get some improvements there as well).

@andriiryzhkov (1) my question exactly. (2) yes, ideally you should see most benefits when you have a long chain of modules and you edit something in the top half.

@jenshannoschwalm indeed, I am very ignorant about this space, I understand only superficially what is happening and I am not sure if the ideas have a lot of value given the current state of dt. Consider this PR as an excuse to have a conversation about this topic.

@jenshannoschwalm
Copy link
Copy Markdown
Collaborator

If you want to go into pipe performance, the point getting into would be mask distorting in the pipe :-) that might be beneficial for all processing ...

@TurboGit
Copy link
Copy Markdown
Member

Yes I fully agree with @jenshannoschwalm, in current dt we have done a lot for speed since then. All this is @jenshannoschwalm work on the pipe, and @ralfbrown for many iop by squeezing CPU cycles as much as possible. There may be room for improvement, as always, but we need figures for this.

@jenshannoschwalm
Copy link
Copy Markdown
Collaborator

Some more cents :-)

  1. I don't know how and what exactly ap is benchmarking, the worse dt results for module processing could be well explained by the "over developing" so we can move the darkroom canvas around without any recalculation.
  2. Using a pinned cl image for caching might be a good one.

@masterpiga
Copy link
Copy Markdown
Collaborator Author

Ok, thanks for your feedback. Closing this for now as I understand that there are lower hanging fruits and more promising directions to explore.

@masterpiga masterpiga closed this Apr 1, 2026
@ralfbrown
Copy link
Copy Markdown
Collaborator

There is one (rarely-used) module which would be considerably helped by improved caching - liquify. It computes its displacement map at least twice for every pipe run as well as every time the shapes for drawn masks are updated on screen, and that calculation can take hundreds of milliseconds for large/numerous warps. That's part of the reason why I spent a fair bit of time minimizing refreshes a couple of years ago.

So we really should be caching liquify's displacement map as well as its output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants