Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opencl] opencl_synch_cache might be buggy? #2514

Closed
AxelG-DE opened this issue May 4, 2019 · 58 comments

Comments

Projects
None yet
5 participants
@AxelG-DE
Copy link

commented May 4, 2019

Describe the bug
What I describe here, might be related to this #2230 but can also be finally triggered by any other function or several issues are related to each other.

Why I say so? Several weired things happen and to say it in advance, I cannot track them with -d perf

When I set opencl_synch_cache=active module instead of = false

  • Moving WB slider causes a slight stuttering in screen update (setting back to false and it is smooth again)
  • sometimes adjusting the sliders in one module several times for fine tuning, each touch makes a new entry in the history stack
    (somewhere I saw an issue about such, here on github, once I found it, I would add reference to this)
  • When I use, e.g. levels at the end of a heavily edited picutre, like this one, on playraw (incl. xmp), in the first place, it is really sticky and update are lagged by ~1s-2s on my quite fast machine. Leaving the module and come back or switching it of and on, all over sudden, the cache seems to work

To Reproduce
Steps to reproduce the behavior:

  1. activate opencl_synch_cache=active module
  2. pick a heavily edited pic like this from above
  3. try to slide the black levels of the module levels
  4. Switch levels off an on again and slide the black levels again

Expected behavior

  • opencl_synch_cache=active module should work reliably, not only after re-approach the module or switching it off and on
  • Multiple touches of modules (indeed without touching others) should not create a new history stack entry
  • opencl_synch_cache=active module maybe logically cost some performance, acc. to this usermanual/en/darktable_and_opencl_optimization ideally it does not create lag

Platform (please complete the following information):

  • OS: [gentoo linux] up to date
  • Version [dt-master +1113]

Additional context
I hope @edgardoh and @dtorop could look into this? :-)

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 5, 2019

To check if this setting is working, on an image with several modules and with -d perf, update one of the last modules (on the pipe):

With opencl_synch_cache=active module or true the modules before the one being modified should not be reprocessed, with opencl_synch_cache=false modules should be reprocessed.

If that's working, then, on the same image, update one of the first modules (WB should do it)

opencl_synch_cache=active module should be faster than opencl_synch_cache=true

What we're looking for here is the total pipe time, not each module in particular.

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 5, 2019

@edgardoh
I'll provide this test either tomorrow or Tuesday.

Please keep in mind, what I said above:

Leaving the module and come back or switching it of and on, all over sudden, the cache seems to work

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 6, 2019

dear @edgardoh
first of all, thank you very much for your patiences!

I do think, here I figure a inconsistent behaviour. Yet again, I got to say, it is difficult to fetch/nail down, what I experience:

  • I test with the pic I mentioned above. It is heavily edited
  • "levels" is on top and hence at the end of the pixelpipe
  • sometimes (not always), when I touch levels and I move the left (black-level) slider towards 25%, I experiance a lag of more than 1s until screen update. From there I move back and forth and the lag persists.
  • than I switch levels off and on again and suddenly the slider reacts with way more lag
  • sometimes this lag-free situation comes after a second touch of the slider (without off/on in between)
  • I do not feel so much differences compared to "synch_cache=true"

Attached I will get a zip with my -d perf recordings, inside you find:

dt-synch_cache.csv from 5th May

  • active module, dt-2.7+1113
  • 74,092228s pulled black level -> ~25% (with lag)
  • 136,896308 levels off (with lag)
  • 172,991478 pulled black level back to left border (no lag)

dt-synch_cache:20190506.csv from 6th May

  • active module, dt-2.7+1129 (sorry that I updated in between)
  • no time transcript as behaviour was inconsistent, so hard to track, which behaviour

dt-synch_cache:20190506b.csv from 6th May

  • synch_cache=true , dt-2.7+1129 (sorry that I updated in between)
  • no time transcript as behaviour was inconsistent, so hard to track, which behaviour

dt-synch_cache:20190506b.csv from 6th May

  • active module, dt-2.7+1129 (sorry that I updated in between)
  • several approache with lag and where the module runs right from the beginning

I do wonder, whether a -d all would help...
dt-synch.zip

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

@AxelG-DE , this is not what I was asking, I just want you to check if for each scenario that I described the modules are re-processed or not.

For future references, when sharing this logs (or print screen), set the language to english, more or less, is the only language that everybody speaks here.

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 6, 2019

@edgardoh
only switching off/on is what obviously does not reprocess but also for synch_cache=true.

  • So I do not see a difference between "true" and "active module"
  • More severe is that inconsistent behaviour that lag is there and after off/on of module it disappears

Sorry for German in my logs. Is there a way to change the logs to English without setting the dt-UI to English?

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 6, 2019

EN-dt-logs.zip

If I understand correctly, no matter the value of synch_cache, when you change the valued of a module, all previous modules are reprocessed?

Almost yes, I did new tests

  • active modue, as long I have the "lag", yes the whole pipe gets reprocessed.
  • after 49s I switched levels off / on
  • from than on only view items get reprocessed, but even some, which are before levels and several times in one command
  • but I achive the same with synch_cache=true

If that's working, then, on the same image, update one of the first modules (WB should do it) opencl_synch_cache=active module should be faster than opencl_synch_cache=true

Unfortunately not. I didn't normalise yet, but from just counting the time, it is 3s in both cases, when I change WB from 3888 to 25000 or back

Find my ENGLISH logs attached
EN-dt-logs.zip

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 6, 2019

Be sure that the setting in darktablerc is: opencl_synch_cache=active module

confirmed

Also be sure to close darktable before changing the setting.

confirmed

You can also check if with opencl disabled you still get all modules reprocessed.

Here we go :-(
--disable-opencl and the thing works as it should, faster than with opencl activated. DARNED

I will make a fresh git clone

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 6, 2019

after new install, still the same .......
nvidia-drivers 418.56 since April 13th (former 418.43 not in repo anymore)

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 6, 2019

debug_and_rc-file.zip

Thanks for today. As it is 1:14 a.m. I gotta sleep now :-)

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 6, 2019

tomorrow I'll try with fresh rc-file and merely merge my opencl-tweaks acc. to handbook

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 7, 2019

thanks again for all your help

  • that means, I need to bisect it :-/
  • wonder about a good starting point as "good" (which commit implemented active module?)
  • disable opencl made such noticeable change in response :-( (maybe bigger leaps in opencl? Whom to ask?)
  • wonder about weather or not it might be related to I have two GPUs

I quickly looked in to my logs again from EN-dt-logs.zip and cannot get a smell...

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 7, 2019

I still wonder about two things:

  1. why I have same behaviour when I use synch_cache=true?
  2. why "active module" is seems to getting to work, when I switch a module on and off for several times ?
  • velvia off (whole pixelpipe)
  • velvia on (whole pixelpipe)
  • velvia off (only velvia and above)
  • velvia on (only velvia and above)
    similar for sharpen, colorzones, local contrast and indeed levels....
@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 7, 2019

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 7, 2019

Dear @edgardoh
Thank you so much for all your support!

I'm proud we already solved one issue and one day we will have this topic nailed down too :-)

I'll handle opencl on my own. It is really very suspicious, not only on my end opencl does not really accelerate rather than slow down.

Would you anyway mind to think a bit about my questions above #1 & #2. That behaviour should be in my logs

Synch_cache=false is definitely slow on larger edits

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 8, 2019

@dtorop
Could you have a look into this please or guide me to the right dev, if there is a suspiciousy, that opencl makes things worse than better?
Tnx!

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 8, 2019

@edgardoh

I realized, the issue might not be with active module, anyway, would you mind having a look into my English :-) log.
What I did:

  • open above mentioned pic with heavy edit
  • deactivated / activated velvia
  • activated / deactivated (or vice versa, forgot) levels for several times

you can see, active module is not reacting for the first two times and than it suddenly works as expected.

I am really lost to describe that phenomenon better...

dt-perf_20190508.zip

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 8, 2019

With =true the same things happen.

So hard to nail it, but something is weird in terms of performance.

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 16, 2019

@edgardoh @dtorop

I think I made a step to get this issue more reproducible...
I am still using the pic from my initial post

What I did all the time was:

  • the levels module was opened up already but module still deactivated
  • I just grabbed the the black-level
  • pulled it to the right, so that with that action the module gets switched on
  • at that moment the reaction from dt (still) is very very sticky and not to be correctly reported in -d perf (means there I see short reaction timing but you can count to 3-4 s prior screen update)

When I switch on the levels firstly and then grabb the sliders, everything works smoothly from the beginning.

Remember, above I said, need to switch the module one and off again, several times, prior the lag disappears.

Again, this behaviour is not limited to active module, I also have it with opencl_synch_cache=true

I still think, it should be possible to drag sliders directly, as this is the daily operation, if a module was in use already...

@ClaesGitHub would you do me a favour, try to reproduce with pic from above and finally feedback here?
TNX!!!

[OT]
In any case, for levels, or all opencl, I dream of a higher performance, as in this test, my two GTX 1060 are loaded ~35% but the screenupdate is not really smooth, even the cache is "switched on" :) But this is probably another issue 😊

@ClaesGitHub

This comment has been minimized.

Copy link

commented May 16, 2019

@AxelG-DE Yes, I confirm the sluggishness in dt 2.7.0+1158
when I performed the steps you indicated above. Note that I had
to turn the levels module on/off a couple of times before the
sluggishness occurred. I tested it with opencl_synch_cache=true.

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 16, 2019

Dear @ClaesGitHub
that is hyper-weired, as for me is the opposite, I need to switch on/off several times, so the sluggishness disappears ....???

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 17, 2019

At this point it seems that the problem is with the levels, can you duplicate with any other module?

@ClaesGitHub

This comment has been minimized.

Copy link

commented May 18, 2019

But you said that contrast brightness saturation works fine for you.

Correct. But since I have not managed to find a pattern yet, any far-fetched
idea is a good idea :-)

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 18, 2019

I am suspicious about whether some performance issues are merely seen on high-end machines maybe due to different command set on CPU or (not meant arrogantly) they reach another level of smoothness which suddenly drops back

@edgardoh which nvidia? lspci may help or a dive in /proc. Also your CPU would be interesting which is in /proc/cpuinfo...

@ClaesGitHub my monitor is on 2569x1440, donno how many k is that (actually unhappy about TV marketing speech seems to become computer spec) :evil: 😄

@ClaesGitHub

This comment has been minimized.

Copy link

commented May 18, 2019

Now after breakfast, as well as after lunch.
Slow progress: Clues/Logic/Thoughts so far:

  1. No problem with my own photos/edits (at least not yet).
  2. Axel's test image history contains about 22 steps.
  3. Axel's xmp -> my image = sluggishness!
  4. Histogram and small "position view" update much faster than the main image window.
  5. [Added] ... BUT if i click History step 0, all three windows update more or less immediately.
  6. [Added] ... AND if I then click History step 23, main window updates more or less immediately, while the two small ones take their time to update.

Buffering? 4k (or UHD, if you prefer)?

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 18, 2019

Just for fun, try PR #2569

@ClaesGitHub

This comment has been minimized.

Copy link

commented May 18, 2019

Just for fun? Of course! I tried PR #2569. Unfortunately,
I did not spot any better/nicer behaviour.

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 18, 2019

@ClaesGitHub

This comment has been minimized.

Copy link

commented May 18, 2019

Good evening!
Sorry for vague reply -- this is difficult; too many parameters :-(((

Using Axel's original image with the long history list and PR #2569:
a) Turning levels on/off = almost instantaneous change in all three windows (main win, position win, and histogram).
b) Turning levels OFF and turning Contr bright sat ON = main win changes fast, the two small wins are sluggish.

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 18, 2019

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 18, 2019

I tested #2569 as well. First I thought it did help a little, as only the first move is sluggish and afterwards, without on/off it moves smoother already. However, after git checkout master it is the same on current master (+1201 which is 8b69990f7 (HEAD -> master, origin/master, origin/HEAD) Merge pull request #2587 from AlicVB/preview_frame). So I donno how, but currently a little bit better, as merely the first move is sluggish and then better, even though it could be still further smooth, also that level, I don't need to wait 3s anymore but still could be more smooth.

And BTW: I do see that same lag also on other pics and also with much shorter history. Most painful on WB. Here the lag really interrupts the creative flow.

@ClaesGitHub thank you for supporting that much

@ClaesGitHub

This comment has been minimized.

Copy link

commented May 19, 2019

Good morning, all!

Today, I have carefuly checked memory usage while playing with settings on Axel's image.
Nothing significant turned up :-(
Tried to find some difference between UHD and "standard" resolution.
Nothing significant turned up :-(

I will keep on looking -- after all, it would be nice to find some kind of pattern...

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 19, 2019

Dear @ClaesGitHub

TNX

Unfortunately '-d all' throw tooooo many information, otherwise this could be a trial...

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 23, 2019

Actually this issue also duplicates on an empty history stack and just using white balance.

It firstly reacts sluggish. WB off/on and it behaves more smooth, eventhough it can be still more smooth (why it scatters, when I have the almost fastest machine available on intel side?)

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 28, 2019

@edgardoh
you where suspicious, wether it might be related to module levels only and asked me, whether I can dublicate it to other modules... I reported, I could duplicate to other modules, by means not related to levels only. What might be the reading from that? Do we need to involve an opencl-guru on this? If yes, who might this be?

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 29, 2019

Then, how to proceed here, whom to ask, how to nail it down further?

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 30, 2019

@TurboGit Dear Pascal, do you haven an idea, whom we could involve here, to figure, why modules are sluggish in the first usage and after switching them off and on again, suddenly they work???

I remember, some days/weeks ago, somebody commited a PR which was intended to improve the performance, when a module get touched/changed several times, which is normal editing behaviours (actually was described with similar words). That person could be the right one?

Thanks to you all for your patient help always!

@dtorop

This comment has been minimized.

Copy link
Contributor

commented May 30, 2019

@AxelG-DE I'm finally reading through this. I can see here what I believe you've reported:

  • open sample .NEF with levels deactivated
  • drag black slider in levels to the right
  • dt is slow to respond
  • continue to drag slider
  • dt is faster to respond

I'm using the .xmp you provided on pixls.us.

This may be replicating what everyone knows, but here's what I see in the logs:

  • When open the image, the full pixelpipe runs.
  • When drag the black slider and activate the module, the full pixelpipe runs again
  • After that, when drag the black slider only gamma and successive modules run in the pixelpipe.

(I don't see this problem if I discard the history stack, deactive white balance, then drag temperature in white balance -- no lag on initial activation/dragging of white balance.)

Of the top of my head, I think this behavior is a longstanding feature of darktable, since before 2.6. But I'm interested that you're seeing it as new in 2.7.

My guess is that the caching logic hasn't saved the output of tone curve (the module before levels in this xmp's pipeline). This might well be expected, the full pixelpipe cache is really shallow -- I believe five entries -- and it may not know to cache the active module.

A worse case would be if when levels iop is turned on the cache was no longer considered valid for levels and its predecessors. This would certainly be a bug!

I have to run out now (again) but will look at src/develop/pixelpipe_cache.c when I can to try to better understand its logic of what it preserves and doesn't.

It's possible darktable -d dev would dump some interested data, but I think it's not informative enough.

Apologies, I'm still being too brief here, and may have missed the whole point of this thread or some subtlety. More in a bit.

@edgardoh

This comment has been minimized.

Copy link
Contributor

commented May 31, 2019

@dtorop , I'm working on this issue on the levels, not sure is the same you are looking into so I let you know so we don't duplicate efforts:

-sometimes when dragging the handles the full pipe is processed instead of using the cache.

This is because the module does not grab the focus, so the pipe don't treat it as the active module. But this is not the issue reported here, @AxelG-DE already test a tentative fix and saw no difference.

Hopefully I'll be pushing a fix for the levels shortly.

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented May 31, 2019

@dtorop
thank you for your very comprehensive reply.

dt is slow to respond
continue to drag slider
dt is faster to respond

Not really like this.
dt remains behave slow even when re-adjusting that slider until I switch the module off and on again.

This behaviour I can also see on an empty history stack (only sharpen, basecurve and rotation) and merely whitebalance is used.

Why I never saw this on 2.6.2? Everything started with my #2484, where we made things improve a lot and somehow I still think 2.6.2 was faster... I need to re-read above whether I tried this topic here in 2.6.2. But 2.6.2 cannot have "active module" which I thought is the culprit (have doubts now, in general I think opencl is far away from max performant)

Cheers Axel

@dtorop

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2019

@edgardoh Great that you've found a problem with the levels, even if it isn't yet fixing this. From what @AxelG-DE writes, I'm not even really seeing the problem yet, so there's no overlap between us at the moment.

This is definitely an interesting puzzle. I don't think I'll be back to it until the middle of next week at the earliest, but will watch this conversation and do what I can then.

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented Jun 1, 2019

@edgardoh
That the module is not being recognised as the active one, that is the most promising theory

Keep in mind please, that I have similar effect on an almost empty pixelpipe and merely adjusting WB

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented Jun 3, 2019

@edgardoh
I am now on 2.7.0+1374~gd76b91877, so your PR2650 is in. It improved tramandously, even though, the very first move of the black slider still lags a bit (~1.5s), but right away the levels get very responsive. That is o.k. for me at this moment, especially if I take into account how levels usually is been used.

One thing remains the same (understandably, as you didn't touch it), I have similar effect on WB, even on an almost (basecurve, sharpen orientation) empty pixle pipe. I move the slider, and it is sluggish forward and backward, if I just grab it. After switching off and on the WB it becomes more responsive...

Strange thing is, it is not always.

  • I shut down dt and on again and it was responsive right away.
  • Repeat "levels test" and go back to the tulip picture (the one in the link not empty pixle pipe, I know, just that RAW) and sluggish again
  • dt off and on again, still sluggish in the first place....
  • -d perf always repeats the exact same (very very fast) processing time of 0.092s

In the sluggish moment watch nvidia-smi shows my two GPUs are used in total ~60% (45%+13% alternating). After I switch WB off and on, the alternating usage is floating around 83%+15% !!!
That's why I keep saying in different places, we have an opencl issue, somewhere in general

@dtorop I know, now you are unavailable, I still hope, you or @TurboGit suddenly have a good idea...

Sincerely your performance challenger 😄

@TurboGit

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

@AxelG-DE : no, no idea about the possible issue. Could this be a CPU cache issue? I mean if it works after the first enable? Maybe I'm completely wrong as I have not followed this closely.

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented Jun 15, 2019

Hey gentlemen,

I keep testing about this topic, what I described above with white balance. Sluggish, when you touch it for the first time, relieve it and touch it again, it is fast and responsive (also that "working..." indicator is flickering more faster then)....

  • opencl_synch_cache=false --> the issue persists (that would be reason enough to close this issue)
  • start dt with --disable-opencl --> the issue persists = not an opencl-issue

please do not close this issue too fast, before that topic has a new home.

How could I adress it best, so the right one might pick it up? I cannot provide clear performance measures (I tried).

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented Jun 15, 2019

I just tested 2.6.2, same issue...

@AxelG-DE

This comment has been minimized.

Copy link
Author

commented Jun 15, 2019

With #2724 I close this one here
@edgardoh thank you very much for all your patience and finally you actually did optimize levels :-)

@AxelG-DE AxelG-DE closed this Jun 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.