Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thumbnail generation after applying profile sometimes really slow #4732

Open
Thanatomanic opened this issue Aug 14, 2018 · 16 comments
Open

Thumbnail generation after applying profile sometimes really slow #4732

Thanatomanic opened this issue Aug 14, 2018 · 16 comments
Labels
scope: performance Performance issues and improvements

Comments

@Thanatomanic
Copy link
Contributor

I notice that often applying a profile on a selection of unprocessed images in the file browser locks up RT for a LONG time (just now: 29 NEF's took almost 3 minutes). Writing the pp3's is virtually instantaneous and therefore not the issue, but I suspect generating the thumbnails is.

Unfortunately, attempts to find a really reproducible way to demonstrate the slow processing have failed.
Sometimes it's fast, sometimes it's slow. Though it only seems to happen when there are no pp3's present yet.
The other consistent thing I see during the lock up is very little CPU use, hardly any disk I/O, but a lot of changes in the amount of memory in use.

FYI: am using latest dev build, Windows 7 x64.

Any clues? Threading issue with a bottleneck maybe?

@Thanatomanic Thanatomanic changed the title Thumbnail generation after applying profile really slow Thumbnail generation after applying profile sometimes really slow Aug 14, 2018
@heckflosse
Copy link
Collaborator

I confirm this on Windows 7 when using a network drive. Using a SSD it's alwyas fast

@Thanatomanic
Copy link
Contributor Author

I did not use a network drive, but a regular HDD. I gave my SSD a try, and it's much faster indeed, but even so there is a noticeable hiccup in which RT freezes. Maybe a second or two.

@Thanatomanic
Copy link
Contributor Author

To add to this: my workflow is unbearably slow because of this issue. Imagine having 20 folders with 50 images and for each folder it takes 8 minutes to simply apply my default profile...
I would not like to adopt a change in workflow where I first copy stuff to my SSD, edit and then move to my HDD.

But since I am the only one ever reporting this, maybe it's really only me. Would it be thinkable that for example disk fragmentation slows down the processing and generation of the thumbnails?
How could I easily check what is taking up most of the processing time? Maybe to narrow down which function is the culprit?

@heckflosse
Copy link
Collaborator

@Thanatomanic

applying a profile to a bunch of images, which don't have a profile yet, does decode all the images.
If you're running rt on a machine with more than one core (e.g. 4), it will process 4 images in parallel, which can lead to heavy disc accesses slowing down the decoding a lot.

I guess you're mainly using raw files of one camera. Can you upload one raw please? I will check then which decoder is used and place a mutex in the decoder to serialise the decoding (just to test that).

@Thanatomanic
Copy link
Contributor Author

Thanks @heckflosse, there's an example here: https://discuss.pixls.us/t/play-raw-sunset-sea/7103/10
It's a Nikon D750.

@heckflosse
Copy link
Collaborator

@Thanatomanic Can you test this patch please. It serialises the decoding (and only the decoding) of NEF files.

diff --git a/rtengine/dcraw.cc b/rtengine/dcraw.cc
index da66bd6d..b6a35044 100644
--- a/rtengine/dcraw.cc
+++ b/rtengine/dcraw.cc
@@ -27,6 +27,7 @@
 #include <utility>
 #include <vector>
 #include "opthelper.h"
+#include "../rtgui/threadutils.h"
 
 /*
    dcraw.c -- Dave Coffin's raw photo decoder
@@ -1188,6 +1189,9 @@ void CLASS pentax_load_raw()
 
 void CLASS nikon_load_raw()
 {
+    static MyMutex decodeMutex;
+    MyMutex::MyLock lock(decodeMutex);
+
   static const uchar nikon_tree[][32] = {
     { 0,1,5,1,1,1,1,1,1,2,0,0,0,0,0,0,	/* 12-bit lossy */
       5,4,3,6,2,7,1,0,8,9,11,10,12 },
@@ -1249,6 +1253,7 @@ void CLASS nikon_load_raw()
     }
   }
   free (huff);
+  lock.release();
 }
 
 void CLASS nikon_yuv_load_raw()

@heckflosse
Copy link
Collaborator

@Thanatomanic This patch adds some timing code to the decoder. Maybe this will be helpfull for your tests.

diff --git a/rtengine/dcraw.cc b/rtengine/dcraw.cc
index da66bd6d..17a19191 100644
--- a/rtengine/dcraw.cc
+++ b/rtengine/dcraw.cc
@@ -27,7 +27,9 @@
 #include <utility>
 #include <vector>
 #include "opthelper.h"
-
+#include "../rtgui/threadutils.h"
+#define BENCHMARK
+#include "StopWatch.h"
 /*
    dcraw.c -- Dave Coffin's raw photo decoder
    Copyright 1997-2018 by Dave Coffin, dcoffin a cybercom o net
@@ -1188,6 +1190,10 @@ void CLASS pentax_load_raw()
 
 void CLASS nikon_load_raw()
 {
+    static MyMutex decodeMutex;
+    MyMutex::MyLock lock(decodeMutex);
+    BENCHFUN
+
   static const uchar nikon_tree[][32] = {
     { 0,1,5,1,1,1,1,1,1,2,0,0,0,0,0,0,	/* 12-bit lossy */
       5,4,3,6,2,7,1,0,8,9,11,10,12 },
@@ -1249,6 +1255,7 @@ void CLASS nikon_load_raw()
     }
   }
   free (huff);
+  lock.release();
 }
 
 void CLASS nikon_yuv_load_raw()

@Thanatomanic
Copy link
Contributor Author

Before I can even think about testing your patch, I run into reproducibility issues... I play with clearing the cache, removing pp3's, switchting folders, going back to previous folders, and all kinds of combinations. However, when I started playing some 15 minutes ago, I first got processing times of 5-10 seconds at most for 25-30 raws. And then suddenly a folder without pp3's and 23 raws with 2:30 min in processing time. And right now, no matter what I try to do, processing is always fast.
I am really puzzled by this. However, I think I noticed that whenever the initial loading of the folder is slow, the generation of the thumbnails will be too. That might be a clue?

I'll let this rest for a bit now, but I really hope to find a reproducible behavior...

@Thanatomanic
Copy link
Contributor Author

Some clue (might be useless), is the behavior I captured in Windows Task Manager while RT is doing stuff quickly and when it locks.
image

The spikes occur when thumbnails are generated quickly after applying a profile in a pp3-less folder. The flat line afterwards is the lock-up occuring in a different folder and applying a profile there.

@heckflosse
Copy link
Collaborator

@Thanatomanic I opened a folder with some nef files from HD and got some outliers

nikon_load_raw took 1970 ms
nikon_load_raw took 2960 ms
nikon_load_raw took 9089 ms
nikon_load_raw took 8244 ms
nikon_load_raw took 7474 ms
nikon_load_raw took 5113 ms
nikon_load_raw took 5530 ms
nikon_load_raw took 5743 ms
nikon_load_raw took 1474 ms
nikon_load_raw took 1243 ms
nikon_load_raw took 1448 ms
nikon_load_raw took 3504 ms
nikon_load_raw took 6519 ms
nikon_load_raw took 20331 ms
nikon_load_raw took 43118 ms
nikon_load_raw took 1733 ms
nikon_load_raw took 1306 ms
nikon_load_raw took 1331 ms
nikon_load_raw took 2621 ms
nikon_load_raw took 749 ms
nikon_load_raw took 3450 ms
nikon_load_raw took 1390 ms
nikon_load_raw took 3733 ms
nikon_load_raw took 5386 ms
nikon_load_raw took 3581 ms
nikon_load_raw took 4064 ms
nikon_load_raw took 4787 ms
nikon_load_raw took 5673 ms
nikon_load_raw took 6260 ms
nikon_load_raw took 776 ms
nikon_load_raw took 2680 ms
nikon_load_raw took 638 ms
nikon_load_raw took 1335 ms
nikon_load_raw took 1517 ms
nikon_load_raw took 155837 ms
nikon_load_raw took 606 ms
nikon_load_raw took 748 ms

@Thanatomanic
Copy link
Contributor Author

Thanatomanic commented Aug 21, 2018

With your patch I get very consistent timings between 450-800 ms each. This is both when I just applied a profile on a image with a prior pp3, and when the cache is cleared but a pp3 is present and the thumbnail gets updated.

The main difference between the two, is that when pp3's are already present, the thumbnails get updated one by one and RT doesn't lock up. While you apply a profile and pp3's are not present yet, the thumbnails get updated all at once and RT locks up until that's done (BENCHFUN outputs its values to the console in the meanwhile). If that lock up could be removed, I think I would be very happy 😄

@Thanatomanic
Copy link
Contributor Author

Might be related: #4576

@Beep6581 Beep6581 added the scope: performance Performance issues and improvements label Aug 21, 2018
@Floessie
Copy link
Collaborator

@Thanatomanic Indeed. #4576 was a nightmare to solve and my only contribution to 5.5 as yet. I'm very much interested that someone else takes a look at this thing, maybe you get a clue about what's going on. My gut feeling tells me, there is at least one thread layer too much, but this damn place of code is a real time-killer, so I fully understand there hasn't been much activity yet.

Trying to nail down that lock would be a nice and welcome first step. 👍

Best,
Flössie

@Thanatomanic
Copy link
Contributor Author

Thanatomanic commented Aug 22, 2018

@heckflosse Off-topic, but I just noticed something odd when I still had your benchmark code present. As soon as I open a file from the browser it calls nikon_load_raw between 1 and 10 times, 400 ms each, before it shows me my image. All thumbs in the folder were fully generated, so I did not expect to see that. What is it making the calls for?
Edit: Figured out what was going on

@heckflosse
Copy link
Collaborator

@Thanatomanic up to 10 times?

If you open a nef which has no pp3 assigned I would expect:

one call to nikon_load_raw for opening the file
another call for the automatched tone curve if that's enabled in your default profile
another call to build the thumb

@Thanatomanic
Copy link
Contributor Author

Thanatomanic commented Aug 22, 2018

@heckflosse Disregard my comment. Of course it should load multiple nefs if I have both flat-field and dark-frame corrections.

You know what, let me test if this is also the reason why RT locks up for me in general.

Edit: Not really. It helps to disable flat-field and dark-frame, but the fact remains that after applying a profile to multiple images RT locks up while generating the thumbnails. And when a thumbnail has been generate before, applying a new profile is done without locking up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope: performance Performance issues and improvements
Projects
None yet
Development

No branches or pull requests

4 participants