Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tone equalizer #1904

Merged
merged 10 commits into from
Sep 22, 2019
Merged

tone equalizer #1904

merged 10 commits into from
Sep 22, 2019

Conversation

aurelienpierre
Copy link
Member

@aurelienpierre aurelienpierre commented Dec 16, 2018

introduce a new "tone equalizer " module, intended to allow selective exposure adjustements in linear RGB (supposed to be color-safe). This is very similar to Lightroom, with blacks/shadows/midtones/highlights/whites cursors.

The luminance channels are blended with gaussian masks and the modifications are applied on a gaussian pyramid to preserve the local contrast (luminance modified in low frequency layers).

For now, nothing is optimized (naive C implementation) and the module is expected to be slow. This is just a proof of concept. I will make it faster once the core algorithm is validated.

@aurelienpierre aurelienpierre force-pushed the toneequalizer branch 3 times, most recently from ee2c591 to 9dd8541 Compare December 18, 2018 21:08
@TurboGit TurboGit self-requested a review December 28, 2018 17:56
@TurboGit TurboGit self-assigned this Dec 28, 2018
@TurboGit TurboGit added the feature: enhancement current features to improve label Dec 28, 2018
@TurboGit TurboGit added this to the 2.8 milestone Dec 28, 2018
@TurboGit
Copy link
Member

Sounds very promising!

@TurboGit
Copy link
Member

TurboGit commented Jan 2, 2019

Just tested, quite nice indeed. The transitions are "smoothed" and so the tonality are pleasant. Nice work.

@aurelienpierre
Copy link
Member Author

Thanks ! I have yet to squeeze in there either a laplacian pyramid or a wavelet decomposition to affect only the low frequencies, because I find it flattens a bit too much the local contrast.

@rawfiner
Copy link
Collaborator

Just tested and liked it :-)
It is probably already on your todolist, but it would be nice to have soft bounds on the sliders
Very nice addition Aurélien, thanks !

@aurelienpierre
Copy link
Member Author

Just tested and liked it :-)
It is probably already on your todolist, but it would be nice to have soft bounds on the sliders
Very nice addition Aurélien, thanks !

Nice ! Soft bounds could quickely backfire. Given that each channel is separated from the others by 2 EV, pushing is beyond will move it to another channel, and the local contrast will suffer beyond repair.

But, increasing the channels around a third one will have the same effect as pushing it beyond/over ± 2 EV while preserving details.

I just stabilized the laplacian filter, I think it works great. There is only one scale for now, so the difference is only visible fully zoomed.

@rawfiner
Copy link
Collaborator

Thanks for the answer :-)
That makes sense.
I did not understand your remark about increasing the channels around a third one however, could you say it differently?

@aurelienpierre
Copy link
Member Author

aurelienpierre commented Mar 3, 2019

@rawfiner

What the algo does is several band-pass filters between 0 and - 14 EV:
image

Each band-pass filter is defined by a gaussian window so that for every pixel, the sum of the weights of all windows are constant (= 1.77).

Each band is associated with an exposure compensation, which is a factor (+2 EV "=" × 4). Say you want to add 2 EV on the - 4 EV band, and remove 2 EV on the -2 EV and 0 EV bands, your parameters translate like this:
image

For every pixel, we then compute the sum of all the channels at the pixel's luminance and use the result as a correction factor (between 0.125 and 4):
image

See here, at -4 EV, the correction factor is only ~ 2.5 (+1.30 EV) because the neighbouring channels have some weight to avoid breaking the local contrast. So we can increase the -6 EV channel as well:
image
Now, the -4 EV band gets a factor of ~ 3 (+1.5 EV)

Once you multiply the RGB vectors by the factor corresponding to their exposure, you get something very similar to a tone curve:
image
This approach shows none of the interpolation artifacts (cusps) of the splines interpolation, gives a smooth curve no matter what, and deal with pixels as full RGB vectors (not separating their components). But you see here how the operation is mildly damaging the local contrast near whites (oscillation) ? That can be made less problematic by using a laplacian pyramid (for now, there are only 2 scales), but still, you don't want to force exposure oscillations.

@rawfiner
Copy link
Collaborator

rawfiner commented Mar 4, 2019

Thanks a lot for this detailed explanation 👍
This makes everything clear 😊

@aurelienpierre
Copy link
Member Author

I have an error double free or corruption (!prev) affecting the line 508:

dt_free_align(luma_in);

It happens when using the laplacian mode with the power weighted norm (you can also see the thumbnail is wrong). I have spent 3h trying to troubleshot it, if someone has an idea ;-)

@rawfiner
Copy link
Collaborator

rawfiner commented Mar 4, 2019

@aurelienpierre
In process_scale, if memory allocation failed, you will goto clean and free the 3 pointers while you just checked that one of them is NULL, hence you will free a NULL pointer. The error maybe comes from here?

@aurelienpierre
Copy link
Member Author

In process_scale, if memory allocation failed, you will goto clean and free the 3 pointers while you just checked that one of them is NULL, hence you will free a NULL pointer. The error maybe comes from here?

No, that part was already an attempt to fix this problem, it arised before (and also freeing a pointer set to NULL is supposed to be harmless).

@ff2000
Copy link

ff2000 commented Mar 4, 2019

Wanted to give it a try, for me it crashes in L.406:

Thread 19 "worker 1" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7fff6bd51700 (LWP 22899)]
0x00007fffa0e37556 in process_image (luminance=<optimized out>, method=<optimized out>, factors=<optimized out>, 
    height=<optimized out>, width=<optimized out>, out=<optimized out>, in=<optimized out>)
    at /home/karl/src/darktable/src/iop/toneequal.c:406
406         for(int i = 0; i < 16; ++i) pixel_out[i] = correction[i / ch] * pixel_in[i];

(gdb) bt full
#0  0x00007fffa0e37556 in process_image (luminance=<optimized out>, method=<optimized out>, factors=<optimized out>, height=<optimized out>, width=<optimized out>, out=<optimized out>, in=<optimized out>)
    at /home/karl/src/darktable/src/iop/toneequal.c:406
        i = <optimized out>
        i = <optimized out>
        i = <optimized out>
        i = <optimized out>
        pixel_in = <optimized out>
        correction = <optimized out>
        pixel_in = 0x7fff686783c0
        correction = {0.314601362, 0.314340442, 0.314838082, 0.314575702}
        pixel_lum = <optimized out>
        luma = <optimized out>
        exposure = <optimized out>
        pixel_lum = 0x7fff683bdd20
        luma = {0.478164583, 0.479542047, 0.47692275, 0.478299618}
        exposure = {-1.06442082, -1.06027079, -1.06817245, -1.06401348}
        pixel_out = <optimized out>
        pixel_out = 0x7fff684503c0
        k = <optimized out>
        k = 105696
        ch = <optimized out>
#1  process_scale (method=<optimized out>, factors=<optimized out>, kernel=<optimized out>, height=<optimized out>, width=<optimized out>, out=<optimized out>, in=<optimized out>) at /home/karl/src/darktable/src/iop/toneequal.c:502
        in_toned = <optimized out>
        luma_in = <optimized out>
        luma_toned = <optimized out>
        ch = <optimized out>
        ch = <optimized out>
        in_toned = <optimized out>
        luma_in = <optimized out>
        luma_toned = <optimized out>
#2  process (self=<optimized out>, piece=<optimized out>, ivoid=<optimized out>, ovoid=<optimized out>, roi_in=<optimized out>, roi_out=<optimized out>) at /home/karl/src/darktable/src/iop/toneequal.c:739
        sigma = <optimized out>
        scale = <optimized out>
        gauss_1D = {0, 0, 0, 0, 0, 0, 0}
        gauss_kernel = {{0, 0, 0, 0, 0, 0, 0}, {0, 1.9748219e-40, 0, 0, 1.40129846e-45, 5.12985358e+26, 4.59163468e-41}, {3.72150149e+24, 4.59163468e-41, 3.51818148e+24, 4.59163468e-41, 4.37318806e+24, 4.59163468e-41, 
            1.48111642e-40}, {0, 5.1298831e+26, 4.59163468e-41, 5.12985949e+26, 4.59163468e-41, -1.06401348, 0}, {0.478164583, 0.479542047, 0.47692275, 0.478299618, -1.06442082, -1.06027079, -1.06817245}, {-1.06401348, 
            0.314601362, 0.314340442, 0.314838082, 0.314575702, 0.468128145, 6.01167674e-25}, {6.78136728e-19, 1.03526312e-13, 2.12414908e-09, 6.23464575e-06, 0.00226316601, 0.0872310847, 7.21254181e-32}}
        width_padded = <optimized out>
        height_padded = <optimized out>
        ch = <optimized out>
        in_padded = <optimized out>
        out_padded = <optimized out>
        d = <optimized out>
        in = <optimized out>
        out = <optimized out>
#3  0x00007ffff6dab89e in gomp_thread_start (xdata=<optimized out>) at /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libgomp/team.c:120
        team = 0x7fffc80127f0
        task = 0x7fffc8012fa0
        data = <optimized out>
        thr = <optimized out>
        pool = 0x7fffc8012720
        local_fn = 0x7fffa0e37200 <process_image._omp_fn.0>
        local_data = 0x7fffdd2639a0
#4  0x00007ffff772296a in start_thread (arg=0x7fff6bd51700) at pthread_create.c:463
        pd = 0x7fff6bd51700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140735002515200, -920184838796788664, 140736903705022, 140736903705023, 8396800, 140737337107936, 919877179956870216, 920201243785509960}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007ffff74571bf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Can't tell you my parameters currently.

#ifdef _OPENMP
#pragma omp parallel for schedule(static)
#endif
for(size_t k = 0; k < (size_t)ch * width * height; k += 4 * ch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be k += ch...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I process 4 pixels at each loop step to vectorize the fmaxf(log2f(...)...) but, indeed, when the pixel number is not a multiple of 4, that could end out of bounds

@aurelienpierre
Copy link
Member Author

aurelienpierre commented Mar 4, 2019

Thanks @ff2000 and @edgardoh, that's very helpful 🙂

@LebedevRI
Copy link
Member

LebedevRI commented Sep 22, 2019

Step one is to not merge code that doesn't pass CI, especially if it was a PR. Seriously.
Step two - how does all other code deal with that? (s/simd/SIMD()/)

@aurelienpierre
Copy link
Member Author

Step two - how does all other code deal with that? (s/simd/SIMD())

GCC will only issue a warning if -fopt-info-vec-missed flag is passed, and it seems Clang 8 does the same. Otherwise they just ignore the pragma and keep moving.

@TurboGit
Copy link
Member

Step one is to not merge code that doesn't pass CI, especially if it was a PR. Seriously.

I did not only merged but also fixed compilation issues on my side using clang-7. So better looking at what has been done before saying something wrong.

@TurboGit
Copy link
Member

Now this not compiling because my clang setup for building darktable is using v8 ! And so I did not catch the issues :( My bad.

@LebedevRI
Copy link
Member

Clang 8 works fine though

I can assure you it does not. I suspect you build without OpenMP there.

@aurelienpierre
Copy link
Member Author

I can assure you it does not. I suspect you build without OpenMP there.

No, that would be painfully slow and quite obvious.

I just tested right now, it builds on Clang 8/Fedora and runs on-par with GCC 9 build. It seems @MStraeten is able to build on MacOS too.

But Clang 7 is a princess. Same issue here : #2679 (comment)

@TurboGit
Copy link
Member

Maybe ! What is the error BTW? Where can I see the log?

@TurboGit
Copy link
Member

Then, what is the incantation to build with OpenMP with clang?

@TurboGit
Copy link
Member

Ok, found and I have reproduce the error. Next step is to fix.

@aurelienpierre
Copy link
Member Author

Maybe ! What is the error BTW? Where can I see the log?

My bad, the log (https://travis-ci.org/darktable-org/darktable/jobs/588119877) says:

In file included from src/iop/introspection_toneequal.c:77:
In file included from /build/darktable/src/iop/toneequal.c:83:
/build/darktable/src/common/fast_guided_filter.h:36:13: error: unknown pragma ignored [-Werror,-Wunknown-pragmas]
#pragma GCC optimize ("unroll-loops", "tree-loop-if-convert", \
            ^

But, that faulty pragma is supposed to get triggered only for GCC:

#if defined(__clang__)
#pragma GCC diagnostic ignored "-Wunknown-pragmas"
#endif

#if defined(__GNUC__)
#pragma GCC optimize ("unroll-loops", "tree-loop-if-convert", \
                      "tree-loop-distribution", "no-strict-aliasing", \
                      "loop-interchange", "loop-nest-optimize", "tree-loop-im", \
                      "unswitch-loops", "tree-loop-ivcanon", "ira-loop-pressure", \
                      "split-ivs-in-unroller", "variable-expansion-in-unroller", \
                      "split-loops", "ivopts", "predictive-commoning",\
                      "tree-loop-linear", "loop-block", "loop-strip-mine", \
                      "finite-math-only", "fp-contract=fast", "fast-math")
#endif

so Clang defines __GNUC__ ???

Then, what is the incantation to build with OpenMP with clang?

Just using sh build.sh should enable OpenMP because -fopenmp is set in CMakeLists.

@TurboGit
Copy link
Member

No the error is:

In file included from /home/obry/dev/repositories/git/darktable/build/src/iop/introspection_toneequal.c:77:
/home/obry/dev/repositories/git/darktable/src/iop/toneequal.c:764:9: error: 
      loop not vectorized: failed explicitly specified loop vectorization
      [-Werror,-Wpass-failed=loop-vectorize]
#pragma omp parallel for simd default(none) \
        ^

Looks like "omp simd" cannot be honored here.

@TurboGit
Copy link
Member

@aurelienpierre : the error you quote should be fixed on master... I don't see it after my patch before merging.

@aurelienpierre
Copy link
Member Author

aurelienpierre commented Sep 22, 2019

@TurboGit what if you replace the loop in toneequal.c:764 by this one:

#ifdef _OPENMP
#pragma omp parallel for simd default(none) \
dt_omp_firstprivate(luminance, out, in_width, out_width, out_height, offset_x, offset_y, ch) \
schedule(static) aligned(luminance, out:64) collapse(2)
#endif
  for(size_t i = 0 ; i < out_height; ++i)
    for(size_t j = 0; j < out_width; ++j)
    {
      const float lum = luminance[(i + offset_y) * in_width  + (j + offset_x)];
      const size_t index = (i * out_width + j) * ch;
      out[index] = out[index + 1] = out[index + 2] = lum;
    }

@TurboGit
Copy link
Member

TurboGit commented Sep 22, 2019

Same error ! This does not work either with clang 8 & 9.

If we disable openmp it is slow.

@TurboGit
Copy link
Member

I have disabled OpenMP for clang for toneequalizer. Not nice but this fixes the build issue and will let us time to come with a proper fix. To test on your side Aurélien, to build with clang-7 and OpenMP:

rm -fr build ; CC=clang-7 CXX=clang++-7 ./build.sh --enable-openmp

@aurelienpierre
Copy link
Member Author

Not sure I can install Clang 7 on Fedora 30.

Did you have issues with Clang > 7 ? This is going to be so slow for Mac users…

@TurboGit TurboGit removed the wip pull request in making, tests and feedback needed label Sep 23, 2019
@TurboGit
Copy link
Member

Yes, same issue with clang 8 & 9.

@MStraeten
Copy link
Collaborator

with OSX i didn't found issues with clang and openmp. (used XCode 10.3 and 11; build was 21.09.19 before merge of #1904)

@TurboGit
Copy link
Member

@MStraeten : maybe OpenMP was disabled? Can you show the config log? Also which clang version are you using?

@aurelienpierre
Copy link
Member Author

No OpenMP support means no vectorization and no parallelization. If the module runs in less than 2 s for a display preview, it's pretty sure both are enabled.

I don't get why Clang 8 runs fine on my system. Isn't that just a warning treated as an error ? A pragma doesn't change the logic of the code, just how the preprocessor is supposed to do with it. There is no way to just bypass the warning ?

@aurelienpierre
Copy link
Member Author

aurelienpierre commented Sep 23, 2019

@TurboGit -Wno-error=loop-vectorize might disable the error

@TurboGit
Copy link
Member

I'll double check ASAP.

@cryptomilk
Copy link
Contributor

Why do you try to hide the error?

@aurelienpierre
Copy link
Member Author

because it makes the compilation crash completely for something that is not blocking

@TurboGit
Copy link
Member

No this option does not work. I still get the same issue.

@aurelienpierre
Copy link
Member Author

Found it @TurboGit ! In CMakeLists.txt:

if(NOT SOURCE_PACKAGE AND NOT APPLE)
  # apple build fails on deprecated warnings..
  add_definitions(-Werror)
endif()

so warnings are turned into errors for any system but Apple, hence why @MStraeten could build with OpenMP and not care about simd fails.

@aurelienpierre
Copy link
Member Author

See #2977 for a solution.

@MStraeten
Copy link
Collaborator

just in case you're still interested in the osx config here the current output of
cmake .. -DCMAKE_OSX_DEPLOYMENT_TARGET=10.14 -DCMAKE_C_FLAGS=-I/opt/local/include/libomp -DCMAKE_CXX_FLAGS="-stdlib=libc++ -I/opt/local/include/libomp -Ofast" -DCMAKE_LIBRARY_PATH=/opt/local/lib/libomp -DBINARY_PACKAGE_BUILD=ON -DRAWSPEED_ENABLE_LTO=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake_out.txt

@TurboGit
Copy link
Member

Thanks, so that's clang 11 and you indeed have OpenMP supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature: enhancement current features to improve
Projects
None yet
Development

Successfully merging this pull request may close these issues.