Skip to content

OpenCL / Local Contrast / Local Laplacian - issue with AMD/ROCM: 'amplified effect' #3756

Closed
@arigit

Description

This is an issue reported in redmine 1 year ago, didn't get any traction, bringing it here since bug reporting seems to have moved and it may be found by other people affected.

This issue affects Darktable when used with AMD-based GPUs, OpenCL and the ROCM opensource driver which is the currently supported OpenCL driver for all new AMD discrete and integrated GPUs. It's been proven to affect both Polaris and Vega GPUs.

This issue does not happen with the legacy closed-source amdgpu-pro OpenCL driver (that many still use).

Original issue: https://redmine.darktable.org/issues/12423

Describe the bug

The result of applying Local Contrast > Local Laplacian is totally different with OpenCL "on" and OpenCL "off", for the exact same settings of the module. The more % of Detail is set, the more striking is the difference.

This issue does NOT happen for Local Contrast > Bilateral Grid, it is specific to Local Laplacian. When I select Bilateral Grid, the result is the same regardless of whether OpenCL is enabled or not.

It looks like the Local Contrast > Local Laplacian OpenCL implementation has a problem. See the attached JPG snapshots, taken with the exact same settings of the model, one with OpenCL "on" and one with OpenCL "off". Basically LocalContrast+LocalLaplacian is unusable (and destroys the image) when OpenCL is on.

It looks like with OpenCL enabled, the "Detail" effect of the Local Contrast Filter is like grossly amplified.

Example:
Local Contrast + Local Laplacian, OpenCL = "off"
image

Local Contrast + Local Laplacian, OpenCL = "on"
image

No other other OpenCL kernel from darktable has problems with the official ROCM driver from AMD in my setup (I have been using ROCM OpenCL with darktable for 1.5 years now, meaning thousands of pictures developed, using profiled denoise, filmic, retouch, basecurve, etc etc).

My workaround in order to keep OpenCL "on" has been to remove the LocalLaplacian kernel.

sudo mv /usr/share/darktable/kernels/locallaplacian.cl /usr/share/darktable/kernels/locallaplacian.cl.temporarilyRemovedDueToROCM

The issue happens with both current stable and github master (Darktable 3.0RC2), the issue was first found on DT 2.4 / Ubuntu 18.10, stock Kernel 4.18, and rocm-opencl 1.6 from AMD. It has been reproducible 100% of the time since then, with DT 2.6, and now DT3.0, and dozens of ROCM releases (all the way to the current 2.10, kernel 5.3)

There are no errors at all shown in the logs by darktable when compiling the kernel and when using it:

0.280302 [opencl_init] compiling program `locallaplacian.cl' ..
0.280608 [opencl_load_program] loaded cached binary program from file `/home/ariel/.cache/darktable/cached_kernels_for_gfx803/locallaplacian.cl.bin'
0.280611 [opencl_load_program] successfully loaded program from `/usr/share/darktable/kernels/locallaplacian.cl'
0.281739 [opencl_build_program] successfully built program
0.281746 [opencl_build_program] BUILD STATUS: 0

While it is possible that the bug is in the ROCM driver (since the same OCL kernel works in nvidia and in the old amdgpu-pro driver), in order to have a rocm developer engaged we need some more details on what is wrong with the driver; we have this ROCM bug opened with them:

ROCm/ROCm#704

however no detail = no action.

It is also possible that some peculiarity of the locallaplacian DT OCL kernel code is triggering the problem - hopefully the problem could alternatively be worked around via refactoring Darktable's Local Laplacian OpenCL code

Platform (please complete the following information):

  • OS: Ubuntu 19.10
  • OpenCL activated
  • GPU: AMD RX-560; Driver: rocm-opencl 2.10
    Same issue reported with AMD Vega 56

Metadata

Assignees

No one assigned

    Labels

    bug: upstreamhe bug needs a fix outside of the scope of darktable, in an external lib or in a driverno-issue-activityreproduce: confirmeda way to make the bug re-appear 99% of times has been foundscope: hardware supportdealing with drivers and external devices: GPU, printersunderstood: uncleardevs lack most or all important info and can do nothing, the report will be closed after 2 weeks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions