OpenCL function argument passing refactoring #12677

dterrahe · 2022-10-20T23:28:52Z

Inspired by this old comment

Line 194 in 086d7b2

// this is so dumb:

this replaces this

  size_t sizes[] = { ROUNDUPDWD(p->width, devid), ROUNDUPDHT(p->height, devid), 1 };
  dt_opencl_set_kernel_arg(devid, kernel, 0, sizeof(cl_mem), (void *)&bl);
  dt_opencl_set_kernel_arg(devid, kernel, 1, sizeof(cl_mem), (void *)&bh);
  dt_opencl_set_kernel_arg(devid, kernel, 2, sizeof(int), (void *)&(width));
  dt_opencl_set_kernel_arg(devid, kernel, 3, sizeof(int), (void *)&(height));
  dt_opencl_set_kernel_arg(devid, kernel, 4, sizeof(float), (void *)&lpass_mult);
  err = dt_opencl_enqueue_kernel_2d(devid, kernel, sizes);

with this

  err = dt_opencl_enqueue_kernel_2d_args(devid, kernel, p->width, p->height,
    CLARG(bl), CLARG(bh), CLARG((width)), CLARG((height)), CLARG(lpass_mult));

This hopefully makes it simpler and less error prone to set up the link to opencl kernels.

The actual refactoring was done with a set of 7 regexps, with the first commit making sure that a sizes declared immediately before the first argument is not used later on, and so can be safely integrated in the "enqueue" call and deleted.

All arguments need to be wrapped in a CLANG or CLLOCAL macro, which takes care of the sizeof calculation (and adds some error checking, but of course it is still not aware of the actual opencl function parameters).

This should give identical results and my limited testing doesn't immediately show problems. A full regression test should be run before considering merging this, but I never manage to do this on my machine, so if somebody with a faster one all set up already would be willing to help, this would be much appreciated!

The only actual change/bug fix (maybe; this is the 3rd commit) is in nlmeans_core.c; a reference to an array of two ints was passed, which now generates a warning that the sizeof is probably not what you want, whereas previously that was just ignored, as the compiler had no way of linking the two together. I can't quickly figure out what these functions are supposed to do and how to debug this, so it would be great if somebody more knowledgeable could have a look. It does seem that elsehwere const int2 is just passed as the array (pointer) itself and not a reference to it.

dterrahe · 2022-10-20T23:33:16Z

The regexps used for the actual conversion.
They were run over src/common,src/develop,src/iop in visual studio code.

wrap "normal" argument:

Search: dt_opencl_set_kernel_arg\((.*), (.*),[ ]*(.*),[ \n].*sizeof\(.*,[ \n]*.*&(.*)\);.*
Replace: dt_opencl_set_kernel_args($1, $2, $3, CLARG($4));

wrap local variable allocation:

Search: dt_opencl_set_kernel_arg\((.*), (.*),[ ]*(.*),[ \n][ ]*(.*),[ \n][ ]*NULL\);.*
Replace: dt_opencl_set_kernel_args($1, $2, $3, CLLOCAL($4));

wrap "other" (array pointers usually):

Search: dt_opencl_set_kernel_arg\((.*), (.*),[ ]*(.*),[ \n](.*),[ \n]*.*\(void \*\)(.*)\);.*
Replace: dt_opencl_set_kernel_args($1, $2, $3, CLWRAP($4, $5));

merge multiple arguments (run, at least, 6 times):

Search: dt_opencl_set_kernel_args\((.*), (.*), (.*), (.*)\);\n[\n]*.*dt_opencl_set_kernel_args\(\1, \2, \d*, (.*)\);
Replace: dt_opencl_set_kernel_args($1, $2, $3, $4, $5);

swap kernel= and sizes= lines:

Search: (.*ROUNDUPDWD.*ROUNDUPDHT.*\n)[\n]*(.*kernel =.*\n)
Replace: $2$1

bring sizes= line to the front:

Search: (.*dt_opencl_set_kernel_args.*\n)(.*ROUNDUPDWD.*ROUNDUPDHT.*\n)(.*dt_opencl_enqueue_kernel_2d.*\n)
Replace: $2$1$3

move sizes= and args into enqueue:

Search: ([ ]*).*size_t (.*)\[.*=.*\{ ROUNDUPDWD\((.*), (.*)\), ROUNDUPDHT\((.*), \4\).*\};[\n]*
[ ]*dt_opencl_set_kernel_args\(\4, (.*), [ ]?0, (.*\n)[\n]*(.*)dt_opencl_enqueue_kernel_2d\(\4, \6, \2\);\n
Replace: $8dt_opencl_enqueue_kernel_2d_args($4, $6, $3, $5,\n$1  $7

Introduce separate wrapper for arrays (which was only use case for CLWRAP in user code):

Search: CLWRAP\((\d*).*?sizeof.*?, ([^&]*?)\)
Replace: CLARRAY($1, $2)

wrap long dt_opencl_set_kernel_args calls to second line:

Search: ([ ]*)(dt_opencl_set_kernel_args.{70,}?)(CLARG|CLLOCAL|CLWRAP|CLARRAY)
Replace: $1$2\n$1  $3

split long argument lists over two lines. call as many times as needed:

Search: ([ ]*)(CLARG|CLLOCAL|CLWRAP|CLARRAY)(.{90,}?)(CLARG|CLLOCAL|CLWRAP|CLARRAY)(.*\n)
Replace: $1$2$3\n$1$4$5

jenshannoschwalm · 2022-10-21T06:35:52Z

Whow!

src/common/opencl.h

TurboGit · 2022-10-21T11:17:53Z

This should give identical results and my limited testing doesn't immediately show problems. A full regression test should be run before considering merging this, but I never manage to do this on my machine, so if somebody with a faster one all set up already would be willing to help, this would be much appreciated!

I'll run the testsuite and will report.

TurboGit · 2022-10-21T11:43:47Z

@dterrahe : The regression tests are all OK.

dterrahe · 2022-10-21T13:21:38Z

Any comments on structure/naming conventions etc? Since most of this was auto generated, it should/could be relatively easy to make changes. Don't worry that I spend weeks doing this by hand and would be sensitive about criticism/suggestions.

TurboGit · 2022-10-21T13:27:03Z

Any comments on structure/naming conventions etc? Since most of this was auto generated, it should/could be relatively easy to make changes. Don't worry that I spend weeks doing this by hand and would be sensitive about criticism/suggestions.

Only one point about the generated code. If you could somehow split the lines it would be nice, the length is too much for me and a diff in a console will be hard.

dterrahe · 2022-10-21T13:44:07Z

If you could somehow split the lines it would be nice

That's the response I was expecting! :-)

EDIT: I think I figured out how to do this in a regexp...

TurboGit · 2022-10-21T15:48:58Z

That's the response I was expecting! :-)

A pleasure to see you happy :)

dterrahe · 2022-10-21T19:40:38Z

While eyeballing the final result after splitting long lines, I noticed that in two cases a single parameter setting in an else was merged with the subsequent parameters (so that they would never be set in the if case.

There was also one case where in the original code in highlights.c one of the pararmeters in a call was actually set in a different kernel (so it didn't get merged with the others). That seemed "obviously wrong" to me, so I fixed it (in bd5c8a5) but it would still be good if this was reviewed by someone who actually understands opencl (same with db111b0).

Regression tests can't find everything...

jenshannoschwalm · 2022-10-21T21:51:16Z

That seemed "obviously wrong" to me,

In highlights: It was wrong :-) Thanks
The other one is also good.

TurboGit

While testing I got :
[local laplacian cl] failed: -51

EDIT: with my change to display the error message we have:

12,860130 [local laplacian cl] couldn't enqueue kernel! CL_INVALID_ARG_SIZE
12,860174 [opencl_pixelpipe] [preview] could not run module `bilat' on gpu. falling back to cpu path

src/common/locallaplaciancl.c

TurboGit

I have also error reported with -d opencl on exposure module:

8,193075 [dt_opencl_check_tuning] use 3503MB (tunemem=ON, pinning=ON) on device `Quadro T1000' id=0
8,350365 [dt_opencl_enqueue_kernel_2d_with_local] kernel 24 on device 0: CL_INVALID_KERNEL_ARGS
8,351739 [opencl_blendop] couldn't enqueue kernel! CL_INVALID_KERNEL_ARGS
8,351744 [opencl_pixelpipe] [full] could not run module `exposure' on gpu. falling back to cpu path
8,553919 [dt_opencl_enqueue_kernel_2d_with_local] kernel 24 on device 0: CL_INVALID_KERNEL_ARGS
8,555488 [opencl_blendop] couldn't enqueue kernel! CL_INVALID_KERNEL_ARGS

dterrahe · 2022-10-22T21:17:05Z

error reported with -d opencl on exposure module

That seems to be in blend.c
I suspect CLARG(offs), three times. That is not a correct translation of
dt_opencl_set_kernel_arg(devid, kernel_mask, 11, 2 * sizeof(int), (void *)&offs);
Probably should be CLARRAY(2, offs)
Was the original "&" correct?

jenshannoschwalm · 2022-10-22T21:29:39Z

&offs[0] ?

johnny-bit · 2022-10-23T08:42:30Z

the size_t changes were mostly to make sure that any multiplications/operations on things that should be both big and unsigned stayed in realm of unsigned. That mostly started if i remember correctly from maths error where big enough image could cause mem allocation request to be netagive. That plus comparing signed and unsigned ints in C code can lead to problems... If one makes sure to be cognizant of both, properly handle casts & math issues and keep consitent then it's fine. sob., 22 paź 2022 o 19:54 dterrahe ***@***.***> napisał(a):

…

***@***.**** commented on this pull request. ------------------------------ In src/common/locallaplaciancl.c <#12677 (comment)> : > err = dt_opencl_enqueue_kernel_2d(b->devid, b->global->kernel_gauss_reduce, sizes); if(err != CL_SUCCESS) goto error; } for(int k=0;k<num_gamma;k++) { // process images const float g = (k+.5f)/(float)num_gamma; - dt_opencl_set_kernel_arg(b->devid, b->global->kernel_process_curve, 0, sizeof(cl_mem), &b->dev_padded[0]); size_t seems awfully big for one dimension of an image, but I guess the advantage is you can safely multiply without having to cast first. If I remember correctly @johnny-bit <https://github.com/johnny-bit> did a lot of work on that at some point, but enforcing consistency, where *all* widths and heights would be size_t is a long way off. So then maybe having them all be ints and enforcing casts everywhere is the more consistent way? Or just deal with them case by case. Copy/pastes from a situation using size_t to one with ints will continue to introduce problems though. At least I feel that this PR making this explicit, rather than just accepting mismatched sizeof(type)s and actual types is "better"? — Reply to this email directly, view it on GitHub <#12677 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACRRKFPGOZZ67RX5TJ76JXLWEQS6DANCNFSM6AAAAAARKUHQTY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Pozdrawiam, Hubert Kowalski

TurboGit · 2022-10-23T17:32:34Z

Probably should be CLARRAY(2, offs)

Sounds like it should yes !

TurboGit · 2022-10-23T21:10:54Z

src/common/bilateralcl.h

@@ -35,7 +35,7 @@ typedef struct dt_bilateral_cl_t
 {
  dt_bilateral_cl_global_t *global;
  int devid;
-  size_t size_x, size_y, size_z;
+  int size_x, size_y, size_z;


wondering if unsigned int won't be better here? Same question below.

In theory, I guess yes. In practice, "all" width/height variables and struct members seem to be using int, so it would "look" weird and may lead to more surprising behavior than just having to keep one thing in your head. If it ever makes a difference anywhere...

Personally I'd rather be consistent than partially more "correct", but if you'd like me to change it, I won't object.

dterrahe · 2022-10-23T21:19:05Z

I ran a before/after this refactor with all modules active and a few different blending options and got different argument sizes for these:

splat
blur_line
blur_line_z
slice
pad_input
process_curve
colorreconstruction_splat
colorreconstruction_blur_line
colorreconstruction_slice

I looked into all of them and they all turned out to use size_t, from the definitions of shared structs. I changed those and checked if that would have an impact on any calculations using them. In a few cases I moved a multiplication with a sizeof in front, instead of after, it so that the type promotion would happen in time to give the same result. In many cases, the value in one of these size_ts gets simply assigned to int variables, so there's not much point in their added precision anyway.

Also made the CLARRAY(2, offs) change as discussed. I'm now no longer seeing any size differences.

BIG CAVEAT!!! My test clearly didn't pick up all kernels, as the mismatch requiring the CLARRAY changes didn't show up in the first place. I don't know if there is a runtime test to catch all opencl kernels? Some of them may be in deprecated modules. Maybe there's a style that uses all of them? (otherwise would be good to have for testing; but I wouldn't know how to create one except by painstakingly going over all the (combinations of) options in all the modules and blending and making sure all code is covered...)

TurboGit · 2022-10-28T16:21:24Z

I've tested, here is what I've done to gain confidence that this does not break something.

I have run the regression testsuite without this PR and with this PR. I have then compared the logs. Note that the testsuite contains tests for most if not all modules even the deprecated ones.

$ diff -c logs/test-20221028-173323.log logs/test-20221028-175534.log 
*** logs/test-20221028-173323.log	2022-10-28 17:53:18.852979596 +0200
--- logs/test-20221028-175534.log	2022-10-28 18:15:26.841402568 +0200
***************
*** 528,534 ****
  
  Test 0016-lowpass-bilateral
        Image mire1.cr2
!       CPU & GPU version differ by 104786 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 21.58795
--- 528,534 ----
  
  Test 0016-lowpass-bilateral
        Image mire1.cr2
!       CPU & GPU version differ by 104784 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 21.58795
***************
*** 561,572 ****
  
  Test 0017-monochrome
        Image mire1.cr2
!       CPU & GPU version differ by 22809 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 2.16921
        Avg dE                   : 0.00225
!       Std dE                   : 0.02646
        ----------------------------------
        Pixels below avg + 0 std : 99.18 %
        Pixels below avg + 1 std : 99.18 %
--- 561,572 ----
  
  Test 0017-monochrome
        Image mire1.cr2
!       CPU & GPU version differ by 22808 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 2.16921
        Avg dE                   : 0.00225
!       Std dE                   : 0.02645
        ----------------------------------
        Pixels below avg + 0 std : 99.18 %
        Pixels below avg + 1 std : 99.18 %
***************
*** 627,637 ****
  
  Test 0019-color-mapping
        Image mire1.cr2
!       CPU & GPU version differ by 107356 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 27.93468
!       Avg dE                   : 0.03419
        Std dE                   : 0.26758
        ----------------------------------
        Pixels below avg + 0 std : 96.16 %
--- 627,637 ----
  
  Test 0019-color-mapping
        Image mire1.cr2
!       CPU & GPU version differ by 107358 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 27.93468
!       Avg dE                   : 0.03420
        Std dE                   : 0.26758
        ----------------------------------
        Pixels below avg + 0 std : 96.16 %
***************
*** 1716,1735 ****
  
  Test 0052-color-reconstruction
        Image mire1.cr2
!       CPU & GPU version differ by 888465 pixels
        CPU vs. GPU report :
        ----------------------------------
!       Max dE                   : 19.23852
!       Avg dE                   : 0.71821
!       Std dE                   : 1.40734
        ----------------------------------
!       Pixels below avg + 0 std : 75.01 %
!       Pixels below avg + 1 std : 86.67 %
!       Pixels below avg + 3 std : 97.33 %
        Pixels below avg + 6 std : 100.00 %
        Pixels below avg + 9 std : 100.00 %
        ----------------------------------
!       Pixels above tolerance   : 12.46 %
   
        Expected CPU vs. current CPU report :
        ----------------------------------
--- 1716,1735 ----
  
  Test 0052-color-reconstruction
        Image mire1.cr2
!       CPU & GPU version differ by 1.01173e+06 pixels
        CPU vs. GPU report :
        ----------------------------------
!       Max dE                   : 51.34364
!       Avg dE                   : 4.24255
!       Std dE                   : 7.87937
        ----------------------------------
!       Pixels below avg + 0 std : 75.54 %
!       Pixels below avg + 1 std : 82.56 %
!       Pixels below avg + 3 std : 98.66 %
        Pixels below avg + 6 std : 100.00 %
        Pixels below avg + 9 std : 100.00 %
        ----------------------------------
!       Pixels above tolerance   : 27.61 %
   
        Expected CPU vs. current CPU report :
        ----------------------------------
***************
*** 2508,2514 ****
  
  Test 0076-retouch-blur-fill
        Image mire1.cr2
!       CPU & GPU version differ by 143682 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 16.46743
--- 2508,2514 ----
  
  Test 0076-retouch-blur-fill
        Image mire1.cr2
!       CPU & GPU version differ by 143681 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 16.46743
***************
*** 2673,2684 ****
  
  Test 0081-mask-groups
        Image mire1.cr2
!       CPU & GPU version differ by 73776 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 32.35596
        Avg dE                   : 0.03100
!       Std dE                   : 0.30957
        ----------------------------------
        Pixels below avg + 0 std : 97.36 %
        Pixels below avg + 1 std : 97.68 %
--- 2673,2684 ----
  
  Test 0081-mask-groups
        Image mire1.cr2
!       CPU & GPU version differ by 73751 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 32.35596
        Avg dE                   : 0.03100
!       Std dE                   : 0.30956
        ----------------------------------
        Pixels below avg + 0 std : 97.36 %
        Pixels below avg + 1 std : 97.68 %
***************
*** 2739,2745 ****
  
  Test 0083-colorbalancergb
        Image mire1.cr2
!       CPU & GPU version differ by 27722 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 1.33753
--- 2739,2745 ----
  
  Test 0083-colorbalancergb
        Image mire1.cr2
!       CPU & GPU version differ by 27723 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 1.33753

Everything looks good except the test 0052-color-reconstruction, the diff is huge so clearly a bug.

dterrahe · 2022-10-28T16:25:22Z

Great! Thank you; I'll immediately look at that regression.

src/iop/colorreconstruction.c

TurboGit

Thanks, working on my side and validated against the testsuite. We should be pretty safe, hopefully if it remains some breakages because of some holes in the testsuite we still have quite some filed testing.

dterrahe added 4 commits October 20, 2022 17:08

harmonize opencl calls

ac15954

expand dt_opencl_enqueue_kernel_2d to also set all arguments

0211f4c

"fix" possible bugs in opencl array argument passing

db111b0

actual opencl calls refactoring

a3cc624

dterrahe added the scope: codebase making darktable source code easier to manage label Oct 20, 2022

dterrahe added the bugfix pull request fixing a bug label Oct 20, 2022

TurboGit reviewed Oct 21, 2022

View reviewed changes

src/common/opencl.h Show resolved Hide resolved

TurboGit added this to the 4.2 milestone Oct 21, 2022

dterrahe added the priority: low core features work as expected, only secondary/optional features don't label Oct 21, 2022

dterrahe added 3 commits October 21, 2022 14:56

introduce CLARRAY OpenCL argument wrapped macro

87dd7eb

wrap long OpenCL calls over several lines

a21c7eb

three manual opencl merged calls fixes

bd5c8a5

TurboGit requested changes Oct 22, 2022

View reviewed changes

src/common/locallaplaciancl.c Show resolved Hide resolved

TurboGit reviewed Oct 22, 2022

View reviewed changes

src/common/locallaplaciancl.c Show resolved Hide resolved

src/common/locallaplaciancl.c Show resolved Hide resolved

TurboGit reviewed Oct 22, 2022

View reviewed changes

TurboGit added the wip pull request in making, tests and feedback needed label Oct 22, 2022

fix some opencl argument sizes

31dc5ca

TurboGit reviewed Oct 23, 2022

View reviewed changes

dterrahe requested a review from TurboGit October 28, 2022 15:02

TurboGit reviewed Oct 28, 2022

View reviewed changes

src/iop/colorreconstruction.c Outdated Show resolved Hide resolved

fix breakage in previous commit which accidentally removed 4

0a7cd9e

TurboGit approved these changes Oct 28, 2022

View reviewed changes

TurboGit merged commit a4bd3cd into darktable-org:master Oct 28, 2022

dterrahe mentioned this pull request Oct 29, 2022

Highlight reconstruction incorrectly applies mask with OpenCL #11968

Closed

jenshannoschwalm mentioned this pull request Dec 11, 2022

Masking: couldn't enqueue kernel! CL_INVALID_KERNEL_ARGS #13120

Closed

dterrahe deleted the opencl_refactor branch February 4, 2023 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenCL function argument passing refactoring #12677

OpenCL function argument passing refactoring #12677

dterrahe commented Oct 20, 2022

dterrahe commented Oct 20, 2022 •

edited

jenshannoschwalm commented Oct 21, 2022

TurboGit commented Oct 21, 2022

TurboGit commented Oct 21, 2022

dterrahe commented Oct 21, 2022

TurboGit commented Oct 21, 2022

dterrahe commented Oct 21, 2022 •

edited

TurboGit commented Oct 21, 2022

dterrahe commented Oct 21, 2022

jenshannoschwalm commented Oct 21, 2022

TurboGit left a comment •

edited

TurboGit left a comment

dterrahe commented Oct 22, 2022

jenshannoschwalm commented Oct 22, 2022

johnny-bit commented Oct 23, 2022 via email

TurboGit commented Oct 23, 2022

TurboGit Oct 23, 2022

dterrahe Oct 23, 2022

dterrahe commented Oct 23, 2022

TurboGit commented Oct 28, 2022

dterrahe commented Oct 28, 2022

TurboGit left a comment

OpenCL function argument passing refactoring #12677

OpenCL function argument passing refactoring #12677

Conversation

dterrahe commented Oct 20, 2022

dterrahe commented Oct 20, 2022 • edited

jenshannoschwalm commented Oct 21, 2022

TurboGit commented Oct 21, 2022

TurboGit commented Oct 21, 2022

dterrahe commented Oct 21, 2022

TurboGit commented Oct 21, 2022

dterrahe commented Oct 21, 2022 • edited

TurboGit commented Oct 21, 2022

dterrahe commented Oct 21, 2022

jenshannoschwalm commented Oct 21, 2022

TurboGit left a comment • edited

Choose a reason for hiding this comment

TurboGit left a comment

Choose a reason for hiding this comment

dterrahe commented Oct 22, 2022

jenshannoschwalm commented Oct 22, 2022

johnny-bit commented Oct 23, 2022 via email

TurboGit commented Oct 23, 2022

TurboGit Oct 23, 2022

Choose a reason for hiding this comment

dterrahe Oct 23, 2022

Choose a reason for hiding this comment

dterrahe commented Oct 23, 2022

TurboGit commented Oct 28, 2022

dterrahe commented Oct 28, 2022

TurboGit left a comment

Choose a reason for hiding this comment

dterrahe commented Oct 20, 2022 •

edited

dterrahe commented Oct 21, 2022 •

edited

TurboGit left a comment •

edited