Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL function argument passing refactoring #12677

Merged
merged 9 commits into from
Oct 28, 2022

Conversation

dterrahe
Copy link
Member

Inspired by this old comment

// this is so dumb:

this replaces this

  size_t sizes[] = { ROUNDUPDWD(p->width, devid), ROUNDUPDHT(p->height, devid), 1 };
  dt_opencl_set_kernel_arg(devid, kernel, 0, sizeof(cl_mem), (void *)&bl);
  dt_opencl_set_kernel_arg(devid, kernel, 1, sizeof(cl_mem), (void *)&bh);
  dt_opencl_set_kernel_arg(devid, kernel, 2, sizeof(int), (void *)&(width));
  dt_opencl_set_kernel_arg(devid, kernel, 3, sizeof(int), (void *)&(height));
  dt_opencl_set_kernel_arg(devid, kernel, 4, sizeof(float), (void *)&lpass_mult);
  err = dt_opencl_enqueue_kernel_2d(devid, kernel, sizes);

with this

  err = dt_opencl_enqueue_kernel_2d_args(devid, kernel, p->width, p->height,
    CLARG(bl), CLARG(bh), CLARG((width)), CLARG((height)), CLARG(lpass_mult));

This hopefully makes it simpler and less error prone to set up the link to opencl kernels.

The actual refactoring was done with a set of 7 regexps, with the first commit making sure that a sizes declared immediately before the first argument is not used later on, and so can be safely integrated in the "enqueue" call and deleted.

All arguments need to be wrapped in a CLANG or CLLOCAL macro, which takes care of the sizeof calculation (and adds some error checking, but of course it is still not aware of the actual opencl function parameters).

This should give identical results and my limited testing doesn't immediately show problems. A full regression test should be run before considering merging this, but I never manage to do this on my machine, so if somebody with a faster one all set up already would be willing to help, this would be much appreciated!

The only actual change/bug fix (maybe; this is the 3rd commit) is in nlmeans_core.c; a reference to an array of two ints was passed, which now generates a warning that the sizeof is probably not what you want, whereas previously that was just ignored, as the compiler had no way of linking the two together. I can't quickly figure out what these functions are supposed to do and how to debug this, so it would be great if somebody more knowledgeable could have a look. It does seem that elsehwere const int2 is just passed as the array (pointer) itself and not a reference to it.

@dterrahe dterrahe added the scope: codebase making darktable source code easier to manage label Oct 20, 2022
@dterrahe
Copy link
Member Author

dterrahe commented Oct 20, 2022

The regexps used for the actual conversion.
They were run over src/common,src/develop,src/iop in visual studio code.

wrap "normal" argument:

Search: dt_opencl_set_kernel_arg\((.*), (.*),[ ]*(.*),[ \n].*sizeof\(.*,[ \n]*.*&(.*)\);.*
Replace: dt_opencl_set_kernel_args($1, $2, $3, CLARG($4));

wrap local variable allocation:

Search: dt_opencl_set_kernel_arg\((.*), (.*),[ ]*(.*),[ \n][ ]*(.*),[ \n][ ]*NULL\);.*
Replace: dt_opencl_set_kernel_args($1, $2, $3, CLLOCAL($4));

wrap "other" (array pointers usually):

Search: dt_opencl_set_kernel_arg\((.*), (.*),[ ]*(.*),[ \n](.*),[ \n]*.*\(void \*\)(.*)\);.*
Replace: dt_opencl_set_kernel_args($1, $2, $3, CLWRAP($4, $5));

merge multiple arguments (run, at least, 6 times):

Search: dt_opencl_set_kernel_args\((.*), (.*), (.*), (.*)\);\n[\n]*.*dt_opencl_set_kernel_args\(\1, \2, \d*, (.*)\);
Replace: dt_opencl_set_kernel_args($1, $2, $3, $4, $5);

swap kernel= and sizes= lines:

Search: (.*ROUNDUPDWD.*ROUNDUPDHT.*\n)[\n]*(.*kernel =.*\n)
Replace: $2$1

bring sizes= line to the front:

Search: (.*dt_opencl_set_kernel_args.*\n)(.*ROUNDUPDWD.*ROUNDUPDHT.*\n)(.*dt_opencl_enqueue_kernel_2d.*\n)
Replace: $2$1$3

move sizes= and args into enqueue:

Search: ([ ]*).*size_t (.*)\[.*=.*\{ ROUNDUPDWD\((.*), (.*)\), ROUNDUPDHT\((.*), \4\).*\};[\n]*
[ ]*dt_opencl_set_kernel_args\(\4, (.*), [ ]?0, (.*\n)[\n]*(.*)dt_opencl_enqueue_kernel_2d\(\4, \6, \2\);\n
Replace: $8dt_opencl_enqueue_kernel_2d_args($4, $6, $3, $5,\n$1  $7

Introduce separate wrapper for arrays (which was only use case for CLWRAP in user code):

Search: CLWRAP\((\d*).*?sizeof.*?, ([^&]*?)\)
Replace: CLARRAY($1, $2)

wrap long dt_opencl_set_kernel_args calls to second line:

Search: ([ ]*)(dt_opencl_set_kernel_args.{70,}?)(CLARG|CLLOCAL|CLWRAP|CLARRAY)
Replace: $1$2\n$1  $3

split long argument lists over two lines. call as many times as needed:

Search: ([ ]*)(CLARG|CLLOCAL|CLWRAP|CLARRAY)(.{90,}?)(CLARG|CLLOCAL|CLWRAP|CLARRAY)(.*\n)
Replace: $1$2$3\n$1$4$5

@dterrahe dterrahe added the bugfix pull request fixing a bug label Oct 20, 2022
@jenshannoschwalm
Copy link
Collaborator

Whow!

@TurboGit TurboGit added this to the 4.2 milestone Oct 21, 2022
@TurboGit
Copy link
Member

This should give identical results and my limited testing doesn't immediately show problems. A full regression test should be run before considering merging this, but I never manage to do this on my machine, so if somebody with a faster one all set up already would be willing to help, this would be much appreciated!

I'll run the testsuite and will report.

@TurboGit
Copy link
Member

@dterrahe : The regression tests are all OK.

@dterrahe
Copy link
Member Author

Any comments on structure/naming conventions etc? Since most of this was auto generated, it should/could be relatively easy to make changes. Don't worry that I spend weeks doing this by hand and would be sensitive about criticism/suggestions.

@TurboGit
Copy link
Member

Any comments on structure/naming conventions etc? Since most of this was auto generated, it should/could be relatively easy to make changes. Don't worry that I spend weeks doing this by hand and would be sensitive about criticism/suggestions.

Only one point about the generated code. If you could somehow split the lines it would be nice, the length is too much for me and a diff in a console will be hard.

@dterrahe
Copy link
Member Author

dterrahe commented Oct 21, 2022

If you could somehow split the lines it would be nice

That's the response I was expecting! :-)

EDIT: I think I figured out how to do this in a regexp...

@dterrahe dterrahe added the priority: low core features work as expected, only secondary/optional features don't label Oct 21, 2022
@TurboGit
Copy link
Member

That's the response I was expecting! :-)

A pleasure to see you happy :)

@dterrahe
Copy link
Member Author

While eyeballing the final result after splitting long lines, I noticed that in two cases a single parameter setting in an else was merged with the subsequent parameters (so that they would never be set in the if case.

There was also one case where in the original code in highlights.c one of the pararmeters in a call was actually set in a different kernel (so it didn't get merged with the others). That seemed "obviously wrong" to me, so I fixed it (in bd5c8a5) but it would still be good if this was reviewed by someone who actually understands opencl (same with db111b0).

Regression tests can't find everything...

@jenshannoschwalm
Copy link
Collaborator

That seemed "obviously wrong" to me,

In highlights: It was wrong :-) Thanks
The other one is also good.

Copy link
Member

@TurboGit TurboGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing I got :
[local laplacian cl] failed: -51

EDIT: with my change to display the error message we have:

12,860130 [local laplacian cl] couldn't enqueue kernel! CL_INVALID_ARG_SIZE
12,860174 [opencl_pixelpipe] [preview] could not run module `bilat' on gpu. falling back to cpu path

Copy link
Member

@TurboGit TurboGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also error reported with -d opencl on exposure module:

8,193075 [dt_opencl_check_tuning] use 3503MB (tunemem=ON, pinning=ON) on device `Quadro T1000' id=0
8,350365 [dt_opencl_enqueue_kernel_2d_with_local] kernel 24 on device 0: CL_INVALID_KERNEL_ARGS
8,351739 [opencl_blendop] couldn't enqueue kernel! CL_INVALID_KERNEL_ARGS
8,351744 [opencl_pixelpipe] [full] could not run module `exposure' on gpu. falling back to cpu path
8,553919 [dt_opencl_enqueue_kernel_2d_with_local] kernel 24 on device 0: CL_INVALID_KERNEL_ARGS
8,555488 [opencl_blendop] couldn't enqueue kernel! CL_INVALID_KERNEL_ARGS

@TurboGit TurboGit added the wip pull request in making, tests and feedback needed label Oct 22, 2022
@dterrahe
Copy link
Member Author

error reported with -d opencl on exposure module

That seems to be in blend.c
I suspect CLARG(offs), three times. That is not a correct translation of
dt_opencl_set_kernel_arg(devid, kernel_mask, 11, 2 * sizeof(int), (void *)&offs);
Probably should be CLARRAY(2, offs)
Was the original "&" correct?

@jenshannoschwalm
Copy link
Collaborator

&offs[0] ?

@johnny-bit
Copy link
Member

johnny-bit commented Oct 23, 2022 via email

@TurboGit
Copy link
Member

Probably should be CLARRAY(2, offs)

Sounds like it should yes !

@@ -35,7 +35,7 @@ typedef struct dt_bilateral_cl_t
{
dt_bilateral_cl_global_t *global;
int devid;
size_t size_x, size_y, size_z;
int size_x, size_y, size_z;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if unsigned int won't be better here? Same question below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, I guess yes. In practice, "all" width/height variables and struct members seem to be using int, so it would "look" weird and may lead to more surprising behavior than just having to keep one thing in your head. If it ever makes a difference anywhere...

Personally I'd rather be consistent than partially more "correct", but if you'd like me to change it, I won't object.

@dterrahe
Copy link
Member Author

I ran a before/after this refactor with all modules active and a few different blending options and got different argument sizes for these:

splat
blur_line
blur_line_z
slice
pad_input
process_curve
colorreconstruction_splat
colorreconstruction_blur_line
colorreconstruction_slice

I looked into all of them and they all turned out to use size_t, from the definitions of shared structs. I changed those and checked if that would have an impact on any calculations using them. In a few cases I moved a multiplication with a sizeof in front, instead of after, it so that the type promotion would happen in time to give the same result. In many cases, the value in one of these size_ts gets simply assigned to int variables, so there's not much point in their added precision anyway.

Also made the CLARRAY(2, offs) change as discussed. I'm now no longer seeing any size differences.

BIG CAVEAT!!! My test clearly didn't pick up all kernels, as the mismatch requiring the CLARRAY changes didn't show up in the first place. I don't know if there is a runtime test to catch all opencl kernels? Some of them may be in deprecated modules. Maybe there's a style that uses all of them? (otherwise would be good to have for testing; but I wouldn't know how to create one except by painstakingly going over all the (combinations of) options in all the modules and blending and making sure all code is covered...)

@TurboGit
Copy link
Member

I've tested, here is what I've done to gain confidence that this does not break something.

I have run the regression testsuite without this PR and with this PR. I have then compared the logs. Note that the testsuite contains tests for most if not all modules even the deprecated ones.

$ diff -c logs/test-20221028-173323.log logs/test-20221028-175534.log 
*** logs/test-20221028-173323.log	2022-10-28 17:53:18.852979596 +0200
--- logs/test-20221028-175534.log	2022-10-28 18:15:26.841402568 +0200
***************
*** 528,534 ****
  
  Test 0016-lowpass-bilateral
        Image mire1.cr2
!       CPU & GPU version differ by 104786 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 21.58795
--- 528,534 ----
  
  Test 0016-lowpass-bilateral
        Image mire1.cr2
!       CPU & GPU version differ by 104784 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 21.58795
***************
*** 561,572 ****
  
  Test 0017-monochrome
        Image mire1.cr2
!       CPU & GPU version differ by 22809 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 2.16921
        Avg dE                   : 0.00225
!       Std dE                   : 0.02646
        ----------------------------------
        Pixels below avg + 0 std : 99.18 %
        Pixels below avg + 1 std : 99.18 %
--- 561,572 ----
  
  Test 0017-monochrome
        Image mire1.cr2
!       CPU & GPU version differ by 22808 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 2.16921
        Avg dE                   : 0.00225
!       Std dE                   : 0.02645
        ----------------------------------
        Pixels below avg + 0 std : 99.18 %
        Pixels below avg + 1 std : 99.18 %
***************
*** 627,637 ****
  
  Test 0019-color-mapping
        Image mire1.cr2
!       CPU & GPU version differ by 107356 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 27.93468
!       Avg dE                   : 0.03419
        Std dE                   : 0.26758
        ----------------------------------
        Pixels below avg + 0 std : 96.16 %
--- 627,637 ----
  
  Test 0019-color-mapping
        Image mire1.cr2
!       CPU & GPU version differ by 107358 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 27.93468
!       Avg dE                   : 0.03420
        Std dE                   : 0.26758
        ----------------------------------
        Pixels below avg + 0 std : 96.16 %
***************
*** 1716,1735 ****
  
  Test 0052-color-reconstruction
        Image mire1.cr2
!       CPU & GPU version differ by 888465 pixels
        CPU vs. GPU report :
        ----------------------------------
!       Max dE                   : 19.23852
!       Avg dE                   : 0.71821
!       Std dE                   : 1.40734
        ----------------------------------
!       Pixels below avg + 0 std : 75.01 %
!       Pixels below avg + 1 std : 86.67 %
!       Pixels below avg + 3 std : 97.33 %
        Pixels below avg + 6 std : 100.00 %
        Pixels below avg + 9 std : 100.00 %
        ----------------------------------
!       Pixels above tolerance   : 12.46 %
   
        Expected CPU vs. current CPU report :
        ----------------------------------
--- 1716,1735 ----
  
  Test 0052-color-reconstruction
        Image mire1.cr2
!       CPU & GPU version differ by 1.01173e+06 pixels
        CPU vs. GPU report :
        ----------------------------------
!       Max dE                   : 51.34364
!       Avg dE                   : 4.24255
!       Std dE                   : 7.87937
        ----------------------------------
!       Pixels below avg + 0 std : 75.54 %
!       Pixels below avg + 1 std : 82.56 %
!       Pixels below avg + 3 std : 98.66 %
        Pixels below avg + 6 std : 100.00 %
        Pixels below avg + 9 std : 100.00 %
        ----------------------------------
!       Pixels above tolerance   : 27.61 %
   
        Expected CPU vs. current CPU report :
        ----------------------------------
***************
*** 2508,2514 ****
  
  Test 0076-retouch-blur-fill
        Image mire1.cr2
!       CPU & GPU version differ by 143682 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 16.46743
--- 2508,2514 ----
  
  Test 0076-retouch-blur-fill
        Image mire1.cr2
!       CPU & GPU version differ by 143681 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 16.46743
***************
*** 2673,2684 ****
  
  Test 0081-mask-groups
        Image mire1.cr2
!       CPU & GPU version differ by 73776 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 32.35596
        Avg dE                   : 0.03100
!       Std dE                   : 0.30957
        ----------------------------------
        Pixels below avg + 0 std : 97.36 %
        Pixels below avg + 1 std : 97.68 %
--- 2673,2684 ----
  
  Test 0081-mask-groups
        Image mire1.cr2
!       CPU & GPU version differ by 73751 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 32.35596
        Avg dE                   : 0.03100
!       Std dE                   : 0.30956
        ----------------------------------
        Pixels below avg + 0 std : 97.36 %
        Pixels below avg + 1 std : 97.68 %
***************
*** 2739,2745 ****
  
  Test 0083-colorbalancergb
        Image mire1.cr2
!       CPU & GPU version differ by 27722 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 1.33753
--- 2739,2745 ----
  
  Test 0083-colorbalancergb
        Image mire1.cr2
!       CPU & GPU version differ by 27723 pixels
        CPU vs. GPU report :
        ----------------------------------
        Max dE                   : 1.33753

Everything looks good except the test 0052-color-reconstruction, the diff is huge so clearly a bug.

@dterrahe
Copy link
Member Author

Great! Thank you; I'll immediately look at that regression.

Copy link
Member

@TurboGit TurboGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, working on my side and validated against the testsuite. We should be pretty safe, hopefully if it remains some breakages because of some holes in the testsuite we still have quite some filed testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix pull request fixing a bug priority: low core features work as expected, only secondary/optional features don't scope: codebase making darktable source code easier to manage wip pull request in making, tests and feedback needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants