-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCL function argument passing refactoring #12677
Conversation
The regexps used for the actual conversion. wrap "normal" argument:
wrap local variable allocation:
wrap "other" (array pointers usually):
merge multiple arguments (run, at least, 6 times):
swap kernel= and sizes= lines:
bring sizes= line to the front:
move sizes= and args into enqueue:
Introduce separate wrapper for arrays (which was only use case for CLWRAP in user code):
wrap long
split long argument lists over two lines. call as many times as needed:
|
Whow! |
I'll run the testsuite and will report. |
@dterrahe : The regression tests are all OK. |
Any comments on structure/naming conventions etc? Since most of this was auto generated, it should/could be relatively easy to make changes. Don't worry that I spend weeks doing this by hand and would be sensitive about criticism/suggestions. |
Only one point about the generated code. If you could somehow split the lines it would be nice, the length is too much for me and a diff in a console will be hard. |
That's the response I was expecting! :-) EDIT: I think I figured out how to do this in a regexp... |
A pleasure to see you happy :) |
While eyeballing the final result after splitting long lines, I noticed that in two cases a single parameter setting in an There was also one case where in the original code in highlights.c one of the pararmeters in a call was actually set in a different kernel (so it didn't get merged with the others). That seemed "obviously wrong" to me, so I fixed it (in bd5c8a5) but it would still be good if this was reviewed by someone who actually understands opencl (same with db111b0). Regression tests can't find everything... |
In highlights: It was wrong :-) Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While testing I got :
[local laplacian cl] failed: -51
EDIT: with my change to display the error message we have:
12,860130 [local laplacian cl] couldn't enqueue kernel! CL_INVALID_ARG_SIZE
12,860174 [opencl_pixelpipe] [preview] could not run module `bilat' on gpu. falling back to cpu path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also error reported with -d opencl on exposure module:
8,193075 [dt_opencl_check_tuning] use 3503MB (tunemem=ON, pinning=ON) on device `Quadro T1000' id=0
8,350365 [dt_opencl_enqueue_kernel_2d_with_local] kernel 24 on device 0: CL_INVALID_KERNEL_ARGS
8,351739 [opencl_blendop] couldn't enqueue kernel! CL_INVALID_KERNEL_ARGS
8,351744 [opencl_pixelpipe] [full] could not run module `exposure' on gpu. falling back to cpu path
8,553919 [dt_opencl_enqueue_kernel_2d_with_local] kernel 24 on device 0: CL_INVALID_KERNEL_ARGS
8,555488 [opencl_blendop] couldn't enqueue kernel! CL_INVALID_KERNEL_ARGS
That seems to be in |
&offs[0] ? |
the size_t changes were mostly to make sure that any
multiplications/operations on things that should be both big and unsigned
stayed in realm of unsigned. That mostly started if i remember correctly
from maths error where big enough image could cause mem allocation request
to be netagive. That plus comparing signed and unsigned ints in C code can
lead to problems... If one makes sure to be cognizant of both, properly
handle casts & math issues and keep consitent then it's fine.
sob., 22 paź 2022 o 19:54 dterrahe ***@***.***> napisał(a):
… ***@***.**** commented on this pull request.
------------------------------
In src/common/locallaplaciancl.c
<#12677 (comment)>
:
> err = dt_opencl_enqueue_kernel_2d(b->devid, b->global->kernel_gauss_reduce, sizes);
if(err != CL_SUCCESS) goto error;
}
for(int k=0;k<num_gamma;k++)
{ // process images
const float g = (k+.5f)/(float)num_gamma;
- dt_opencl_set_kernel_arg(b->devid, b->global->kernel_process_curve, 0, sizeof(cl_mem), &b->dev_padded[0]);
size_t seems awfully big for one dimension of an image, but I guess the
advantage is you can safely multiply without having to cast first. If I
remember correctly @johnny-bit <https://github.com/johnny-bit> did a lot
of work on that at some point, but enforcing consistency, where *all*
widths and heights would be size_t is a long way off. So then maybe
having them all be ints and enforcing casts everywhere is the more
consistent way? Or just deal with them case by case. Copy/pastes from a
situation using size_t to one with ints will continue to introduce problems
though.
At least I feel that this PR making this explicit, rather than just
accepting mismatched sizeof(type)s and actual types is "better"?
—
Reply to this email directly, view it on GitHub
<#12677 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACRRKFPGOZZ67RX5TJ76JXLWEQS6DANCNFSM6AAAAAARKUHQTY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Pozdrawiam,
Hubert Kowalski
|
Sounds like it should yes ! |
@@ -35,7 +35,7 @@ typedef struct dt_bilateral_cl_t | |||
{ | |||
dt_bilateral_cl_global_t *global; | |||
int devid; | |||
size_t size_x, size_y, size_z; | |||
int size_x, size_y, size_z; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if unsigned int won't be better here? Same question below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, I guess yes. In practice, "all" width/height variables and struct members seem to be using int
, so it would "look" weird and may lead to more surprising behavior than just having to keep one thing in your head. If it ever makes a difference anywhere...
Personally I'd rather be consistent than partially more "correct", but if you'd like me to change it, I won't object.
I ran a before/after this refactor with all modules active and a few different blending options and got different argument sizes for these:
I looked into all of them and they all turned out to use size_t, from the definitions of shared structs. I changed those and checked if that would have an impact on any calculations using them. In a few cases I moved a multiplication with a sizeof in front, instead of after, it so that the type promotion would happen in time to give the same result. In many cases, the value in one of these size_ts gets simply assigned to int variables, so there's not much point in their added precision anyway. Also made the BIG CAVEAT!!! My test clearly didn't pick up all kernels, as the mismatch requiring the CLARRAY changes didn't show up in the first place. I don't know if there is a runtime test to catch all opencl kernels? Some of them may be in deprecated modules. Maybe there's a style that uses all of them? (otherwise would be good to have for testing; but I wouldn't know how to create one except by painstakingly going over all the (combinations of) options in all the modules and blending and making sure all code is covered...) |
I've tested, here is what I've done to gain confidence that this does not break something. I have run the regression testsuite without this PR and with this PR. I have then compared the logs. Note that the testsuite contains tests for most if not all modules even the deprecated ones.
Everything looks good except the test |
Great! Thank you; I'll immediately look at that regression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, working on my side and validated against the testsuite. We should be pretty safe, hopefully if it remains some breakages because of some holes in the testsuite we still have quite some filed testing.
Inspired by this old comment
darktable/src/common/locallaplaciancl.c
Line 194 in 086d7b2
this replaces this
with this
This hopefully makes it simpler and less error prone to set up the link to opencl kernels.
The actual refactoring was done with a set of 7 regexps, with the first commit making sure that a
sizes
declared immediately before the first argument is not used later on, and so can be safely integrated in the "enqueue" call and deleted.All arguments need to be wrapped in a CLANG or CLLOCAL macro, which takes care of the
sizeof
calculation (and adds some error checking, but of course it is still not aware of the actual opencl function parameters).This should give identical results and my limited testing doesn't immediately show problems. A full regression test should be run before considering merging this, but I never manage to do this on my machine, so if somebody with a faster one all set up already would be willing to help, this would be much appreciated!
The only actual change/bug fix (maybe; this is the 3rd commit) is in
nlmeans_core.c
; a reference to an array of two ints was passed, which now generates a warning that thesizeof
is probably not what you want, whereas previously that was just ignored, as the compiler had no way of linking the two together. I can't quickly figure out what these functions are supposed to do and how to debug this, so it would be great if somebody more knowledgeable could have a look. It does seem that elsehwereconst int2
is just passed as the array (pointer) itself and not a reference to it.