Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel overall local size #171

Open
forreg16 opened this issue Dec 24, 2022 · 3 comments
Open

Kernel overall local size #171

forreg16 opened this issue Dec 24, 2022 · 3 comments

Comments

@forreg16
Copy link

Good afternoon.

When I use the NVIDIA GeForce RTX 3060 Ti graphics card in java-code, I get an error:
Kernel overall local size: 1000 exceeds maximum kernel allowed local size of: 256 failed
Running the same code on an Intel HD Graphics 630 or AMD RadeonT R7 450 graphics card, everything works fine.
If in this part of the code I put a number less than 256, then the code with the NVIDIA GeForce RTX 3060 Ti graphics card works fine:

Range range = needDevice.createRange(255);
kernel.execute(range)

The NVIDIA GeForce RTX 3060 Ti video card is more modern than the Intel HD Graphics 630 or AMD RadeonT R7 450, but for some reason the parameter for createRange is less than for older video cards.
What could be the problem?

@trayanmomkov
Copy link

trayanmomkov commented Mar 11, 2023

Hey @forreg16 I have the same problem. My card is RTX 3070 and I run it on Linux.
The problem happens because the max group size is hardcoded to be 256:
public static final int MAX_OPENCL_GROUP_SIZE = 256;
I don't know why this is the max, I don't have experience with OpenCL.
I hope the maintainers of the project will answer here.
Maybe our option is to change the value and recompile the library but I don't know is there any instructions how to do that?

@forreg16
Copy link
Author

Hey @trayanmomkov.
Try this version of the code. In this case, my parameter size can be set to more than 256.

Range range = needDevice.createRange2D(size, 1); 
kernel.execute(range);

you can see more details here
https://stackoverflow.com/questions/75365328/error-exceeds-maximum-kernel-allowed-local-size

@trayanmomkov
Copy link

trayanmomkov commented Mar 19, 2023

But @forreg16 you can achieve that with create(size, localSize) where localSize <= 256 and size % localSize == 0.
The real problem is that localSize cannot be greater than 256.
On my card which has 5888 cores I want to have greater localSize to achieve better performance.
And actually Aparapi automatically chooses the localSize of 640 but when tries to set it I get the error:

!!!!!!! Kernel overall local size: 640 exceeds maximum kernel allowed local size of: 256 failed (null)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants