Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size limitation in FilterActs? #4

Open
GoogleCodeExporter opened this issue May 12, 2015 · 3 comments
Open

size limitation in FilterActs? #4

GoogleCodeExporter opened this issue May 12, 2015 · 3 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. use a big image like 512 x 512
2. put lots of filters (like 64)
3. have lots of color channels (again 64?)

What is the expected output? What do you see instead?
I expect a big filtered image, but instead it crashes. 

The blocks are defined such that blocks.y > (2^16) so CUDA refuses to launch 
the kernel.

I'm not sure I understand how to set the number of modules when doing a normal 
convolution, but it seems that an outer loop is required. The trouble with an 
outer loop is that the data is arranged in such a way that it is impossible to 
apply just a fraction of the  filters, or to process just some of each image. 
The data arrangement makes it natural to process just some of the image 
channels... but the color channels don't come into the blocking structure.

Basically... can I use this kernel to perform big convolutions?

Original issue reported on code.google.com by james.be...@gmail.com on 7 Mar 2012 at 6:55

@GoogleCodeExporter
Copy link
Author

Hi James,

You're right, there is a maximum-grid-size-imposed limitation on the size of 
the convolution that can be performed in filterActs. I'm not really sure what 
to do about it yet, though. One hacky solution if you really need to perform 
such a big convolution is to split your filters into several sets, each set in 
its own matrix. Then call filterActs for each set. The target matrix can be the 
same for all calls (just a different offset into the same array). The gradient 
computation routines would have to be called twice too.

I'll probably come up with something better in the future but for now it's an 
unfixed bug. 

Alex

Original comment by akrizhev...@gmail.com on 10 Mar 2012 at 9:57

@GoogleCodeExporter
Copy link
Author

CUDA compute compatibility 2.x or lower has a limitation that x, y and z 
dimension of  a grid must be smaller than 65536. However when it comes to 3.x, 
x dimension of a grid can be as large as 2^31-1, which will be enough for 
larger pictures. The original function can be enhanced just by swapping x and y 
dimension. 

Original comment by haohuali...@gmail.com on 26 Dec 2013 at 12:37

@GoogleCodeExporter
Copy link
Author

The suggested fix sounds very promising, Any plans to swap x and y in the 
future?

Original comment by brosch....@gmail.com on 7 Jan 2014 at 7:59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant