-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High-level description of RAVU algorithm? #9
Comments
Actually I have already got the idea how to improve RAVU with compute shader, but current Anyway, RAVU works like this. The convolution kernel is of size RAVU will upscale an image
It's same as how The convolution kernels are extracted from weights texture based on key Let me know if you have further questions. |
My idea to use compute shader for RAVU is basically the same as your compute shader sample with |
It sounds like you're essentially doing three big convolutions, and then combining the three intermediate results in the final image - yes? This seems like it involve a bunch of redundant sampling work. It seems like it would be faster if you do the sampling once and generate all three intermediate results at the same time, then write them out to the resulting image from a single thread. Basically, what you could do (in principle) is this:
Right now, the user convolution shaders implicitly insert an |
It's an interesting idea. With direct access to How about this
The reason we still need two pass is that
I think a compute shader based 2 pass version of RAVU will bring a huge performance boost.
|
@haasn Actually, without considering the size of shared memory, we could even share calculation of gradients ( Could you share more information about the size limit of shared memory? How large is it typically, maybe on different cards? Is it hard limit or soft limit? What would happen if the size exceeds, huge performance drop? By How much? |
I think you can already do that by just inserting the appropriate GLSL in to the top of the pass. |
Typically 32-64 kB. On my nvidia GPU it's 49 kB. You can see a list here: http://opengl.gpuinfo.org/gl_stats_caps_single.php?listreportsbycap=GL_MAX_COMPUTE_SHARED_MEMORY_SIZE
Compile failure |
See https://www.khronos.org/opengl/wiki/Core_Language_(GLSL)#Extensions, these headers are required to be immediately after
Seems to be enough for RAVU, but good to have it in mind. |
Closing since compute shader port was implemented and optimized to a degree @haasn feel free to ignore the |
I'd like to see if I can figure out any tricks to make it faster, especially with compute shaders.
How does RAVU work on a high level? It has four passes, what do those passes do? Why the weird weights texture?
The text was updated successfully, but these errors were encountered: