Skip to content

Commit 907ab0b

Browse files
committed
ssbo
1 parent 2b2f414 commit 907ab0b

18 files changed

+641
-47
lines changed

opencl/introduction.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,10 @@ OpenCL, like any other language has versions. As of 2013 the latest version is O
2828
- <http://stackoverflow.com/questions/4005935/mix-opencl-with-opengl>
2929
- <http://stackoverflow.com/questions/7907510/opengl-vs-opencl-which-to-choose-and-why>
3030
- <http://stackoverflow.com/questions/8824269/gl-cl-interoperability-shared-texture>
31-
- <https://github.com/9prady9/CLGLInterop>
31+
- <https://github.com/9prady9/CLGLInterop> Works! Started minifying example with: <https://github.com/cirosantilli/CLGLInterop/tree/minify>
32+
- <https://github.com/nvpro-samples/gl_cl_interop_pingpong_st> Build failed with: <https://github.com/nvpro-samples/gl_cl_interop_pingpong_st/issues/1> likely only tested on Windows.
33+
- <https://github.com/halcy/simpleflow> VS build, fluid simulation, preview: <https://www.youtube.com/watch?v=KD2UqBCqfjA>
34+
- <https://github.com/Twinklebear/OpenCL-OpenGL-Interop> VS build
35+
- <http://stackoverflow.com/questions/33575715/opencl-opengl-interop-how-to-fill-a-climagegl>
3236

3337
Also see compute shaders for OpenGL 4.X, they seem to integrate better.

opengl/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
1. Compute shader
3939
1. [compute-shader.md](compute-shader.md)
4040
1. [compute_shader.c](glfw_compute_shader.c)
41+
1. [compute_shader_.c](glfw_compute_shader.c)
4142
1. GLUT
4243
1. [glutBitmapCharacter](bitmap_character.c)
4344
1. [Triangle rotate](triangle_rotate.c)

opengl/compute-shader.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
Vs OpenCL: <http://wili.cc/blog/opengl-cs.html>
44

5+
Vs frament shader: <http://computergraphics.stackexchange.com/questions/54/when-is-a-compute-shader-more-efficient-than-a-pixel-shader-for-image-filterinig>
6+
57
> But why did Khronos introduce compute shaders in OpenGL when they already had OpenCL and its OpenGL interoperability API? Well, OpenCL (and CUDA) are aimed for heavyweight GPGPU projects and offer more features. Also, OpenCL can run on many different types of hardware (apart from GPUs), which makes the API thick and complicated compared to light compute shaders. Finally, the explicit synchronization between OpenGL and OpenCL/CUDA is troublesome to do without crudely blocking (some of the required extensions are not even supported yet). With compute shaders, however, OpenGL is aware of all the dependencies and can schedule things smarter. This aspect of overhead might, in the end, be the most significant benefit for graphics algorithms which often execute for less than a millisecond.
68
79
Examples:
@@ -12,7 +14,30 @@ Examples:
1214

1315
Most interesting files are `ParticleSystem.cpp` and `cs.glsl`.
1416

17+
- <https://community.arm.com/groups/arm-mali-graphics/blog/2014/04/17/get-started-with-compute-shaders>, runnable from their SDK
18+
1519
Applications:
1620

1721
- ray tracing
1822
- ignore objects too far away
23+
24+
## Work group
25+
26+
TODO: what is the advantage of work grops?
27+
28+
Ideally, we would have a single work group, but that hits hardware design limitations (memory locality): <http://stackoverflow.com/questions/39380986/opengl-is-there-a-benefit-to-using-multiple-global-work-groups-for-compute-shad>
29+
30+
- http://gamedev.stackexchange.com/questions/66198/optimal-number-of-work-groups-for-compute-shaders
31+
- https://www.cg.tuwien.ac.at/courses/Realtime/repetitorium/rtr_rep_2014_ComputeShader.pdf
32+
33+
More work groups does not mean faster TODO why? CL exposes `CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE`, but
34+
35+
### Shared memory
36+
37+
Shared memory (SM).
38+
39+
Per work group, faster access in group. This is what characterizes different groups.
40+
41+
General algorithm: copy global memory to shared, and then process there.
42+
43+
Only useful if the given memory is accessed several times.

opengl/glfw_color_array.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ Color interpolation on the fragment shader is automatic.
88

99
#include "common.h"
1010

11-
static const GLuint WIDTH = 500;
12-
static const GLuint HEIGHT = 500;
11+
static const GLuint WIDTH = 512;
12+
static const GLuint HEIGHT = 512;
1313
/* fragColor is passed on to the fragment shader. */
1414
static const GLchar *vertex_shader_source =
1515
"#version 330 core\n"

opengl/glfw_compute_shader.c

Lines changed: 24 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,17 @@
11
/*
22
Compute shader hello world.
33
4+
Does a simple computation, and writes it directly to the
5+
texture seen by the frament shader.
6+
47
This could be done easily on a fragment shader,
58
so this is is just an useless sanity check example.
69
710
The main advantage of compute shaders (which we are not doing here),
8-
shader can do is keep state data on the GPU between draw calls.
11+
is that they can keep state data on the GPU between draw calls.
12+
13+
This is basically the upper limit speed of compute to texture operations,
14+
since we are only doing a very simple operaiton on the shader.
915
1016
TODO understand:
1117
@@ -55,10 +61,10 @@ static const char *compute_shader_source =
5561
"layout (local_size_x = 1, local_size_y = 1) in;\n"
5662
"layout (rgba32f, binding = 0) uniform image2D img_output;\n"
5763
"void main () {\n"
58-
" ivec2 pixel_coords = ivec2(gl_GlobalInvocationID.xy);\n"
64+
" ivec2 gid = ivec2(gl_GlobalInvocationID.xy);\n"
5965
" ivec2 dims = imageSize(img_output);\n"
60-
" vec4 pixel = vec4(pixel_coords.x / float(dims.x), pixel_coords.y / float(dims.y), 1.0, 1.0);\n"
61-
" imageStore(img_output, pixel_coords, pixel);\n"
66+
" vec4 pixel = vec4(gid.x / float(dims.x), gid.y / float(dims.y), 1.0, 1.0);\n"
67+
" imageStore(img_output, gid, pixel);\n"
6268
"}\n";
6369

6470
int main(void) {
@@ -77,8 +83,8 @@ int main(void) {
7783
vao
7884
;
7985
unsigned int
80-
width = 512,
81-
height = 512
86+
width = WIDTH,
87+
height = HEIGHT
8288
;
8389

8490
/* Window. */
@@ -126,6 +132,8 @@ int main(void) {
126132
glBindTexture(GL_TEXTURE_2D, texture);
127133
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
128134
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
135+
/* Same internal format as compute shader input.
136+
* data=NULL to just allocate the memory but not set it to anything. */
129137
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT, NULL);
130138
/* Bind to image unit, to allow writting to it from the compute shader. */
131139
glBindImageTexture(0, texture, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA32F);
@@ -136,20 +144,22 @@ int main(void) {
136144
glDispatchCompute((GLuint)width, (GLuint)height, 1);
137145
glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
138146

139-
/* Draw. */
147+
/* Global state. */
140148
glViewport(0, 0, width, height);
141149
glClearColor(1.0f, 1.0f, 1.0f, 1.0f);
142-
glClear(GL_COLOR_BUFFER_BIT);
143-
glUseProgram(program);
144-
glUniform1i(textureSampler_location, 0);
145-
glBindVertexArray(vao);
146-
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
147-
glBindVertexArray(0);
148-
glfwSwapBuffers(window);
149150

150151
/* Main loop. */
152+
common_fps_init();
151153
while (!glfwWindowShouldClose(window)) {
154+
glClear(GL_COLOR_BUFFER_BIT);
155+
glUseProgram(program);
156+
glUniform1i(textureSampler_location, 0);
157+
glBindVertexArray(vao);
158+
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
159+
glBindVertexArray(0);
160+
glfwSwapBuffers(window);
152161
glfwPollEvents();
162+
common_fps_print();
153163
}
154164

155165
/* Cleanup. */

0 commit comments

Comments
 (0)