Permalink
Fetching contributors…
Cannot retrieve contributors at this time
120 lines (91 sloc) 5.71 KB
Dynamic ambient occlusion, based on GPU Gems 2 chapter 14, online on
http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter14.html
(PDF version on
http://http.download.nvidia.com/developer/GPU_Gems_2/GPU_Gems2_ch14.pdf).
Idea:
- Crude assumption: model is actually a number of elements, each element
corresponding to a vertex of orig model, with normal and area correctly
calculated.
See menu "Program -> Show Elements".
- Encode elements info into the texture.
- On each vertex do (in vertex shader): calculate AO using elements info.
Practice:
- First of all, this AO will have too much shadow (multiple shadows
on the same item). So calculate 2nd time, multiplying by 1st result.
The key is that "if element is already in shadow, then it also doesn't
contribute anything to the other shadows".
2nd pass will make the result too much light (because we multiplied
by results of 1st pass, which were too dark...). Theoretically,
more passes are needed. In practice: the result is between 1st
and 2nd passes, so just at the end of 2nd pass take the results
and average.
- Note that you cannot actually do this in vertex shader: with arch 3,
texture2D calls in vertex shader are not allowed or
very very horribly-terribly slow. And you want to read texture a lot
to get all element infos.
(I tried to do the 1st testing implementation
of one pass on vertex shader, it simply doesn't work when tested on older
NVidia and works with 1 frame / couple seconds (even with 1 element count!!!)
on fglrx.)
- So do this on fragment shader. Also, render such that 1 screen pixel =
one element, and simply write result back to buffer. Using the texture,
1st shader passes results to the 2nd, and 2nd passes it to the CPU.
For communication between shaders, texture data doesn't have
to go through CPU --- glCopyTexImage2D is perfectly designed for such tricks.
For actual rendering, I grab the colors to the CPU --- while it would be possible
to avoid this, I would then run into trouble trying to access textures
from vertex shader, which is practically prohibited for shaders gen <= 3,
see above notes.
- Shader optimization notes:
- Remember to expand constant variables for the shader, instead
of making them uniforms. This is important, not only for speed,
also for correctness: older GPUs need to unroll "for" loop
(to elements_count) at compilation.
(I didn't expand all possible vars, because fglrx doesn't tolerate
floats with fractional part in GLSL code; so to be safe for fglrx too,
only ints can be expanded such.)
- Above unrolling also means that this may just get too difficult
for older GPUs for larger models: too many instructions after unrolling.
E.g. my NVidia GPU (kocury) "GeForce FX 5200/AGP/SSE2/3DNOW!" handles
smaller models (simplico), but fails with
"error: too many instructions" on larger ones (peach, chinchilla).
More tests show that the border is somewhere between
24 (Ok) and 42 (not Ok) verts (for this GPU, of course;
much newer Radeon on MacBookPro can handle at least 20k (although
naive implementation gets really slow then).
- Note that GPU Gems text gives two (different) equations for calculating
element-to-element shadowing, both of them wrong IMO...
Yes, I did implemented and tried them. It's a matter of changing
"color -= ..." line.
Math eq on page 3 is equivalent to
color -= sqrt(sqr_distance) * cos_emitter_angle *
max(1.0, 4.0 * cos_current_angle) /
sqrt(element_area / pi + sqr_distance);
Code on page 4 is (possibly --- not sure, as they put "1 - " part
already there, so I don't see a reasonable way to use it?) equivalent to
color -= inversesqrt((element_area/pi)/sqr_distance + 1) *
cos_emitter_angle *
min(1.0, 4.0 * cos_current_angle);
If you grok reasoning behind their equations, and/or if there's an
error in my implementations above, mail me. My implementation follows
normal approximation for solid angle:
element_area * cos_emitter_angle / sqr_distance
We also multiply by cos_current_angle
(as light coming from an angle is scaled down, so blocker for
which cos_current_angle is small blocks less light).
Advantages:
- Since all work is done on the fly, the scene may be absolutely dynamic. (On changes, list of elements must be updated, but this shouldn't be so bad with a few optimizations?). Any deformations, transformations, changes --- all is allowed, nothing needs to be precomputed.
For example, see:
data/chinchilla_awakens (timesensor animation) and
data/dynamic_world_ifs (interactive world changing, press left mouse down, use keys werxdf, uiojkl)
- Can make bent normals, indirect lighting almost for free.
- Although the work is done on shaders, resulting colors are grabbed by CPU, and may be fed to fixed-function gl pipeline at the end. This is important, makes it much easier to plug this technique into existing renderer.
- Can work with practically any 3d model.
For example, check out castle/data/levels/fountain/fountain_final.wrl
Disadvantages:
- Like PRT and shadow fields, this requires a large number of vertexes (elements) to look good, and still number vertexes is also the key thing that determines speed. (On the up side, it's not difficult to make lower LOD versions of elements, since they are so simple? Hierarchy idea will also help a lot.)
On dynamic models, when too few vertexes are available, it can be seen as moving shadows are "sticky" to vertexes.
- Requires a good GPU. Fragment shader must do a *lot* of work, a lot of texture lookups. Simply not possible on older GPUs.
TODO: our current method of generating elements works only for nodes
with explicit vertexes, so will not work for VRML primitives (sphere,
cone and and such).