-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Material system: Implement GPU frustum culling #1137
base: master
Are you sure you want to change the base?
Conversation
I've fixed the culling now. Just need to deal with the memory issue. |
Also made it work with r_lockPVS so it can be tested now by just doing /r_lockPVS 1 and looking around. |
How do I enable GPU culling? I tried the branch but a look in Orbit shows me that it still uses CPU culling on my end. I enabled |
Do you have /r_arb_bindless_textures 1? I've disabled it by default because of the perf problems on AMD that we observed earlier. But it's required for the material system to work. |
Also, it might crash or something if there are surfaces with more than 4 stages; this can be patched by changing MAX_SURFACE_COMMANDS (both in |
If I do this it sets the cvar:
But I see no difference after both this and |
Sorry, got the wrong cvar name. |
Thanks. I now get some flickering and lines like this (atcshd):
And no So I guess I'm getting this branch running ! 👍️ |
I got the same thing... I think there's somehow concurrent operations going on where the
Ah, that's just some debug output :)
Yup! I'm not sure how it's gonna be on performance right now as the rendering shaders have to wait for the compute ones to complete, but it should be better when I make it double- or triple-buffered either way. |
Fixed most of the flickering issues now and made it double buffered. Some of the flickering issues still remain, but it's probbaly an off-by-1 error somewhere. Also now it crashes on map change again. |
Made this properly double-buffered now: R_RenderView() will queue views for surface culling, then in RB_RenderPostProcess() it will dipsatch computes for the queued views. I haven't tested that it works with multiple views yet though, but I need to cleanup and fix some things first before that. |
I'm not getting any flickering now. Also fixed a crash and hopefully the incorrect memory accesses. |
Excluding the missing fog, I see no visual glitch anymore in ATCSHD, and I don't experience GPU pagefaults anymore. This is coming to be in good shape! 😃️ Framerate is around 370 fps on medium preset @ 4K with a Radeon PRO W7600 On Mesa 23.2.1 radeonsi. I usually get around 530 fps on master. |
I get 400 fps on Mesa 24.0.7 (still 530 fps on master). |
Hmm, I wonder if it's because of bindless textures still... Well, it's also still culling less surfaces right now because the far plane is ignored (we also have it as (0, 0, 0)) and because there's no occlusion culling here yet, so if you're looking e. g. towards one of the sides on atcs it will render all of the surfaces behind walls etc. |
Yes I test with the default spectator scene, so in ATCSHD it means the whole outdoor and the whole alien base is in line of sight. |
Yea... I'll make a separate pr later for occlusion culling, that should fix that part :) |
This works slightly faster for me now after removing some unnecessary branching. |
10a19be
to
936be43
Compare
936be43
to
cc5bbf1
Compare
eab33b3
to
41fa83e
Compare
41fa83e
to
3ad229d
Compare
I've also made this work with multiple different views (i. e. portals) and moved defines to GLHeaders. Surface commands will now use the minimal array size for the maximum amount of stages used on any compatible surface on the map (padded out to be a multiple of 4 for alignment). This required making the |
Frustum culling can now be toggled with |
Builds on #1105.
Implement frustum culling in compute shaders for the material system.
The culling works in 3 steps (performed in 3 different shaders):
clearSurfaces_cp.glsl
all the atomic command counters for the next frame are cleared.cull_cp.glsl
every surfaces bounding sphere is checked against the 5 frustum planes (far plane is skipped because we always have it set to { 0, 0, 0, 0 } for some reason; and we set zFar to encompass the whole map anyway) and the correspondingenabled
field in the surface commands buffer is set for the next frame.processSurfaces_cp.glsl
goes over batches of 64 surfaces for all of the materials. If a material has an amount of surfaces that is not an integer multiple of 64, it is padded out to be such with fake surface commands (all of their fields are always 0). Each material has a corresponding atomic counter in an array. The indirect commands from each enabled surface command are written into an indirect draw buffer. After each command is written the corresponding atomic counter is increased by 1, and the returned value, added with a static material offset, is used as the indirect commands offset.Both of these work in groups of 64 because compute threads are launched in groups of 32 (warp, Nvidia) or 64 (wavefront, AMD). The threads that are going past the last surface just return. Additionally surface commands have to be processed in batches of 64 (surface batches) for this reason and because atomic counter arrays can only be accessed with a dynamically uniform integral expression – data sourced from a UBO with global workgroup ID is such, while per-thread data wouldn't be.
Surface culling also requires all surface commands for every surface: surface command here corresponds to a stage in drawSurf shader. The additional ones (set to id: 0) are the "fake" surface commands which are never actually used, but they have to be there because indirection there is not possible since buffer writes have to be in a dynamically uniform control flow.
All of this is double buffered (MAX_FRAMES == 2) and holds information for MAX_VIEWS * MAX_FRAMES views.
Graph of how this system works:
![image](https://private-user-images.githubusercontent.com/10687142/331795103-7ab0af01-01e3-428e-9477-4b99d5090a0a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzNjA2NjcsIm5iZiI6MTcxODM2MDM2NywicGF0aCI6Ii8xMDY4NzE0Mi8zMzE3OTUxMDMtN2FiMGFmMDEtMDFlMy00MjhlLTk0NzctNGI5OWQ1MDkwYTBhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE0VDEwMTkyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ4NjM1MjEwOWJiZDZmNzEzODJiY2I2YjZlYzZkMTdiZGIzMzM5YjQ2NjVhN2MzZGVhYjdiYzM3NTcxYzVjZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0._sVBEEcBb7U2CFJ_zGyb419JeEWVywLAKsgJdOXi8Nk)