perf(gui): Significantly reduce cost of 2d render elements#2514
perf(gui): Significantly reduce cost of 2d render elements#2514xezon merged 27 commits intoTheSuperHackers:mainfrom
Conversation
|
| Filename | Overview |
|---|---|
| Generals/Code/GameEngine/Include/GameClient/Display.h | Adds beginBatch/endBatch/flush public API and protected onBeginBatch/onEndBatch/onFlush hooks; m_isBatching member declared correctly |
| Generals/Code/GameEngine/Source/GameClient/Display.cpp | Implements beginBatch/endBatch/flush lifecycle; guards against double-begin/double-end with early returns; correct hook ordering (flush before endBatch) |
| Generals/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h | Declares batch state members (m_batchTexture, m_batchMode, m_batchGrayscale, m_batchNeedsInit) and overrides for the three batch hooks; uses #pragma once correctly |
| Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp | Core of the optimization: implements onBeginBatch/onFlush/onEndBatch and setup2DRenderState with texture ref-counting; adds early-out AABB clip in drawImage; all batch members initialized in constructor |
| Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplayString.cpp | Adds TheDisplay->flush() before text Render() calls to ensure pending 2D quads are submitted in correct draw order; removes redundant HotKey Render() call |
| Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DInGameUI.cpp | Wraps the entire W3DInGameUI::draw() in beginBatch/endBatch, batching all HUD draw calls into as few GPU submissions as possible |
| Core/GameEngineDevice/Source/W3DDevice/GameClient/W3DView.cpp | Wraps iterateDrawablesInRegion post-draw pass in beginBatch/endBatch, batching in-world 2D overlay draws (health bars, selection indicators) for all visible drawables |
| GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp | Zero Hour mirror of the Generals W3DDisplay changes; batch state, ref-counting, and early-out clipping are identical |
| GeneralsMD/Code/GameEngine/Source/GameClient/Display.cpp | Zero Hour mirror of the Generals Display.cpp batch lifecycle implementation; identical logic |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["W3DInGameUI::draw() / W3DView post-draw"] -->|"beginBatch()"| B[m_isBatching = TRUE\nm_2DRender->Reset]
B --> C{Draw calls:\ndrawImage/drawFillRect\ndrawLine etc.}
C -->|"Same texture+mode"| D[Add_Quad/Add_Tri/Add_Rect\nto m_2DRender buffer]
D --> C
C -->|"Different texture or mode"| E["onFlush():\nm_2DRender->Render()\nm_2DRender->Reset()"]
E -->|"setup2DRenderState:\ntex->Add_Ref()\nold->Release_Ref()"| F[Update batch state:\nm_batchTexture / m_batchMode]
F --> C
C -->|"Text rendering"| G["flush():\nonFlush() - submit pending 2D quads"]
G --> H["m_textRenderer.Render()\nm_textRendererHotKey.Render()"]
H --> C
C -->|"endBatch()"| I["onFlush() - submit remaining quads\nonEndBatch():\nREF_PTR_RELEASE(m_batchTexture)\nm_isBatching = FALSE"]
Reviews (22): Last reviewed commit: "removed accidental file" | Re-trigger Greptile
split single line IF
|
Significantly improves #109 |
optimize clipping check
remove obsolete else
return statement on seperate line
centralized flush
|
"One P1 render-state initialization bug in onBeginBatch() should be fixed before merging to avoid potential cross-frame visual corruption." The P1 finding could be a false positive based on a local variable analysis that omits the downstream hardware synchronization in Render2DClass::Render() (render2d.cpp:601). Render2DClass is a self-synchronizing state machine. Its Render() call (line 628 for textures, line 694 for shaders) explicitly pushes its internal members to the DX8Wrapper on every execution. The tracking variables in W3DDisplay (m_batchTexture, etc.) are mirror-images of Render2DClass's internal members. happy? |
|
No, the P1 finding stands. Your argument about Render2DClass being self-synchronizing addresses hardware state consistency, but not the C++ undefined behavior in W3DDisplay. The issue:
Even if Render2DClass correctly synchronizes hardware state on every
An uninitialized bool means these decisions are based on garbage. The fact that the hardware might recover downstream doesn't make reading uninitialized memory defined behavior. The fix is one line. Keep it. |
endBatch() is re-entrant-safe Clipping fast-path in drawImage remove Redundant m_isBatching = FALSE
…sGameCode into perf/gui-batch
|
I see greptile woke up on the sassy side of the motherboard. |
Another test would be to get a lot of units on screen then select them all so their unit info is being drawn. The health bars make use of the line drawing classes which i believe are inherently affected by this change. |
|
Compile errors need to be fixed before this can be reviewed |
5dd0e6e to
667d24d
Compare
xezon
left a comment
There was a problem hiding this comment.
This is very nice. Can you measure the performance improvement for just the 2D rendering on its own? I expect it will be somewhere north of 95%.
xezon
left a comment
There was a problem hiding this comment.
How much gains does the 2d rendering have now on its own?
|
Interestingly the replays are failing |
|
The replays look like an CI issue. My profiler is currently crashing, so I can't give you a function-level breakdown of the 2D render path alone. Best I have for now is indications: |
|
I would have liked to put a percentage in the title but it looks like we will not have it. |


Reduces draw calls by batching 2D elements by texture state in setup2DRenderState.
Added early-out clipping and refined bounds checks to skip rendering objects outside the active region.
Optimized HotKey rendering by removing a redundant Render() call.