Skip to content

perf(gui): Significantly reduce cost of 2d render elements#2514

Merged
xezon merged 27 commits intoTheSuperHackers:mainfrom
githubawn:perf/gui-batch
Apr 19, 2026
Merged

perf(gui): Significantly reduce cost of 2d render elements#2514
xezon merged 27 commits intoTheSuperHackers:mainfrom
githubawn:perf/gui-batch

Conversation

@githubawn
Copy link
Copy Markdown

@githubawn githubawn commented Mar 31, 2026

Reduces draw calls by batching 2D elements by texture state in setup2DRenderState.

Environment Scenario (-quickstart) Before After
Main Machine Skirmish Screen 477 FPS 918 FPS
M1 Macbook (Wine) Skirmish Screen 16 FPS 61 FPS
M1 Macbook (Wine) In-Game (Match Start) 22 FPS 33 FPS
Main Machine Selecting a thousand units in skirmish 45 FPS 56 FPS

Added early-out clipping and refined bounds checks to skip rendering objects outside the active region.

Optimized HotKey rendering by removing a redundant Render() call.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 31, 2026

Greptile Summary

This PR implements a 2D render batching system that significantly reduces GPU draw calls by accumulating geometry sharing the same texture/mode into a single Render2DClass flush, rather than submitting one draw call per element. A new beginBatch/endBatch/flush API is added to Display, with W3D-specific implementations managing texture reference counting and render state changes. Additional optimizations include early-out bounds clipping in drawImage and removal of a redundant Render() call for HotKey text rendering. The perf numbers in the description are compelling (477→918 FPS on the skirmish screen).

Confidence Score: 5/5

Safe to merge — no correctness, data, or security issues found; all batch member lifetimes and texture ref-counts are balanced

Texture reference counting through the batch lifecycle is correct for both RAW_TEXTURE and asset-manager-owned textures. The early-out AABB bounds check in drawImage is correct for screen-space coordinates. The flush() interlock before text rendering preserves draw order. No P0 or P1 findings; prior review concerns about m_batchNeedsInit initialization and Release_Ref ordering have both been resolved.

No files require special attention

Important Files Changed

Filename Overview
Generals/Code/GameEngine/Include/GameClient/Display.h Adds beginBatch/endBatch/flush public API and protected onBeginBatch/onEndBatch/onFlush hooks; m_isBatching member declared correctly
Generals/Code/GameEngine/Source/GameClient/Display.cpp Implements beginBatch/endBatch/flush lifecycle; guards against double-begin/double-end with early returns; correct hook ordering (flush before endBatch)
Generals/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Declares batch state members (m_batchTexture, m_batchMode, m_batchGrayscale, m_batchNeedsInit) and overrides for the three batch hooks; uses #pragma once correctly
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Core of the optimization: implements onBeginBatch/onFlush/onEndBatch and setup2DRenderState with texture ref-counting; adds early-out AABB clip in drawImage; all batch members initialized in constructor
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplayString.cpp Adds TheDisplay->flush() before text Render() calls to ensure pending 2D quads are submitted in correct draw order; removes redundant HotKey Render() call
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DInGameUI.cpp Wraps the entire W3DInGameUI::draw() in beginBatch/endBatch, batching all HUD draw calls into as few GPU submissions as possible
Core/GameEngineDevice/Source/W3DDevice/GameClient/W3DView.cpp Wraps iterateDrawablesInRegion post-draw pass in beginBatch/endBatch, batching in-world 2D overlay draws (health bars, selection indicators) for all visible drawables
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Zero Hour mirror of the Generals W3DDisplay changes; batch state, ref-counting, and early-out clipping are identical
GeneralsMD/Code/GameEngine/Source/GameClient/Display.cpp Zero Hour mirror of the Generals Display.cpp batch lifecycle implementation; identical logic

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["W3DInGameUI::draw() / W3DView post-draw"] -->|"beginBatch()"| B[m_isBatching = TRUE\nm_2DRender->Reset]
    B --> C{Draw calls:\ndrawImage/drawFillRect\ndrawLine etc.}
    C -->|"Same texture+mode"| D[Add_Quad/Add_Tri/Add_Rect\nto m_2DRender buffer]
    D --> C
    C -->|"Different texture or mode"| E["onFlush():\nm_2DRender->Render()\nm_2DRender->Reset()"]
    E -->|"setup2DRenderState:\ntex->Add_Ref()\nold->Release_Ref()"| F[Update batch state:\nm_batchTexture / m_batchMode]
    F --> C
    C -->|"Text rendering"| G["flush():\nonFlush() - submit pending 2D quads"]
    G --> H["m_textRenderer.Render()\nm_textRendererHotKey.Render()"]
    H --> C
    C -->|"endBatch()"| I["onFlush() - submit remaining quads\nonEndBatch():\nREF_PTR_RELEASE(m_batchTexture)\nm_isBatching = FALSE"]
Loading

Reviews (22): Last reviewed commit: "removed accidental file" | Re-trigger Greptile

Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
@githubawn
Copy link
Copy Markdown
Author

githubawn commented Mar 31, 2026

Significantly improves #109
Fully fixing this requires significant refactoring to Render2DSentenceClass for fonts

optimize clipping check
Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread GeneralsMD/Code/GameEngine/Source/GameClient/Display.cpp Outdated
return statement on seperate line
centralized flush
@githubawn
Copy link
Copy Markdown
Author

@greptileai

"One P1 render-state initialization bug in onBeginBatch() should be fixed before merging to avoid potential cross-frame visual corruption."

The P1 finding could be a false positive based on a local variable analysis that omits the downstream hardware synchronization in Render2DClass::Render() (render2d.cpp:601).

Render2DClass is a self-synchronizing state machine. Its Render() call (line 628 for textures, line 694 for shaders) explicitly pushes its internal members to the DX8Wrapper on every execution. The tracking variables in W3DDisplay (m_batchTexture, etc.) are mirror-images of Render2DClass's internal members.

happy?

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 31, 2026

No, the P1 finding stands.

Your argument about Render2DClass being self-synchronizing addresses hardware state consistency, but not the C++ undefined behavior in W3DDisplay.

The issue:

  • onFlush() reads !m_batchNeedsInit (line 617 of your diff)
  • setup2DRenderState() reads !m_batchNeedsInit (line 627)
  • If uninitialized, these branches are unpredictable

Even if Render2DClass correctly synchronizes hardware state on every Render() call, W3DDisplay's control flow still depends on m_batchNeedsInit to decide:

  1. Whether to call Render() in onFlush()
  2. Whether to skip redundant state setup in setup2DRenderState()

An uninitialized bool means these decisions are based on garbage. The fact that the hardware might recover downstream doesn't make reading uninitialized memory defined behavior.

The fix is one line. Keep it.

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

I see greptile woke up on the sassy side of the motherboard.

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

Reduces draw calls by batching 2D elements by texture state in setup2DRenderState.

Environment Scenario (-quickstart) Before After
Main Machine Skirmish Screen 477 FPS 918 FPS
M1 Macbook (Wine) Skirmish Screen 16 FPS 61 FPS
M1 Macbook (Wine) In-Game (Match Start) 22 FPS 33 FPS

Another test would be to get a lot of units on screen then select them all so their unit info is being drawn.

The health bars make use of the line drawing classes which i believe are inherently affected by this change.

@githubawn
Copy link
Copy Markdown
Author

I was initially going to say that health bars render in drawViews() (outside this batch), but your comment made me realize it's only a 6-line change to wrap that too.

Jumped from 45 FPS to 56 FPS.

image

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

I was initially going to say that health bars render in drawViews() (outside this batch), but your comment made me realize it's only a 6-line change to wrap that too.

Jumped from 45 FPS to 56 FPS.
image

Nice ~30% improvement there

@Skyaero42
Copy link
Copy Markdown

Compile errors need to be fixed before this can be reviewed

Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplayString.cpp Outdated
Comment thread GeneralsMD/Code/GameEngine/Include/GameClient/Display.h Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
@xezon xezon changed the title perf(gui): implement batched UI rendering perf(gui): Reduce cost of 2d render elements by 30% to 70% Apr 14, 2026
@xezon xezon added GUI For graphical user interface Major Severity: Minor < Major < Critical < Blocker Performance Is a performance concern Gen Relates to Generals ZH Relates to Zero Hour labels Apr 14, 2026
@xezon xezon changed the title perf(gui): Reduce cost of 2d render elements by 30% to 70% perf(gui): Significantly reduce cost of 2d render elements Apr 14, 2026
Copy link
Copy Markdown

@xezon xezon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice. Can you measure the performance improvement for just the 2D rendering on its own? I expect it will be somewhere north of 95%.

Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngine/Include/GameClient/Display.h Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Copy link
Copy Markdown

@xezon xezon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much gains does the 2d rendering have now on its own?

Comment thread Generals/Code/GameEngine/Include/GameClient/Display.h Outdated
@xezon
Copy link
Copy Markdown

xezon commented Apr 18, 2026

Interestingly the replays are failing

@githubawn
Copy link
Copy Markdown
Author

The replays look like an CI issue.

My profiler is currently crashing, so I can't give you a function-level breakdown of the 2D render path alone.

Best I have for now is indications:
The Skirmish screen showing 918 fps instead of 477 fps when using -quickstart.
When in a match with the 3d viewport disabled it jumped from 2200-3300 fps to 2700-3700 fps.
Selecting 1,000 hackers caused an immediate drop from 56 FPS to 45 FPS, with this PR, the stays near the same even with all 1,000 units selected.

Comment thread GameClient_Reference Outdated
@xezon xezon changed the title perf(gui): Significantly reduce cost of 2d render elements perf(gui): Reduce cost of 2d render elements Apr 19, 2026
@xezon
Copy link
Copy Markdown

xezon commented Apr 19, 2026

I would have liked to put a percentage in the title but it looks like we will not have it.

@xezon xezon changed the title perf(gui): Reduce cost of 2d render elements perf(gui): Significantly reduce cost of 2d render elements Apr 19, 2026
@xezon xezon merged commit 471dc26 into TheSuperHackers:main Apr 19, 2026
17 checks passed
@githubawn githubawn deleted the perf/gui-batch branch April 19, 2026 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Gen Relates to Generals GUI For graphical user interface Major Severity: Minor < Major < Critical < Blocker Performance Is a performance concern ZH Relates to Zero Hour

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants