Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically allocate arrays using MAX_SCREENWIDTH/HEIGHT #45

Closed
wants to merge 10 commits into from

Conversation

MrAlaux
Copy link
Owner

@MrAlaux MrAlaux commented Jun 20, 2023

Alright, let's get some automated builds and test performance.

It's not without its bugs: flat rendering is distorted and weapon sprites can be drawn cut off when changing resolutions.
If `cachedheight` is a dynamic array, that `memset()` call is problematic. We'll just keep it fixed-size for now.
Well, that was quick...
@MrAlaux MrAlaux changed the title Dynamically allocate screen arrays Dynamically allocate most rendering-related arrays Jun 20, 2023
@liPillON
Copy link

liPillON commented Jun 20, 2023

Hi, finally had the time to try the build resulting from this pr, here are the result.
For comparison, I've included a bunch of data from previous tests.

It seems that this PR has brought Nugget's performances in line with Woof 11.2
The only exceptions being:

  • demonastery (better than nugg-master, worst than woof-11.2)
  • alien vendetta (better than both nugg-master and woof-11.2)
  • pln.wad (same as nugg-master, worst than woof-11.2)

Hopefully when the optimizations from Woof will be merged in Nugget, the performance boost will compensate the costs of making 800p and 1600p available.

I'm still curious about testing the current PR with MAX_HIRES 2

pwad woof 11.2 woof master nugg master nugg pr45
pln 63 106 33 34
eviternity 315 338 290 312
alien vendetta 575 609 550 596
saturnine chapel 260 268 245 258
grove 249 258 238 250
demonastery 471 479 415 431

@MrAlaux
Copy link
Owner Author

MrAlaux commented Jun 20, 2023

Hmm... I take it these tests were performed by warping to some maps and standing still, right? Any chance you could try some timedemos for each build?

@liPillON
Copy link

liPillON commented Jun 20, 2023

With the exception of pln.wad, all tests were timedemos (three runs each, average fps).
This is the same methodology I've used for the results posted in #44

win11 x64
intel i7 1260p
16gb ddr4 dual channel
iris xe graphics
1920x1200 display
-noautoload -nosound
clean-from-scratch cfg (only changes: vsync off, widescreen auto, demo progeessbar on)
between each run I waited for cpu/gpu cool off, no thermal throttling should have happened

@MrAlaux
Copy link
Owner Author

MrAlaux commented Jun 20, 2023

I see, thanks.

@ceski-1 thoughts?

@ceski-1
Copy link

ceski-1 commented Jun 20, 2023

Looks like it's time to review visplane_t again.

@liPillON
Copy link

liPillON commented Jun 21, 2023

for the record, the optimizations I've mentioned were introduced in Woof with these PRs:
fabiangreffrath#1108
fabiangreffrath#1110
fabiangreffrath#1111

@MrAlaux MrAlaux changed the title Dynamically allocate most rendering-related arrays Dynamically allocate arrays using MAX_SCREENWIDTH/HEIGHT Jun 22, 2023
@@ -124,7 +124,7 @@ static void R_ClipWallSegment(int first, int last, boolean solid)

void R_ClearClipSegs (void)
{
memset(solidcol, 0, MAX_SCREENWIDTH);
memset(solidcol, 0, SCREENWIDTH << hires); // [Nugget] Dynamic arrays

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to get away with viewwidth whenever you use SCREENWIDTH << hires in this code change.

src/r_draw.c Outdated Show resolved Hide resolved
src/r_draw.c Outdated Show resolved Hide resolved
src/z_zone.h Outdated Show resolved Hide resolved
@liPillON
Copy link

liPillON commented Jun 23, 2023

BTW I re-run all timedemos because while testing the latest PR I noticed some inconsistencies..
Here are the results: I believe they are more trustworthy since this time I used the same methodology for all tests (the previous table was simple an accumulation of old results).

All timedemos were ran 5 times (letting the cpu cool down in between runs).
Best and Worst results were excluded.
I then calculated the average of the remaining 3 results.

For PLN.WAD I simply warped into the level and stood still in one corner of the map, looking at the opposite corner.

Noteworthy considerations:

  • nugg 1.14 performing better than woof 11.2
  • sensible performance gain in woof master (artifacts run 798: various optimizations)
  • sensible performance hit in nugget master (run69: supporting rendering resolutions up to 1600p)
  • even stronger performance hit with this PR (pr45, run67: dynamic array allocations)

In particular, the last point confirms what @MrAlaux was anticipating here #44 (comment)

Sorry for the misleading previous results...

TIMEDEMOS woof 11.2 woof run798 nugg 1.14 nugg run69 nugg pr45
alien vendetta MAP20 605 643 637 612 582
demonastery 458 508 498 460 448
eviternity MAP26 337 358 347 324 312
grove 256 266 264 253 248
planisfere 2 363 379 370 343 331
saturnine chapel 272 279 275 263 258
sunlust MAP30 275 285 278 258 256
VISPLANE TEST MAP woof 11.2 woof run798 nugg 1.14 nugg run69 nugg pr45
PLN.WAD 60 100 59 35 35

DEMOS:
https://dsdarchive.com/files/demos/av/28588/av20-1357.zip
https://dsdarchive.com/files/demos/demonastery/48438/demonastery-1326.zip
https://dsdarchive.com/files/demos/eviternity/44536/evit26-658.zip
https://dsdarchive.com/files/demos/grove/12807/grov-724.zip
https://dsdarchive.com/files/demos/planisf2/48969/planisf2os910.zip
https://dsdarchive.com/files/demos/satchap/20510/satchap-706.zip
https://dsdarchive.com/files/demos/sunlust/57817/sl30m2321.zip

@MrAlaux
Copy link
Owner Author

MrAlaux commented Jun 23, 2023

Thanks for this, @liPillON.

It seems I was right about this PR's effects, then. If these changes are not improving performance in their current state, I don't think I'll be able to figure out how to make them do so. I'll try tweaking the allocation process as @fabiangreffrath suggested, but as it stands, that as far as I can go.

Nugget 11.4 performing better than Woof 11.2 was a complete surprise, I wonder how that happened.

The performance hit of allowing higher resolutions is certainly unwanted, but I guess we can live with it considering the unexpected performance gain over Woof, and also the fact that further down the road, Woof's latest optimizations will be merged. I might make a branch merging those optimizations early so you can test it.

@liPillON
Copy link

liPillON commented Jun 23, 2023

Sure! When the time comes, feel free to @ me in the discussion for issue 44

While we're here: what about benchmarking a MAX_HIRES 2 branch?

@MrAlaux
Copy link
Owner Author

MrAlaux commented Jun 25, 2023

I changed some things based on most of Fabian's suggestions, but to no avail it seems; the only change that seems to have increased performance was making negonearray static again, hinting at the fact that making arrays dynamic causes a performance loss. I don't think these last changes are worth benchmarking.

I'm pretty sure that I'm the one doing something wrong here, and I'm certain I won't be able to figure out what exactly. As it stands, I can't do anything else with this branch, but I guess I'll leave the PR open for a while in case anyone wants to chime in.

@MrAlaux MrAlaux deleted the screen_alloc branch February 23, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants