Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taito/taito_f3_v.cpp: regain performance after major rewrite #12312

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

y-ack
Copy link
Contributor

@y-ack y-ack commented Apr 26, 2024

addresses my own concerns with #11811 speed regression against previous implementation.

  • switch AoS z buffers and per-pix blend info to SoA
  • allow vectorization of line blending operation
  • regains empty line optimization by tracking tilemap row usage
  • consolidate sprite framebuffers (we still pull from it multiple times for each sprite priority group)
  • other minor wins from safe logic reorderings

@y-ack
Copy link
Contributor Author

y-ack commented Apr 26, 2024

-window -nomaximize -bench 240 <set>
of 1
Windows 11 / CI (Windows) / AMD Ryzen 7 7840HS

set e967a70 pre-rewrite 563b63f rewrite 55c60e5 this pr
ringrage 606.71% 533.30% 630.98%
arabianm 685.52% 574.87% 695.64%
ridingf 635.20% 520.65% 608.64%
gseeker 683.36% 618.98% 743.95%
commandw 630.25% 560.72% 634.03%
hthero93 717.49% 588.87% 710.68%
scfinals 694.80% 587.99% 720.88%
trstar 690.49% 578.01% 706.09%
gunlock 609.94% 553.95% 639.41%
lightbr 668.88% 545.62% 651.80%
kaiserkn 637.06% 558.43% 649.98%
dariusg 724.94% 580.85% 733.24%
bubsymphj 686.23% 533.20% 646.04%
spcinvdj 721.39% 577.61% 729.34%
hthero95 667.58% 559.33% 681.61%
qtheater 692.09% 535.16% 660.61%
elvactr 738.29% 611.27% 736.44%
spcinv95 670.14% 561.78% 655.50%
twinqix 721.19% 581.23% 699.54%
tcobra2 639.72% 544.45% 607.79%
bubblem 616.16% 570.59% 661.85%
cleopatr 593.05% 497.74% 606.97%
arkretrn 599.81% 525.94% 599.10%
kirameki 698.47% 570.16% 673.45%
puchicar 585.04% 511.76% 591.27%
popnpop 598.05% 514.02% 606.06%
landmakr 735.18% 578.79% 700.83%

Windows 10 / CI (Windows) / Intel Core i5-7300U

set e967a70 pre-rewrite 563b63f rewrite 55c60e5 this pr
ringrage 289.16% 248.27% 295.81%
arabianm 313.46% 272.34% 333.73%
ridingf 285.56% 226.03% 263.36%
gseeker 299.63% 283.37% 337.42%
commandw 277.50% 249.16% 282.81%
dariusg 322.64% 277.24% 332.32%
bubblem 294.36% 261.99% 302.12%
kirameki 311.88% 274.02% 312.47%
puchicar 251.74% 206.12% 267.41%
per-commit benchmark

-window -nomaximize -sound none -bench 60 commandw
of 3
WSL 2.0.9.0 / AMD Ryzen 7 7840HS

commit description mean std.dev.
072367d pre-fredyeye cleanup 502.53% 2.90%
59ae6c1 pre-rewrite 537.51% 0.71%
563b63f f3 video rewrite 466.17% 1.42%
5936644 vas cleanup 455.61% 3.51%
f91b896 [rebase point] 467.18% 2.77%
e5e3bd8 SoA/blend vectorization* 520.14% 3.58%
7dcaecd AoSoA mistake fix* 519.07% 8.68%
cbc92f3 merge sprite framebuffers 526.05% 0.84%
734879e tilemap line usage* 529.54% 2.75%
49e45bd mix_line ref params 534.88% 4.01%
7241a37 text line usage* 544.14% 2.16%
7fce16a fix extend+alt case 535.00% 5.65%
8412c51 savestate correctness 532.07% 1.80%
53d541d strategic uint or layout jostling? 541.07% 3.48%

* validated in -O1 by callgrind cycle counting

i found commandw to be a good test case because it does heavy playfield and sprite scaling work for most scenes in its attract sequence, however, it does have a 6 second completely blank boot.
as shown, most sets recover more unthrottled speed than was lost, and the ones that do not still recover most of it.

this system runs slower in general than many other arcade systems in MAME (the test ryzen here gets ~1700% on ibara, 4000% mrdo), but we found that this is not actually due to graphics bottlenecks.
skipping all screen_update work, from here, achieved only ~18% (100 unthrottled percentage points) increase,
while disabling the ensoniq subdevices resulted in ~180% (+1000% to 1635% unthrottled percentage points) increase (it has to emulate, like 3 or 4 processors in there, with synchronization)

@y-ack y-ack marked this pull request as ready for review April 28, 2024 14:58
@y-ack y-ack changed the title taito/taito_f3_v.cpp: improve performance taito/taito_f3_v.cpp: regain performance after major rewrite Apr 28, 2024
@y-ack y-ack marked this pull request as draft April 28, 2024 23:09
@y-ack y-ack marked this pull request as ready for review April 29, 2024 00:10
@y-ack y-ack marked this pull request as draft April 30, 2024 20:25
@y-ack
Copy link
Contributor Author

y-ack commented Apr 30, 2024

pivot layer regression found in vertical games, marking as draft again.

…e etc.)

because we are translating from pre-flipped tilemap coordinates to non-flipped memory coordinates.
(in general, this handling of flipscreen is bug-prone and should be given more thought...)
@y-ack y-ack marked this pull request as ready for review May 2, 2024 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants