Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPU LLVM: Improve runtime SPU compilation preferences #15250

Merged
merged 5 commits into from Feb 28, 2024

Conversation

elad335
Copy link
Contributor

@elad335 elad335 commented Feb 27, 2024

  1. Prefer using inactive worker threads to ompile new SPU blocks for maximum concurrency.
  2. Postpone thread notifications to when block queue is drained so all threads would start at once, not delaying the managing thread to push more blocks.
  3. If more than one block has the same use count as others, apply for compilation first the one which has been queued earlier. Similarly to distance / time = speed, here the time here is compared (estimation) for rate of block usage.

Some of these affect only CPUs with 12 or more threads at the current implementation, point 3 affects all.

Tests: Tested to improve massively the performance of SPU LLVM ingame compilation for Red Dead Redemption on 14600KF

@elad335 elad335 added the LLVM Related to LLVM instruction decoders label Feb 27, 2024
@Megamouse
Copy link
Contributor

Instead of just making changes that supposedly are faster, how about adding some benchmarks for once, so people can actually grasp the benefits when seeing auch a PR?

@@ -390,9 +390,19 @@ class lf_queue final
item->m_link = load(oldv);
}

if (!oldv)
if (!oldv && Notify)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this reduce instructions if it was behind if constexpr Notify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, compilers detect that Notify is known. if-constexpr is not technically for optimizations, but to avoid compiling code that would otherwise not compile for template functions.

@elad335
Copy link
Contributor Author

elad335 commented Feb 27, 2024

Instead of just making changes that supposedly are faster, how about adding some benchmarks for once, so people can actually grasp the benefits when seeing auch a PR?

Need someone with 12 or more threads to test this on empty SPU cache.

@elad335
Copy link
Contributor Author

elad335 commented Feb 27, 2024

I guess I'll also make it possible to test its performance somewhat accurately mid-game, because the only way the user known now is how much stuttery the game is while compiling.

@readywer
Copy link

Instead of just making changes that supposedly are faster, how about adding some benchmarks for once, so people can actually grasp the benefits when seeing auch a PR?

Need someone with 12 or more threads to test this on empty SPU cache.

I have a 13600k later today i will try to compere the performace to master in some games(GT5,6, R&C games).

@elad335
Copy link
Contributor Author

elad335 commented Feb 27, 2024

keep in mind it's not an FPS test, it's to test how relatively long SPU compilation takes (those green "compiled block successfully messages")

@EmulationChannel
Copy link

EmulationChannel commented Feb 27, 2024

@elad335 I tested 14600KF 14 /20 THE LAST OF US and RED DEAD REDEMPTION very quickly SPU CACHE

@elad335 elad335 merged commit 75ef154 into RPCS3:master Feb 28, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LLVM Related to LLVM instruction decoders
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants