Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsx: PS3 Native frame limiter improvements, add Infinite frame limiter #12052

Merged
merged 2 commits into from
Jun 25, 2022

Conversation

elad335
Copy link
Contributor

@elad335 elad335 commented May 21, 2022

  • Do not wait on DEVICE 0x30 semaphore, it seems like it is something to do with queue command synchronization.
  • This also fixes cellGcmSetFlipWithWaitLabel which is built specifically to enable accurate RSX flipping time, its waiting command is confirmed to be placed AFTER DEVICE 0x30 waiting.
  • Fix default VSYNC state to be enabled. (and set it to be enabled in cellGcmSetVBlankFrequency and cellVideOutConfigure as well)
  • Add experimental "Infinite" frame limiter mode.
  • Fix spurious enabling of second vblank.

@elad335
Copy link
Contributor Author

elad335 commented May 21, 2022

I've added a new frame limiter called "Infinite" which basically imposes no limits whatsoever and sends an additional VBLANK each frame .
This is an extreme "Off" frame limiter type.
I've also added "Vblank" type which is the same behavior as before #12045.

@kd-11
Copy link
Contributor

kd-11 commented May 21, 2022

I've added a new frame limiter called "Infinite" which basically imposes no limits whatsoever and sends an additional VBLANK each frame . This is an extreme "Off" frame limiter type. I've also added "Vblank" type which is the same behavior as before #12045.

We already know games are using vblank count for timing. This makes things worse imo.
The correct solution here is to not wait for the flip semaphore but to implement hardware triple buffering for RSX. A display queue is used and we check the semaphore during flip()

@kd-11
Copy link
Contributor

kd-11 commented May 21, 2022

If the semaphore is not set and we have a frame to consume, go ahead and consume it. Hopefully by the time that frame is ready the old one will have been cleared

@kd-11
Copy link
Contributor

kd-11 commented May 21, 2022

The display queue already exists by the way, it just needs some tweaks to bind it to a RSX semaphore. I'd like to extend this in future by using the native GPU's semaphore system which does a very similar thing so its important we get the architecture right.

@solarmystic
Copy link

With Framelimit set to VBLANK, performance is back to normal again. Frame times are also smoother.

image

@Asinin3
Copy link
Contributor

Asinin3 commented May 21, 2022

God of War 3 still has more spikes and slower than before when using auto framelimiter.

Before HW Framelimiter:
13626_neozcull_9EEgKAK67A

After:
13627_hw_framelimit_77xzUCiHis

After this fixup PR:
framelimiter_g9TJhHnWTh

@elad335
Copy link
Contributor Author

elad335 commented May 21, 2022

I've switched the auto and vblank's demeanor, also please test the new "infinite" frame limiter which is supposed to be a replacement for upping the vblank rate for higher framerates in many games.

@Asinin3
Copy link
Contributor

Asinin3 commented May 21, 2022

Vblank limiter is worse now.

old commit:
framelimiter_YA32mtm87j
New Commit:
hw_tsEd1a16WN

Infinite in new commit:
hw_2Yl0TzH8HZ
1-3fps worse than Off for me.

@elad335
Copy link
Contributor Author

elad335 commented May 21, 2022

For clarification - off is the fastest option, infinite is supposed to be used with games that have an internal vblank-based frame limiter.

@solarmystic
Copy link

solarmystic commented May 21, 2022

With these latest commits, God of War Ascension behaves as follows with each framelimiter option:-

Auto - frame time graph is not so spikey but performance is still lower compared to pre HW limiter behaviour
Auto FL

Vblank - the spikiest frametimes, and the least performant option.
VBLANK FL

Infinite - better than Vblank and Auto, and matches pre HW limiter behaviour in that scene.
Infinite FL

Pre HW limiter build (0.0.22-13626) for comparison:-
rpcs3_2022_05_21_17_52_50_543

@Megamouse Megamouse added the RSX label May 21, 2022
@elad335 elad335 changed the title rsx: Redesign frame limiter rsx: Implement frame semaphore May 21, 2022
@elad335 elad335 changed the title rsx: Implement frame semaphore rsx: Implement frame semaphore (cellGcmSetWaitFlip) May 21, 2022
@elad335 elad335 changed the title rsx: Implement frame semaphore (cellGcmSetWaitFlip) rsx: Implement frame semaphore May 22, 2022
rpcs3/Emu/system_config.h Outdated Show resolved Hide resolved
@@ -3258,6 +3275,11 @@ namespace rsx
flip_status = CELL_GCM_DISPLAY_FLIP_STATUS_DONE;
m_queued_flip.in_progress = false;

if (g_cfg.video.vblank_loop)
{
post_vblank_event(rsx::uclock());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it should just be part of "infinite" vblank mode and does not need the extra loop option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

rpcs3/Emu/RSX/RSXThread.cpp Outdated Show resolved Hide resolved
@elad335 elad335 requested a review from kd-11 June 21, 2022 17:25
@elad335 elad335 changed the title rsx: Implement frame semaphore rsx: PS3 Native frame limiter improvements, add Infinite frame limiter Jun 21, 2022
@elad335 elad335 marked this pull request as ready for review June 21, 2022 17:25
rpcs3/rpcs3qt/tooltips.h Outdated Show resolved Hide resolved
if (a5 == 1u)
{
// This function resets vsync state to enabled
render->requested_vsync = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the counterpart to disable this? We know PS3 had engines that had tearing when fps dropped below a threshold (e.g cryengine 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see sometimes in logs that the games reset it afterward and after cellVideoOutConfigure, I will add to it as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it is unlikely that they are calling videoOutConfigure during gameplay as that is a heavy operation. It is more likely to be toggled using this method. Is it known what happens if a5 is 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that games also react to the unused CellVideoOutCallback. (maybe it's called when the display is changed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's controlled by CellGcmSetFlipMode. You can choose 3 modes there, 1 for vsync, 1 for scanline sync and the last mode for all out anarchy. We only need to set vsync to engage when the first mode is chosen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that games also react to the unused CellVideoOutCallback. (maybe it's called when the display is changed)

I've checked and no game uses cellVideoOutRegisterCallback.

@@ -3251,18 +3259,25 @@ namespace rsx
}
}
}
else if (wait_for_flip_sema)
else if (frame_limit == frame_limit_type::_ps3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly the logic here is essentially "do not flip until a fresh vblank signal is received". You have the vblank signal as the flip release and this is an acquire operation here. Two problems:

  1. There is still the correct event sent to us which you are now short-circuiting in semaphore-acquire. I'm not sure why this approach is to be considered better as it is essentially a hackier approach. Note that this "PS3 native" sync is to be used with games that have trouble with the normal PC-friendly framelimiter system, so extra bells and whistles are unneeded for this path. Is it that it doesn't work properly with HLE? I'm scratching my head on why I should prefer this version over the existing one which more closely follows what the PS3 does.
  2. This whole set of variables and checks is not correctly labeled to match how display systems are architected with clients acquiring the display and the server releasing it in a cycle. Not a blocker by itself, but its not ideal and I would end up having to change it.

Copy link
Contributor Author

@elad335 elad335 Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite confident DEVICE 0x30 is something to do with the queue command and not vblank, as it's inserted by gcm for all display sync options and occurs before semaphore wait by cellGcmSetFlipWithWaitLabel.
This function exists specifically to allow accurate timing of flipping and this is a usage example:
image
Here is label 0 is used, notice that it resides after DEVICE 0x30 waiting
Furthermore, flips are executed along with the vblank signal, not after, I guess that's why TLoU regressed by the original pr.

Copy link
Contributor

@kd-11 kd-11 Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the trace above, we can see there are 2 signals:
The only difference is that there are 2 cond vars that I see, one to ack the enqueue and another to wait for vertical sync.
Server:

	label_0x30.wait_signal();
	handle_enqueued_head();
	label_0x30.reset()

Client:

	enqueue();
	label_0x30.signal();
	label_0x30.wait_ack();
	call(); <- CB reset??
	label_0x0.wait_signal();
	flip();

My guess here is that 0x0 is signaled by the hardware vblank system, 0x30 just acknowledges that a head has been enqueued. So, your PR addresses this by having the system use vblank instead of enqueue signal.
While this is fine, the second critique still holds. The implementation is convoluted and not clear at all.
My suggestion to solve this once and for all is to have a special kind of cond var with waiting for known values. Simply replace my rushed solution with the wait_for_flip_sema with proper condvar.wait() semantics to make the flow clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity I have omitted the 0x10 logic, I have no idea where that signal is for.
For 0x30 handling, we have the frame enqueue logic in sys_rsx and rsx_thread::end_frame() for that.
For 0x0 we can signal from vblank thread.

Copy link
Contributor Author

@elad335 elad335 Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I have HLEd the game this is a cellGcmSetFlipWithWaitLabel call with 0 address. It doesn't appear in other games which uses other flip functions.

Copy link
Contributor Author

@elad335 elad335 Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is named _cellGcmSetFlipCommandWithWaitLabel on firmware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an LLE flip without waiting on label.
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for "normal" flip the 0x0 var is absent? That simplifies the design then and makes the 0x30_ack a wait for vblank.
Again, my biggest problem with the way it is now is that the "documentation is the code" philosophy doesn't hold up because the design is all over the place. This isn't a new problem, but if it is now well understood, we have to do it properly and this PR already modifies the whole thing anyway. I only understand what is going on because I'm familiar with most of the code there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x30 has nothing to do with vblank, it appears in all sync modes even with VSYNC turned off and besides the whole purpose of _cellGcmSetFlipCommandWithWaitLabel is to time the flip very closely to what CELL decides. So if 0x30 is a vblank semaphore it defies all logic as to why the function exists in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems my point is getting lost in translation, that doesn't matter it's just an ack, whether its from vblank or not. I'm not happy with the implementation, but if we're being honest, you're unlikely to restructure it, so I'll merge and change it later when I have time.

* Do not wait on DEVICE 0x30 semaphore, it seems like it is something to do with queue command synchronization.
 - This also fixes cellGcmSetFlipWithWaitLabel which is built specifically to enable accurate RSX flipping time, its waiting command is confirmed to be placed **AFTER** DEVICE 0x30 waiting.
* Fix default vsync state to be enabled. (and set it to enabled in cellGcmSetVBlankFrequency as well)
* Add experimental "Infinite" frame limiter mode.
* Fix spurious enabling of second vblank.
@elad335 elad335 force-pushed the vblank branch 3 times, most recently from 23d662d to ffa4342 Compare June 25, 2022 12:29
@Megamouse Megamouse merged commit 7422ab9 into RPCS3:master Jun 25, 2022
elad335 added a commit to elad335/rpcs3 that referenced this pull request Jun 25, 2022
kd-11 pushed a commit that referenced this pull request Jun 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants