Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choppy sound in NieR when rsx thread is too slow, only on the Vulkan backend #6882

Closed
linkmauve opened this issue Oct 26, 2019 · 14 comments
Closed
Assignees

Comments

@linkmauve
Copy link
Contributor

I’m using rpcs3 f3ed26e, on an i5-8350u sporting an UHD 620 GPU.

When using Vulkan, the sound gets choppy as soon as the framerate drops under 30fps and the rsx thread uses 100% of a core, but this isn’t the case when using OpenGL despite the framerate being approximately identical.

I’m in the process of optimising the rsx thread so that it runs better on this laptop, but in the meantime it’d be nice to be able to play with better sound. :)

@Illynir
Copy link

Illynir commented Oct 26, 2019

On what game? There is no useful information or log. Vulkan uses multithreading more efficiently and is more CPU intensive then it's expected. The audio thread becomes "hungry". It also depends on how the game was coded.

Your processor is quite weak and is limited in threads, even if there are 8 they are not very powerful.

There is the buffering option (Audio buffer duration) for that, it doesn't make a miracle but it helps to relieve the problem, try it with 125ms or 150ms. For example on Nier. since your PR is about this game, I guess that's what you're testing. Nier uses SPUs to control audio, the game is very prone to have a "choppy sound" even on powerful PCs.

That being said, optimizations are still possible yes, to relieve the CPU and threads.

@linkmauve
Copy link
Contributor Author

Ah right, it’s indeed NieR, here is a log of it running on Vulkan: RPCS3.log.gz and doing approximately the same on OpenGL: RPCS3.log.gz

I am fully aware that my CPU isn’t the strongest, but I’m surprised your Vulkan backend is that much slower (on the CPU) than your OpenGL backend, which uses about 65% of a core at the same area. I’ll have a look at more in-depth profiling when it isn’t 3:30am anymore. :-°

Audio buffering doesn’t do much sadly, I already tested the OpenAL backend previously.

@Whatcookie
Copy link
Member

This may be somewhat Nier specific behavior, the game itself isn't super threaded, so stalls with drawing the graphics can cause the audio to crackle. Turning the resolution scaling way up will cause crackling on my RX 570.

I don't regularly use the integrated graphics on my 7700K, but last I tried, it was much faster at drawing stuff in RPCS3 with OpenGL and Mesa vs with Vulkan and ANV. This may be what you're seeing.

@linkmauve
Copy link
Contributor Author

The two main functions I see on the Vulkan backend are VKGSRender::do_local_task() and vk::wait_for_event(), respectively at ~25% and ~16%, each spending 99% of their time in _mm_pause() waiting for the GPU to signal it is done with its task it seems. Why do you use _mm_pause() instead of a wait on the event btw? When I replace it with a yield it doesn’t improve the situation much obviously, but the CPU used in this thread gets somewhat lower, and the profile becomes very different with syscall leading. Could it be that your Vulkan rendering is much more expensive than OpenGL, instead of being a CPU-side problem it is GPU-side?

What seems strange though is that OpenGL also gets sub-20fps and ~never desyncs the sound thread.

@Whatcookie
Copy link
Member

Whatcookie commented Oct 26, 2019

Yes, my expectation is that the game ends up waiting too long for your GPU to do something, and since Nier isn't well threaded, it ends up writing the audio too late. Even on my 7700K which can effortlessly run the game well above 60FPS with ~25% CPU usage, it will crackle if I push the resolution too high.

EDIT: Actually, the problem is clear here, with 800% scaling and write colour buffers on, the game runs at 14fps with terrible audio. With write colour buffers off and 800% scaling, the game still runs at 14fps, but now with perfect audio. Of course this breaks the visuals, but it's clear what the game is waiting on now.

@Illynir
Copy link

Illynir commented Oct 26, 2019

Yes, this has always been the case on Nier, the WCB on vulkan creates problems on audio, I know that KD or Ruipin had explained the reason to me but I don't remember it.

The buffer option was there to relieve this problem but apparently it's not enough for the CPU of Linkmauve.

Keep in mind that Nier was coded by demons, the code of this game is incomprehensible. :P

@linkmauve
Copy link
Contributor Author

Another interesting datapoint is that the music runs perfectly when the rsx thread is entirely blocked, for instance on io.

I tried to profile the Vulkan backend but I don’t have enough RAM to use renderdoc on a given game frame, despite it working properly on OpenGL, so I’m working on pure guesses so far.

I tried to figure out the differences between when it runs fine and when it doesn’t, and it seems to upload a swizzled 512×512 texture from the cell to Vulkan taking quite a lot of (CPU) time, and when it stops the framerate increases back to some 30 fps. Do you know why this game is uploading a texture (the same?) every frame but only in certain areas? Have you tried deswizzling the texture on the GPU after upload instead, for instance using a compute shader? Or am I following a red herring and it’s totally unrelated?

I’ve tested with write colour buffers off and I can indeed reproduce your findings. What exactly does this mean? The game is rendering something on the GPU, and then copies it to the main RAM for the CPU to do things with, but where/when does this happen? Is it a situation where we have no idea when to do the copy and have to do it all the time, or is there some way to know the CPU access patterns and such?

Thanks a lot for your answers, they are greatly appreciated. :)

@kd-11
Copy link
Contributor

kd-11 commented Oct 28, 2019

Have you tried deswizzling the texture on the GPU after upload instead, for instance using a compute shader?

The deswizzling algorithm is not well-suited for GPUs. Can it be done? Yes, but it will hurt graphics performance.

Do you know why this game is uploading a texture (the same?) every frame but only in certain areas?

Texture modification is detected using page faults. Its usually faster to queue an upload than to hash the texture data since on PS3 the textures are changed a lot every frame to make use of the small memory available. The hardware is designed to make this cheap.

What exactly does this mean? The game is rendering something on the GPU, and then copies it to the main RAM for the CPU to do things with, but where/when does this happen?

In the texture cache, only identifiable by a page fault.

Is it a situation where we have no idea when to do the copy and have to do it all the time, or is there some way to know the CPU access patterns and such?

There is no innate hardware access pattern, but we do have a very competent predictor that actually initiates the transfer before we need the data. If you're seeing GPU->CPU flushes, they're likely triggering based on prediction, not actual page faults. If you start queuing DMA requests after a page fault you're going to have a very bad time.

Which brings us to the main question: Why is OpenGL immune?
Because of how the API is designed. The commands are continously streamed to the graphics hardware by the driver which is perfect for emulators. Vulkan is a buffering API by design which is not a good fit for emulators. This means Vulkan records a huge list of things to do, but the hardware doesn't actually know about them until they're submitted in one large group. This causes a situation where if you're caught unawares and need data from the GPU, you may end up having to send a huge todo list then waiting for results to come back. In this case, we use wait_for_event which polls the hardware status and is what is showing you the stalling. And no, command streaming on vulkan is a bad choice due to how long the submit command itself takes.

But why not yield/sleep? Most of these stalls are "small" in threading timescales (ranging from around 40us to several milliseconds) and you cannot know beforehand how long you need to wait. Wait too long and performance drops for some people, busy wait and some weak CPUs may sometimes have audio stuttering. This is actually because the graphics and audio threads run in the same higher-than-normal priority, but the audio thread has a fixed 'tick rate' that is large on threading timescales causing a situation where (in your case) RSX and Audio are contending with each other.

@kd-11 kd-11 self-assigned this Oct 28, 2019
@kd-11
Copy link
Contributor

kd-11 commented Oct 28, 2019

Just for thoroughness, I may add a GPU-side deswizzling option since most people have much better GPUs than CPUs. Not sure how well it will work on a UHD chip due to the high amount of non-continuous memory access and the deep loops which suck on a graphics card, but at the very least, it could be better than executing this on a weaker CPU. Will probably default to off though.

@linkmauve linkmauve changed the title Choppy sound when rsx thread is too slow, only on the Vulkan backend Choppy sound in NieR when rsx thread is too slow, only on the Vulkan backend Oct 28, 2019
@AniLeo
Copy link
Member

AniLeo commented Jul 13, 2021

@linkmauve can you recheck this one?

@rtentser
Copy link

I can confirm, the sound is choppy sometimes. Don't know anything about rsx threads.

@kd-11
Copy link
Contributor

kd-11 commented Dec 16, 2023

I have an architectural solution for this. Reviving.

@yuiiio
Copy link

yuiiio commented Mar 15, 2024

sorry if dumb comment.
still wait_for_event has alot of cpu time (>15%) (second spu_thread::do_dma_transfer)
I'm testing comment out wait_for_event in cached_texture_section::imp_flush (used at only this place).
20fps(alot of stutter) => 50fps, on 5700u(vega8/radv). ye, rendering may broken(but i'm not notice yet)
wait_for_event seems polling while dma_fence->status == VK_EVENT_RESET

edit:
It seems that by design ps3 has no choice but to wait for vkevent to complete the cpu<=gpu transfer.
Perhaps the intermittent use of the gpu was causing the frequency of the igpu to not increase, resulting in poorer performance.
Manually fixing the igpu frequency improved performance.

@kd-11
Copy link
Contributor

kd-11 commented Mar 15, 2024

Yea, commenting out is obviously wrong, you're not actually doing the requested operation (WCB/WDB/GPU blit) instead just reading whatever junk was in memory. Sometimes this works fine, most of times everything gets fucked and you get flashing or flicker.
wait_for_event is not a CPU event, it is a GPU one. That cpu time lost will go down significantly on a RTX 4090 for example. It is literally just a GPU wait that we have to service before we start reading things from VRAM.
I already made changes that allows threads to skip waiting if conditions favor. That fix actually should close this issue (#15205)
Previously while the wait_for_event thing was happening, the entire RSX and part of CELL was frozen. Now only the unit or thread that needs the data has to wait for it to become available.

Closing as fixed by #15205

@kd-11 kd-11 closed this as completed Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants