Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulseaudio: rewrite the pa backend with the async api #20

Merged
merged 2 commits into from Feb 7, 2014

Conversation

degasus
Copy link
Member

@degasus degasus commented Jan 31, 2014

The default async api allow us to set some latency options. The old one (simple API) was the lazy way to go for usual audio where latency doesn't matter.

This also streams audio, so it should be a bit faster then the old one.

@degasus
Copy link
Member Author

degasus commented Feb 1, 2014

delroth: updated

@SizzlingCalamari
Copy link
Contributor

You might need to worry about instruction reordering and coherency issues with the m_run_thread variable.

m_run_thread = false;
m_thread.join();

The compiler is free to move the assignment of m_run_thread around the join(). Also, the m_run_thread variable could be set, but not be visibly set on the audio thread if it's on a separate cpu core. The variable seemed to be [incorrectly] marked as volatile before to attempt to get rid of these errors. Volatile solves the coherence, but not the ordering issue, at least on x86.

I would say to use the atomic helper functions in Common/Atomic.h, specifically AtomicStoreAquire/AtomicStoreRelease, since I don't think the win32 versions of AtomicStore/AtomicLoad are correct.

@delroth
Copy link
Member

delroth commented Feb 3, 2014

Idea if someone is bored: implement a proper Event class in Dolphin that we
can use for this kind of stuff. When writing the threaded mode of AX HLE
(and before, when writing the async-dvd thingy) I tried to use
std::condition_variable to get stop events without having to poll... but
it's really horrible to use (have to use a mutex as well, kinda
complicated, bleh).

In general we're lacking a few useful multithreading constructs in Dolphin:
events, queues (FifoQueue is nice but I'm not sure if it handles all the
needs we have, on top of being very badly named :p),
WaitForMultipleObjects-style function, etc.

On Mon, Feb 3, 2014 at 4:35 PM, Jordan Cristiano
notifications@github.comwrote:

You might need to worry about instruction reordering and coherency issues
with the m_run_thread variable.

m_run_thread = false;m_thread.join();

The compiler is free to move the assignment of m_run_thread around the
join(). Also, the m_run_thread variable could be set, but not be visibly
set on the audio thread if it's on a separate cpu core. The variable seemed
to be [incorrectly] marked as volatile before to attempt to get rid of
these errors. Volatile solves the coherence, but not the ordering issue, at
least on x86.

I would say to use the atomic helper functions in Common/Atomic.h,
specifically AtomicStoreAquire/AtomicStoreRelease, since I don't think the
win32 versions of AtomicStore/AtomicLoad are correct.

Reply to this email directly or view it on GitHubhttps://github.com//pull/20#issuecomment-33965880
.

Pierre "delroth" Bourdon delroth@gmail.com
Software Engineer @ Zürich, Switzerland
http://code.delroth.net/

@degasus
Copy link
Member Author

degasus commented Feb 4, 2014

Updated with the AtomicStore/AtomicLoad ones, if they don't work, we should fix them.

But do we need a MemoryBarrier between the two calls SizzlingCalamari has pasted?

Sorry I don't know how to do threaded code. If you want, I'll change it back to the volatile one like all other audio backends.

@SizzlingCalamari
Copy link
Contributor

After taking another look, I'd say that the easiest, most correct way would be to use

std::atomic<bool> m_run_thread;

This type may be a bit slower than doing the synchronization manually, but it takes care of all the gotchas and threading tricks as far as I know. I really think this should be used until someone looks at the atomic operations and creates a fully functional interface for them, along with some documentation on their usage. The only thing we're losing by using the std::atomic types is speed.

To answer your question about the memory barrier, saying that the atomic calls need a memory barrier is what I was going for in the previous comment. std::atomic types by default use a full memory barrier in both directions and synchronize with all other threads.

If you want to look in to doing it manually still, I'm pretty sure that the AtomicStoreAquire/AtomicStoreRelease functions are strict enough with their memory ordering constraints to get the job done correctly. Someone chime in if this isn't correct.

Last thing, about the AtomicStore/AtomicLoad functions. They seem to be misrepresented in what they can accomplish. These functions use a relaxed memory model, which only provide atomic access and give no protection against compiler/hardware instruction reordering. They are fine for some uses, but an event signal like this isn't one of them. Also, since they are template functions, they may not even be atomic in the win32 case because of their implementation just being a variable read/assign. If someone puts a 64 bit type through them, the 64 bit reads/writes could turn out non atomic by being split into two 32 bit reads/writes.

@delroth
Copy link
Member

delroth commented Feb 4, 2014

Might be worth integrating https://chromium.googlesource.com/chromium/src/base/+/master/atomicops.h (+ implementation and related files) into Common? We would at least have a robust, well tested implementation at hand instead of the current API which has problems as Jordan mentioned.

BSD licensed, so this should be possible without too much trouble. Might need some changes to better match our codebase (namespace changes, etc.).

@delroth
Copy link
Member

delroth commented Feb 4, 2014

Of course, I still think the proper solution would be to not use atomics at all here and rely on a proper Event object that would encapsulate all that (this Event object might internally use atomics - but users shouldn't care).

@SizzlingCalamari
Copy link
Contributor

Another alternative could be the Mintomic lock free atomics library.
http://mintomic.github.io/
This library only seems to use relaxed atomics though, leaving it up to the user to implement the needed memory model using the provided memory fences and relaxed atomics.

It uses a BSD license as well.

@degasus
Copy link
Member Author

degasus commented Feb 5, 2014

updated with std::atomic, so imo this branch is ready for merging.

For history: The first PA backend used the simple (push based) api. This was changed to use this async api for volume control, but without using callbacks. So it was basicly the same code flow but with a more complicated api. This was reverted sometimes to use a nicer api.
So I'll use the async api again, but this time in a callback way.

@delroth
Copy link
Member

delroth commented Feb 5, 2014

LGTM, would appreciate seeing some user testing results though.

@JMC47
Copy link
Contributor

JMC47 commented Feb 5, 2014

I did some testing with this, and on Linux it's very hard for me record/get actual data when it comes to the latency. I think it's actually slightly better than Windows xaudio2. The latency of audio is the main reason I took interest in testing this. The main thing I did was check Zelda Four Swords Adventures menus, and basically time how long it took the sound to play after I hit the button. With Zelda-HLE, it was instantaneous, which is great. But the issues with LLE/AX-HLE + Latency are still bad. Regardless, that is likely not backend specific, and wouldn't be fixed by rewriting an audio backend. Still unfortunate.

As for performance in games, pulse audio outperformed all the other audio backends in Linux when it comes to latency, choppiness, among other things. I tested a bunch of GC games.

Compared to the Windows, audio latency is the same, which is a huge improvement over Alsa and the other backends I tried.

Now, while writing this, I actually discovered why I got such great audio latency when first testing this. Holding tab when starting a game will temporarily (until slowdown) fix the audio latency. It improves it by at least 2 - 3 frames according to my Four Swords Adventures test. I don't know how stupid that is, but, whatever. That just tells me the audio latency problem is separate from backends. This backend performs well, gets a thumbs up from me.

m_pa_mlapi = pa_mainloop_get_api(m_pa_ml);
m_pa_ctx = pa_context_new(m_pa_mlapi, "dolphin-emu");
m_pa_error = pa_context_connect(m_pa_ctx, NULL, PA_CONTEXT_NOFLAGS, NULL);
pa_context_set_state_callback(m_pa_ctx, StateCallback, this);

This comment was marked as off-topic.

@degasus
Copy link
Member Author

degasus commented Feb 6, 2014

I store the return value of every non-void call in m_pa_error. If this error flag is set, we will break both on connection with the pa server and in the mainloop.

The default async api allow us to set some latency options. The old one (simple API) was the lazy way to go for usual audio where latency doesn't matter.

This also streams audio, so it should be a bit faster then the old one.
delroth added a commit that referenced this pull request Feb 7, 2014
Pulseaudio: rewrite the pa backend with the async api
@delroth delroth merged commit 4f97666 into dolphin-emu:master Feb 7, 2014
@degasus degasus deleted the pulseaudioRewrite branch February 7, 2014 11:42
@ghost ghost mentioned this pull request Feb 11, 2021
rapito pushed a commit to rapito/dolphin that referenced this pull request Sep 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants