Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share PCM between multiple threads #33

Closed
supercurio opened this issue Apr 9, 2018 · 14 comments
Closed

Share PCM between multiple threads #33

supercurio opened this issue Apr 9, 2018 · 14 comments

Comments

@supercurio
Copy link

supercurio commented Apr 9, 2018

Hi @diwic !

I'd like to access PCMs from multiple threads (at least two) in order to:

  • Only read/write buffers at real-time priority
  • Gather PCM Status for timestamp / avail data at a specified frequency and at the highest real-time priority. The frequency should not depend on capture or playback frame sizes.

In order to do that, there will be three threads.

  1. Capture thread
  2. Playback thread
  3. Timestamp/avail status gathering thread.

Therefore, both capture and playback PCMs need to be shared with the status gathering thread, that'll periodically adjust an asynchronous resampler ratio.

In alsa-lib tests, the pcm-multi-thread.c example shows how it is possible in C.
It's simple, a single static snd_pcm_t *pcm is declared on top and all threads call methods on it.

Would it be possible to do that with alsa-rs?
Right now, I get this error:

error[E0277]: the trait bound `*mut alsa_sys::snd_pcm_t: std::marker::Sync` is not satisfied in `&alsa::PCM`
   --> src/alsa_audio_time.rs:155:9
    |
155 |         thread::spawn(|| {
    |         ^^^^^^^^^^^^^ `*mut alsa_sys::snd_pcm_t` cannot be shared between threads safely
@diwic
Copy link
Owner

diwic commented Apr 10, 2018

So...this sounds like a case for my direct PCM API, which bypasses alsa-lib and talks directly to the kernel. That way I can guarantee that the structs are Send + Sync. Have you seen the direct::pcm module?

@diwic
Copy link
Owner

diwic commented Apr 10, 2018

(There is also a synth example to help you understand how direct::pcm works)

@supercurio
Copy link
Author

supercurio commented Apr 10, 2018

Yes thanks for the suggestion, I hope to use the direct PCM API later when optimizing for performance and latency.
Unfortunately the ALSA kernel driver of the HDMI output of the ODROID-C2 is not capable of mmap IO and likely won't ever. This board is one of my target hardware platform as it supports 8 channels 24-bit PCM out of the box out via HDMI, I hope to support it.

Therefore alsa-lib readi/writei it is at least for now!

I added

unsafe impl Sync for PCM {}

in src/pcm.rs with good results, for the audio-time rust port: https://github.com/supercurio/asrc-rs-research/blob/master/src/alsa_audio_time.rs
Multi-threaded behavior is activated with the command line -f parameter.

Unfortunately, on the Odroid C2 multi-threaded access tends to introduce xruns. It's the same with alsa-lib pcm-multi-thread.c utility. On the Raspberry Pi also but to a much lesser extent, on my desktop machine there's xrun sometimes showing as well.
I'm not sure why but Xruns show up if the frequency is low <= 10Hz, but with a high frequency they don't.

Do you think there would be reasons for not adding impl Sync for PCM?

@diwic
Copy link
Owner

diwic commented Apr 10, 2018

I added unsafe impl Sync for PCM {} in src/pcm.rs with good results,

When it comes to thread safety bugs, they usually don't show up immediately, but rather every half a year or so at a customer site. You have been warned.

Unfortunately the ALSA kernel driver of the HDMI output of the ODROID-C2 is not capable of mmap IO

You can use direct::pcm::Status::new even on non-mmap PCMs, I believe. At least it's worth a try. That struct can then be sent across threads.

Do you think there would be reasons for not adding impl Sync for PCM?

Yes. For one, I use a Cell myself. :-) But that can be changed if necessary. More specifically, the "thread safety" of the snd_pcm_xxx APIs were a bit retrofitted in ways I'm not entirely comfortably with. It's not documented exactly what functions can be called in parallel with others. Better stay safe.

@supercurio
Copy link
Author

supercurio commented Apr 10, 2018

Ohh I understand better now the synth example, where you only deal with the file descriptor instead of the PCM struct. And from this fd it is possible to call the direct rust implementation that's not built on top of alsa-lib but uses ioctl with the kernel.
And this fd is not an issue to share between threads.

When it comes to thread safety bugs, they usually don't show up immediately, but rather every half a year or so at a customer site. You have been warned.

True. I'm not sure how to interpret the doc's statement: "The function is thread-safe when built with the proper option." What does proper option mean?

You can use direct::pcm::Status::new even on non-mmap PCMs, I believe. At least it's worth a try. That struct can then be sent across threads.

I'll try that now, thanks :)

For one, I use a Cell myself. :-)

I'm still a beginner with Rust concurrency: do you mean that by Cell you have the intent of only one use at the time as the content of the cell is take away while in use?

@supercurio
Copy link
Author

You can use direct::pcm::Status::new even on non-mmap PCMs, I believe. At least it's worth a try. That struct can then be sent across threads.

On the ODROID-C2 HDMI, Raspberry Pi3+ HDMI and USB audio: Error("driver memory mmap", UnsupportedOperation)'
Ahh, too bad. Works on on the PCI-E sound card of my desktop tho.

I might have to do without this multi-threaded approach then.

@diwic
Copy link
Owner

diwic commented Apr 11, 2018

And from this fd it is possible to call the direct rust implementation that's not built on top of alsa-lib but uses ioctl with the kernel.
And this fd is not an issue to share between threads.

Correct. Assuming the kernel is free from thread safety bugs, that is.

I'm not sure how to interpret the doc's statement: "The function is thread-safe when built with the proper option." What does proper option mean?

I believe it refers to this config option although it might not tell you much if you're not used to rebuilding C libraries. Long story short, should be enabled by default given a recent enough alsa-lib.

I'm still a beginner with Rust concurrency: do you mean that by Cell you have the intent of only one use at the time as the content of the cell is take away while in use?

Explaining what a Cell is used for and why it's not Sync is a bit more than what I have time to explain here and now :-)

On the ODROID-C2 HDMI, Raspberry Pi3+ HDMI and USB audio: Error("driver memory mmap", UnsupportedOperation)'

Really? HDMI drivers (especially for embedded stuff) may vary, but the standard USB audio driver does definitely support mmap. And it does not make sense for Raspberry Pi3 kernels to disable it.

@diwic
Copy link
Owner

diwic commented Apr 11, 2018

I might have to do without this multi-threaded approach then.

You can still protect the PCM struct in a mutex (as long as you use poll for waiting instead of blocking I/O). That's what alsa-lib sometimes does "when built with the proper option" anyway, so...

@supercurio
Copy link
Author

I believe it refers to this config option although it might not tell you much if you're not used to rebuilding C libraries. Long story short, should be enabled by default given a recent enough alsa-lib.

You're right, thread safety should be enabled in most setups, where alsa-lib is likely built with pthread support and thread safety is not explicitly disabled. Tho not all setups.
I wonder if it would be possible to detect the installed alsa-lib build configuration via alsa-rs build.rs, and then activate multi-threaded PCM operations in alsa-rs when safe.

Really? HDMI drivers (especially for embedded stuff) may vary, but the standard USB audio driver does definitely support mmap. And it does not make sense for Raspberry Pi3 kernels to disable it.

ODROID-C2 HDMI audioI does not support mmap, but The Pi HDMI and USB audio does indeed.
In this case, from direct::pcm::Status::new it should not return an error. Is it because the program access the PCM via alsa-lib by blocking readi/writei simultaneously?

You can still protect the PCM struct in a mutex (as long as you use poll for waiting instead of blocking I/O). That's what alsa-lib sometimes does "when built with the proper option" anyway, so...

Yes I tried that too, however since readi and writei are blocking calls looped, they almost always hold the mutex loop which effectively makes the PCM single-threaded ;)

For the use case described, experimentation showed that a multi-threaded approach is not suitable. The main reason observed is that sometimes, the pcm.status() calls happens at the same time as readi or writei which introduces a delay leading to Xruns.
It was the case to some extent on all kernels, architectures and hardware tested unless increasing significantly the period size and count (which prevents low-latency operation).
So that's fine, should I close the issue?

@diwic
Copy link
Owner

diwic commented Apr 12, 2018

I wonder if it would be possible to detect the installed alsa-lib build configuration via alsa-rs build.rs, and then activate multi-threaded PCM operations in alsa-rs when safe.

No idea. But even if it was, I'm not 100% certain that alsa-lib's notion of "thread safe" is the same as Rust's Sync.

In this case, from direct::pcm::Status::new it should not return an error. Is it because the program access the PCM via alsa-lib by blocking readi/writei simultaneously?

I doubt it. It should be possible to get a better error message out of mmap. Let me see if I have time to fix that.

Yes I tried that too, however since readi and writei are blocking calls looped, they almost always hold the mutex loop which effectively makes the PCM single-threaded ;)

I mean like this (where s is the minimum amount of frames you want to write at a time, typically the same as a period size:

loop {
    use PollDescriptors;
    while pcm.lock().avail() < s { poll_all(pcm.get()); }
    pcm.lock().writei(samples);
}

...here the waiting is done inside poll and the lock is not held. Because there is enough samples available when doing writei it will not block.

@diwic
Copy link
Owner

diwic commented Apr 12, 2018

Out of curiousity, what kind of latency are you trying with (both unsuccessfully, and what you have to back down to, to make it work)?

As a side track, you might be interested in a low-latency experiment I did some time ago: https://github.com/diwic/core-isolation

@supercurio
Copy link
Author

Awesome, thanks for looking into the mmap error.
I'll use that later and will check again then.

And the polling approach you showed could get handy at some point as well :)

There's some latency targets for MultiDSP, but I kinda made up some.
However a few days ago I got it down to 1.377ms on the Terratec AUREON XFIRE8.0 HD, at 192 kHz
period size: 24 frames, 3 periods.
Via this alsa_simple_loopback.rs code
Not yet relying on none of the approach you described in the presentation! Which is awesome BTW!

@diwic
Copy link
Owner

diwic commented Apr 12, 2018

I doubt it. It should be possible to get a better error message out of mmap. Let me see if I have time to fix that.

I just pushed a commit that should help us understand why mmap fails, or at least give us a hint in the right direction.

@diwic
Copy link
Owner

diwic commented Oct 25, 2018

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants