Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new IAudioClient3 interface for low-latency audio in shared mode #385

Open
adzm opened this issue Dec 22, 2020 · 15 comments
Open

Use new IAudioClient3 interface for low-latency audio in shared mode #385

adzm opened this issue Dec 22, 2020 · 15 comments
Assignees
Labels
enhancement New feature or request P3 Priority: Normal src-wasapi MS WASAPI Host API /src/hostapi/wasapi windows Affects MS Windows

Comments

@adzm
Copy link

adzm commented Dec 22, 2020

The new IAudioClient3 interface supports lower latency audio as described here although I am uncertain how to apply this to pa_win_wasapi.c offhand. The IAudioClient3 interface is already used, though the functions InitializeSharedAudioStream and GetSharedModeEnginePeriod / GetCurrentSharedModeEnginePeriod are not used apparently.

@RossBencina RossBencina added the src-wasapi MS WASAPI Host API /src/hostapi/wasapi label Dec 22, 2020
@dmitrykos
Copy link
Collaborator

To my understanding the additional API of IAudioClient3 is just a helper functions (facade) to make it easier to create shared stream with a desired latency. PA WASAPI implementation does its best to provide the lowest supported latency in the Shared mode and the usage of the IAudioClient3::InitializeSharedAudioStream is not required to achieve it.

If you have comparison of the lowest latency you could achieve with IAudioClient3::InitializeSharedAudioStream and PA WASAPI please provide it.

@adzm
Copy link
Author

adzm commented Dec 22, 2020

I was under the impression this could let us get around the usual minimum latency in shared mode; however I may indeed be misunderstanding. I'll try to give it a shot though. Thanks for the input.

@rakosrudolf
Copy link

I believe using IAudioClient3 could reduce WASAPI shared mode latency to <10ms which would be great for tools like FlexASIO.
See dechamps/FlexASIO#55 .

It looks like this might reduce latency by a few of milliseconds as the system switches to small buffers for that endpoint.
https://docs.microsoft.com/en-us/windows-hardware/drivers/audio/low-latency-audio#faq

... By default, all applications in Windows 10 will use 10ms buffers to render and capture audio. If an application needs to use small buffers, then it needs to use the new AudioGraph settings or the WASAPI IAudioClient3 interface, in order to do so. However, if one application in Windows 10 requests the usage of small buffers, then the Audio Engine will start transferring audio using that particular buffer size. In that case, all applications that use the same endpoint and mode will automatically switch to that small buffer size. When the low latency application exits, the Audio Engine will switch to 10ms buffers again.

@dmitrykos
Copy link
Collaborator

dmitrykos commented Dec 29, 2020

@rakosrudolf, thank you for referencing Microsoft docs regarding this issue. According to the documentation the promise about low-latency in Shared mode is not guaranteed by the platform:

  • it is driver dependent (e.g. if driver supports <10 ms in Shared mode)
  • there is race condition: if another Shared stream is opened in non low-latency mode then low-latency can not be achieved by the stream initialized with InitializeSharedAudioStream API

Anyway, taking into account that low-latency possibility might exist, this new API can be incorporated into PA WASAPI as previously proposed by @adzm to provide such possibility for PA WASAPI users.

@dmitrykos dmitrykos added the windows Affects MS Windows label Dec 29, 2020
@RossBencina RossBencina added enhancement New feature or request P3 Priority: Normal labels Apr 5, 2022
@RossBencina
Copy link
Collaborator

We've set this to priority P3 (Normal) but @dmitrykos can change it to whatever he likes.

@danryu
Copy link

danryu commented Jan 28, 2023

Was there any progress on this? It would be a very welcome enhancement if so.

@dmitrykos
Copy link
Collaborator

As I have mentioned earlier, if someone would develop a small test which would try to contrast IAudioClient3::Initialize vs IAudioClient3::InitializeSharedAudioStream showing that it is really possible to achieve a lower latency then it would make sense to add the support for IAudioClient3::InitializeSharedAudioStream as an additional option.

Also, IAudioClient3::InitializeSharedAudioStream may also introduce some uncertainty and bugs from WASAPI side, so adding it blindly wouldn­'t be a great idea having PA WASAPI backend in a fairly stable condition.

@danryu
Copy link

danryu commented Jan 29, 2023

The best I can do that is within reasonable time-scope for me (I'm not familiar with either Windows audio or PortAudio codebase and I'm severely time-constrained) is a practical round-trip latency experiment.
I measure round-trip latency with RTL Utility with IAudioClient2 using FlexASIO/PortAudio and IAudioClient3 using the built-in "Shared Low Latency" mode of RTL Utility.

Setup:

Software:

  • RTL Utility - round-trip latency tester (uses current JUCE internally)
  • KoordASIO (FlexASIO clone with configurator for WASAPI Shared and Exclusive modes)

IAudioClient2
KoordASIO/FlexASIO link against current PortAudio, and thus use IAudioClient2 (when configured with WASAPI)

IAudioClient3
RTL Utility uses JUCE which since 2020 has supported IAudioClient3 (referred to as "Shared Low Latency" mode in JUCE/RTL Utility)

Test Results
(All tests were repeated several times to ensure a consistent result was being delivered.)
image

The results are really interesting!
Firstly: IAudioClient3 is impressive - both the low Exclusive Mode time, and the sub-20ms Shared Low Latency Mode result. This immediately suggests that IAudioClient3 has had a marked improvement on Shared Mode performance.

Then for the PA/IAudioClient2 results - firstly, the sub-10ms result for Exclusive Mode is absolutely phenomenal (if anywhere near accurate!).
Incredibly unfortunately, when actually recording and playing back with this configuration, the output is full of a crackly distortion which makes it unusable (and doesn't disappear when varying buffer size). This is so frustrating as it shows how close we are to having usable sub-10ms round-trip latency with generic Windows hardware.

Then the Shared Mode result is as previously expected. Interestingly it is basically double the IAudioClient3 Shared Mode result.

CONCLUSION
Considering the available configurations tested above, and assuming the above-mentioned crackly distortion problem with PortAudio WASAPI Exclusive mode is a "WON'T FIX", being able to do 13ms Exclusive Mode or 17ms Shared Mode round-trip latency with generic Windows hardware would be immensely important and useful to countless real-time audio applications.

@dechamps
Copy link
Contributor

I'm sceptical you can compare RTL Utility's built-in "Windows Audio" mode with KoordASIO/PA. The code paths are very different and in particular they likely have different internal buffer sizes which would act as confounding factors. If I understand your protocol correctly, you didn't even pick the same buffer sizes between the two. This makes it difficult to draw any meaningful conclusions.

What would be much more interesting is to compare RTL Utility's "Windows Audio" with "Windows Audio (Shared Low Latency Mode)". Presumably that's just switching between IAudioClient::Initialize() and IAudioClient3::Initialize() while keeping everything else the same, thus producing an apples-to-apples comparison. You didn't include "Windows Audio" in your results.

@dmitrykos
Copy link
Collaborator

dmitrykos commented Jan 29, 2023

@danryu it is great to see test results.

The Exclusive mode testing is not useful in relation to IAudioClient3::InitializeSharedAudioStream as it is only for a Shared mode as per docs. Therefore, difference you see is the difference of implementation of two different apps.

On Windows 10 and higher PA is using IAudioClient3 for Shared and Exclusive modes.

According to the docs IAudioClient3::InitializeSharedAudioStream is simply a wrapper for IAudioClient3::Initialize which calculates hnsBufferDuration for IAudioClient3::Initialize internally:
"Unlike IAudioClient3::Initialize, this method does not allow you to specify a buffer size. The buffer size is computed based on the periodicity requested with the PeriodInFrames parameter. It is the client app's responsibility to ensure that audio samples are transferred in and out of the buffer in a timely manner."

The difference of latency in Shared mode you observed, 34 vs 17, is due to double buffering used by PA WASAPI implementation. I will check if double buffering can be safely omitted and if yes, propose to add an additional PA WASAPI option to switch off double buffering, so that user would be able to achieve lowest possible latency in Shard mode at expense of some pops & clicks of course if CPU of the machine gets loaded with other tasks.

@danryu
Copy link

danryu commented Jan 29, 2023

@dechamps thanks for weighing in. I fully admit that the test was not very rigorous, and involved apples and oranges - it was simply intended as a quick-and-dirty indicator of the different configurations' potential. I am primarily interested in getting sub-10ms RTL on generic Windows hardware - so any route that can get me there is interesting.
Hence why I just set to lowest practical buffers with whatever configuration I had available (128 in Windows Audio, 32 in FlexASIO).
I actually couldn't get reliable results from RTL Utility with plain "Windows Audio" at less than 256 buffer size - I'm not sure why. At 256 "Windows Audio" delivered RTL of ~51ms, and "Windows Audio Shared Low Latency" gave ~36ms.

@dmitrykos Thanks for all the notes. I appreciate now why InitializeSharedAudioStream would not serve a purpose here.
I'm glad the tests were in some way helpful.

I will check if double buffering can be safely omitted and if yes, propose to add an additional PA WASAPI option to switch off double buffering, so that user would be able to achieve lowest possible latency in Shard mode at expense of some pops & clicks of course if CPU of the machine gets loaded with other tasks.

That would be very welcome - many thanks.

@danryu
Copy link

danryu commented Feb 9, 2023

I will check if double buffering can be safely omitted and if yes, propose to add an additional PA WASAPI option to switch off double buffering

@dmitrykos Would it be useful if I opened a separate issue for this, for tracking purposes?

Also, I'm very happy to fork and do some quick hacks/tests. I was wondering if there was perhaps a simple hack to do in _RecalculateBuffersCount() which I could test out?

@dmitrykos
Copy link
Collaborator

dmitrykos commented Feb 25, 2023

@danryu I got possibility to debug PA WASAPI implementation. In my tests I am not able to get lower Shared Mode latency than 22 ms for 48000 Hz input stream.

IAudioClient::Initialize() was called with period equal to 10000 that is 480 frames which were also reported by IAudioClient3::GetSharedModeEnginePeriod() in pFundamentalPeriodInFrames, pMinPeriodInFrames, and pMaxPeriodInFrames.

For experiment I also replaced IAudioClient::Initialize() with IAudioClient3::InitializeSharedAudioStream() with PeriodInFrames equaling 480.

I also tried polling or event mode (AUDCLNT_STREAMFLAGS_EVENTCALLBACK).

In all cases initialized audio client instance returns 1056 frames as max endpoint buffer via IAudioClient::GetBufferSize(). So basically on my PC I am not able to reach lower than 22 ms latency of Shared Mode stream for input or output (checked both). It seems internally WASAPI is using double-buffering approach.

If you have interest, could you modify _GetFramesPerHostBuffer function to such and check if you are able to get lower latency, i.e. 10 ms, on your machine:

static PaUint32 _GetFramesPerHostBuffer(PaUint32 userFramesPerBuffer, PaTime suggestedLatency, double sampleRate, PaUint32 TimerJitterMs)
{
    PaUint32 frames = userFramesPerBuffer + max( 0, (PaUint32)(suggestedLatency * sampleRate) );
    frames += (PaUint32)((sampleRate * 0.001) * TimerJitterMs);
    return frames;
}

@davidebeatrici
Copy link

davidebeatrici commented Apr 14, 2023

// Use built-in PCM converter (channel count and sample rate) if requested
if ((GetWindowsVersion() >= WINDOWS_7_SERVER2008R2) &&
(stream->in.shareMode == AUDCLNT_SHAREMODE_SHARED) &&
((inputStreamInfo != NULL) && (inputStreamInfo->flags & paWinWasapiAutoConvert)))
stream->in.streamFlags |= (AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM | AUDCLNT_STREAMFLAGS_SRC_DEFAULT_QUALITY);

Please note that AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM and AUDCLNT_STREAMFLAGS_SRC_DEFAULT_QUALITY are not accepted: MicrosoftDocs/sdk-api#1498

I encountered the issue while experimenting with IAudioClient3 in libcrossaudio.

@mirh
Copy link

mirh commented Aug 17, 2024

FWIW people here and here reported a minimal latency of 2.67ms in shared mode, for as much as testing conditions weren't exactly clear (conversely some of the best scholars in the world, had to give up testing exactly for some kind of bug in this library)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P3 Priority: Normal src-wasapi MS WASAPI Host API /src/hostapi/wasapi windows Affects MS Windows
Projects
None yet
Development

No branches or pull requests

8 participants