Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDM-KS audio is choppy with some values of suggestedLatency #763

Closed
dechamps opened this issue Jan 21, 2023 · 9 comments · Fixed by #931
Closed

WDM-KS audio is choppy with some values of suggestedLatency #763

dechamps opened this issue Jan 21, 2023 · 9 comments · Fixed by #931
Labels
P3 Priority: Normal src-wdmks MS WDM/KS Host API /src/hostapi/wdmks

Comments

@dechamps
Copy link
Contributor

dechamps commented Jan 21, 2023

Describe the bug
When using the WDM-KS host API, audio is choppy with some values of suggestedLatency. The distribution of good and bad values seems random. This is not specific to small values.

To Reproduce

  1. Change examples/paex_sine.c:
    • Set device to the index of a WDM-KS device.
      • The hardware I'm testing with is an ASUS Xonar U7 using the Microsoft standard usbaudio2.sys driver (WaveRT).
    • Try various reasonable values for suggestedLatency.
  2. Run the example.

Expected behavior
Smooth audio. In my testing this happens for some values of suggestedLatency, for example 0.030 or 0.031, but not 0.029 nor 0.100.

This is an example debug output with 0.031 which works:

 0.067: OpenStream:sampleRate = 44100.000000
 0.067: OpenStream:framesPerBuffer = 64
 0.067: Pin create result = 0x00000491
 0.067: Pin create result = 0x00000491
 0.067: Pin create result = 0x00000491
 0.067: Pin create result = 0x00000491
 0.067: Pin create result = 0x00000491
 0.067: Pin create result = 0x00000491
 0.068: Pin create result = 0x00000491
 0.068: Pin create result = 0x00000491
 0.070: Pin create result = 0x00000000
 0.070: Render pin frames: 0
 0.070: Output frames chosen:1367
 0.070: PinGetBuffer: SubType_kNotification
 0.070: Output buffer start = 000001CF37D60000, size = 16404, membarrier = 0
 0.070: BytesPerInputFrame = 0
 0.071: BytesPerOutputFrame = 6
 0.071: Failed to register position register
 0.071: Failed to register rendering position register, using PinGetAudioPositionViaIOCTLRead
Latency: 0.030998s
 0.072: In  buffer len: 0.000 ms
 0.072: Out buffer len: 61.995 ms
 0.072: Timeout = 496 ms
 0.074: PreparePinsForStart = 0
 0.074: Processing thread started!
Play for 5 seconds.

Actual behavior
Choppy audio with many discontinuities per second. This only happens with some values of suggestedLatency but not others, with no apparent logic.

This is an example debug output with 0.029 which is choppy:

 0.072: OpenStream:sampleRate = 44100.000000
 0.073: OpenStream:framesPerBuffer = 64
 0.073: Pin create result = 0x00000491
 0.073: Pin create result = 0x00000491
 0.073: Pin create result = 0x00000491
 0.073: Pin create result = 0x00000491
 0.073: Pin create result = 0x00000491
 0.073: Pin create result = 0x00000491
 0.073: Pin create result = 0x00000491
 0.073: Pin create result = 0x00000491
 0.075: Pin create result = 0x00000000
 0.076: Render pin frames: 0
 0.076: Output frames chosen:1278
 0.076: PinGetBuffer: SubType_kNotification
 0.076: Output buffer start = 0000020AD3CB0000, size = 15336, membarrier = 0
 0.076: BytesPerInputFrame = 0
 0.076: BytesPerOutputFrame = 6
 0.076: Failed to register position register
 0.076: Failed to register rendering position register, using PinGetAudioPositionViaIOCTLRead
Latency: 0.028980s
 0.077: In  buffer len: 0.000 ms
 0.077: Out buffer len: 57.959 ms
 0.077: Timeout = 464 ms
 0.079: PreparePinsForStart = 0
 0.079: Processing thread started!
Play for 5 seconds.

Desktop:

  • OS Version: Windows 11 21H2 22000.1455
  • PortAudio version: latest master (bbe2b5a)
  • Host API: WDM-KS

Additional context

I noticed #752 from @feoff3 which seemed promising, but I can still reproduce with that PR. This is unsurprising because that PR only touches the capture code path, but the example code does playback.

@feoff3
Copy link
Contributor

feoff3 commented Jan 21, 2023 via email

@dechamps
Copy link
Contributor Author

Hello. Well, my fix is for "event" path, not for "polling" path.

Yes, sorry, I misread your patch. The real reason why your PR doesn't change anything is because it's changing the capture path, not the render path. I edited my post.

How about 22ms?

0.022 is choppy.

@RossBencina
Copy link
Collaborator

Possibly related to #764

@dechamps
Copy link
Contributor Author

dechamps commented Jun 9, 2024

Just FYI, I am still able to reproduce this on latest PortAudio master and an up-to-date fresh system with the same audio driver and hardware.

I've also just tried this with Virtual Audio Cable (in WaveRT mode). To my surprise, with that driver, any reasonable value of suggestedLatency works fine, despite the WDM-KS Host API debug output looking similar to the glitchy case in my original report:

 0.051: OpenStream:sampleRate = 44100.000000
 0.051: OpenStream:framesPerBuffer = 64
 0.051: Pin create result = 0x00000000
 0.051: Render pin frames: 0
 0.051: Output frames chosen:1278
 0.051: PinGetBuffer: SubType_kNotification
 0.051: Output buffer start = 00000251CC190000, size = 20448, membarrier = 0
 0.051: BytesPerInputFrame = 0
 0.051: BytesPerOutputFrame = 8
 0.051: Failed to register position register
 0.051: Failed to register rendering position register, using PinGetAudioPositionViaIOCTLRead
 0.052: In  buffer len: 0.000 ms
 0.052: Out buffer len: 57.959 ms
 0.052: Timeout = 464 ms
 0.052: PreparePinsForStart = 0
 0.052: Processing thread started!
Play for 5 seconds.

The only difference I can see is BytesPerOutputFrame is reported as 6 with the original hardware I tested with (ASUS Xonar U7 using the Microsoft standard usbaudio2.sys driver), but with VAC it is 8. Not sure why. Maybe that's relevant, maybe not.

@dechamps
Copy link
Contributor Author

If I force the WDM-KS Host API code to open the KS pin with paInt16 instead of paInt24, I cannot reproduce.

When running against VAC the Host API code would normally select a "24-bit in 32-bit container" format, whereas with the offending hardware it selects a "24-bit in 24-bit container" format.

I wondered if the issue had something to do with that, so I forced the code to select 24-bit in 24-bit with VAC too. No dice: it doesn't reproduce in that setup.

I can't test 24-bit in 32-bit with the offending hardware because it rejects that format.

So basically, this is only reproducible on specific hardware (ASUS Xonar U7 using the Microsoft standard usbaudio2.sys driver), and only when feeding it 24-bit samples - not 16-bit samples.

@dechamps
Copy link
Contributor Author

dechamps commented Jun 10, 2024

Okay here's something a bit more interesting.

If I comment out this code:

if (pPin->pinKsSubType != SubType_kPolled)
{
/* In case of unknown (or notification), we try both modes */
result = PinGetBufferWithNotification(pPin, pBuffer, pRequestedBufSize, pbCallMemBarrier);
if (result == paNoError)
{
PA_DEBUG(("PinGetBuffer: SubType_kNotification\n"));
pPin->pinKsSubType = SubType_kNotification;
break;
}
}

Then the issue disappears(!) and the debug log reads:

 0.052: OpenStream:sampleRate = 44100.000000
 0.052: OpenStream:framesPerBuffer = 64
 0.052: Trying output channels 2, format 4 WAVEFORMATEXTENSIBLE
 0.064: Pin create result = 0x00000000
 0.064: Render pin frames: 0
 0.064: Output frames chosen:1278
 0.065: PinGetBuffer: SubType_kPolled
 0.065: Output buffer start = 00000230433C0000, size = 24576, membarrier = 0
 0.065: Buffer length changed by driver from 15336 to 24576 !
 0.065: BytesPerInputFrame = 0
 0.065: BytesPerOutputFrame = 6
 0.065: Failed to register position register
 0.065: Failed to register rendering position register, using PinGetAudioPositionViaIOCTLRead
 0.065: In  buffer len: 0.000 ms
 0.065: Out buffer len: 92.880 ms
 0.065: Timeout = 744 ms
 0.074: PreparePinsForStart = 0
 0.074: Timer event handles=0x0000,0x0204 period=9 ms 0.074: Waitable timer started, period = 9 ms
 0.075: Processing thread started!

Of particular note is the following:

Buffer length changed by driver from 15336 to 24576 !

If I put the original code back in, but I force it to use 24576 as the buffer size, then it works fine.

In other words: if I force the code to go through PinGetBufferWithoutNotification (i.e. KSPROPERTY_RTAUDIO_BUFFER) then the driver adjusts the buffer size and the resulting buffer size works. But if the code uses PinGetBufferWithNotification (i.e. KSPROPERTY_RTAUDIO_BUFFER_WITH_NOTIFICATION) then it looks like the driver wrongly accepts a buffer size that it does not actually support, resulting in choppy audio.

That behavior where the driver rejects some buffer sizes when going through PinGetBufferWithoutNotification only happens when configured with a 24-bit sample type. When using 16-bit, the driver seems to accept any buffer size and behaves properly with all the buffer sizes it accepts.

As far as I can tell it doesn't look like PortAudio is doing anything wrong here. This may just be a driver bug which we may want to deploy a workaround for.

One thing to note is I'm unable to trigger this issue with the normal Windows audio pipeline (i.e. Windows Audio Engine), even when using a WASAPI Exclusive event mode client (e.g. foobar2000). It may be worth looking into the KS calls that WASAPI is making in that case, as it looks like it's capable of avoiding that apparent driver bug somehow.

@dechamps
Copy link
Contributor Author

One thing to note is I'm unable to trigger this issue with the normal Windows audio pipeline (i.e. Windows Audio Engine), even when using a WASAPI Exclusive event mode client (e.g. foobar2000)

Turns out I was doing it wrong - foobar2000 has a separate option for the WASAPI Exclusive buffer size which is distinct from the main buffer size. If I use the correct option, then I can reproduce the issue there to some extent, but the symptoms don't seem to match exactly (e.g. foobar2000 can trigger it with 16-bit) so I can't confirm it's the exact same issue.

Next, I looked into which WaveRT calls the Windows Audio Engine is making to set up the stream. For the curious, this can be done by attaching WinDbg to audiodg.exe (which is the system process under audiosrv that hosts the shared Windows audio pipeline) and then running:

ld Windows_Media_Devices
dx @$IOCTL_KS_PROPERTY = 0x002f0003
dx @$KSPROPSETID_RTAudio_Data1 = 0xa855a48c
dx @$KSPROPERTY_RTAUDIO_BUFFER_WITH_NOTIFICATION = 5
bp /w "@rdx == @$IOCTL_KS_PROPERTY && ((KSPROPERTY*)@r8)->Set.Data1 == @$KSPROPSETID_RTAudio_Data1 && ((KSPROPERTY*)@r8)->Id == @$KSPROPERTY_RTAUDIO_BUFFER_WITH_NOTIFICATION" KERNELBASE!DeviceIoControl "dx ((KSRTAUDIO_BUFFER_PROPERTY_WITH_NOTIFICATION*)@r8); bp /1 $ra \"dx *(KSRTAUDIO_BUFFER**)(@rsp+4*sizeof(void*)); g\"; g"

The output, when the device is configured as 24-bit 2-channel 44.1 kHz in the Windows audio settings, is:

((KSRTAUDIO_BUFFER_PROPERTY_WITH_NOTIFICATION*)@r8)                 : 0xc9ce9fda10 [Type: KSRTAUDIO_BUFFER_PROPERTY_WITH_NOTIFICATION *]
    [+0x000] Property         [Type: KSIDENTIFIER]
    [+0x018] BaseAddress      : 0x0 [Type: void *]
    [+0x020] RequestedBufferSize : 0x14ac [Type: unsigned long]
    [+0x024] NotificationCount : 0x2 [Type: unsigned long]
*(KSRTAUDIO_BUFFER**)(@rsp+4*sizeof(void*))                 : 0xc9ce9fda00 [Type: KSRTAUDIO_BUFFER *]
    [+0x000] BufferAddress    : 0x1e832460000 [Type: void *]
    [+0x008] ActualBufferSize : 0x14ac [Type: unsigned long]
    [+0x00c] CallMemoryBarrier : 0 [Type: int]

The buffer size is exactly 20 ms long, which is unsurprising as the periodicity of the Windows Audio Engine is 10 ms and this is a double buffer.

Now, since the Windows Audio Engine is able to deliver non-choppy audio with a 20 ms 24-bit 2-channel 44.1 kHz buffer, I would have expected paex_sine to be able to do the same, but that is not the case. Indeed, if I set suggestedLatency to 0.010, resulting in the exact same WaveRT buffer size as above, the audio is choppy.

This would seem to suggest that this is not just a driver limitation/bug - the Windows Audio Engine is able to play audio just fine on that device using the exact same WaveRT parameters as PortAudio, but PortAudio can't. We now need to figure out what is the difference between the two that triggers this.

@dechamps
Copy link
Contributor Author

The above makes me suspect that this doesn't really have anything to do with the buffer size itself, and more to do with how the PortAudio WDM-KS code is handling the buffers during playback.

Specifically, I started to suspect that the code may be getting confused about which WaveRT buffer half it's supposed to write to. If the code writes to the wrong buffer half, this would of course explain the symptoms. What made me even more suspicious is that the problem disappears if I manipulate the sine wave to make both halves the same (e.g. a 10 ms sine wave with a 10 ms half buffer size), which would indeed hide that kind of problem.

I looked at the relevant code and was shocked to discover that if I comment out the following line, I can't reproduce the issue anymore!

/* And align it, not sure its really needed though */
pos &= ~(pRender->bytesPerFrame - 1);

It looks like this "alignment" is not just unnecessary, it's downright harmful.

Taking a closer look, in the offending case bytesPerFrame is 6 (2-channel 24-bit), meaning pos is masked with 0x5, i.e. 0b0101, which is nonsensical.

For example, in the case where the sample rate is 44100 Hz and suggestedLatency is 0.010, the total buffer size is 5292, and the driver-reported position will flip back and forth between 0 and 2646 (half the buffer size). But the reported position is then wrongly "adjusted" to 2642, which means the code will always write into the first half and never write into the second half!

This explains all the symptoms, including why the issue seems to relate randomly to suggestedLatency values (different buffer sizes will result in different "bit flips" from the wrong mask), and also why it is not reproducible on 16-bit samples (because in that case the mask is 0x1 which is actually correct).

dechamps added a commit to dechamps/portaudio that referenced this issue Jun 15, 2024
The previous code is only correct if `bytesPerFrame` is a power of two.
If it's not (e.g. 2-channel 24-bit = 6 bytes per frame), then the
computed buffer position is corrupted. This can lead the code to use
the wrong buffer half, resulting in glitchy audio.

Fixes PortAudio#763
@RossBencina
Copy link
Collaborator

Nice investigation there Etienne

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Priority: Normal src-wdmks MS WDM/KS Host API /src/hostapi/wdmks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants