auto-discovery for input device fails on some linux systems #2292

drmaniac · 2024-03-08T07:47:21Z

Mic not found
Exception with an error code: 0xe (SPXERR_MIC_NOT_AVAILABLE)

The SpeechSDK generally works with Arch Linux and Pipewire. However, I encountered an issue where the auto-discovery feature didn't detect my microphone (SPXERR_MIC_NOT_AVAILABLE).
On the same computer it is working with ubuntu(23.10) also with pipewire.

Example: https://gist.github.com/drmaniac/23a4faf462caabd57f0175f281739ada lines 25-26

This is related to microsoft/vscode#205758

Expected behavior
I would expect that auto-discovery on linux is more robust.

Version of the Cognitive Services Speech SDK

SpeechSDK-Linux-1.36.0

Platform, Operating System, and Programming Language

OS: Linux (arch fully patched)
Hardware - x64
Programming language: C++

Additional context

terminate called after throwing an instance of 'std::runtime_error'
  what():  Exception with an error code: 0xe (SPXERR_MIC_NOT_AVAILABLE)
[CALL STACK BEGIN]

/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.extension.audio.sys.so(+0xe1e9) [0x7b100da0e1e9]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1e60ed) [0x7b100fde60ed]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1010f4) [0x7b100fd010f4]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1a8613) [0x7b100fda8613]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0xf2841) [0x7b100fcf2841]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1e60ed) [0x7b100fde60ed]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1010f4) [0x7b100fd010f4]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x195f5c) [0x7b100fd95f5c]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x19babf) [0x7b100fd9babf]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x133cef) [0x7b100fd33cef]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x133cef) [0x7b100fd33cef]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x187563) [0x7b100fd87563]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x140af9) [0x7b100fd40af9]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1f9508) [0x7b100fdf9508]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x13e585) [0x7b100fd3e585]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x216e34) [0x7b100fe16e34]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(recognizer_create_speech_recognizer_from_config+0x10b) [0x7b100fcbe8fd]
[CALL STACK END]

The text was updated successfully, but these errors were encountered:

aitor · 2024-03-08T08:30:55Z

I just wanted to confirm that this issue is present in Pop!_OS 22.04 LTS, too.

pankopon · 2024-03-11T19:17:13Z

Hi, the SDK uses the ALSA API (snd_pcm_info, snd_pcm_open etc.) to access audio devices on Linux.
If an application does not specify the input device, a system default microphone is assumed.
Alternatively the application can use a device id to specify the audio device.
Based on microsoft/vscode#205758 (comment) it seems the microphone in the example case was not the default microphone for ALSA.
As noted, this can be configured as described in ALSA documentation and e.g. ArchWiki.

aitor · 2024-03-11T19:33:26Z

I can confirm that using the id obtained by arecord -l or cat /proc/asound/cards to set the default microphone in ~/.asoundrc as described by @drmaniac in microsoft/vscode#205758 (comment) works correctly in PopOS 🎉 . Thank you for the info @pankopon!

drmaniac · 2024-03-11T20:57:17Z

The .asoundrc workaround (microsoft/vscode/issues/205758) highlights a potential discrepancy in device selection between the Speech SDK and standard ALSA tools. Since arecord and others correctly use Pipewire/WirePlumber's default device, it suggests the SDK may have a unique selection logic.

Can the Speech SDK development team investigate and potentially adapt its device selection to align with Pipewire/WirePlumber conventions?

pankopon · 2024-03-11T21:49:34Z

@drmaniac Are you sure that arecord can record audio from a non-default microphone, possibly among multiple input devices, without explicitly specifying the device?

The SDK uses the standard ALSA API. Sound servers like PulseAudio, PipeWire etc. work on top of ALSA, not the other way round.
Unfortunately we cannot plan to support potentially dozens of different Linux environments.
Instead, there is a list of a select few reference distributions.
These are verified to work with the Speech SDK in their default configurations. Any other environments are expected to be configurable (by the application or the user) for the same effect.

drmaniac · 2024-03-11T22:36:24Z

I played a little bit.

It looks like AudioConfig::FromDefaultMicrophoneInput() might not always correctly detect the default device. Using AudioConfig::FromMicrophoneInput("default") seems to work reliably. I confirmed this behavior by changing my default microphone settings in GNOME.

Also my test with snd_pcm_open(&capture_handle, "default", SND_PCM_STREAM_CAPTURE, 0) have the same working behavior.

So maybe it's only a missing documentation which seams to cause so many confusions.

@drmaniac Are you sure that arecord can record audio from a non-default microphone, possibly among multiple input devices, without explicitly specifying the device?

As far as I see arecord does the same with the 'default' string https://git.alsa-project.org/?p=alsa-utils.git;a=blob;f=aplay/aplay.c

@pankopon To answer your question more specifically, WirePlumber sets the default audio alsa device dynamically, so there is a default device but not in any configuration file.

Edit:
I installed a older Ubuntu 20.04 in a vbox and tested the AudioConfig with the "default" string. It works also on non pipewire enabled systems.

pankopon · 2024-03-13T00:37:49Z

@drmaniac Thank you for the details.
Currently the SDK tries to find a capture device, if not explicitly specified, based on snd_device_name_hint enumeration.
But maybe this does not work in all environments today.
We could improve detection reliability by directly trying "default" before the existing fallback mechanism.
I believe this does require that the system indeed has a device named "default" as in e.g.

$ arecord -L
default
    Playback/recording through the PulseAudio sound server
null
    Discard all samples (playback) or generate zero samples (capture)
pulse
    PulseAudio Sound Server

Would you be willing to try out the change in environments where you previously were able to reproduce the issue, so that we can get more coverage?
There would be only an updated libMicrosoft.CognitiveServices.Speech.extension.audio.sys.so library file to add on top of the latest Speech SDK release (1.36.0).

drmaniac · 2024-03-13T07:50:20Z

@drmaniac Thank you for the details. Currently the SDK tries to find a capture device, if not explicitly specified, based on snd_device_name_hint enumeration. But maybe this does not work in all environments today. We could improve detection reliability by directly trying "default" before the existing fallback mechanism. I believe this does require that the system indeed has a device named "default" as in e.g.
$ arecord -L
default
    Playback/recording through the PulseAudio sound server
null
    Discard all samples (playback) or generate zero samples (capture)
pulse
    PulseAudio Sound Server

I agree that must be then a minimum requirement. On the other hand this would be normaly managed by a sound server like pulseaudio, or pipewire.

To have more values here are my list of configured PCM devices.

Arch Linux (completely updated)

❯ arecord -L
null
    Discard all samples (playback) or generate zero samples (capture)
pipewire
    PipeWire Sound Server
pulse
    PulseAudio Sound Server
default
    Default ALSA Output (currently PipeWire Media Server)
...

Ubuntu 20.04 LTS (fresh installation, no specific sound configuration done)

$ arecord -L
default
    Playback/recording through the PulseAudio sound server
null
    Discard all samples (playback) or generate zero samples (capture)
pulse
    PulseAudio Sound Server

Ubuntu 23.10 (fresh installation, no specific sound configuration done)

$ LC_ALL=C arecord -L 
null
    Discard all samples (playback) or generate zero samples (capture)
pipewire
    PipeWire Sound Server
default
    Default ALSA Output (currently PipeWire Media Server)

Would you be willing to try out the change in environments where you previously were able to reproduce the issue, so that we can get more coverage? There would be only an updated libMicrosoft.CognitiveServices.Speech.extension.audio.sys.so library file to add on top of the latest Speech SDK release (1.36.0).

Sure, i can do it.

pankopon · 2024-03-13T20:33:48Z

@drmaniac Please see attached a zip with libMicrosoft.CognitiveServices.Speech.extension.audio.sys.so for x64 and arm64: libaudiosys.zip

Replace the original library file from the Speech SDK 1.36.0 release with this updated one:

If you download .tar.gz from https://aka.ms/csspeech/linuxbinary then just overwrite the original extracted file in lib\x64 or lib\arm64.
If you use .nupkg from https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech/ then you may need to replace the library file in the build output directory every time you compile the project.

If this version does not work in some environment, please post the output of arecord -L and if possible, check whether AudioConfig::FromMicrophoneInput("default") would have worked either.

drmaniac · 2024-03-14T20:56:12Z

@pankopon I have tested [library name] on the following systems:

Ubuntu 20.04 LTS ✅
Ubuntu 22.04 LTS ✅
Ubuntu 23.10 ✅
Arch x64 (fully updated) ✅

Initial results indicate positive compatibility. I'll also be testing on a Raspberry Pi 4 with Arch. Will provide results soon.

pankopon · 2024-03-21T22:05:48Z

@drmaniac Many thanks for testing this. Based on your results so far it seems the change is good. So if there are no further updates, we'll include it in the next Speech SDK release (1.37.0).

pankopon · 2024-03-22T19:20:24Z

Internal work item ref. 6862878.

github-actions · 2024-04-11T02:08:33Z

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

pankopon · 2024-04-11T23:35:03Z

Changes have been released (Speech SDK 1.37.0).

bpasero · 2024-04-15T11:37:13Z

Thanks a lot ❤️ !

pankopon self-assigned this Mar 11, 2024

pankopon added in-review In review pending close Closed soon without new activity labels Mar 11, 2024

pankopon removed the pending close Closed soon without new activity label Mar 13, 2024

pankopon added enhancement New feature or request accepted Issue moved to product team backlog. Will be closed when addressed. and removed in-review In review labels Mar 21, 2024

pankopon added the to be released The fix is merged, to be released. label Mar 22, 2024

github-actions bot added the update needed For items that are in progress but have not been updated label Apr 11, 2024

pankopon closed this as completed Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto-discovery for input device fails on some linux systems #2292

auto-discovery for input device fails on some linux systems #2292

drmaniac commented Mar 8, 2024

aitor commented Mar 8, 2024

pankopon commented Mar 11, 2024

aitor commented Mar 11, 2024

drmaniac commented Mar 11, 2024

pankopon commented Mar 11, 2024

drmaniac commented Mar 11, 2024 •

edited

pankopon commented Mar 13, 2024

drmaniac commented Mar 13, 2024

pankopon commented Mar 13, 2024

drmaniac commented Mar 14, 2024

pankopon commented Mar 21, 2024

pankopon commented Mar 22, 2024

github-actions bot commented Apr 11, 2024

pankopon commented Apr 11, 2024

bpasero commented Apr 15, 2024

auto-discovery for input device fails on some linux systems #2292

auto-discovery for input device fails on some linux systems #2292

Comments

drmaniac commented Mar 8, 2024

aitor commented Mar 8, 2024

pankopon commented Mar 11, 2024

aitor commented Mar 11, 2024

drmaniac commented Mar 11, 2024

pankopon commented Mar 11, 2024

drmaniac commented Mar 11, 2024 • edited

pankopon commented Mar 13, 2024

drmaniac commented Mar 13, 2024

pankopon commented Mar 13, 2024

drmaniac commented Mar 14, 2024

pankopon commented Mar 21, 2024

pankopon commented Mar 22, 2024

github-actions bot commented Apr 11, 2024

pankopon commented Apr 11, 2024

bpasero commented Apr 15, 2024

drmaniac commented Mar 11, 2024 •

edited