Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-discovery for input device fails on some linux systems #2292

Closed
drmaniac opened this issue Mar 8, 2024 · 15 comments
Closed

auto-discovery for input device fails on some linux systems #2292

drmaniac opened this issue Mar 8, 2024 · 15 comments
Assignees
Labels
accepted Issue moved to product team backlog. Will be closed when addressed. enhancement New feature or request to be released The fix is merged, to be released. update needed For items that are in progress but have not been updated

Comments

@drmaniac
Copy link

drmaniac commented Mar 8, 2024

Mic not found
Exception with an error code: 0xe (SPXERR_MIC_NOT_AVAILABLE)

The SpeechSDK generally works with Arch Linux and Pipewire. However, I encountered an issue where the auto-discovery feature didn't detect my microphone (SPXERR_MIC_NOT_AVAILABLE).
On the same computer it is working with ubuntu(23.10) also with pipewire.

Example: https://gist.github.com/drmaniac/23a4faf462caabd57f0175f281739ada lines 25-26

This is related to microsoft/vscode#205758

Expected behavior
I would expect that auto-discovery on linux is more robust.

Version of the Cognitive Services Speech SDK

SpeechSDK-Linux-1.36.0

Platform, Operating System, and Programming Language

  • OS: Linux (arch fully patched)
  • Hardware - x64
  • Programming language: C++

Additional context

terminate called after throwing an instance of 'std::runtime_error'
  what():  Exception with an error code: 0xe (SPXERR_MIC_NOT_AVAILABLE)
[CALL STACK BEGIN]

/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.extension.audio.sys.so(+0xe1e9) [0x7b100da0e1e9]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1e60ed) [0x7b100fde60ed]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1010f4) [0x7b100fd010f4]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1a8613) [0x7b100fda8613]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0xf2841) [0x7b100fcf2841]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1e60ed) [0x7b100fde60ed]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1010f4) [0x7b100fd010f4]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x195f5c) [0x7b100fd95f5c]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x19babf) [0x7b100fd9babf]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x133cef) [0x7b100fd33cef]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x133cef) [0x7b100fd33cef]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x187563) [0x7b100fd87563]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x140af9) [0x7b100fd40af9]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x1f9508) [0x7b100fdf9508]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x13e585) [0x7b100fd3e585]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(+0x216e34) [0x7b100fe16e34]
/home/christian/git/test/SpeechSDK-Linux-1.36.0/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so(recognizer_create_speech_recognizer_from_config+0x10b) [0x7b100fcbe8fd]
[CALL STACK END]
@aitor
Copy link

aitor commented Mar 8, 2024

I just wanted to confirm that this issue is present in Pop!_OS 22.04 LTS, too.

@pankopon
Copy link
Contributor

Hi, the SDK uses the ALSA API (snd_pcm_info, snd_pcm_open etc.) to access audio devices on Linux.
If an application does not specify the input device, a system default microphone is assumed.
Alternatively the application can use a device id to specify the audio device.
Based on microsoft/vscode#205758 (comment) it seems the microphone in the example case was not the default microphone for ALSA.
As noted, this can be configured as described in ALSA documentation and e.g. ArchWiki.

@pankopon pankopon self-assigned this Mar 11, 2024
@pankopon pankopon added in-review In review pending close Closed soon without new activity labels Mar 11, 2024
@aitor
Copy link

aitor commented Mar 11, 2024

I can confirm that using the id obtained by arecord -l or cat /proc/asound/cards to set the default microphone in ~/.asoundrc as described by @drmaniac in microsoft/vscode#205758 (comment) works correctly in PopOS 🎉 . Thank you for the info @pankopon!

@drmaniac
Copy link
Author

The .asoundrc workaround (microsoft/vscode/issues/205758) highlights a potential discrepancy in device selection between the Speech SDK and standard ALSA tools. Since arecord and others correctly use Pipewire/WirePlumber's default device, it suggests the SDK may have a unique selection logic.

Can the Speech SDK development team investigate and potentially adapt its device selection to align with Pipewire/WirePlumber conventions?

@pankopon
Copy link
Contributor

@drmaniac Are you sure that arecord can record audio from a non-default microphone, possibly among multiple input devices, without explicitly specifying the device?

The SDK uses the standard ALSA API. Sound servers like PulseAudio, PipeWire etc. work on top of ALSA, not the other way round.
Unfortunately we cannot plan to support potentially dozens of different Linux environments.
Instead, there is a list of a select few reference distributions.
These are verified to work with the Speech SDK in their default configurations. Any other environments are expected to be configurable (by the application or the user) for the same effect.

@drmaniac
Copy link
Author

drmaniac commented Mar 11, 2024

I played a little bit.

It looks like AudioConfig::FromDefaultMicrophoneInput() might not always correctly detect the default device. Using AudioConfig::FromMicrophoneInput("default") seems to work reliably. I confirmed this behavior by changing my default microphone settings in GNOME.

Also my test with snd_pcm_open(&capture_handle, "default", SND_PCM_STREAM_CAPTURE, 0) have the same working behavior.

So maybe it's only a missing documentation which seams to cause so many confusions.

@drmaniac Are you sure that arecord can record audio from a non-default microphone, possibly among multiple input devices, without explicitly specifying the device?

As far as I see arecord does the same with the 'default' string https://git.alsa-project.org/?p=alsa-utils.git;a=blob;f=aplay/aplay.c

@pankopon To answer your question more specifically, WirePlumber sets the default audio alsa device dynamically, so there is a default device but not in any configuration file.

Edit:
I installed a older Ubuntu 20.04 in a vbox and tested the AudioConfig with the "default" string. It works also on non pipewire enabled systems.

@pankopon
Copy link
Contributor

@drmaniac Thank you for the details.
Currently the SDK tries to find a capture device, if not explicitly specified, based on snd_device_name_hint enumeration.
But maybe this does not work in all environments today.
We could improve detection reliability by directly trying "default" before the existing fallback mechanism.
I believe this does require that the system indeed has a device named "default" as in e.g.

$ arecord -L
default
    Playback/recording through the PulseAudio sound server
null
    Discard all samples (playback) or generate zero samples (capture)
pulse
    PulseAudio Sound Server

Would you be willing to try out the change in environments where you previously were able to reproduce the issue, so that we can get more coverage?
There would be only an updated libMicrosoft.CognitiveServices.Speech.extension.audio.sys.so library file to add on top of the latest Speech SDK release (1.36.0).

@drmaniac
Copy link
Author

@drmaniac Thank you for the details. Currently the SDK tries to find a capture device, if not explicitly specified, based on snd_device_name_hint enumeration. But maybe this does not work in all environments today. We could improve detection reliability by directly trying "default" before the existing fallback mechanism. I believe this does require that the system indeed has a device named "default" as in e.g.

$ arecord -L
default
    Playback/recording through the PulseAudio sound server
null
    Discard all samples (playback) or generate zero samples (capture)
pulse
    PulseAudio Sound Server

I agree that must be then a minimum requirement. On the other hand this would be normaly managed by a sound server like pulseaudio, or pipewire.

To have more values here are my list of configured PCM devices.

Arch Linux (completely updated)

❯ arecord -L
null
    Discard all samples (playback) or generate zero samples (capture)
pipewire
    PipeWire Sound Server
pulse
    PulseAudio Sound Server
default
    Default ALSA Output (currently PipeWire Media Server)
...

Ubuntu 20.04 LTS (fresh installation, no specific sound configuration done)

$ arecord -L
default
    Playback/recording through the PulseAudio sound server
null
    Discard all samples (playback) or generate zero samples (capture)
pulse
    PulseAudio Sound Server

Ubuntu 23.10 (fresh installation, no specific sound configuration done)

$ LC_ALL=C arecord -L 
null
    Discard all samples (playback) or generate zero samples (capture)
pipewire
    PipeWire Sound Server
default
    Default ALSA Output (currently PipeWire Media Server)

Would you be willing to try out the change in environments where you previously were able to reproduce the issue, so that we can get more coverage? There would be only an updated libMicrosoft.CognitiveServices.Speech.extension.audio.sys.so library file to add on top of the latest Speech SDK release (1.36.0).

Sure, i can do it.

@pankopon
Copy link
Contributor

@drmaniac Please see attached a zip with libMicrosoft.CognitiveServices.Speech.extension.audio.sys.so for x64 and arm64: libaudiosys.zip

Replace the original library file from the Speech SDK 1.36.0 release with this updated one:

If this version does not work in some environment, please post the output of arecord -L and if possible, check whether AudioConfig::FromMicrophoneInput("default") would have worked either.

@pankopon pankopon removed the pending close Closed soon without new activity label Mar 13, 2024
@drmaniac
Copy link
Author

@pankopon I have tested [library name] on the following systems:

  • Ubuntu 20.04 LTS ✅
  • Ubuntu 22.04 LTS ✅
  • Ubuntu 23.10 ✅
  • Arch x64 (fully updated) ✅

Initial results indicate positive compatibility. I'll also be testing on a Raspberry Pi 4 with Arch. Will provide results soon.

@pankopon
Copy link
Contributor

@drmaniac Many thanks for testing this. Based on your results so far it seems the change is good. So if there are no further updates, we'll include it in the next Speech SDK release (1.37.0).

@pankopon pankopon added enhancement New feature or request accepted Issue moved to product team backlog. Will be closed when addressed. and removed in-review In review labels Mar 21, 2024
@pankopon
Copy link
Contributor

Internal work item ref. 6862878.

@pankopon pankopon added the to be released The fix is merged, to be released. label Mar 22, 2024
Copy link

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

@github-actions github-actions bot added the update needed For items that are in progress but have not been updated label Apr 11, 2024
@pankopon
Copy link
Contributor

Changes have been released (Speech SDK 1.37.0).

@bpasero
Copy link

bpasero commented Apr 15, 2024

Thanks a lot ❤️ !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Issue moved to product team backlog. Will be closed when addressed. enhancement New feature or request to be released The fix is merged, to be released. update needed For items that are in progress but have not been updated
Projects
None yet
Development

No branches or pull requests

4 participants