You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since cpal v0.17.2, default_input_config() for "Communications-class" USB microphones on Windows 11 24H2 returns 16 kHz mono F32 (the system Communications mix format), and the resulting WASAPI capture stream delivers genuine zero/near-zero samples — i.e. silence at the noise floor — while the same physical microphone records normal speech levels via DirectShow on the same machine at the same moment.
The regression bisects to PR #1097 — "wasapi: Enable resampling and rate adjustment" (merged 2026-01-29, released in v0.17.2 on 2026-02-08). My downstream users started reporting "audio looks like it's capturing, but the files are basically silent with a bit of white noise" exactly when their auto-updater pulled them through the cpal-bump release.
A precise measurement, same mic + same speaker + same minute:
That's an ~82 dB delta = ~12,600× attenuation. It's not an offset; the samples are genuinely zero, not misinterpreted bytes from a format mismatch (see hex dump below).
Environment
OS: Windows 11 Pro 24H2 (build 10.0.26200, Insider)
cpal: v0.18.0 (downstream fork pinned to a commit based on upstream main; behavior is the same as v0.17.2+)
Hardware (reproduced on both): USB headset (Jabra Evolve 75) and USB webcam (Logi C270 HD WebCam)
Working baseline: Same machine, same mics, ffmpeg via DirectShow → normal speech levels
Not affected on the same machine: Built-in Microphone Array (Intel Smart Sound Technology) — exposes 48 kHz stereo via WASAPI and records normally. Only the USB Communications-class endpoints are silent.
Reproduction
Use a USB headset or USB webcam mic that Windows registers as a Communications-class endpoint on Win11 24H2 (verifiable: mmsys.cpl → Recording → properties → the device is set as both Default Device AND Default Communications Device).
Enumerate via cpal:
use cpal::traits::{DeviceTrait,HostTrait};fnmain(){let host = cpal::default_host();for d in host.input_devices().unwrap(){let name = d.name().unwrap_or("?".into());println!("=== {} ===", name);ifletOk(c) = d.default_input_config(){println!(" default: {:?} {} ch @ {} Hz",
c.sample_format(), c.channels(), c.sample_rate().0);}ifletOk(configs) = d.supported_input_configs(){for c in configs {println!(" supported: {:?} {} ch @ {}-{} Hz",
c.sample_format(), c.channels(),
c.min_sample_rate().0, c.max_sample_rate().0);}}}}
Note: cpal exposes only 16 kHz for these devices — which is not a native hardware rate. ffmpeg -f dshow -list_options true for the same devices lists 8000 / 11025 / 22050 / 32000 / 44100 / 48000 / 96000 Hz × 1/2 ch × 8/16-bit. 16 kHz is the Windows Communications-class mix format, and AUTOCONVERTPCM is what makes WASAPI accept that rate via server-side resampling.
Build an input stream with default_input_config() and dump samples. Result: stream callbacks fire at the expected rate, but every sample value is 0, ±1, or extremely-near-zero noise. Decoded as s16le, the first 256 bytes of one capture look like:
For comparison, the same mic captured via DirectShow in the same second (decoded to s16le):
C2 FE 53 FE 87 FE A0 FE EE FE 4F FF 7E FF D9 FF 20 00 23 00 83 00 1A 01
4E 01 5F 01 8D 01 BF 01 F3 01 1F 02 3C 02 5D 02 8E 02 CF 02 1C 03 73 03
[…normal speech signal continues]
The cpal samples are not misinterpreted bytes from a format mismatch — they are genuinely zero. The format negotiation succeeds; the stream just doesn't carry any signal.
On Win11 24H2 specifically, the WASAPI audio engine appears to apply a privacy/Communications policy when a non-Communications consumer opens a Communications-class endpoint at the Communications mix format (16 kHz F32 mono): Initialize succeeds, the stream "plays," callbacks fire — but the samples delivered are zero.
Reverting to v0.15.3 (the last release before AUTOCONVERTPCM) restores normal capture on the exact same hardware. (We confirmed the timeline: our downstream stopped working when users were rolled past the cpal v0.17.2+ release; no other audio-code changes correlate.)
The Intel Smart Sound mic array on the same machine is NOT a Communications-class endpoint, exposes 48 kHz stereo via WASAPI (without AUTOCONVERTPCM in play), and records normally.
Or: probe for silent streams during stream setup. A 100–500 ms post-Start check — if RMS is exactly zero over the first N buffers, retry with AUTOCONVERTPCM off and use the device's actual hardware mix format from GetMixFormat on the endpoint's eMultimedia role (instead of eCommunications).
Or: pick the endpoint role explicitly.IMMDeviceEnumerator::GetDefaultAudioEndpoint(eCapture, eMultimedia) returns a different audio session policy than eCommunications, even for the same physical device. cpal currently doesn't expose role selection; exposing it (or defaulting to eMultimedia for non-RT use cases) sidesteps the policy gate entirely.
We're screenpipe — Rust + Tauri app that records audio + accessibility text continuously. We started seeing user reports immediately after our auto-updater rolled cpal v0.17.2+ to Windows users. Diagnosis credit to one of our users (William Lucas) who built the DirectShow baseline + WASAPI hex dump to isolate the regression to the cpal capture layer.
Summary
Since cpal v0.17.2,
default_input_config()for "Communications-class" USB microphones on Windows 11 24H2 returns16 kHz mono F32(the system Communications mix format), and the resulting WASAPI capture stream delivers genuine zero/near-zero samples — i.e. silence at the noise floor — while the same physical microphone records normal speech levels via DirectShow on the same machine at the same moment.The regression bisects to PR #1097 — "wasapi: Enable resampling and rate adjustment" (merged 2026-01-29, released in v0.17.2 on 2026-02-08). My downstream users started reporting "audio looks like it's capturing, but the files are basically silent with a bit of white noise" exactly when their auto-updater pulled them through the cpal-bump release.
A precise measurement, same mic + same speaker + same minute:
ffmpeg -f dshow -i audio="<mic>"That's an ~82 dB delta = ~12,600× attenuation. It's not an offset; the samples are genuinely zero, not misinterpreted bytes from a format mismatch (see hex dump below).
Environment
main; behavior is the same as v0.17.2+)Microphone Array (Intel Smart Sound Technology)— exposes 48 kHz stereo via WASAPI and records normally. Only the USB Communications-class endpoints are silent.Reproduction
mmsys.cpl→ Recording → properties → the device is set as both Default Device AND Default Communications Device).Output on the affected machine:
Note: cpal exposes only 16 kHz for these devices — which is not a native hardware rate.
ffmpeg -f dshow -list_options truefor the same devices lists 8000 / 11025 / 22050 / 32000 / 44100 / 48000 / 96000 Hz × 1/2 ch × 8/16-bit. 16 kHz is the Windows Communications-class mix format, and AUTOCONVERTPCM is what makes WASAPI accept that rate via server-side resampling.default_input_config()and dump samples. Result: stream callbacks fire at the expected rate, but every sample value is 0, ±1, or extremely-near-zero noise. Decoded ass16le, the first 256 bytes of one capture look like:For comparison, the same mic captured via DirectShow in the same second (decoded to
s16le):The cpal samples are not misinterpreted bytes from a format mismatch — they are genuinely zero. The format negotiation succeeds; the stream just doesn't carry any signal.
Why I believe PR #1097 is the cause
AUDCLNT_STREAMFLAGS_AUTOCONVERTPCMin WASAPIInitializeso non-native rates can be requested through the server-side resampler.Initializesucceeds, the stream "plays," callbacks fire — but the samples delivered are zero.Suggested fix directions
build_output_streamfails on Windows 10 if the specified sample rate does not match the output device's default sample rate #593) was solving a build-time failure when users request non-native rates; AUTOCONVERTPCM is one valid solution, but for callers who request the device's native rate (or who usedefault_input_config()expecting a usable stream), the flag introduces silent-failure risk on Win11 24H2.Startcheck — if RMS is exactly zero over the first N buffers, retry with AUTOCONVERTPCM off and use the device's actual hardware mix format fromGetMixFormaton the endpoint'seMultimediarole (instead ofeCommunications).IMMDeviceEnumerator::GetDefaultAudioEndpoint(eCapture, eMultimedia)returns a different audio session policy thaneCommunications, even for the same physical device. cpal currently doesn't expose role selection; exposing it (or defaulting toeMultimediafor non-RT use cases) sidesteps the policy gate entirely.Happy to test patches against the affected hardware here. cc @yeah-its-gloria @roderickvd.
Downstream context
We're screenpipe — Rust + Tauri app that records audio + accessibility text continuously. We started seeing user reports immediately after our auto-updater rolled cpal v0.17.2+ to Windows users. Diagnosis credit to one of our users (William Lucas) who built the DirectShow baseline + WASAPI hex dump to isolate the regression to the cpal capture layer.