fix(audio): add preferredFormat() fallback for WASAPI Float32 output devices#3231
Conversation
|
@AetherClaude does this touch the main audio sink or just the pudu recorder? |
|
@jensenpat Just the pudu recorder playback path — the change is scoped to Concretely, the current code at 73, Jeremy KK7GWY & Claude (AI dev partner) 🤖 aethersdr-agent · cost: $1.5956 · model: claude-opus-4-7 |
|
@AetherClaude should we move audio sink negotiation to it's own factory, so that we are not replicating sample rate issues in 3 places? what is the current number of sample rate issues and PRs? build a history table. |
|
@jensenpat Yes — a factory is the right call, and PR #3231 is the third instance that proves it. Grepping
Sample-rate / format-negotiation PR + issue history
That's 11 PRs/issues across 6 sinks, every one of them re-discovering some variant of "Qt's Proposed factory shapeSomething like // src/core/audio/AudioSinkNegotiator.h
namespace aether::audio {
struct NegotiatedSink {
QAudioFormat format; // actually accepted by the device
int actualRate = 0; // == format.sampleRate()
bool fallbackOccurred = false;
QStringList fallbackReasons; // human-readable, for AudioSummaryLogger
QStringList attemptedFormats; // ordered probe trail
};
enum class FormatPreference {
Int16First, // QSO recorder, Pudu monitor: source PCM is Int16
FloatFirst, // Quindar, AudioEngine RX: source is Float32 already
NativeFirst, // CW sidetone: prefer whatever the device wants
};
struct NegotiationRequest {
int desiredRateHz = 48000;
int channels = 2;
FormatPreference preference = FormatPreference::Int16First;
QList<int> rateCandidates; // empty => {desired, 48000, 44100, 24000}
QString pathLabel; // for the AudioSummaryLogger summary line
};
// Negotiates an output format, including preferredFormat() as the final
// candidate. Always logs the open-success/open-failure summary via
// AudioSummaryLogger so the path is inspectable from support bundles.
std::optional<NegotiatedSink>
negotiateOutput(const QAudioDevice& dev, const NegotiationRequest& req);
} // namespace aether::audioCallers shrink from ~25 lines of probe-ladder + summary-logger plumbing to: auto sink = aether::audio::negotiateOutput(dev, {
.desiredRateHz = kSampleRate,
.channels = kChannels,
.preference = aether::audio::FormatPreference::Int16First,
.pathLabel = QStringLiteral("Aetherial monitor playback"),
});
if (!sink) return; // factory already logged the open-failure summary…and the Int16→Float32 conversion-on-demand logic from #3231 belongs alongside it (a Recommended sequencing
The risk of doing it inside #3231 is that the refactor changes behaviour for CW sidetone and the main RX engine (different probe order, different log lines), and any regression there is a much louder failure than a silent Pudu monitor — keep the blast radius small. A maintainer can authorize the orchestrator to land #3231's fix by adding the 73, Jeremy KK7GWY & Claude (AI dev partner) 🤖 aethersdr-agent · cost: $4.1747 · model: claude-opus-4-7 |
There was a problem hiding this comment.
Thanks for the fix, @NF0T — the precedent referenced (QuindarLocalSink.cpp:61 and the preferredFormat() use in AudioEngine.cpp:306) is the right pattern to follow here, and the root-cause analysis (WASAPI shared mode running a Float32 mix on a hardware device that itself accepts Int16) lines up with what the diagnostics in DeviceDiagnostics.cpp already track.
A few small things worth a second look before merge:
1. Sample-format scope of the preferredFormat branch (src/core/ClientPuduMonitor.cpp:235-249 in the new code)
The conversion block only handles QAudioFormat::Float:
if (fmt.sampleFormat() == QAudioFormat::Float) { /* int16 → float */ }…but the acceptance check earlier accepts any format the device reports as supported:
if (preferred.isValid() && dev.isFormatSupported(preferred)) { fmt = preferred; ... }The PR comment explicitly notes Float is the only format observed in practice on WASAPI/CoreAudio. That's a reasonable assumption — but the code doesn't enforce it. If a device's preferredFormat() ever returns Int32, UInt8, etc., the Int16 payload will be sent to a sink expecting a different layout and you'll get garbage rather than a clean failure.
Suggestion: gate the preferred acceptance on sampleFormat() == Int16 || sampleFormat() == Float (Int16 needs no conversion, Float uses the existing branch), and otherwise fall through to the existing failure log. Single extra condition, removes a latent footgun.
2. Interaction with #3230 (the bail/mute-lift fix)
The new preferredFormat-rejection failure path (the new else branch that logs "output device supports neither Int16 stereo nor its own preferredFormat") is a 7th early-return path. Once #3230 lands with its bail() helper, this branch will also need to call it or it'll re-introduce the stranded-mute bug on this specific (rare) failure mode. Worth flagging in whichever PR merges second so it doesn't slip.
3. Minor: summary.fallbackReason already covers it
The post-success summary will include "48000Hz Int16 unsupported -> preferredFormat (…)" in the fallback reason and the negotiated sampleFormat field on the summary — so the path is already inspectable from a support bundle as the test plan expects. Nothing to change, just confirming it works.
Otherwise the conversion math (/ 32768.0f) matches preparePlaybackPcm()'s existing scaling at ClientPuduMonitor.cpp:150-151, the buffer-size arithmetic is correct (frames * kChannels * sizeof(float)), and the scope is appropriately tight — single file, single function, mirrors prior art.
🤖 aethersdr-agent · cost: $6.9063 · model: claude-opus-4-7
There was a problem hiding this comment.
Thanks @NF0T — the diagnosis (WASAPI shared mixer locking out Int16 despite hardware capability) is solid, and reusing the preferredFormat() pattern from QuindarLocalSink / AudioEngine is the right move. The Int16→Float32 conversion is straightforward and the m_playPcm = std::move(floatPcm) happens before m_playBuffer.setBuffer(&m_playPcm), so the QBuffer pointer is fine.
Two things worth tightening before merge:
1. Narrow what preferredFormat() is allowed to return
The post-resample conversion only handles QAudioFormat::Float:
if (fmt.sampleFormat() == QAudioFormat::Float) {
// Int16 → Float32 conversion
}…but the acceptance check above will accept anything isFormatSupported() confirms — Int32, UInt8, future formats, etc. If preferredFormat ever returns one of those (uncommon but not impossible on exotic CoreAudio aggregates or pro-audio drivers), the Int16 payload would be played through a sink expecting a different layout → garbage. The comment claims Float32 is the only realistic case, but the code doesn't enforce that claim. Suggest gating acceptance to Int16 or Float only:
const auto sf = preferred.sampleFormat();
const bool acceptable = (sf == QAudioFormat::Int16 || sf == QAudioFormat::Float);
if (preferred.isValid() && acceptable && dev.isFormatSupported(preferred)) {
...
}That keeps the diagnostic path (logOpenFailure) honest for the rare unsupported case and prevents silent corruption.
2. Mute lifting on the new bail path
The new else branch logs the failure and returns without emitting muteRxRequested(false) — same class of bug as #3230. If #3230 lands first you'll need to wire the new path through its bail() lambda; if this lands first the same fix needs to be repeated here. Worth a coordinating note across the two PRs or rebasing to use the lambda once both are aligned.
Otherwise looks good — minor enough that I'd be happy to see it land once #2 in particular is resolved (either via #3230 or inline here).
🤖 aethersdr-agent · cost: $5.1402 · model: claude-opus-4-7
…devices (aethersdr#3231) - Principle VIII. ClientPuduMonitor::startPlayback() accepted only Int16 stereo at 24 kHz or 48 kHz. On Windows WASAPI shared mode, Float32 mix pipelines can reject both Int16 candidates even when the device works through AudioEngine. When both Int16 candidates fail, query preferredFormat(), force stereo, and accept it if supported. Convert the prepared Int16 payload to Float32 when the sink negotiated Float32 so playback can start on WASAPI Float32 devices. The fallback is logged through AudioSummaryLogger, and the failure path preserves an inspectable reason if preferredFormat() is rejected. Fixes aethersdr#3229 Refs aethersdr#3222 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: jensenpat <patjensen@gmail.com>
a25aae5 to
7de3d5e
Compare
…devices (aethersdr#3231) - Principle VIII. ClientPuduMonitor::startPlayback() accepted only Int16 stereo at 24 kHz or 48 kHz. On Windows WASAPI shared mode, Float32 mix pipelines can reject both Int16 candidates even when the device works through AudioEngine. When both Int16 candidates fail, query preferredFormat(), force stereo, and accept it if supported. Convert the prepared Int16 payload to Float32 when the sink negotiated Float32 so playback can start on WASAPI Float32 devices. This keeps the existing bail() paths from PR 3230 so RX mute is still lifted on every playback-start failure. The fallback is logged through AudioSummaryLogger, and the failure path preserves an inspectable reason if preferredFormat() is rejected. Fixes aethersdr#3229 Refs aethersdr#3222 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: jensenpat <patjensen@gmail.com>
7de3d5e to
aaf85cf
Compare
…3397) ## What First, **pure-addition** step of the consolidated audio sink factory (#3306): a single shared, OS-parameterized format/rate **negotiation policy** that every audio sink and source will migrate onto, replacing the ~9 divergent per-sink fallback ladders and per-OS `#ifdef` branches that are the root of a recurring class of platform audio bugs. Zero behaviour change — this is the maintainer-endorsed *"land the helper first, migrate sinks incrementally"* approach (#3306, and the #3194 thread's "sensitive audio-stack changes must be separate PRs with soak time"). ## Why this shape The historical bugs escaped CI because the per-OS rate ladders were compiled behind `Q_OS_*`, so a Linux runner could only ever exercise the Linux path. The new policy is a **pure function over an injected `DeviceCaps` snapshot with `TargetOs` as a parameter, not an `#ifdef`** — so one headless, hardware-free test binary exercises the Windows, macOS *and* Linux ladders on any runner. ## Contents - **`src/core/AudioFormatNegotiator.{h,cpp}`** — dependency-free policy (links only `Qt6::Core`). One ladder owns: - Windows force-48k + r8brain (#2120/#2123), macOS 48k/A2DP (#1705), Linux native-24k — divergences preserved as **data**, not `#ifdef`. - The **universal 44.1 kHz rung** that RX/Quindar/QSO are missing today (#3385). - `Int16`↔`Float32` + `preferredFormat()` catch-all (#2669 / #1090 / #3231). - macOS mic preferred-rate-first (#2930) and Bluetooth-HFP native rate (#2615); Windows probe-at-open (#2929). - The two stereo resampler strategies — **PreservePan** (RX/QSO) vs **MonoCollapse** (TCI-TX/RADE) — kept deliberately distinct (#2403/#2459). - **`tests/audio_format_negotiation_test.cpp`** — table-driven golden matrix, registered under CTest. **25/25 pass**, including the 44.1k-only-device regression guard. - **`docs/audio-sink-factory.md`** — the full three-layer design (pure policy → live Qt wrapper → device-ownership router), the device-following **"uncoupling"** fix (CW / RADE / Aetherial-Audio "Pudu" recorder following the selected output), the regression-guard invariants, and the one-sink-per-PR migration plan. ## Test ``` $ ./build/audio_format_negotiation_test ... 25/25 checks passed ``` ## Follow-ups (per the migration plan in the doc) Live Qt wrapper → migrate RX speaker → migrate PC mic → `AudioOutputRouter` (closes the uncoupling class) → TCI-TX client-rate → PipeWire DAX shared resampler. Each a separate, soakable PR. Refs #3306 💻 Generated with Claude Code (Opus 4.8) with architecture by @jensenpat Co-authored-by: Codex <noreply@openai.com>
Summary
ClientPuduMonitor::startPlayback()accepted only Int16 stereo at 24 kHz or 48 kHz. On Windows 11 with WASAPI shared mode, the audio engine mixer runs internally in Float32. When the shared pipeline is in Float32 mode,IAudioClient::IsFormatSupported()(surfaced viaQAudioDevice::isFormatSupported()) rejects Int16 regardless of hardware capability — causingstartPlayback()to bail at format negotiation on any Windows machine whose WASAPI mix format is not Int16.Root cause confirmed: the same C-Media USB device opens successfully in
AudioEngineas Float32 stereo 48 kHz — the hardware is not the constraint, the WASAPI shared pipeline is. This is the same class of format-negotiation problem already solved inQuindarLocalSink.cpp:61andAudioEngine.cpp:306viadev.preferredFormat().Fix
When both Int16 @ 24 kHz and Int16 @ 48 kHz are rejected, query
dev.preferredFormat(), enforce stereo channel count, and accept it ifisFormatSupported()confirms it. AfterpreparePlaybackPcm()produces the resampled Int16 buffer, convert Int16 → Float32 in-place when the negotiated sink format is Float32 (trivial per-sample multiply; the r8brain resampler already handles any rate conversion).The accepted format and fallback reason are recorded in the
AudioSummaryLoggeropen-success summary, making the path inspectable from future support bundles.Files changed
src/core/ClientPuduMonitor.cpppreferredFormat()as third format candidate; add Int16→Float32 conversion post-resample when sink format is Float32Test plan
Audio Auxiliary sink summarylog line now appears for "Aetherial monitor playback" with fallback reason showingpreferredFormatpathConstitution
Principle VIII — Evidence Over Assertion: the WASAPI Float32 behavior is confirmed by the log in #3222 showing
format=FloatforAudioEngine's successful open of the same device thatClientPuduMonitorfails on. The fix followsQuindarLocalSink's establishedpreferredFormat()pattern rather than asserting a novel approach.Fixes #3229
Refs #3222
🤖 Generated with Claude Code