feat(diarizer): add opt-in embedding skip strategy for offline pipeline by adamsro · Pull Request #480 · FluidInference/FluidAudio

adamsro · 2026-04-03T23:17:48Z

Why is this change needed?

This PR adds an opt-in EmbeddingSkipStrategy to the offline diarization pipeline. When consecutive segmentation windows produce highly similar speaker masks, the embedding model call is skipped and the previously computed embedding is reused.

At the current default config (stepRatio=0.20), this has minimal effect — windows don't overlap enough to produce significant redundancy. The feature becomes valuable at higher-overlap configurations (e.g., stepRatio=0.15) where it recovers the extra embedding cost with zero quality loss.

What changed

New EmbeddingSkipStrategy enum on OfflineDiarizerConfig.Embedding (.none default, .maskSimilarity(threshold:))
Convenience setter embeddingSkipStrategy on OfflineDiarizerConfig
skipStrategy parameter added to the flat initializer with .none default (backward compatible)
Skip logic in OfflineEmbeddingExtractor with cache clearing between FBANK batches
maskCosineSimilarity helper using existing VDSPOperations.dotProduct
Skip count in profiling log when active

Design decisions

Cache-pinned comparison, not rolling: The similarity check compares against the mask that produced the cached embedding, not the most recent mask. This prevents drift accumulation — if masks M1→M2→M3 each differ by 5%, M3 vs M1 could differ by 15%, but a rolling comparison would always pass.

Cache cleared between FBANK batches: Speaker indices are local to each powerset chunk (0, 1, 2), not global IDs. Within a batch, consecutive overlapping windows share audio so the ordering is stable. Across batch boundaries, speaker assignments may change.

Recommended threshold: 0.95 based on cross-corpus benchmarking (VoxConverse, SCOTUS oral arguments, Earnings-21 calls).

Benchmarks

All benchmarks on Apple M1 Max, macOS 26.5, 4 files across 3 corpora.

At default config (`stepRatio=0.20`, `excludeOverlap=true`)

File	Duration	Speakers	Baseline	Skip-95	Speedup
sbrmv (VoxConverse)	3 min	3	2.6s	2.6s	1.0x
duvox (VoxConverse)	16 min	6	13.8s	13.7s	1.0x
22-842 (SCOTUS)	74 min	12	92.6s	92.7s	1.0x
4320211 (Earnings-21)	55 min	10	59.6s	58.4s	1.0x

Quality: identical SAA/DER on all files. No effect at default overlap.

At higher-overlap config (`stepRatio=0.15`, `excludeOverlap=false`)

Embedding model time only:

File	Duration	No skip	Skip-95	Skipped	Speedup
sbrmv	3 min	2,527ms	1,756ms	116/378 (31%)	1.44x
duvox	16 min	13,691ms	7,662ms	816/1983 (41%)	1.79x
22-842	74 min	58,057ms	25,355ms	5102/8934 (57%)	2.29x
4320211	55 min	43,120ms	37,131ms	793/6573 (12%)	1.16x

Quality (DER scored with pyannote.metrics, collar=0.25s):

File	No skip SAA	Skip-95 SAA	Delta
sbrmv	87.4%	87.4%	0pp
duvox	96.9%	96.9%	0pp
22-842	96.1%	96.1%	0pp
4320211	94.0%	94.0%	0pp

Zero quality loss across all files. Skip rate scales with audio stability — long monologues (SCOTUS) skip 57%, frequent speaker changes (Earnings) skip 12%.

Add EmbeddingSkipStrategy to OfflineDiarizerConfig that skips redundant speaker embedding model calls when consecutive segmentation windows have highly similar speaker masks. At the default config (stepRatio=0.20) this has minimal effect. At higher-overlap configs (e.g., stepRatio=0.15) it provides 1.4-2.3x embedding speedup with zero quality loss.

devin-ai-integration

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-04-03T23:23:40Z

🟡 Missing validation for EmbeddingSkipStrategy threshold in validate()

The validate() method at Sources/FluidAudio/Diarizer/Offline/Core/OfflineDiarizerTypes.swift:299 validates every other config parameter (clustering threshold, step ratio, batch size, onset/offset thresholds, etc.) but does not validate the new embedding.skipStrategy. A .maskSimilarity(threshold:) with a NaN, negative, or > 1.0 value will pass validation uncaught. This is called by the pipeline at Sources/FluidAudio/Diarizer/Offline/Core/OfflineDiarizerManager.swift:120 before processing begins. A NaN threshold would cause maskCosineSimilarity(...) >= threshold to always evaluate false (harmless but surprising), while a negative threshold would cause every mask comparison to hit the cache (skipping nearly all embeddings, severely degrading diarization quality).

(Refers to lines 388-389)

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-03T23:23:41Z

+    public enum EmbeddingSkipStrategy: Sendable {
+        /// No skipping — extract every embedding (default).
+        case none
+        /// Skip if the speaker mask has cosine similarity ≥ threshold compared to the mask
+        /// that produced the currently cached embedding for this speaker. Prevents drift by
+        /// always comparing against the mask that generated the cached embedding, not a
+        /// rolling previous mask.
+        ///
+        /// Recommended threshold: 0.95 (≤1pp DER cost across VoxConverse/SCOTUS/Earnings-21).
+        case maskSimilarity(threshold: Float)
+    }


🔴 No unit tests added for new EmbeddingSkipStrategy feature (AGENTS.md rule violation)

AGENTS.md mandates "Add unit tests when writing new code." This PR introduces a new public enum EmbeddingSkipStrategy, a new config field Embedding.skipStrategy, a new convenience accessor embeddingSkipStrategy, and non-trivial caching logic with maskCosineSimilarity — but no test files are included in the change. There is an existing Tests/FluidAudioTests/Diarizer/Offline/OfflineConfigTests.swift that tests other config parameters, making this a natural place to add coverage for the new strategy (e.g., config round-trip, validation of threshold bounds, default value).

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration bot reviewed Apr 3, 2026

View reviewed changes

Alex-Wengg merged commit fe4b4df into FluidInference:main Apr 4, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(diarizer): add opt-in embedding skip strategy for offline pipeline#480

feat(diarizer): add opt-in embedding skip strategy for offline pipeline#480
Alex-Wengg merged 1 commit intoFluidInference:mainfrom
MimicScribe:feat/embedding-skip-strategy

adamsro commented Apr 3, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Apr 3, 2026

Uh oh!

devin-ai-integration bot Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adamsro commented Apr 3, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why is this change needed?

What changed

Design decisions

Benchmarks

At default config (stepRatio=0.20, excludeOverlap=true)

At higher-overlap config (stepRatio=0.15, excludeOverlap=false)

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adamsro commented Apr 3, 2026 •

edited by devin-ai-integration bot

Loading

At default config (`stepRatio=0.20`, `excludeOverlap=true`)

At higher-overlap config (`stepRatio=0.15`, `excludeOverlap=false`)