Skip to content

feat(diarizer): add opt-in embedding skip strategy for offline pipeline#480

Merged
Alex-Wengg merged 1 commit intoFluidInference:mainfrom
MimicScribe:feat/embedding-skip-strategy
Apr 4, 2026
Merged

feat(diarizer): add opt-in embedding skip strategy for offline pipeline#480
Alex-Wengg merged 1 commit intoFluidInference:mainfrom
MimicScribe:feat/embedding-skip-strategy

Conversation

@adamsro
Copy link
Copy Markdown
Contributor

@adamsro adamsro commented Apr 3, 2026

Why is this change needed?

This PR adds an opt-in EmbeddingSkipStrategy to the offline diarization pipeline. When consecutive segmentation windows produce highly similar speaker masks, the embedding model call is skipped and the previously computed embedding is reused.

At the current default config (stepRatio=0.20), this has minimal effect — windows don't overlap enough to produce significant redundancy. The feature becomes valuable at higher-overlap configurations (e.g., stepRatio=0.15) where it recovers the extra embedding cost with zero quality loss.

What changed

  • New EmbeddingSkipStrategy enum on OfflineDiarizerConfig.Embedding (.none default, .maskSimilarity(threshold:))
  • Convenience setter embeddingSkipStrategy on OfflineDiarizerConfig
  • skipStrategy parameter added to the flat initializer with .none default (backward compatible)
  • Skip logic in OfflineEmbeddingExtractor with cache clearing between FBANK batches
  • maskCosineSimilarity helper using existing VDSPOperations.dotProduct
  • Skip count in profiling log when active

Design decisions

Cache-pinned comparison, not rolling: The similarity check compares against the mask that produced the cached embedding, not the most recent mask. This prevents drift accumulation — if masks M1→M2→M3 each differ by 5%, M3 vs M1 could differ by 15%, but a rolling comparison would always pass.

Cache cleared between FBANK batches: Speaker indices are local to each powerset chunk (0, 1, 2), not global IDs. Within a batch, consecutive overlapping windows share audio so the ordering is stable. Across batch boundaries, speaker assignments may change.

Recommended threshold: 0.95 based on cross-corpus benchmarking (VoxConverse, SCOTUS oral arguments, Earnings-21 calls).

Benchmarks

All benchmarks on Apple M1 Max, macOS 26.5, 4 files across 3 corpora.

At default config (stepRatio=0.20, excludeOverlap=true)

File Duration Speakers Baseline Skip-95 Speedup
sbrmv (VoxConverse) 3 min 3 2.6s 2.6s 1.0x
duvox (VoxConverse) 16 min 6 13.8s 13.7s 1.0x
22-842 (SCOTUS) 74 min 12 92.6s 92.7s 1.0x
4320211 (Earnings-21) 55 min 10 59.6s 58.4s 1.0x

Quality: identical SAA/DER on all files. No effect at default overlap.

At higher-overlap config (stepRatio=0.15, excludeOverlap=false)

Embedding model time only:

File Duration No skip Skip-95 Skipped Speedup
sbrmv 3 min 2,527ms 1,756ms 116/378 (31%) 1.44x
duvox 16 min 13,691ms 7,662ms 816/1983 (41%) 1.79x
22-842 74 min 58,057ms 25,355ms 5102/8934 (57%) 2.29x
4320211 55 min 43,120ms 37,131ms 793/6573 (12%) 1.16x

Quality (DER scored with pyannote.metrics, collar=0.25s):

File No skip SAA Skip-95 SAA Delta
sbrmv 87.4% 87.4% 0pp
duvox 96.9% 96.9% 0pp
22-842 96.1% 96.1% 0pp
4320211 94.0% 94.0% 0pp

Zero quality loss across all files. Skip rate scales with audio stability — long monologues (SCOTUS) skip 57%, frequent speaker changes (Earnings) skip 12%.


Open with Devin

Add EmbeddingSkipStrategy to OfflineDiarizerConfig that skips redundant
speaker embedding model calls when consecutive segmentation windows have
highly similar speaker masks.

At the default config (stepRatio=0.20) this has minimal effect. At
higher-overlap configs (e.g., stepRatio=0.15) it provides 1.4-2.3x
embedding speedup with zero quality loss.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Missing validation for EmbeddingSkipStrategy threshold in validate()

The validate() method at Sources/FluidAudio/Diarizer/Offline/Core/OfflineDiarizerTypes.swift:299 validates every other config parameter (clustering threshold, step ratio, batch size, onset/offset thresholds, etc.) but does not validate the new embedding.skipStrategy. A .maskSimilarity(threshold:) with a NaN, negative, or > 1.0 value will pass validation uncaught. This is called by the pipeline at Sources/FluidAudio/Diarizer/Offline/Core/OfflineDiarizerManager.swift:120 before processing begins. A NaN threshold would cause maskCosineSimilarity(...) >= threshold to always evaluate false (harmless but surprising), while a negative threshold would cause every mask comparison to hit the cache (skipping nearly all embeddings, severely degrading diarization quality).

(Refers to lines 388-389)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +82 to +92
public enum EmbeddingSkipStrategy: Sendable {
/// No skipping — extract every embedding (default).
case none
/// Skip if the speaker mask has cosine similarity ≥ threshold compared to the mask
/// that produced the currently cached embedding for this speaker. Prevents drift by
/// always comparing against the mask that generated the cached embedding, not a
/// rolling previous mask.
///
/// Recommended threshold: 0.95 (≤1pp DER cost across VoxConverse/SCOTUS/Earnings-21).
case maskSimilarity(threshold: Float)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 No unit tests added for new EmbeddingSkipStrategy feature (AGENTS.md rule violation)

AGENTS.md mandates "Add unit tests when writing new code." This PR introduces a new public enum EmbeddingSkipStrategy, a new config field Embedding.skipStrategy, a new convenience accessor embeddingSkipStrategy, and non-trivial caching logic with maskCosineSimilarity — but no test files are included in the change. There is an existing Tests/FluidAudioTests/Diarizer/Offline/OfflineConfigTests.swift that tests other config parameters, making this a natural place to add coverage for the new strategy (e.g., config round-trip, validation of threshold bounds, default value).

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@Alex-Wengg Alex-Wengg merged commit fe4b4df into FluidInference:main Apr 4, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants