Update README.md by BrandonWeng · Pull Request #42 · FluidInference/FluidAudio

BrandonWeng · 2025-07-27T16:44:41Z

No description provided.

github-actions · 2025-07-27T16:48:15Z

🗣️ Speaker Diarization Benchmark Results

Speaker Diarization Performance

Evaluating "who spoke when" detection accuracy

Metric	Value	Target	Status	Description
DER	18.7%	<30%	✅	Diarization Error Rate (lower is better)
JER	22.6%	<25%	✅	Jaccard Error Rate
RTF	0.06x	<1.0x	✅	Real-Time Factor (lower is faster)

⏱️ Diarization Pipeline Timing Breakdown

Time spent in each stage of speaker diarization

Stage	Time (s)	%	Description
Model Download	0.000	0.0	Fetching diarization models
Model Compile	5.032	7.7	CoreML compilation
Audio Load	0.106	0.2	Loading audio file
Segmentation	14.588	22.2	Detecting speech regions
Embedding	46.001	70.0	Extracting speaker voices
Clustering	0.028	0.0	Grouping same speakers
Total	65.756	100	Full pipeline

📊 Speaker Diarization Research Comparison

Comparing against state-of-the-art diarization methods

Method	DER	Year	Notes
FluidAudio	18.7%	2025	On-device CoreML
Powerset BCE	18.5%	2023	Research baseline
EEND	25.3%	2019	End-to-end neural
x-vector clustering	28.7%	2018	Traditional approach

_{🎯 Speaker Diarization Test • AMI Corpus ES2004a • 1049.4s meeting audio • 60.6s diarization time • Test runtime: 1m 31s • 07/27/2025, 12:48 PM EST}

github-actions · 2025-07-27T16:54:29Z

VAD Benchmark Results

Performance Comparison

Metric	FluidAudio VAD	Industry Standard	Status
Accuracy	98.0%	85-90%	✅
Precision	96.2%	85-95%	✅
Recall	100.0%	80-90%	✅
F1-Score	98.0%	85.9% (Sohn's VAD)	✅
Processing Time	424.0s (100 files)	~1ms per 30ms chunk	✅

Industry Leaders:

Silero VAD: ~90-95% F1 (DNN-based, 1.8MB model)
WebRTC VAD: ~75-80% F1 (GMM-based, fast but lower accuracy)
Sohn's VAD: 77.5% F1 (traditional approach)
Modern DNNs: 85-97% F1 (varies by SNR conditions)

📊 Detailed Research Comparisons

Paper	Dataset	F1-Score	Method
Silero VAD (2021)	TEDx	88.1%	LSTM-based lightweight model
WebRTC VAD	MUSAN	64.4%	GMM-based (traditional)
pyannote.audio (2020)	AMI	85.9%	SincTDNN architecture
MarbleNet (2020)	AVA-Speech	87.8%	1D time-channel separable CNN
FluidAudio VAD	MUSAN-mini	98.0%	CoreML-optimized Silero

Note: Direct comparisons should consider dataset differences. MUSAN contains challenging noise conditions.

github-actions · 2025-07-27T17:26:29Z

ASR Benchmark Results

Dataset	WER Avg	WER Med	RTFx	Status
test-clean	4.44%	0.00%	1.54x	✅
test-other	8.26%	2.78%	1.47x	✅

_{500 files per dataset • Test runtime: 39m36s • 07/27/2025, 01:26 PM EST}

_{RTFx = Real-Time Factor (higher is better) • Calculated as: Total audio duration ÷ Total processing time
Processing time includes: Model inference on Apple Neural Engine, audio preprocessing, state resets between files, token-to-text conversion, and file I/O
Example: RTFx of 2.0x means 10 seconds of audio processed in 5 seconds (2x faster than real-time)}

Expected RTFx Performance on Physical M1 Hardware:

• M1 Mac: ~28x (clean), ~25x (other)
• CI shows ~0.5-3x due to virtualization limitations

_{Testing methodology follows HuggingFace Open ASR Leaderboard}

Update README.md

23de0cc

BrandonWeng merged commit 9f18a1b into main Jul 27, 2025
5 checks passed

BrandonWeng deleted the BrandonWeng-patch-1 branch July 27, 2025 16:44

Alex-Wengg pushed a commit that referenced this pull request Jan 1, 2026

Update README.md (#42)

6f210f2

SGD2718 pushed a commit that referenced this pull request Jan 4, 2026

Update README.md (#42)

6696809

Alex-Wengg pushed a commit that referenced this pull request Jan 5, 2026

Update README.md (#42)

3e9d2ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README.md#42

Update README.md#42
BrandonWeng merged 1 commit intomainfrom
BrandonWeng-patch-1

BrandonWeng commented Jul 27, 2025

Uh oh!

Uh oh!

github-actions Bot commented Jul 27, 2025

Uh oh!

github-actions Bot commented Jul 27, 2025

Uh oh!

github-actions Bot commented Jul 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BrandonWeng commented Jul 27, 2025

Uh oh!

Uh oh!

github-actions Bot commented Jul 27, 2025

🗣️ Speaker Diarization Benchmark Results

Speaker Diarization Performance

⏱️ Diarization Pipeline Timing Breakdown

📊 Speaker Diarization Research Comparison

Uh oh!

github-actions Bot commented Jul 27, 2025

VAD Benchmark Results

Performance Comparison

Uh oh!

github-actions Bot commented Jul 27, 2025

ASR Benchmark Results

Expected RTFx Performance on Physical M1 Hardware:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant