Split CLI and fix datasets by BrandonWeng · Pull Request #24 · FluidInference/FluidAudio

BrandonWeng · 2025-07-16T19:15:57Z

Split up the massivceCLI file and fix the benchmarking failure from previous PR. This also makes it so far when its below a threshold it fails.

github-actions · 2025-07-16T21:53:01Z

VAD Benchmark Results

Performance Comparison

Metric	FluidAudio VAD	Industry Standard	Status
Accuracy	98.0%	85-90%	✅
Precision	96.2%	85-95%	✅
Recall	100.0%	80-90%	✅
F1-Score	98.0%	85.9% (Sohn's VAD)	✅
Processing Time	634.3s (100 files)	~1ms per 30ms chunk	✅

Industry Leaders:

Silero VAD: ~90-95% F1 (DNN-based, 1.8MB model)
WebRTC VAD: ~75-80% F1 (GMM-based, fast but lower accuracy)
Sohn's VAD: 77.5% F1 (traditional approach)
Modern DNNs: 85-97% F1 (varies by SNR conditions)

github-actions · 2025-07-16T22:12:23Z

🎯 Single File Benchmark Results

Test File: ES2004a (1049.4s audio)
Overall Result: ✅ PASSED

📊 Accuracy Metrics

Metric	Value
DER (Diarization Error Rate)	18.7%
JER (Jaccard Error Rate)	22.6%
RTF (Real-Time Factor)	0.07x
Speakers Detected	4/4

⏱️ Performance Timing

Stage	Time (s)	% of Total
Model Download	2.334	3.1%
Model Compilation	0.935	1.2%
Audio Loading	0.136	0.2%
Segmentation	15.755	21.0%
Embedding Extraction	56.007	74.5%
Speaker Clustering	0.036	0.0%
Total Processing	75.204	100%

Inference Time: 71.799s (95.5% of total)
Setup Overhead: 3.269s (4.3% of total)

Research Comparison:

Powerset BCE (2023): 18.5% DER
EEND (2019): 25.3% DER
x-vector clustering: 28.7% DER

Alex-Wengg · 2025-07-16T22:20:01Z

+            print("   Debug mode: \(debugMode ? "enabled" : "disabled")")
+            print("   Auto-download: \(autoDownload ? "enabled" : "disabled")")
+            print("   VAD: \(disableVad ? "disabled" : "enabled")")
+            if iterations > 1 {


is this just an arbitrary number , seem like the 1 is to represent the min iterations for consistency.

https://github.com/FluidInference/FluidAudio/pull/24/files/f56b59d56c7e88ad45a95422340985e88fb9a279#diff-eafa60fcbd28ed77f777519cbbb96b9fe180ee35f906349e307cf45d22ec6ba4R61

Not arbitrary. One here just means that we're running it more than once so print out the iteration number

github-actions · 2025-07-16T23:29:06Z

🎯 Single File Benchmark Results

Test File: ES2004a (1049.4s audio)
Overall Result: ✅ PASSED

📊 Accuracy Metrics

Metric	Value
DER (Diarization Error Rate)	18.7%
JER (Jaccard Error Rate)	22.6%
RTF (Real-Time Factor)	0.04x
Speakers Detected	4/4

⏱️ Performance Timing

Stage	Time (s)	% of Total
Model Download	2.698	5.3%
Model Compilation	0.766	1.5%
Audio Loading	0.099	0.2%
Segmentation	11.471	22.7%
Embedding Extraction	35.578	70.3%
Speaker Clustering	0.025	0.0%
Total Processing	50.637	100%

Inference Time: 47.074s (93.0% of total)
Setup Overhead: 3.464s (6.8% of total)

Research Comparison:

Powerset BCE (2023): 18.5% DER
EEND (2019): 25.3% DER
x-vector clustering: 28.7% DER

github-actions · 2025-07-16T23:34:31Z

Single File Benchmark Results

Test File: ES2004a (1049.4s audio)
Overall Result: ✅

Accuracy Metrics

Metric	Value
DER (Diarization Error Rate)	18.7%
JER (Jaccard Error Rate)	22.6%
RTF (Real-Time Factor)	0.06x
Speakers Detected	4/4

⏱️ Performance Timing

Stage	Time (s)	% of Total
Model Download	3.179	5.0%
Model Compilation	0.934	1.5%
Audio Loading	0.091	0.1%
Segmentation	13.240	21.0%
Embedding Extraction	45.536	72.3%
Speaker Clustering	0.032	0.1%
Total Processing	63.011	100%

Inference Time: 58.807s (93.3% of total)
Setup Overhead: 4.113s (6.5% of total)

Research Comparison:

Powerset BCE (2023): 18.5% DER
EEND (2019): 25.3% DER
x-vector clustering: 28.7% DER

github-actions · 2025-07-16T23:45:38Z

VAD Benchmark Results

Performance Comparison

Metric	FluidAudio VAD	Industry Standard	Status
Accuracy	98.0%	85-90%	✅
Precision	96.2%	85-95%	✅
Recall	100.0%	80-90%	✅
F1-Score	98.0%	85.9% (Sohn's VAD)	✅
Processing Time	636.2s (100 files)	~1ms per 30ms chunk	✅

Industry Leaders:

Silero VAD: ~90-95% F1 (DNN-based, 1.8MB model)
WebRTC VAD: ~75-80% F1 (GMM-based, fast but lower accuracy)
Sohn's VAD: 77.5% F1 (traditional approach)
Modern DNNs: 85-97% F1 (varies by SNR conditions)

Split up the massivceCLI file and fix the benchmarking failure from previous PR. This also makes it so far when its below a threshold it fails.

BrandonWeng added 2 commits July 16, 2025 15:15

Split CLI and fix datasets

2bfb643

Fix

2988645

FluidInference deleted a comment from github-actions Bot Jul 16, 2025

BrandonWeng added 3 commits July 16, 2025 15:34

more fixeS

ec416ce

download

2aab9a5

download dataset

bfa08e5

BrandonWeng added invalid documentation Improvements or additions to documentation enhancement New feature or request and removed invalid labels Jul 16, 2025

FluidInference deleted a comment from github-actions Bot Jul 16, 2025

BrandonWeng added 3 commits July 16, 2025 16:02

Fail threshold is below some threshold

787cacb

Support PR

9d2025b

Comment

be8b4e5

FluidInference deleted a comment from github-actions Bot Jul 16, 2025

BrandonWeng added 4 commits July 16, 2025 17:14

Add comments

6c08a26

Remove duplicate outputs

41372b7

cleanup table

f3906da

clean up vad table too

d12f010

FluidInference deleted a comment from github-actions Bot Jul 16, 2025

BrandonWeng added 2 commits July 16, 2025 17:33

Remove

15ea9f0

concurrency setting for diarization

f56b59d

FluidInference deleted a comment from github-actions Bot Jul 16, 2025

Alex-Wengg reviewed Jul 16, 2025

View reviewed changes

Comment thread Sources/DiarizationCLI/BenchmarkRunner.swift Outdated

BrandonWeng commented Jul 16, 2025

View reviewed changes

Comment thread Sources/DiarizationCLI/BenchmarkRunner.swift Outdated

Apply suggestion from @BrandonWeng

97a7e84

BrandonWeng enabled auto-merge (squash) July 16, 2025 23:29

Update benchmark.yml

b332a94

BrandonWeng requested a review from Alex-Wengg July 16, 2025 23:31

FluidInference deleted a comment from github-actions Bot Jul 16, 2025

Alex-Wengg approved these changes Jul 17, 2025

View reviewed changes

BrandonWeng merged commit 64bd19a into main Jul 17, 2025
4 checks passed

BrandonWeng deleted the split-cli-2 branch July 17, 2025 00:03

Alex-Wengg pushed a commit that referenced this pull request Jan 1, 2026

Split CLI and fix datasets (#24)

5aebdc9

Split up the massivceCLI file and fix the benchmarking failure from previous PR. This also makes it so far when its below a threshold it fails.

SGD2718 pushed a commit that referenced this pull request Jan 4, 2026

Split CLI and fix datasets (#24)

a577b50

Split up the massivceCLI file and fix the benchmarking failure from previous PR. This also makes it so far when its below a threshold it fails.

Alex-Wengg pushed a commit that referenced this pull request Jan 5, 2026

Split CLI and fix datasets (#24)

fb11090

Split up the massivceCLI file and fix the benchmarking failure from previous PR. This also makes it so far when its below a threshold it fails.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split CLI and fix datasets#24

Split CLI and fix datasets#24
BrandonWeng merged 17 commits intomainfrom
split-cli-2

BrandonWeng commented Jul 16, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 16, 2025

Uh oh!

github-actions Bot commented Jul 16, 2025

Uh oh!

Alex-Wengg Jul 16, 2025

Uh oh!

BrandonWeng Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jul 16, 2025

Uh oh!

github-actions Bot commented Jul 16, 2025

Uh oh!

github-actions Bot commented Jul 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BrandonWeng commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jul 16, 2025

VAD Benchmark Results

Performance Comparison

Uh oh!

github-actions Bot commented Jul 16, 2025

🎯 Single File Benchmark Results

📊 Accuracy Metrics

⏱️ Performance Timing

Uh oh!

Alex-Wengg Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

BrandonWeng Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jul 16, 2025

🎯 Single File Benchmark Results

📊 Accuracy Metrics

⏱️ Performance Timing

Uh oh!

github-actions Bot commented Jul 16, 2025

Single File Benchmark Results

Accuracy Metrics

⏱️ Performance Timing

Uh oh!

github-actions Bot commented Jul 16, 2025

VAD Benchmark Results

Performance Comparison

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BrandonWeng commented Jul 16, 2025 •

edited

Loading