Refactor: Rename Repo.parakeetCtcJa to Repo.parakeetJa for accuracy#520
Refactor: Rename Repo.parakeetCtcJa to Repo.parakeetJa for accuracy#520Alex-Wengg merged 2 commits intomainfrom
Conversation
The HuggingFace repo 'parakeet-ctc-0.6b-ja-coreml' contains BOTH CTC and TDT models, so calling it 'parakeetCtcJa' is misleading. Renamed to 'parakeetJa' to accurately reflect that it's the Japanese models repository containing both decoder variants. Repository contents verified: - CtcDecoder.mlmodelc (CTC) - Decoderv2.mlmodelc (TDT) - Jointerv2.mlmodelc (TDT) - Preprocessor.mlmodelc - Encoder.mlmodelc Changes: - ModelNames.swift: Renamed Repo.parakeetCtcJa → Repo.parakeetJa - AsrModels.swift: Updated .ctcJa and .tdtJa to use .parakeetJa - CtcJaModels.swift: Updated repository reference - TdtJaModels.swift: Updated repository reference
The HuggingFace repo 'parakeet-tdt-0.6b-ja-coreml' doesn't exist (404). Both CTC and TDT Japanese models are in 'parakeet-ctc-0.6b-ja-coreml' which is now correctly referenced as Repo.parakeetJa. Removed all references to the non-existent parakeetTdtJa: - Enum case definition - folderName property case - shortName property case - getRequiredModelNames() case
VAD Benchmark ResultsPerformance Comparison
Dataset Details
✅: Average F1-Score above 70% |
Updated: Removed Dead CodeAdded a second commit that removes the non-existent Since the HuggingFace repo Changes in second commit:
✅ Build still succeeds |
| case .parakeetJa: | ||
| return ModelNames.CTCJa.requiredModels |
There was a problem hiding this comment.
🔴 getRequiredModelNames for .parakeetJa only returns CTC models, breaking TDT Japanese model download and loading
After merging .parakeetCtcJa and .parakeetTdtJa into a single .parakeetJa repo, the getRequiredModelNames function at Sources/FluidAudio/ModelNames.swift:675-676 only returns ModelNames.CTCJa.requiredModels (Preprocessor.mlmodelc, Encoder.mlmodelc, CtcDecoder.mlmodelc). The TDT-specific models (Decoderv2.mlmodelc, Jointerv2.mlmodelc from ModelNames.TDTJa.requiredModels) are never included.
This causes two failures when TdtJaModels.downloadAndLoad() is called (via ParakeetLanguageModels<TdtJaConfig>):
-
Download is skipped when CTC models are cached:
DownloadUtils.loadModelsOnceatSources/FluidAudio/DownloadUtils.swift:191-195checksgetRequiredModelNames(.parakeetJa)(only CTC models) to decide if download is needed. If CTC models exist, it skips download even thoughDecoderv2.mlmodelcandJointerv2.mlmodelcare missing. -
Download omits TDT models on fresh install:
DownloadUtils.downloadRepoatSources/FluidAudio/DownloadUtils.swift:279-290uses the samegetRequiredModelNamesto build download patterns. Only CTC model patterns are generated, so TDT model files are never fetched from HuggingFace.
In both scenarios, loading fails with a file-not-found error when the loop at DownloadUtils.swift:211-213 tries to access Decoderv2.mlmodelc.
Prompt for agents
In Sources/FluidAudio/ModelNames.swift, the getRequiredModelNames function for .parakeetJa only returns ModelNames.CTCJa.requiredModels but the repo now contains both CTC and TDT models. When TdtJaModels tries to download or load via DownloadUtils.loadModelsOnce, the cache check and download patterns are based on getRequiredModelNames which only knows about CTC models.
The fix should ensure that getRequiredModelNames for .parakeetJa returns the union of both CTCJa.requiredModels and TDTJa.requiredModels, so that both model sets are downloaded and their existence is properly verified. For example:
case .parakeetJa:
return ModelNames.CTCJa.requiredModels.union(ModelNames.TDTJa.requiredModels)
This ensures DownloadUtils.downloadRepo fetches all model files (including Decoderv2.mlmodelc and Jointerv2.mlmodelc) and that the cache-existence check in loadModelsOnce correctly detects when TDT models are missing.
Was this helpful? React with 👍 or 👎 to provide feedback.
Sortformer High-Latency Benchmark ResultsES2004a Performance (30.4s latency config)
Sortformer High-Latency • ES2004a • Runtime: 2m 42s • 2026-04-12T04:29:08.261Z |
Parakeet EOU Benchmark Results ✅Status: Benchmark passed Performance Metrics
Streaming Metrics
Test runtime: 1m23s • 04/12/2026, 12:42 AM EST RTFx = Real-Time Factor (higher is better) • Processing includes: Model inference, audio preprocessing, state management, and file I/O |
ASR Benchmark Results ✅Status: All benchmarks passed Parakeet v3 (multilingual)
Parakeet v2 (English-optimized)
Streaming (v3)
Streaming (v2)
Streaming tests use 5 files with 0.5s chunks to simulate real-time audio streaming 25 files per dataset • Test runtime: 5m36s • 04/12/2026, 01:19 AM EST RTFx = Real-Time Factor (higher is better) • Calculated as: Total audio duration ÷ Total processing time Expected RTFx Performance on Physical M1 Hardware:• M1 Mac: ~28x (clean), ~25x (other) Testing methodology follows HuggingFace Open ASR Leaderboard |
Qwen3-ASR int8 Smoke Test ✅
Performance Metrics
Runtime: 5m21s Note: CI VM lacks physical GPU — CoreML MLState (macOS 15) KV cache produces degraded results on virtualized runners. On Apple Silicon: ~1.3% WER / 2.5x RTFx. |
Speaker Diarization Benchmark ResultsSpeaker Diarization PerformanceEvaluating "who spoke when" detection accuracy
Diarization Pipeline Timing BreakdownTime spent in each stage of speaker diarization
Speaker Diarization Research ComparisonResearch baselines typically achieve 18-30% DER on standard datasets
Note: RTFx shown above is from GitHub Actions runner. On Apple Silicon with ANE:
🎯 Speaker Diarization Test • AMI Corpus ES2004a • 1049.0s meeting audio • 46.7s diarization time • Test runtime: 3m 3s • 04/12/2026, 12:52 AM EST |
Kokoro TTS Smoke Test ✅
Runtime: 0m57s Note: Kokoro TTS uses CoreML flow matching + Vocos vocoder. CI VM lacks physical ANE — performance may differ from Apple Silicon. |
PocketTTS Smoke Test ✅
Runtime: 0m34s Note: PocketTTS uses CoreML MLState (macOS 15) KV cache + Mimi streaming state. CI VM lacks physical GPU — audio quality and performance may differ from Apple Silicon. |
✅ Japanese ASR Benchmark Results (CTC)Status: Passed
✅ Benchmark completed successfully. The TDT Japanese hybrid model (CTC preprocessor/encoder + TDT decoder/joint) is working correctly. View benchmark log |
1 similar comment
✅ Japanese ASR Benchmark Results (CTC)Status: Passed
✅ Benchmark completed successfully. The TDT Japanese hybrid model (CTC preprocessor/encoder + TDT decoder/joint) is working correctly. View benchmark log |
Offline VBx Pipeline ResultsSpeaker Diarization Performance (VBx Batch Mode)Optimal clustering with Hungarian algorithm for maximum accuracy
Offline VBx Pipeline Timing BreakdownTime spent in each stage of batch diarization
Speaker Diarization Research ComparisonOffline VBx achieves competitive accuracy with batch processing
Pipeline Details:
🎯 Offline VBx Test • AMI Corpus ES2004a • 1049.0s meeting audio • 219.2s processing • Test runtime: 3m 42s • 04/12/2026, 01:13 AM EST |
✅ Fixed: TDT Japanese Model Downloads Now Work (Issue #517)@Josscii found that TDT Japanese models weren't downloading correctly because The Problem:
Missing the TDT-specific files:
The Fix (Commit 3): case .parakeetJa:
// Repo contains BOTH CTC and TDT models - return union of both sets
return ModelNames.CTCJa.requiredModels.union(ModelNames.TDTJa.requiredModels)Now downloads all 5 models:
Fixes #517 |
Problem
The enum name
Repo.parakeetCtcJais misleading because it implies the repository only contains CTC models, but it actually contains both CTC and TDT models.Verified Repository Contents
FluidInference/parakeet-ctc-0.6b-ja-coremlcontains:CtcDecoder.mlmodelcDecoderv2.mlmodelc+Jointerv2.mlmodelcPreprocessor.mlmodelc,Encoder.mlmodelc,vocab.jsonSolution
Renamed
Repo.parakeetCtcJa→Repo.parakeetJato accurately reflect that it's the Japanese models repository containing both decoder variants.Changes
.parakeetCtcJato.parakeetJa.ctcJaand.tdtJato use.parakeetJaTesting
Related