Fix KokoroTtsManager.initialize() hang on iOS#418
Conversation
|
Claude finished @Alex-Wengg's task —— View job PR Review: Fix KokoroTtsManager.initialize() hang on iOSTodo List:
Review Complete ✅I've thoroughly reviewed this PR and submitted a comprehensive code review. Here's my assessment: Summary: This is an excellent fix that properly addresses the iOS ANE compilation hang with a clean timeout implementation. Key Strengths:
Minor Suggestions:
Verdict: The implementation demonstrates solid understanding of Swift concurrency patterns and effectively solves the ANE compilation hang issue without compromising functionality. Ready for merge. The review has been submitted as a non-blocking comment review on the PR. |
aa275ec to
7e0c3dd
Compare
Speaker Diarization Benchmark ResultsSpeaker Diarization PerformanceEvaluating "who spoke when" detection accuracy
Diarization Pipeline Timing BreakdownTime spent in each stage of speaker diarization
Speaker Diarization Research ComparisonResearch baselines typically achieve 18-30% DER on standard datasets
Note: RTFx shown above is from GitHub Actions runner. On Apple Silicon with ANE:
🎯 Speaker Diarization Test • AMI Corpus ES2004a • 1049.0s meeting audio • 57.0s diarization time • Test runtime: 5m 29s • 03/24/2026, 02:34 PM EST |
There was a problem hiding this comment.
Code Review Summary
This PR effectively addresses the iOS initialization hang issue with a clean, well-architected solution. The timeout implementation is solid and follows Swift best practices.
Strengths ✅
Clean Timeout Implementation
- The
withTimeouthelper usingTaskGroupis the correct approach for Swift async/await - Proper cancellation of both tasks when one completes
- Good use of
@Sendableconstraints for thread safety
Graceful Error Handling
- Timeout doesn't crash the app - continues initialization with warning
- Clear, actionable log message explaining what happened
- First synthesis will still work (just may be slower)
Follows Repository Guidelines
- No
@unchecked Sendableusage ✅ - Uses
AppLoggerfor proper logging ✅ - Maintains thread safety with actors/proper concurrency ✅
Minor Suggestions 💡
Line 179-180: Consider making timeout configurable
private static func warmUpModel(_ model: MLModel, variant: ModelNames.TTS.Variant, timeoutSeconds: TimeInterval = 60) asyncDifferent devices might benefit from different timeout values.
Lines 195-213: The withTimeout implementation is excellent and reusable
Consider moving this to a shared utility module (like Sources/FluidAudio/Shared/AsyncUtils.swift) if timeout functionality might be needed elsewhere in the codebase.
Line 206: The timeout error could include more context:
throw TimeoutError(operation: "model warm-up", timeoutSeconds: seconds)Architecture Assessment 🏗️
Problem-Solution Fit: Perfect match - ANE compilation hangs are exactly what timeouts solve
API Consistency: No breaking changes, maintains existing public interface
Performance: Minimal overhead - timeout only runs during initialization, not per-synthesis
Error Recovery: Excellent - degrades gracefully rather than failing completely
Security & Reliability 🔒
- No security concerns identified
- Proper resource cleanup with
group.cancelAll() - No force unwrapping or unsafe operations
- Memory management looks correct for MLMultiArray usage
Verdict: Approve ✅
This is a well-crafted fix that solves a real problem without introducing technical debt. The implementation demonstrates good understanding of Swift concurrency patterns and follows the repository's engineering standards.
The 60-second timeout is reasonable for model initialization, and the graceful fallback ensures the app remains functional even when ANE compilation is problematic.
Qwen3-ASR int8 Smoke Test ✅
Runtime: 3m17s Note: CI VM lacks physical GPU — CoreML MLState (macOS 15) KV cache produces degraded results on virtualized runners. On Apple Silicon: ~1.3% WER / 2.5x RTFx. |
Parakeet EOU Benchmark Results ✅Status: Benchmark passed Performance Metrics
Streaming Metrics
Test runtime: 0m28s • 03/24/2026, 02:26 PM EST RTFx = Real-Time Factor (higher is better) • Processing includes: Model inference, audio preprocessing, state management, and file I/O |
PocketTTS Smoke Test ✅
Runtime: 0m46s Note: PocketTTS uses CoreML MLState (macOS 15) KV cache + Mimi streaming state. CI VM lacks physical GPU — audio quality may differ from Apple Silicon. |
ASR Benchmark Results ✅Status: All benchmarks passed Parakeet v3 (multilingual)
Parakeet v2 (English-optimized)
Streaming (v3)
Streaming (v2)
Streaming tests use 5 files with 0.5s chunks to simulate real-time audio streaming 25 files per dataset • Test runtime: 9m35s • 03/24/2026, 02:31 PM EST RTFx = Real-Time Factor (higher is better) • Calculated as: Total audio duration ÷ Total processing time Expected RTFx Performance on Physical M1 Hardware:• M1 Mac: ~28x (clean), ~25x (other) Testing methodology follows HuggingFace Open ASR Leaderboard |
Sortformer High-Latency Benchmark ResultsES2004a Performance (30.4s latency config)
Sortformer High-Latency • ES2004a • Runtime: 5m 19s • 2026-03-24T18:44:04.600Z |
VAD Benchmark ResultsPerformance Comparison
Dataset Details
✅: Average F1-Score above 70% |
Offline VBx Pipeline ResultsSpeaker Diarization Performance (VBx Batch Mode)Optimal clustering with Hungarian algorithm for maximum accuracy
Offline VBx Pipeline Timing BreakdownTime spent in each stage of batch diarization
Speaker Diarization Research ComparisonOffline VBx achieves competitive accuracy with batch processing
Pipeline Details:
🎯 Offline VBx Test • AMI Corpus ES2004a • 1049.0s meeting audio • 254.3s processing • Test runtime: 4m 26s • 03/24/2026, 02:25 PM EST |
8ec040d to
62c88fb
Compare
Fixes #417 - KokoroTtsManager.initialize() hanging indefinitely on iOS. Root Cause: The hang is caused by v2 models specifically, not warm-up itself. The v2 models (kokoro_21_5s_v2, kokoro_21_15s_v2) were introduced 3 days ago with: - fp16 precision for ANE optimization (commit 2ae0846, Mar 21) - source_noise input requirement (commit 4b03d1f, Mar 22) On iOS, ANE compilation with fp16 models + large source_noise tensor (up to 6.48 MB for 15s model) causes model.prediction() to hang indefinitely during warm-up. The v1 models work fine on iOS. Solution: Use v1 models on iOS, v2 models on macOS: - iOS: kokoro_21_5s.mlmodelc, kokoro_21_15s.mlmodelc (fp32, no source_noise) - macOS: kokoro_21_5s_v2.mlmodelc, kokoro_21_15s_v2.mlmodelc (fp16, 1.67x faster) Platform-specific model selection in ModelNames.swift: #if os(iOS) return "kokoro_21_5s.mlmodelc" // v1 #else return "kokoro_21_5s_v2.mlmodelc" // v2 #endif Conditional source_noise input: - Check model.modelDescription.inputDescriptionsByName["source_noise"] - Only add source_noise if model expects it - v1 models don't have this input, v2 models require it Impact: - iOS: initialize() returns immediately, no hang, warm-up works - macOS: No change, keeps v2 performance benefits - iOS loses v2 speed boost, but gains stability
62c88fb to
1d42321
Compare
Summary
Fixes #417 -
KokoroTtsManager.initialize()hanging indefinitely on iOS.Root Cause
The hang occurs during model warm-up in
TtsModels.download():3826150, Mar 20): Nosource_noiseinput, warm-up works fine2ae0846(Mar 21): Switched to fp16 models for ANE optimization4b03d1f(Mar 22): Addedsource_noiseinput requirementThe warm-up creates a massive source_noise tensor:
[1, 120000, 9]= ~2.16 MB of random Float16 values[1, 360000, 9]= ~6.48 MB of random Float16 valuesOn iOS, ANE compilation with fp16 models + this large random tensor causes
model.prediction()to hang indefinitely.Solution
Skip warm-up entirely on iOS using
#if os(macOS)guards:Changes
Impact
initialize()returns immediately ✅ (no hang)Test Plan