-
Notifications
You must be signed in to change notification settings - Fork 167
fix: macOS 26 Sortformer compatibility - use V2 models #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Claude finished @Alex-Wengg's task —— View job PR Review Complete ✅Reviewed PR #266: fix: macOS 26 Sortformer compatibility - use V2 models Todo List:
Review SummaryThis PR successfully addresses the macOS 26 BNNS compiler error by updating to V2 models with distinct input/output tensor names and adding Float16 support. The changes are well-implemented with proper error handling and backward compatibility. Key Findings:
Minor suggestions: Better documentation of Float16 requirements and more descriptive error messages, but these are non-blocking improvements. The review has been submitted as a non-blocking comment to provide feedback while allowing the PR to proceed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review Summary
This PR successfully addresses the macOS 26 BNNS compiler error by updating to V2 models with distinct input/output tensor names and adding Float16 support. The changes are well-implemented and follow good practices.
Positive Aspects
- Clear Problem Resolution: The fix directly addresses the BNNS compiler error where input/output tensors shared the same name
- Backward Compatibility: Graceful fallback from Float16 to Float32 with proper availability checks
- Good Documentation: Clear comments explaining the reasoning behind the
_outsuffix - Minimal Surface Area: Only necessary changes were made, keeping the diff focused
Minor Suggestions
- Float16 Availability Comment: Consider adding a comment explaining why macOS 15.0+/iOS 18.0+ is required for Float16 CoreML support at line 253
- Error Message Clarity: The error message "Missing chunk_pre_encoder_embs_out" could be more descriptive about the Float16/Float32 expectation
Technical Assessment
- Thread Safety: ✅ No concurrency issues introduced
- Memory Management: ✅ Proper use of existing memory optimization patterns
- Error Handling: ✅ Appropriate error handling with meaningful messages
- Performance: ✅ No performance regressions, maintains efficient tensor access
- API Consistency: ✅ No breaking changes to public interfaces
Architecture Review
The changes maintain the existing architecture while adapting to new model requirements. The dual Float16/Float32 handling is implemented cleanly without adding complexity to the calling code.
Overall, this is a solid fix that resolves the immediate compatibility issue while maintaining code quality and backward compatibility.
VAD Benchmark ResultsPerformance Comparison
Dataset Details
✅: Average F1-Score above 70% |
Parakeet EOU Benchmark Results ✅Status: Benchmark passed Performance Metrics
Streaming Metrics
Test runtime: 1m4s • 01/25/2026, 08:34 PM EST RTFx = Real-Time Factor (higher is better) • Processing includes: Model inference, audio preprocessing, state management, and file I/O |
69d642d to
06b8232
Compare
Speaker Diarization Benchmark ResultsSpeaker Diarization PerformanceEvaluating "who spoke when" detection accuracy
Diarization Pipeline Timing BreakdownTime spent in each stage of speaker diarization
Speaker Diarization Research ComparisonResearch baselines typically achieve 18-30% DER on standard datasets
Note: RTFx shown above is from GitHub Actions runner. On Apple Silicon with ANE:
🎯 Speaker Diarization Test • AMI Corpus ES2004a • 1049.0s meeting audio • 60.5s diarization time • Test runtime: 2m 15s • 01/25/2026, 08:40 PM EST |
Sortformer High-Latency Benchmark ResultsES2004a Performance (30.4s latency config)
Sortformer High-Latency • ES2004a • Runtime: 2m 33s • 2026-01-26T01:37:22.684Z |
Offline VBx Pipeline ResultsSpeaker Diarization Performance (VBx Batch Mode)Optimal clustering with Hungarian algorithm for maximum accuracy
Offline VBx Pipeline Timing BreakdownTime spent in each stage of batch diarization
Speaker Diarization Research ComparisonOffline VBx achieves competitive accuracy with batch processing
Pipeline Details:
🎯 Offline VBx Test • AMI Corpus ES2004a • 1049.0s meeting audio • 395.6s processing • Test runtime: 7m 44s • 01/25/2026, 08:48 PM EST |
ASR Benchmark Results ✅Status: All benchmarks passed Parakeet v3 (multilingual)
Parakeet v2 (English-optimized)
Streaming (v3)
Streaming (v2)
Streaming tests use 5 files with 0.5s chunks to simulate real-time audio streaming 25 files per dataset • Test runtime: 7m11s • 01/25/2026, 08:40 PM EST RTFx = Real-Time Factor (higher is better) • Calculated as: Total audio duration ÷ Total processing time Expected RTFx Performance on Physical M1 Hardware:• M1 Mac: ~28x (clean), ~25x (other) Testing methodology follows HuggingFace Open ASR Leaderboard |
06b8232 to
15d136c
Compare
macOS 26 introduced stricter BNNS compiler validation that rejects CoreML models where input and output tensors share the same name. Changes: - SortformerModelInference.swift: Read from renamed outputs (chunk_pre_encoder_embs_out, chunk_pre_encoder_lengths_out) - SortformerModelInference.swift: Handle Float16 output from head module - ModelNames.swift: Update to download V2 models from HuggingFace - SortformerTypes.swift: Remove unused nestEncoderDims property The V2 models were converted with explicit identity ops to create distinct output tensors, fixing the BNNS compiler error: "Function main has tensor chunk_pre_encoder_embs as both an input and output. Inputs and outputs must be distinct." Fixes #265
15d136c to
df8a963
Compare
|
hi, this is not compiled when use mac catalyst: |
|
hi @Josscii in that case can you create an issue for mac catalyst |
Summary
Problem
macOS 26 introduced stricter validation in the BNNS graph compiler that rejects CoreML models where input and output tensors share the same name:
Solution
Code Changes
SortformerModelInference.swift:
chunk_pre_encoder_embs_out,chunk_pre_encoder_lengths_out)ModelNames.swift:
SortformerV2,SortformerNvidiaLowV2,SortformerNvidiaHighV2)Model Changes (separate PR)
V2 models converted with:
+ 0.0) to create distinct output tensors*_outsuffixSee: FluidInference/mobius#11
Testing
Tested on macOS 26.1 (Build 25B78), Apple M2:
Fixes #265