v0.3.1
[0.3.1] - 2026-03-11
Added
- M5 / Apple10 device detection: GPU family
Apple10with architecture generation 17, NAX (Neural Accelerators in GPU) availability flag, and NAX-aware tile size tuning (M5 Max/Ultra get 128×64×32) - UltraFusion topology detection:
sysctl hw.packagesdetects multi-die Ultra chips;is_ultra_fusionanddie_countfields onDeviceProperties - GPU and ANE core count estimation: Per-chip core counts derived from device name and tier, with UltraFusion die multiplication
- Memory bandwidth estimation: Tier + GPU family lookup table for estimated bandwidth (GB/s)
- ANE performance stats API:
evaluate_with_stats()onAneModeluses_ANEPerformanceStatswithhwExecutionTimefor nanosecond-precision hardware timing - TUI device tab enhancements: GPU core counts (with per-die breakdown for Ultra), ANE core counts, memory bandwidth, architecture generation, NAX and UltraFusion feature flags
crates/pmetal/README.md: Crate-level README with feature flags table, quick start examples, hardware support summary, and re-export reference
Fixed
AppleGPUFamily::Unknownordering bug:Unknownwas declared last in the enum, causing derivedOrdto rank it aboveApple10— unknown GPUs incorrectly gothas_dynamic_caching,has_nax, etc. set totrue. Fixed by movingUnknownto first position- Future chip name collision:
name.contains("M1")matched "M10"; replaced withhas_chip_id()that checks the character after the match isn't a digit - Dead
sysctlsubprocess inquery_memory_bandwidth: Spawnedsysctlwhose result was discarded; removed and renamed toestimate_memory_bandwidth()using tier-based lookup
Improved
- README updates: Root README now documents hardware support matrix (M1–M5), 9 TUI tabs (was 7), 16 crates (was 15), all fused Metal kernels (GDN, SwiGLU, RMSNorm+LoRA), ANE perf stats and M1–M5 compatibility
- Hardware support docs: Complete M1–M5 chip matrix with arch gen, core counts, bandwidth, ANE TFLOPS measurements; NAX kernel integration roadmap; UltraFusion distributed roadmap
Full Changelog: v0.3.0...v0.3.1