Skip to content

v0.3.1

Choose a tag to compare

@github-actions github-actions released this 11 Mar 15:43
· 642 commits to main since this release

[0.3.1] - 2026-03-11

Added

  • M5 / Apple10 device detection: GPU family Apple10 with architecture generation 17, NAX (Neural Accelerators in GPU) availability flag, and NAX-aware tile size tuning (M5 Max/Ultra get 128×64×32)
  • UltraFusion topology detection: sysctl hw.packages detects multi-die Ultra chips; is_ultra_fusion and die_count fields on DeviceProperties
  • GPU and ANE core count estimation: Per-chip core counts derived from device name and tier, with UltraFusion die multiplication
  • Memory bandwidth estimation: Tier + GPU family lookup table for estimated bandwidth (GB/s)
  • ANE performance stats API: evaluate_with_stats() on AneModel uses _ANEPerformanceStats with hwExecutionTime for nanosecond-precision hardware timing
  • TUI device tab enhancements: GPU core counts (with per-die breakdown for Ultra), ANE core counts, memory bandwidth, architecture generation, NAX and UltraFusion feature flags
  • crates/pmetal/README.md: Crate-level README with feature flags table, quick start examples, hardware support summary, and re-export reference

Fixed

  • AppleGPUFamily::Unknown ordering bug: Unknown was declared last in the enum, causing derived Ord to rank it above Apple10 — unknown GPUs incorrectly got has_dynamic_caching, has_nax, etc. set to true. Fixed by moving Unknown to first position
  • Future chip name collision: name.contains("M1") matched "M10"; replaced with has_chip_id() that checks the character after the match isn't a digit
  • Dead sysctl subprocess in query_memory_bandwidth: Spawned sysctl whose result was discarded; removed and renamed to estimate_memory_bandwidth() using tier-based lookup

Improved

  • README updates: Root README now documents hardware support matrix (M1–M5), 9 TUI tabs (was 7), 16 crates (was 15), all fused Metal kernels (GDN, SwiGLU, RMSNorm+LoRA), ANE perf stats and M1–M5 compatibility
  • Hardware support docs: Complete M1–M5 chip matrix with arch gen, core counts, bandwidth, ANE TFLOPS measurements; NAX kernel integration roadmap; UltraFusion distributed roadmap

Full Changelog: v0.3.0...v0.3.1