-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmark Details
This page describes the workloads used in each benchmark category. All tests are native Swift and run locally.
Single-threaded CPU performance across mixed workloads.
- What: 64-bit integer arithmetic
- How: tight loop with add, multiply, shift, and XOR
- Metric: Mops/s (millions of operations per second)
- What: double-precision math
- How: multiply, sqrt, sin/cos operations
- Metric: Mops/s
- What: vectorized operations using vDSP
- How: vector multiply-add, vector add, dot product
- Metric: GFLOPS
- What: AES-256-GCM encryption + SHA-256 hashing
- How: encrypt a data buffer and hash ciphertext
- Metric: MB/s throughput
- What: LZFSE compression + decompression
- How: compress and decompress a buffer
- Metric: MB/s throughput (combined)
Same tests as single-core, executed in parallel across all available CPU cores.
- Uses Swift TaskGroup for parallel execution
- Totals throughput across tasks
- Score is normalized by core count (see Scoring Methodology)
Unified memory subsystem performance.
- Linear read of a 256 MB buffer
- Page-aligned allocation with optional mlock
- Metric: GB/s
- Linear write of a 256 MB buffer
- Page-aligned allocation with optional mlock
- Metric: GB/s
- memcpy between two 256 MB buffers
- Page-aligned allocation with optional mlock
- Metric: GB/s
- Pointer-chase random access
- 32 MB working set, 10M accesses
- Metric: ns (lower is better)
Storage performance with cache bypass. Patterns are NovaBench-compatible for direct comparison.
We align with NovaBench patterns to enable meaningful cross-tool comparisons:
- Sequential: 4MB blocks (NovaBench uses up to 8 simultaneous)
- Random: 4KB blocks, QD1 (one operation at a time)
- All metrics in MB/s for consistency
- Creates a unique temp directory per run
- Uses O_NOFOLLOW to prevent symlink attacks
- Uses F_NOCACHE to bypass the filesystem cache
- Uses F_FULLFSYNC to force sync to physical media
- 4 MB chunks into a 256 MB (quick) or 512 MB (normal) file
- Cache bypass + final sync
- Metric: MB/s
- 4 MB chunks from a 256 MB (quick) or 512 MB (normal) file
- File is synced before reading to ensure cold read
- Metric: MB/s
- 4 KB writes at random offsets in a 256 MB sparse file
- QD1 pattern (one I/O at a time)
- Metric: MB/s (converted from IOPS × 4KB / 1MB)
- 4 KB reads at random offsets in a 256 MB file
- QD1 pattern (one I/O at a time)
- Metric: MB/s (converted from IOPS × 4KB / 1MB)
| Parameter | Quick Mode | Normal Mode |
|---|---|---|
| Sequential file size | 256 MB | 512 MB |
| Sequential chunk size | 4 MB | 4 MB |
| Random file size | 512 MB | 1 GB |
| Random block size | 4 KB | 4 KB |
| Random operations | 500 | 2,000 |
| Metric | Our Tool (M1) | NovaBench M1 | Why Different |
|---|---|---|---|
| Seq Read | ~2180 MB/s | 3356 MB/s | We use F_NOCACHE |
| Seq Write | ~700 MB/s | 3279 MB/s | We use F_FULLFSYNC |
| Rand Read | ~43 MB/s | 166 MB/s | 1GB file exceeds cache |
| Rand Write | ~17 MB/s | 761 MB/s | F_NOCACHE + final sync |
Note: Our tool measures actual disk performance (what you'd see in real workloads), not cache throughput. NovaBench numbers include filesystem cache effects, which makes them higher but less representative of sustained I/O performance.
Integrated GPU compute benchmarks using Metal.
- Dense matrix multiplication
- Size: 1024x1024 (quick) or 2048x2048 (normal)
- Metric: GFLOPS
- N-body style simulation
- Particles: 100,000 (quick) or 1,000,000 (normal)
- Metric: Mparts/s
- 5x5 convolution on an image
- Size: 2048x2048 (quick) or 4096x4096 (normal)
- Metric: MP/s
- Sobel filter on the same image size
- Metric: MP/s
| Parameter | Quick Mode | Normal Mode |
|---|---|---|
| Matrix size | 1024×1024 | 2048×2048 |
| Particle count | 100,000 | 1,000,000 |
| Image size (blur/edge) | 2048×2048 | 4096×4096 |
- Metal shaders are compiled at runtime from inline source (SPM compatible)
- Texture data is heap-allocated to avoid stack overflow on large images (v2.1.1+)
- Uses deterministic patterns for reproducibility
- If Metal is unavailable, GPU tests return 0 and are scored as "Failed"
Neural Engine and machine learning inference benchmarks using CoreML and Accelerate.
The AI/ML score is not included in the Total Score. This mirrors Geekbench AI's approach:
- AI workloads have different characteristics than traditional benchmarks
- Neural Engine performance varies significantly between chip generations
- Users may care about AI performance independently from general compute
All CoreML tests use the same model (MobileNetV2 image classification) with different compute units:
- What: CoreML inference using CPU-only
-
Compute Units:
.cpuOnly - Metric: IPS (inferences per second)
- Measures: CPU's ability to run ML models without GPU/ANE
- What: CoreML inference using GPU
-
Compute Units:
.cpuAndGPU - Metric: IPS (inferences per second)
- Measures: GPU's ability to accelerate ML inference
- What: CoreML inference with Neural Engine
-
Compute Units:
.all(CoreML schedules to ANE when beneficial) - Metric: IPS (inferences per second)
- Measures: ANE throughput for supported operations
- What: Matrix multiplication using vDSP
- Size: 512×512 (quick) or 1024×1024 (normal)
- Metric: GFLOPS
- Measures: CPU vector math performance for ML workloads
The benchmark downloads a CoreML model from Apple's official ML assets on first run:
- Model: MobileNetV2 image classification (~17MB source)
-
Source:
ml-assets.apple.com(Apple's official CoreML model repository) - Compilation: Compiled locally using CoreML framework (no Xcode required)
-
Cache:
~/Library/Application Support/osx-bench/models/ -
Offline mode: Use
--offlineto skip if model not cached -
Custom model: Use
--model-pathfor local.mlmodelcmodels
| Parameter | Quick Mode | Normal Mode |
|---|---|---|
| Warmup iterations | 5 | 20 |
| Min iterations | 5 | 10 |
| Max iterations | 1,000 | 10,000 |
| BNNS matrix size | 512×512 | 1024×1024 |
- Neural Engine availability depends on macOS version and chip
- CoreML decides layer scheduling - not all operations run on ANE even with
.all - Input is synthetic (deterministic pattern for reproducibility)
- Results comparable to Geekbench AI methodology (see Geekbench AI Workloads paper)
Thermal state is tracked during a run using the macOS public API:
- Nominal
- Fair
- Serious
- Critical
If throttling is detected, scores may be lower than peak performance.
| Mode | Flag | Duration | Use Case |
|---|---|---|---|
| Quick | --quick | ~3s per category | Fast iteration |
| Normal | default | 10s per category | Standard runs |
| Custom | -d N | N seconds | Tuned runs |
| Stress | --stress | ~60s per category | Sustained performance |