Cross-platform hash-algorithm benchmark suite. The same 32 algorithm sources compiled and timed natively on seven different Nintendo consoles — from the 1.79 MHz 6502 in the NES to the 93.75 MHz 64-bit MIPS VR4300 in the N64 — exposing how CPU architecture, ALU width, cache topology, and toolchain maturity each show up in real throughput numbers.
| Platform | Repo | CPU | Clock | Algos | Buffer | Toolchain |
|---|---|---|---|---|---|---|
| NES / Famicom | hash-bench-nes | MOS 6502 | 1.79 MHz | 18 | 64 B | cc65 |
| Game Boy / GBC | hash-bench-gb | Sharp SM83 (Z80-ish) | 4.19 MHz | 18 | 1024 B | GBDK-2020 / SDCC |
| Game Boy Advance | hash-bench-gba | ARM7TDMI | 16.78 MHz | 32 | 1024 B | devkitARM + libtonc |
| Nintendo DS | hash-bench-nds | ARM946E | 33.51 MHz | 32 | 1024 B | devkitARM + libnds |
| Nintendo DSi | hash-bench-dsi | ARM946E (DSi-mode) | 134.06 MHz | 32 | 1024 B | devkitARM + libnds (setCpuClock(true)) |
| Nintendo 3DS | hash-bench-3ds | ARM11 | 268 / 804 MHz | 32 | 1024 B | devkitARM + libctru |
| Nintendo 64 | hash-bench-n64 | NEC VR4300 (MIPS3, 64-bit) | 93.75 MHz | 32 | 1024 B | libdragon |
Experimental optimization branch (mini code golf on N64 perf): hash-bench-n64-optimized.
Three findings I didn't expect going in:
-
SHA-512 beats SHA-256 on the N64 (1379 vs 778 KB/s) — the only benchmarked platform where this is true, because VR4300 is the only 64-bit-native CPU here. Every 32-bit platform has SHA-256 winning by ~2× because it synthesizes uint64 ops as register pairs. On VR4300
daddu / dsrlv / dxorare single-cycle native, and SHA-512's 128-byte block amortizes round-constant cost over twice the input. -
cc65 + 6502 is roughly 100-1000× slower than ARM for
uint32_twork. CRC-8 (one byte, one table lookup) runs at 14 ms/iter on the NES versus ~3 µs/iter on the GBA — same C source, same algorithm, one CPU has native 32-bit add and the other doesn't. -
DSi-mode
setCpuClock(true)gives a clean 4× speedup on every single algorithm vs DS-mode, no source changes. The DSi has the same ARM946E die as the DS — just clocked at 134 MHz instead of 33 — and the benchmark exposes it as the cleanest "what does clock speed alone buy you" comparison in the family.
Algorithm roster (full set; SM83 / NES omit the 64-bit-state algos that don't fit their ROM budget):
| Tier | Algorithms |
|---|---|
| Checksums (9) | CRC-8, CRC-16, CRC-32, CRC-64, Adler-32, Fletcher-{16,32,64}, Pearson-8 |
| Non-crypto (11) | DJB2, FNV-1a, Knuth, Jenkins-OAT, PJW/ELF, SDBM, Murmur3-32, Murmur3-128, xxHash32, xxHash64, SipHash-2-4 |
| Cryptographic (12) | MD4, MD5, SHA-1, RIPEMD-160, SHA-256, SHA-512, SHA-3-256, SHA-3-512, BLAKE2s, HMAC-SHA256, PBKDF2-HMAC-SHA256, AES-CBC-MAC |
Workload buffer: buf[i] = (i * 31 + 7) & 0xFF for i in [0, buf_len). Identical pattern on every platform so digests cross-check.
HMAC key: ASCII hash-bench-nds + two zero bytes. PBKDF2 iterations:
1000. SipHash key: 0x00..0x0F. xxHash seed: 0.
Timing methodology varies per platform but converges on microsecond-resolution wall time:
| Platform | Timer source | Resolution |
|---|---|---|
| NES | cc65 clock() (NMI-driven) |
~20 ms (CLOCKS_PER_SEC=50) |
| GB / GBC | DIV register + LCD vblank counter | ~1 ms |
| GBA | TM0+TM1 cascade at SYSCLK/64 | ~3.8 µs |
| NDS / DSi | cpuStartTiming(0) at BUS_CLOCK |
~30 ns |
| 3DS | svcGetSystemTick() at SYSCLOCK_ARM11 |
~3.7 ns |
| N64 | libdragon get_ticks_us() at COP0 count |
1 µs |
Each platform reports iters / µs-per-iter / KB-per-second plus the
first 1-4 bytes of the last digest produced (digest IDs cross-check
the algorithm semantics across builds).
Crypto tier @ 1024-byte buffer, KB/s (higher = faster):
| Algorithm | NES* | GB | GBA | NDS | DSi | 3DS | N64 |
|---|---|---|---|---|---|---|---|
| MD4 | 3 | ~12 | ~440 | ~870 | ~3500 | ~12000 | 6372 |
| MD5 | 2 | ~8 | ~280 | ~560 | ~2240 | ~7800 | 1877 |
| SHA-1 | 1 | ~6 | ~220 | ~430 | ~1720 | ~5500 | 1185 |
| RIPEMD-160 | — | — | ~180 | ~360 | ~1440 | ~4400 | 764 |
| SHA-256 | — | — | ~140 | ~280 | ~1120 | ~3900 | 791 |
| SHA-512 | — | — | ~50 | ~110 | ~440 | ~1900 | 1379 ⚡ |
| SHA-3-256 | — | — | ~50 | ~100 | ~400 | ~1500 | 412 |
| BLAKE2s | — | — | ~190 | ~380 | ~1520 | ~5200 | 1692 |
| HMAC-S256 | — | — | ~70 | ~140 | ~560 | ~2000 | 622 |
| PBKDF2 | — | — | ~0.07 | ~0.14 | ~0.55 | ~2.0 | 27 |
| AES-CBC-MAC | — | — | ~40 | ~80 | ~320 | ~1100 | 217 |
* NES is 64-byte buffer, not 1024 — see hash-bench-nes README.
Numbers for GBA / NDS / DSi / 3DS are approximate (the children
projects' READMEs have firm measurements). N64 numbers are the
measured Ares baseline at commit
38322dd.
The bold/⚡ cells are the SHA-512-wins-on-N64 anomaly. Every left-of-N64 column has SHA-256 ≥ SHA-512; only the 64-bit-native CPU flips that ordering.
Every child repo ships:
- A prebuilt ROM at the repo root and in
artifacts/(most projects) - A
build.bat(Windows) andMakefile(Linux/macOS) that rebuild from source using only freely-distributed toolchains - A
tests/verify.chost-gcc cross-check (most projects) that compiles the algorithm.cfiles under gcc and dumps reference digests for the standard workload buffer
The algorithm .c files are byte-identical across platforms wherever
both targets support the required word size. Specifically the 27 algos
in the full roster are shared verbatim between hash-bench-gba /
hash-bench-nds / hash-bench-dsi / hash-bench-3ds / hash-bench-n64;
the 14 that fit in SDCC SM83's runtime are also byte-identical with
hash-bench-gb; the 18 that don't need uint64_t are byte-identical with
hash-bench-nes.
Every screenshot in the per-platform READMEs was taken from a clean boot of an accurate emulator (Mesen2 for NES, mGBA for GB/GBC/GBA, melonDS 1.1 for NDS/DSi, Citra for 3DS, Ares for N64). Each row's iteration counts represent one ~200-500 ms run — the timer is quantised by the platform's native counter (see the "Timing methodology" table above), so individual runs vary by ~1-2% on every platform. Patterns that hold across multiple reruns and across multiple emulator versions get called out as "real"; single-run deltas at noise-floor magnitude get called out as "noise".
This matters because — as documented in the
reverted optimization experiment
on hash-bench-n64 — confidently-reasoned perf patches can fail in
non-obvious ways (e.g., __attribute__((hot, flatten)) reorganized
the .text section enough to make SHA-3 self-conflict on the VR4300's
direct-mapped 16 KB I-cache, dropping it 47% with zero source-level
changes to sha3.c). The discipline is: measure first, write the
patch to fit what the data says is actually the bottleneck.
Three reasons in increasing order of importance:
-
It's a fun cross-architecture survey. Twenty-five years of Nintendo handheld + console CPUs side-by-side on one workload — a sort of museum exhibit in benchmark form.
-
It's a real-world test bed for portable C. All ~30 algorithm files use only
<stdint.h>and the localhashes.h; they compile cleanly under SDCC, GBDK, devkitARM (libgba / libnds / libctru), libdragon, cc65, and host gcc with no platform#ifdefchains inside the algorithm code itself. Themain.cin each project is the only file that talks to its platform's BSP. -
It's the algorithmic foundation for the totp-gb / totp-gba projects — real-world TOTP authenticators (HMAC-SHA1 ÷ Unix-time on a Game Boy) that needed careful per-platform tuning of the same hash functions. The hash-bench projects are where each new algorithm gets stress-tested against a reference implementation before it's trusted in a security context.
- N64 perf golf — see
hash-bench-n64-optimized.
Targeted single-file unrolling on the algos that benefit (MD5,
SHA-256, SHA-512, HMAC, PBKDF2). The reverted broad-sweep attempt
on
hash-bench-n64@e675698taught us why one-file-at-a-time is required on direct-mapped-cache architectures. - PSP — first portable Nintendo-or-otherwise console with native WiFi (2004) and 32 MB RAM, would close the "missing Sony branch" gap in this collection. Likely future addition.
- Saturn / Genesis / 32X / Sega CD — interesting because the SH-2 dual-CPU Saturn vs the 68000 + Z80 Genesis vs 32X's added SH-2 pair would test what asymmetric coprocessors buy you on a per-algorithm basis. Backburner.
- Bitcoin miner / cold-wallet on a handheld — see the design doc at nds-wallet for a cold-wallet-on-DS proposal; the PSP is probably the better target for an actual miner (WiFi + 32 MB RAM + 333 MHz MIPS R4000).
README.md this file — landing page + cross-platform story
That's it. All the code, ROMs, and per-platform documentation lives in the seven child repos linked at the top. This repo exists to be the discovery point — the thing that turns up in a web search for "benchmark sha256 nintendo ds n64" and routes the reader to the specific repo they want.
MIT, matching the child repos. Algorithm implementations are reference
public-domain (FIPS / RFC text) re-typed for <stdint.h> portability;
platform main.c files are MIT-licensed.