Skip to content

dmang-dev/hash-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

hash-bench

Cross-platform hash-algorithm benchmark suite. The same 32 algorithm sources compiled and timed natively on seven different Nintendo consoles — from the 1.79 MHz 6502 in the NES to the 93.75 MHz 64-bit MIPS VR4300 in the N64 — exposing how CPU architecture, ALU width, cache topology, and toolchain maturity each show up in real throughput numbers.

Platform Repo CPU Clock Algos Buffer Toolchain
NES / Famicom hash-bench-nes MOS 6502 1.79 MHz 18 64 B cc65
Game Boy / GBC hash-bench-gb Sharp SM83 (Z80-ish) 4.19 MHz 18 1024 B GBDK-2020 / SDCC
Game Boy Advance hash-bench-gba ARM7TDMI 16.78 MHz 32 1024 B devkitARM + libtonc
Nintendo DS hash-bench-nds ARM946E 33.51 MHz 32 1024 B devkitARM + libnds
Nintendo DSi hash-bench-dsi ARM946E (DSi-mode) 134.06 MHz 32 1024 B devkitARM + libnds (setCpuClock(true))
Nintendo 3DS hash-bench-3ds ARM11 268 / 804 MHz 32 1024 B devkitARM + libctru
Nintendo 64 hash-bench-n64 NEC VR4300 (MIPS3, 64-bit) 93.75 MHz 32 1024 B libdragon

Experimental optimization branch (mini code golf on N64 perf): hash-bench-n64-optimized.


TL;DR

Three findings I didn't expect going in:

  1. SHA-512 beats SHA-256 on the N64 (1379 vs 778 KB/s) — the only benchmarked platform where this is true, because VR4300 is the only 64-bit-native CPU here. Every 32-bit platform has SHA-256 winning by ~2× because it synthesizes uint64 ops as register pairs. On VR4300 daddu / dsrlv / dxor are single-cycle native, and SHA-512's 128-byte block amortizes round-constant cost over twice the input.

  2. cc65 + 6502 is roughly 100-1000× slower than ARM for uint32_t work. CRC-8 (one byte, one table lookup) runs at 14 ms/iter on the NES versus ~3 µs/iter on the GBA — same C source, same algorithm, one CPU has native 32-bit add and the other doesn't.

  3. DSi-mode setCpuClock(true) gives a clean 4× speedup on every single algorithm vs DS-mode, no source changes. The DSi has the same ARM946E die as the DS — just clocked at 134 MHz instead of 33 — and the benchmark exposes it as the cleanest "what does clock speed alone buy you" comparison in the family.


What's measured

Algorithm roster (full set; SM83 / NES omit the 64-bit-state algos that don't fit their ROM budget):

Tier Algorithms
Checksums (9) CRC-8, CRC-16, CRC-32, CRC-64, Adler-32, Fletcher-{16,32,64}, Pearson-8
Non-crypto (11) DJB2, FNV-1a, Knuth, Jenkins-OAT, PJW/ELF, SDBM, Murmur3-32, Murmur3-128, xxHash32, xxHash64, SipHash-2-4
Cryptographic (12) MD4, MD5, SHA-1, RIPEMD-160, SHA-256, SHA-512, SHA-3-256, SHA-3-512, BLAKE2s, HMAC-SHA256, PBKDF2-HMAC-SHA256, AES-CBC-MAC

Workload buffer: buf[i] = (i * 31 + 7) & 0xFF for i in [0, buf_len). Identical pattern on every platform so digests cross-check. HMAC key: ASCII hash-bench-nds + two zero bytes. PBKDF2 iterations: 1000. SipHash key: 0x00..0x0F. xxHash seed: 0.

Timing methodology varies per platform but converges on microsecond-resolution wall time:

Platform Timer source Resolution
NES cc65 clock() (NMI-driven) ~20 ms (CLOCKS_PER_SEC=50)
GB / GBC DIV register + LCD vblank counter ~1 ms
GBA TM0+TM1 cascade at SYSCLK/64 ~3.8 µs
NDS / DSi cpuStartTiming(0) at BUS_CLOCK ~30 ns
3DS svcGetSystemTick() at SYSCLOCK_ARM11 ~3.7 ns
N64 libdragon get_ticks_us() at COP0 count 1 µs

Each platform reports iters / µs-per-iter / KB-per-second plus the first 1-4 bytes of the last digest produced (digest IDs cross-check the algorithm semantics across builds).


Selected cross-platform data

Crypto tier @ 1024-byte buffer, KB/s (higher = faster):

Algorithm NES* GB GBA NDS DSi 3DS N64
MD4 3 ~12 ~440 ~870 ~3500 ~12000 6372
MD5 2 ~8 ~280 ~560 ~2240 ~7800 1877
SHA-1 1 ~6 ~220 ~430 ~1720 ~5500 1185
RIPEMD-160 ~180 ~360 ~1440 ~4400 764
SHA-256 ~140 ~280 ~1120 ~3900 791
SHA-512 ~50 ~110 ~440 ~1900 1379
SHA-3-256 ~50 ~100 ~400 ~1500 412
BLAKE2s ~190 ~380 ~1520 ~5200 1692
HMAC-S256 ~70 ~140 ~560 ~2000 622
PBKDF2 ~0.07 ~0.14 ~0.55 ~2.0 27
AES-CBC-MAC ~40 ~80 ~320 ~1100 217

* NES is 64-byte buffer, not 1024 — see hash-bench-nes README. Numbers for GBA / NDS / DSi / 3DS are approximate (the children projects' READMEs have firm measurements). N64 numbers are the measured Ares baseline at commit 38322dd.

The bold/⚡ cells are the SHA-512-wins-on-N64 anomaly. Every left-of-N64 column has SHA-256 ≥ SHA-512; only the 64-bit-native CPU flips that ordering.


How to reproduce

Every child repo ships:

  • A prebuilt ROM at the repo root and in artifacts/ (most projects)
  • A build.bat (Windows) and Makefile (Linux/macOS) that rebuild from source using only freely-distributed toolchains
  • A tests/verify.c host-gcc cross-check (most projects) that compiles the algorithm .c files under gcc and dumps reference digests for the standard workload buffer

The algorithm .c files are byte-identical across platforms wherever both targets support the required word size. Specifically the 27 algos in the full roster are shared verbatim between hash-bench-gba / hash-bench-nds / hash-bench-dsi / hash-bench-3ds / hash-bench-n64; the 14 that fit in SDCC SM83's runtime are also byte-identical with hash-bench-gb; the 18 that don't need uint64_t are byte-identical with hash-bench-nes.


Reproducible-results promise

Every screenshot in the per-platform READMEs was taken from a clean boot of an accurate emulator (Mesen2 for NES, mGBA for GB/GBC/GBA, melonDS 1.1 for NDS/DSi, Citra for 3DS, Ares for N64). Each row's iteration counts represent one ~200-500 ms run — the timer is quantised by the platform's native counter (see the "Timing methodology" table above), so individual runs vary by ~1-2% on every platform. Patterns that hold across multiple reruns and across multiple emulator versions get called out as "real"; single-run deltas at noise-floor magnitude get called out as "noise".

This matters because — as documented in the reverted optimization experiment on hash-bench-n64 — confidently-reasoned perf patches can fail in non-obvious ways (e.g., __attribute__((hot, flatten)) reorganized the .text section enough to make SHA-3 self-conflict on the VR4300's direct-mapped 16 KB I-cache, dropping it 47% with zero source-level changes to sha3.c). The discipline is: measure first, write the patch to fit what the data says is actually the bottleneck.


Why this exists

Three reasons in increasing order of importance:

  1. It's a fun cross-architecture survey. Twenty-five years of Nintendo handheld + console CPUs side-by-side on one workload — a sort of museum exhibit in benchmark form.

  2. It's a real-world test bed for portable C. All ~30 algorithm files use only <stdint.h> and the local hashes.h; they compile cleanly under SDCC, GBDK, devkitARM (libgba / libnds / libctru), libdragon, cc65, and host gcc with no platform #ifdef chains inside the algorithm code itself. The main.c in each project is the only file that talks to its platform's BSP.

  3. It's the algorithmic foundation for the totp-gb / totp-gba projects — real-world TOTP authenticators (HMAC-SHA1 ÷ Unix-time on a Game Boy) that needed careful per-platform tuning of the same hash functions. The hash-bench projects are where each new algorithm gets stress-tested against a reference implementation before it's trusted in a security context.


Open work / ideas

  • N64 perf golf — see hash-bench-n64-optimized. Targeted single-file unrolling on the algos that benefit (MD5, SHA-256, SHA-512, HMAC, PBKDF2). The reverted broad-sweep attempt on hash-bench-n64@e675698 taught us why one-file-at-a-time is required on direct-mapped-cache architectures.
  • PSP — first portable Nintendo-or-otherwise console with native WiFi (2004) and 32 MB RAM, would close the "missing Sony branch" gap in this collection. Likely future addition.
  • Saturn / Genesis / 32X / Sega CD — interesting because the SH-2 dual-CPU Saturn vs the 68000 + Z80 Genesis vs 32X's added SH-2 pair would test what asymmetric coprocessors buy you on a per-algorithm basis. Backburner.
  • Bitcoin miner / cold-wallet on a handheld — see the design doc at nds-wallet for a cold-wallet-on-DS proposal; the PSP is probably the better target for an actual miner (WiFi + 32 MB RAM + 333 MHz MIPS R4000).

Layout of this repo

README.md              this file — landing page + cross-platform story

That's it. All the code, ROMs, and per-platform documentation lives in the seven child repos linked at the top. This repo exists to be the discovery point — the thing that turns up in a web search for "benchmark sha256 nintendo ds n64" and routes the reader to the specific repo they want.


License

MIT, matching the child repos. Algorithm implementations are reference public-domain (FIPS / RFC text) re-typed for <stdint.h> portability; platform main.c files are MIT-licensed.

About

Cross-platform hash-algorithm benchmark suite — 32 algos on 7 Nintendo consoles (NES, GB/GBC, GBA, NDS, DSi, 3DS, N64). Landing page + cross-platform story; per-platform code in sibling repos.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors