Below some results collected. Please keep in mind that these are NOT hardware performance numbers but depend on software/settings (see the differences kernel version makes for RockPro64 for example). The purpose of
sbc-bench is to generate insights and not colorful graphs representing numbers without meaning. It's perfectly fine for the same hardware appearing multiple times with different numbers since those differ for a reason (software/settings).
Especially openssl numbers should be taken with a huge grain of salt since the benchmark numbers depend on kernel features and performance with other use cases (e.g. disk/filesystem encryption) might look differently.
So do not rely on collected numbers unless you carefully read through all the explanations and insights below and be prepared to conduct your own benchmarks if you really want to choose appropriate hardware for your use case.
|Board||Clockspeed||Kernel||Distro||7-zip||AES-128 (16 byte)||AES-256 (16 KB)||memcpy||memset||kH/s||URL|
|BPi R2||1300 MHz||4.4||Xenial armhf||2600||27550||25350||1500||3800||-||http://ix.io/1iGV|
|Clearfog Pro||1600 MHz||4.14||Stretch armhf||2185||44500||43900||935||4940||-||http://ix.io/1iFa|
|Helios4||1600 MHz||4.14||Stretch armhf||2210||44785 *1280||42500 *98560||910||4840||-||http://ix.io/1jCy|
|Edge/Captain||2000/1500 MHz||4.4||Bionic arm64||6550||402150||1130400||2810||4860||10.50||http://ix.io/1rYm|
|EspressoBin||800 MHz||4.17||Stretch arm64||1138||54290||368330||1040||2490||1.23||http://ix.io/1kt2|
|EspressoBin||1200 MHz||4.18||Stretch arm64||1630||81900||555840||1000||2400||1.82||http://ix.io/1lCe|
|Le Potato||1410 MHz||4.18||Stretch arm64||3780||96680||657200||1810||5730||3.92||http://ix.io/1iSQ|
|Lime A10||910 MHz||4.14||Stretch armhf||550||25200||28250||440||1300||-||http://ix.io/1j1L|
|NanoPC T3+||1400 MHz||4.4||Xenial armhf||6400||143800||651000||1650||3700||-||http://ix.io/1iyp|
|NanoPC T3+||1400 MHz||4.14||Bionic arm64||7480||126000||652600||1440||4540||10.99||http://ix.io/1iRJ|
|NanoPC T4||1800/1400 MHz||4.17||Stretch arm64||6250||307200||1022500||4100||9000||8.24||http://ix.io/1iFz|
|NanoPC T4||1800/1400 MHz||4.17||Stretch arm64||6380||230280||1022600||4160||9000||9.36||http://ix.io/1iZq|
|NanoPC T4||1800/1400 MHz||4.17||Stretch arm64||6230||299600||1023600||4100||9060||10.30||http://ix.io/1iWU|
|NanoPC T4||2000/1500 MHz||4.4||Stretch arm64||5870||308370||1124040||2810||4890||8.70||http://ix.io/1lkG|
|NanoPi Fire3||1380 MHz||4.14||Bionic arm64||7440||126050||653000||1560||4600||10.96||http://ix.io/1jjm|
|NanoPi Fire3||1380 MHz||4.14||Stretch arm64||7420||95700||645400||1520||4570||8.53||http://ix.io/1jiU|
|NanoPi K1 Plus||1152 MHz||4.14||Stretch arm64||3030||78740||533380||1040||3070||3.32||http://ix.io/1m3x|
|NanoPi K2||1480 MHz||4.14||Stretch arm64||3850||43020||50370||1660||3870||4.61||http://ix.io/1iT1|
|NanoPi M4||2000/1500 MHz||4.19||Stretch arm64||6400||334650||1128330||4080||8270||8.86||http://ix.io/1lzP|
|NanoPi NEO4||2000/1500 MHz||4.4||Stretch arm64||6510||320600||1128860||2260||4770||8.71||http://ix.io/1oho|
|NanoPi NEO4||2000/1500 MHz||4.4||Stretch arm64||6030||342620||1121380||2230||4770||8.57||http://ix.io/1oib|
|NanoPi NEO4||2000/1500 MHz||4.4||Stretch arm64||6520||268720||1123190||2280||4770||8.83||http://ix.io/1oim|
|NanoPi NEO4||2000/1500 MHz||4.19||Stretch arm64||6750||278200||1139850||2370||6110||8.84||http://ix.io/1p3T|
|ODROID-C2||1750 MHz||3.14||Xenial arm64||4070||50500||48500||1750||3100||-||http://ix.io/1ixI|
|ODROID-C2||1530 MHz||4.17||Stretch arm64||3870||43800||51280||1420||2600||4.63||http://ix.io/1iSh|
|ODROID-XU4||1900/1400 MHz||3.10||Jessie armhf||6750||74100||68200||2200||4800||-||http://ix.io/1ixL|
|ODROID-XU4||2000/1400 MHz||4.9||Stretch armhf||6400||73350||72075||2230||4850||-||http://ix.io/1iWL|
|ODROID-XU4||2000/1500 MHz||4.14||Bionic armhf||7100||74700||71500||2240||4880||-||http://ix.io/1iLy|
|Orange Pi PC Plus||1300 MHz||4.14||Stretch armhf||2.880||20890||25270||900||3280||-||http://ix.io/1j1d|
|Orange Pi Plus 2||1300 MHz||4.14||Stretch armhf||2.890||21480||25250||830||3240||-||http://ix.io/1iX4|
|PineH64||900 (!) MHz||4.17||Stretch arm64||2550||62200||421000||1600||4840||2.84||http://ix.io/1iFT|
|PineH64||1800 MHz||4.18||Stretch arm64||4650||123400||836900||1380||5530||5.62||http://ix.io/1jEr|
|Renegade||1400 MHz||4.4||Stretch arm64||3710||95030||644200||1565||7435||3.92||http://ix.io/1iFx|
|Raspberry Pi 2 B+||900 MHz||4.14||Debian Stretch||2070||14350||17450||615||1175||-||http://ix.io/1iFf|
|Raspberry Pi 2 B+||900 MHz||4.14||Raspbian Stretch||2130||14000||16300||1010||1170||-||http://ix.io/1ivw|
|Raspberry Pi 3 B+||original||4.9||Raspbian Stretch||3600||35500||42700||1230||1640||-||http://ix.io/1iI5|
|Raspberry Pi 3 B+||normal||4.14||Raspbian Stretch||3240||30500||36600||1130||1530||-||http://ix.io/1ism|
|Raspberry Pi 3 B+||normal||4.14||Raspbian Stretch||3040||29500||36600||1050||1500||-||http://ix.io/1iGM|
|Raspberry Pi 3 B+||UV/normal||4.14||Raspbian Stretch||2100||29500||36400||1040||1460||-||http://ix.io/1iH0|
|Raspberry Pi 3 B+||OC/normal||4.14||Raspbian Stretch||3130||30500||36620||1230||1780||-||http://ix.io/1iGz|
|Raspberry Pi 3 B+||with fan||4.14||Raspbian Stretch||3670||35800||42600||1120||1600||-||http://ix.io/1isD|
|Raspberry Pi Zero||1000 MHz||4.14||Raspbian Stretch||450||13400||16820||400||1590||-||http://ix.io/1niO|
|Rock64||1300 MHz||4.4||Bionic arm64||3410||89060||601200||1310||5680||4.46||http://ix.io/1iGW|
|Rock64||1300 MHz||4.18||Bionic arm64||3530||116100||605250||1340||5770||4.65||http://ix.io/1iH4|
|Rock64||1300 MHz||4.4||Stretch arm64||3430||88600||601000||1350||5680||3.64||http://ix.io/1iHo|
|Rock64||1300 MHz||4.18||Stretch arm64||3560||89070||603800||1340||5770||3.80||http://ix.io/1iHB|
|Rock64||1400 MHz||4.4||Stretch arm64||3610||95000||644250||1330||5700||3.85||http://ix.io/1iFm|
|Rock64||1400 MHz||4.4||Stretch arm64||3590||95000||643700||1320||5640||4.40||http://ix.io/1iZj|
|Rock64||1400 MHz||4.4||Stretch arm64||3580||94800||644380||1330||5680||4.63||http://ix.io/1iYK|
|Rock64||1400 MHz||4.4||Stretch armhf||3620||99400||624000||1430||3620||-||http://ix.io/1iwz|
|Rock Pi 4B||1800/1400 MHz||4.4||Stretch armhf||~6250||261960||1007500||1900||4850||-||http://ix.io/1pJi|
|Rock Pi 4B||2000/1500 MHz||4.4||Stretch armhf||~6450||301470||1113900||1870||4860||-||http://ix.io/1rrO|
|RockPro64||1800/1400 MHz||4.4||Stretch arm64||6140||298800||1015600||2770||4850||8.14||http://ix.io/1lBC|
|RockPro64||1800/1400 MHz||4.4||Stretch armhf||6250||275000||1000150||2000||4835||-||http://ix.io/1iFZ|
|RockPro64||1800/1400 MHz||4.18||Stretch arm64||6300||237700||1021500||3650||8450||8.20||http://ix.io/1iFp|
|Tinkerboard||1730 MHz||4.14||Stretch armhf||5350||63150||66600||1480||3900||-||http://ix.io/1iSX|
|Vim2||1400/1000 MHz||4.9||Xenial arm64||4800||177600||659000||1690||5610||-||http://ix.io/1ixi|
|Vim2||1400/1000 MHz||4.17||Bionic arm64||5450||126770||659600||1920||5920||8.59||http://ix.io/1iJ7|
|x5-Z8300||1420 MHz||4.9||Stretch amd64||3900||101580||178010||2380||2380||7.81||http://ix.io/1lgD|
|x5-Z8350||1920/1420 MHz||4.17||Manjaro amd64||4540||137900||237130||1970||1670||9.32||http://ix.io/1lBy|
|Celeron J3455||2300/1500 MHz||4.17||Stretch amd64||7000||316480||429660||4090||4050||17.26||http://ix.io/1m5p|
|Pentium N4200||2560/1100 MHz||4.14||Bionic amd64||7469||354328||468008||4682||4997||18.75||http://ix.io/1ngq|
|Pentium J4205||2560/1500 MHz||4.17||Stretch amd64||7570||355540||480640||5070||5170||18.82||http://ix.io/1m5t|
|Celeron J4105||2400/1500 MHz||4.15||Bionic amd64||9020||458670||697100||5500||7410||19.07||http://ix.io/1qal|
|Celeron J4105||2400/1500 MHz||4.15||Bionic amd64||8960||453860||697080||5620||7650||19.13||http://ix.io/1qb0|
* Number obtained with cryptodev (Marvell's CESA).
- 7-zip number is an averaged multi threaded score from 3 consecutive
7z bruns. Only relevant for server workloads where stuff happens in parallel. Check the links for single threaded results (on big.LITTLE SoCs individually) to get an idea how most typical (single threaded) workloads perform
- AES-128 (16 byte) is a single threaded encryption score with very small chunks of data (useful to get an idea how initialization overhead influences crypto performance with small packets). On big.LITTLE SoCs numbers show big core performance
- AES-256 (16 KB) is a single threaded encryption score with rather huge chunks of data. On big.LITTLE SoCs numbers show big core performance
- memcpy and memset are tinymembench measurements for memory bandwidth. On big.LITTLE SoCs numbers show big core performance
- kH/s is a multi threaded cpuminer score showing the board's performance when executing NEON optimized code. To get the performance difference between big and little cores check the links in the right column
- Clearfog Pro and Helios4 use exactly same SoC (Armada 385), kernel and clockspeeds and the only reason why OpenSSL numbers differ is since Helios4 numbers were made using Marvell's CESA crypto accelerator via cryptodev which provides nice speed improvements with larger block sizes but also some initialization overhead with tiny block sizes. Also CPU utilization is way lower so the SoC is free for other stuff while performing better at the same time.
- EspressoBin's boot BLOB claims to run at up to 1GHz while real clockspeeds are lower maxing out with this setting at 790MHz (obviously a kernel bug -- see details)
- NanoPi K1 Plus numbers are preliminary. Currently in Armbian highest cpufreq OPP is 1152 MHz and throttling tresholds are way too low, once this is unlocked (SoC capable of almost 1.4GHz) numbers will improve further
- NanoPi NEO4 numbers: 1st result is from my NEO4 N°1 running with a NanoPi M4 image. This NEO uses the vendor supplied thermal pad between SoC and heatsink. 2nd number from my 2nd NEO4 this time using NEO4 settings (
rk3399-nanopi4-rev04.dtbloaded) with a copper shim between heatsink and SoC which as usual improves 'thermal performance' a lot. Since memory bandwidth and especially latency is too low another test needed with my NEO4 N°2, this time again with M4 settings (
rk3399-nanopi4-rev01.dtbloaded) and an additional fan. Memory performance restored, slightly better performance due to colder SoC. 4th result made with 4.19.0-rc4. Please be aware that RK3399 memory performance numbers differ alot between 4.4 and mainline kernel for yet unknown reasons!
- PineH64 numbers are both preliminary -- with mainline kernel no cpufreq scaling was working in the beginning. Comparison of 900 MHz and 1800 MHz numbers allows to estimate influence of CPU clockspeeds on DRAM bandwidth measurements and so on.
- RPi 3 B+ performance shown as original was measured with an older ThreadX release (6e08617e7767b09ef97b3d6cee8b75eba6d7ee0b from Mar 13 2018). Back then the 3B+ was faster than the 3B. This changed with a newer ThreadX release (4800f08a139d6ca1c5ecbee345ea6682e2160881 from Jun 7 2018) since RPi Trading people decided to trash performance on every RPi 3 B+ to masquerade instability issues on a fraction of boards (details)
- RPi 3 B+ performance numbers shown as normal were made with no or just a heatsink (in contrast to with fan)
- RPi 3 B+ marked as 'UV/normal' means: normal settings and average Micro USB cable resulting in UV (undervoltage). Once the demanding 7-zip benchmark started voltage dropped below 4.63V and 'frequency capping' (downclocking to 600 MHz) happened destroying performance. See the detailed log: 1400 MHz are reported by the kernel while it's 600 MHz in reality. Is this just highly misleading or already cheating?
- RPi 3 B+ marked as 'OC/normal' means: OC (overclocked) settings, stable voltage but no fan used. Since SoC temperature exceeds 60°C the 'firmware' starts to cheat and downclocks to 1200 MHz while the kernel reports running at 1570 MHz. At least memory overclocking is somewhat effective.
- Rock Pi 4B numbers are preliminary. Board has been tested without heatsink first so throttling occured as expected. Second time with higher cpufreq OPPs just a fan was added (fan without heatsink == pretty inefficient). Memory performance seems rather low but that's due to testing with vendor's armhf Linaro images -- see other RK3399 devices running same software stack, e.g. RockPro64 numbers above with kernel 4.4, armhf and also being limited to 1.8/1.4GHz.
- Vim2 is somewhat special: not a real big.LITTLE design but 2 A53 clusters controlled by a firmware BLOB that allows cluster 0 to clock up to 1414 MHz (reported falsely as 1512 MHz) and cluster 1 able to reach 1 GHz (details)
- x86 numbers are meant as comparison. x5-Z8300 numbers were made with UP Board, x5-Z8350 with Alfawise X5 Mini, Celeron J3455 with an ASRock J3455-ITX mainboard, Pentium N4200 on UP2 Board, Pentium J4205 on an ASRock J4205-ITX and Celeron J4105 on two ODROID-H2 with different DDR4-PC19200 (2400MT/s) SO-DIMMs (remotely accessed via maze.odroid.com)
- Benchmarking the Raspberry Pi is useless when not taking into account that there always is a primary operating system running on the primary CPU (VideoCore) that fully controls the hardware. ARM cores are just guests here. That's why
sbc-benchstarting with v0.2 also logs ThreadX version and configuration (/boot/config.txt)
- Looking at RPi 2 B+ numbers this is 2 times the same hardware, one time running latest Raspbian Stretch Lite and one time OMV/Armbian. Userland is both times Debian Stretch but Raspbian packages are built for ARMv6 while upstream Debian builds for ARMv7 (though with less effective compiler switches). Overall performance looks more or less the same except a very low
memcopybandwidth value with OMV. What's the reason since same ditro and kernel is used and same GCC to compile
tinymembench? Is it firmware 'af8084725947aa2c7314172068f79dad9be1c8b4 from Apr 16 2018' vs. '47b05c853342eb6e4ea5b017d981e0ef247fb8be from Jul 3 2018'?
- Looking at RPi 3 B+ numbers it's obvious that 'firmware' version is the most important factor. With original firmware (6e08617e7767b09ef97b3d6cee8b75eba6d7ee0b from Mar 13 2018) performance is ok just to get trashed after applying firmware 4800f08a139d6ca1c5ecbee345ea6682e2160881 from Jun 7 2018 which totally changes throttling behaviour. From then on you either need a fan for good performance or add a
temp_soft_limit=entry to the firmware config file (we can't have a look what all those partially undocumented settings really do since RPi's main operating system is closed source)
tinymembenchwhen executed on an A53 in an armhf userland compared to arm64 seems to generate lower
memsetnumbers (78% on RK3399 -- see RockPro64 arm64 vs. RockPro64 armhf -- and 64% on RK3328 -- see Rock64 arm64 vs. Rock64 armhf). Status: needs further investigation and confirmation
- Bionic vs. Stretch doesn't seem to make a difference with
7-zipscores. Applies to both armhf and arm64 too -- see Rock64 numbers above
7-zipscores benefit slightly from memory performance. See RK3328 equipped Renegade at 1.4 GHz with 4.4 kernel and Rock64 with same setup
opensslnumbers are not affected by memory performance and are the same with same CPU cores and same clockspeeds. At least with Cortex-A53 running at 1.4 GHz with a Debian Stretch arm64 binary: Le Potato, NanoPi Fire3, Renegade, Rock64 and RockPro64 with openssl pinned to an A53 core: ~96000k with AES-128/16bit and ~650000k with AES-256/16KB
- It seems the combination arm64 Bionic with very recent kernel improves AES encryption results with small data chunks (less than 1KB -- see Rock64 with 4.18 at 1.3GHz and Vim2 with 4.17 at 1.4GHz vs. Rock64 with 4.4 at 1.3GHz). Status: Needs further investigations (most probably related to GCC version)
- It seems running an armhf userland on 64-bit SoCs also improves AES encryption results with small data chunks (see armhf entries for NanoPC T3+, Rock64, RockPro64 and Vim2). Status: very interesting, needs further investigations
- It seems running Xenial binaries even further improves AES/SSL performance when ARMv8 Crypto Extensions are available. Status: while interesting irrelevant, we should get rid of Xenial and Jessie numbers.
- It makes a huge difference whether ARMv8 Crypto Extensions can be used or not. See the many 64-bit SBC results above and compare with 32-bit SoCs or RPi 3B+, ODROID-C2 and NanoPi K2 (the latter 3 basing on 64-bit ARMv8 SoCs without crypto engine licensed/available)
- Bionic vs. Stretch makes a big difference with
cpuminer. Libs and GCC versions obviously matter (GCC 7.3 on Bionic vs. 6.3 on Stretch -- some benchmarks heavily depend on compiler versions). Stretch with GCC 7.3 provides a 15% performance increase with cpuminer on RK3328 and RK3399 (see Rock64 and NanoPC T4 numbers above and there the logs to compare performance of big and little cores). With GCC 8.2 and Stretch it's 20% with RK3328 and even 25% with RK3399 (the A72 performance increasing more compared to the A53 cores -- check individual kH/s numbers in the logs)
- (more to come soon)
The bigger picture
- To compare different hardware exactly the same software environment (apps, libs, compiler, kernel) is needed. Ignoring this will produce numbers without meaning.
- ARM's big cores (A15, A17, A72) perform a lot better than the little cores (A7, A53). Everything that needs high single threaded performance will hugely benefit from running on such a core. This puts SoCs like RK3288 (Tinkerboard), Exynos 5244 (ODROID XU4) or RK3399 in a better position. For the big.LITTLE designs a working HMP scheduler is mandatory since otherwise performance hungry tasks end up on a slow core. This is even true for pseudo big.LITTLE like on the Vim2/S912
- 7-zip's benchmark still looks like a nice indicator for a 'server workloads' performance index (multi threaded tasks that do not rely on floating point arithmetics but partially on memory performance). Though these scores are totally irrelevant when it's about SBC use cases that focus on something different (e.g. a 'Desktop Linux' needing high single threaded CPU performance, HW accelerated GPU and VPU and also fast random IO on the rootfs)
- We see a huge variation in tinymembench numbers with some boards outperforming others by magnitudes while the effect in reality for CPU bound workloads is rather minimal though high memory bandwidth is a requirement for certain other tasks (e.g. playing 4K video). At least numbers are there to generate further insights.
- Identical SoCs perform more or less identical if 'environmental conditions' (clockspeeds) are the same -- see Renegade vs. Rock64 numbers or NanoPC T4 vs. RockPro64 or ODROID-C2 vs. NanoPi K2.
- Same could be said for different Cortex-A cores. One A53 performs like the other as long as both run at the same clockspeed (with some exceptions most probably due to internal cache sizes -- see cpuminer numbers for Amlogic S905 vs. S905X/RK3328). With same count of cores you get similar performance (if the task(s) in question benefits from parallel execution)
- Cortex-A53 running at the same clockspeed as A7 shows almost ~30% better performance (~3500 7-zip MIPS vs. ~2700). This is even true when running ARMv7 code (see RPi 3 B+ numbers). In general it seems irrelevant whether the A53 cores run an armhf or arm64 userland, some numbers are even higher when running armhf code. This is very interesting since there are scenarios where running an armhf userland results in needing way less physical memory for the same task while performing identical. Please note: it's about the userland (32-bit vs. 64-bit) and not kernel (64-bit of course)
- On SoCs that contain an own crypto engine the openssl numbers above don't tell the whole truth (userspace vs. in-kernel crypto). It needs additional benchmarks to get an idea how CESA (Clearfog/Helios4 with Armada 38x), sun4i-ss (Allwinner SoCs), Samsung's Slim SSS (ODROID XU4/HC1/HC2) or MediaTek's crypto accelerator (BPi R2 / MT7623) perform with real-world workloads like disk encryption.