Parallel Memory Bandwidth Measurement / Benchmark Tool
C++ Shell C
Failed to load latest commit information.
.gitignore Add .gitignore. Nov 10, 2013
COPYING Adding GPL-3 copyright header. Jun 1, 2013 Adding autoconf/automake build system needed to detect libraries and … May 26, 2013
configure Regenerating configure script. Added missing autoconf scripts, which … Jan 6, 2015
funcs_arm.h use 'asm volatile' to prevent elimination Mar 17, 2014
funcs_c.h add C functions as fallback Mar 17, 2014
funcs_x86_32.h use 'asm volatile' to prevent elimination Mar 17, 2014
funcs_x86_64.h use 'asm volatile' to prevent elimination Mar 17, 2014 adding simple K/M/G/T byte suffix parser Feb 13, 2016 generate useful plots for unknown functions Mar 17, 2014

pmbw - Parallel Memory Bandwidth Measurement / Benchmark

The tool pmbw is a set of assembler routines to measure the parallel memory (cache and RAM) bandwidth of modern multi-core machines. Memory bandwidth is one of the key performance factors of any computer system. And today, measuring the memory performance often gives a more realistic view on the overall speed of a machine than pure arithmetic or floating-point benchmarks. This is due to the speed of computation units in modern CPUs growing faster than the memory bandwidth, which however is required to get more information to the CPU. The bigger the processed data amount gets, the more important memory bandwidth becomes!

The pmbw tool contains a set of very basic functions, which are all hand-coded in assembler to avoid any compiler optimizations. These basic functions are modeled after the basic inner loops found in any data processing: sequential scanning and pure random access. Any application will have a memory access pattern that is somewhere between these two extremes.

Besides these two access patterns, the basic functions benchmark different modes of memory access. Depending on the architecture, 16- / 32- / 64- / 128- or 256-bit memory transfers are tested by using different machine instructions, like MMX, SSE or AVX. Furthermore, iterating by pointers is compared against access via array index. The current version of pmbw supports benchmarking x86_32-bit, x86_64-bit and ARMv6 systems..

Most important feature of this benchmark is that it will perform the tests in parallel with growing number of threads. The results of these scalability tests highlight the basic problem which parallel multi-core algorithms must cope with: scanning memory bandwidth does not scale with the number of cores in current systems. The ratio of bandwidth to cache over the bandwidth to RAM determines the amount of local cache-based processing which must be done between RAM accesses for an algorithm to scale well.

Website and License

The current source package and some binaries can be downloaded from

We also collect results from various multi-core systems on the page above.

The program and code is published under the GNU General Public License v3 (GPL), which can also be found in the file COPYING.


The basic idea of measuring memory bandwidth is not new, however, none of the existing benchmarks target multi-core parallelism, growing array sizes and simple program loops. The STREAM benchmark allows tuning for specific hardware and is not in assembler code. Zack Smith's bandwidth benchmark is limited to sequential bandwidth and was the starting point for designing pmbw.

Written 2013-07-08 by Timo Bingmann