MPI_STREAM

This program measures memory transfer rates in MB/s for simple computational kernels coded in C.

Motivations

Since 2007, the Stream benchmark is used to test and check nodes on clusters managed by CEA/DAM during system updates or maintenances. The benchmark is useful to detect:

Memory module failure
Lack of memory on nodes
Memory module performance issue
OS regression
Compiler regression (OpenMP)

Over the years, CEA added some features to be more efficient detecting these problems:

The MPI version can test a whole cluster (more than 8000 nodes) with a single run only.
An option can be used to define the amount of memory to use instead of a vector size. An array size will therefore be computed to fit this memory requirement.
The output was updated to give a list of nodes sorted by their measured bandwidth.

To share these new features with the HPC community, CEA publishes this modified version on Github in 2022.

Legacy code

McCalpin, John D., 1995: "Memory Bandwidth and Machine Balance in Current High Performance Computers", IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.

Stream benchmark

Stream benchmark paper

Running

$ ./stream.exe -h
MPI_STREAM CEA MPI/OpenMP version $Revision: X.Y $
Usage: ./stream.exe [-h] [-n N] [-m mem] [-t ntimes] [-o offset]

Options:
   -n N         Size of a vector
   -m mem       Memory (kB) used per process
   -t ntimes    Number of times the computation will run
   -o offset    Offset
   -h           Print this help

Examples

You can launch the sequential mode with 1GB of memory:

$ ./stream.exe -m 1048576
-------------------------------------------------------------
MPI_STREAM CEA MPI/OpenMP version $Revision: X.Y $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 44739242, Offset = 0
Total memory required = 1048575.98 KB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 1
-------------------------------------------------------------
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 20594 microseconds.
   (= 20594 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       42299.2351       0.0171       0.0169       0.0172
Scale:      42512.4562       0.0171       0.0168       0.0171
Add:        42376.8484       0.0255       0.0253       0.0256
Triad:      42440.3442       0.0255       0.0253       0.0257
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------

To test 4 nodes with 128 cores and 256GB per node, you can launch the MPI/OpenMP mode with 220GB per node:

$ OMP_NUM_THREADS=128 mpirun -n 4 -cpus-per-rank 128 ./stream.exe -m 230686720
-------------------------------------------------------------
MPI_STREAM CEA MPI/OpenMP version $Revision: X.Y $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 9842633386, Offset = 0
Total memory required = 230686719.98 KB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 128
-------------------------------------------------------------
Number of MPI Processes = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 487067 microseconds.
   (= 487067 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Triad Rate (MB/s):
node6201		336870.222291
node6202		336899.087888
node6203		336828.994303
node6204		336809.300049


============SUMMARY============

TRIAD_MAX          = 336899.087888		 MB/s on node6202
TRIAD_MIN          = 336809.300049		 MB/s on node6204
TRIAD_AVG          = 336851.901133		 MB/s
TRIAD_AVG_per_proc = 2631.655478		 MB/s
TRIAD_STDD         = 40.422064		 MB/s

==========END SUMMARY==========

Contributing

Authors

See the list of AUTHORS who participated in this project.

Contact

Laurent Nguyen - laurent.nguyen@cea.fr

Website

CEA-HPC

License

Copyright 2007-2023 CEA/DAM/DIF

MPI_STREAM is distributed under the original license of STREAM benchmark.
See the included files LICENSE.txt (English version).

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
c		c
rust/mpi_stream		rust/mpi_stream
.clang-format		.clang-format
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

c

c

rust/mpi_stream

rust/mpi_stream

.clang-format

.clang-format

.gitignore

.gitignore

README.md

README.md

Repository files navigation

MPI_STREAM

Motivations

Legacy code

Running

Examples

Contributing

Authors

Contact

Website

License

Acknowledgments

About

Releases

Packages

Languages

cea-hpc/mpi_stream

Folders and files

Latest commit

History

Repository files navigation

MPI_STREAM

Motivations

Legacy code

Running

Examples

Contributing

Authors

Contact

Website

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Languages