mega-stream and mega-sweep

mega-stream

The mega-stream mini-app initially aimed to test the theory that streaming many arrays (with different sizes) causes memory bandwidth limits not to be reached; resulting in latency becoming a dominant factor. We ran a kernel with a similar form to STREAM Triad, but with more than 3 input arrays. Additionally we run a small reduction, requiring results of the Triad-style computation.

This was then extended to also include a finite difference style update on the medium side arays.

The kernel

The main kernel consists of 8 multi-dimensional arrays with the following properties:

r and q are large
x, y and z are medium
a, b, and c are small

The computational kernel is found inside a triple-nested loop, and can be expressed as

r(i,j,k,l,m) = q(i,j,k,l,m) + a(i)*x(i,j,k,m) + b(i)*y(i,j,l,m) + c(i)*z(i,k,l,m)
x(i,j,k,m) = 0.2*r(i,j,k,l,m) - x(i,j,k,m)
y(i,j,l,m) = 0.2*r(i,j,k,l,m) - y(i,j,l,m)
z(i,k,l,m) = 0.2*r(i,j,k,l,m) - z(i,k,l,m)
sum(j,k,l,m) = SUM(r(:,j,k,l,m))

Building

The benchmark should build with make, and by default uses the Intel compiler. This can be changed by specifying CC, for example make CC=cc. Additional options can be passed to the Makefile as make OPTIONS=.

Notes

The Fortran version does not have any command line argument checking. A baseline and an optimised version are kept in this repository.

mega-sweep

The mega-sweep mini-app aims to take the mega-stream kernel and fit it within a KBA-style sweep iteration structure. The compute kernel is similar to mega-stream, but additional looping along with MPI communication is included.

This mini-app is Fortran only. There are 2- and 3-spatial dimensions versions available. The 3D version decomposes in the YZ dimensions.

Sweep timings are printed out for each sweep direction. Note that these are from MPI rank 0, which means that some sweeps may appear to take longer than others. This is due to the start-up costs associated with the KBA sweep.

Building

The benchmark should build with make, and by default uses the default MPI compiler. A compiler toolchain should be specified with COMPILER, for example make COMPILER=INTEL. Additional options can be passed to the Makefile as make OPTIONS=.

Publications

Tom Deakin, John Pennycook, Andrew Mallinson, Wayne Gaudin and Simon McIntosh-Smith. The MEGA-STREAM Benchmark on Intel Xeon Phi Processors (Knights Landing). The Intel Xeon Phi Users Group Spring Meeting, 2017. Available online
Tom Deakin, Simon McIntosh-Smith and Wayne Gaudin. On the Mitigation of Cache Hostile Memory Access Patterns on Many-core CPU Architectures. The Intel Xeon Phi Users Group Workshop at International Conference on High Performance Computing, 2017. DOI:978-3-319-67630-2_26
Tom Deakin, Simon McIntosh-Smith, Justin Lovegrove, Richard Smedley-Stevenson and Andrew Hagues. Reviewing the Computational Performance of Structured and Unstructured Grid Deterministic SN Transport Sweeps on Many-Core Architectures. Journal of Computational and Theoretical Transport. Volume 49, Issue 3, 2020. DOI:10.1080/23324309.2020.1775096

Name		Name	Last commit message	Last commit date
Latest commit History 245 Commits
baseline		baseline
optimised		optimised
scripts		scripts
sweep		sweep
sweep3d		sweep3d
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
COPYING		COPYING
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mega-stream and mega-sweep

mega-stream

The kernel

Building

Notes

mega-sweep

Building

Publications

About

Releases

Packages

Contributors 2

Languages

License

UK-MAC/mega-stream

Folders and files

Latest commit

History

Repository files navigation

mega-stream and mega-sweep

mega-stream

The kernel

Building

Notes

mega-sweep

Building

Publications

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages