Skip to content

UK-MAC/CloverLeaf_Serial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CloverLeaf_Serial

This is the serial version of CloverLeaf version 1.3. With all OpenMP and MPI removed.

Release Notes

Version 1.3

CloverLeaf 1.3 contains a number of optimisations over previous releases. These include a number of loop fusion optimisations and the use of scalar variables over work arrays. Overall this improves cache efficiency.

This version also contains some support for explicit tiling. This is activated through the two input deck parameters:

  • tiles_per_chunk To specify how many tiles per MPI ranks there are.
  • tiles_per_problem To specify how many global tiles there are, this is rounded down to be an even number per MPI rank.

For compilation we now use the native Fortran and C compilers, not the MPI wrappers as before. However specific compilation can be obtained with their use:

make COMPILER=GNU MPI_COMPILER=gfortran C_MPI_COMPILER=gcc

Performance

Expected performance is give below.

If you do not see this performance, or you see variability, then is it recommended that you check MPI task placement and OpenMP thread affinities, because it is essential these are pinned and placed optimally to obtain best performance.

Note that performance can depend on compiler (brand and release), memory speed, system settings (e.g. turbo, huge pages), system load etc.

Performance Table

Test Problem Time Time Time
Hardware E5-2670 0 @ 2.60GHz Core E5-2670 0 @ 2.60GHz Node E5-2698 v3 @ 2.30GHz Node
Options make COMPILER=INTEL make COMPILER=INTEL make COMPILER=CRAY
Options mpirun -np 1 mpirun -np 16 aprun -n4 -N4 -d8
2 20.0 2.5 0.9
3 960.0 100.0
4 460.0 40.0 23.44
5 13000.0 1700.0

Weak Scaling - Test 4

Node Count Time
1 40.0
2
4
8
16

Strong Scaling - Test 5

Node Count Time Speed Up
1 1700 1.0
2
4
8
16