# Application benchmarks: single node performance comparison

Comparison the single-node performance of three different application benchmarks across a variety of architectures.

## Import required modules for results analysis

In [19]:
import matplotlib as mpl
from matplotlib import pyplot as plt
%matplotlib inline
mpl.rcParams['figure.figsize'] = (12,6)
import seaborn as sns
sns.set_style("white", {"font.family": "serif"})

In [20]:
import sys
sys.path.append('../python-modules')

## GROMACS: 1400k Atom Benchmark

Details of the 1400k benchmark can be found in this repository at:  https://github.com/hpc-uk/archer-benchmarks/tree/master/apps/GROMACS

Performance is measured in 'ns/day'. This is calculated by the GROMACS software itself and is read directly from the GROMACS output.

In [21]:
from appanalysis import gromacs

In [22]:
systems = ['ARCHER','Peta4-Skylake','Cirrus','Isambard','Tesseract','Wilkes2-GPU','JADE']

### Best performance per platform comparison

This section compares the best performaning configuration on a single node of each platform.

In [23]:
perf = {}
notes = {}
perf['ARCHER'] = gromacs.getperf('../apps/GROMACS/1400k-atoms/results/ARCHER/benchmark_1nodes24tasks2threads_201810051612.log')
notes['ARCHER'] = '(24 tasks, 2 threads)'
perf['Peta4-Skylake'] = gromacs.getperf('../apps/GROMACS/1400k-atoms/results/CSD3-Skylake/benchmark_1nodes32tasks1threads_201810030927.log')
notes['Peta4-Skylake'] = '(32 tasks, 1 thread)'
perf['Cirrus'] = gromacs.getperf('../apps/GROMACS/1400k-atoms/results/Cirrus/benchmark_1nodes36tasks2threads_201810022015.log')
notes['Cirrus'] = '(36 tasks, 2 threads)'
perf['Isambard'] = gromacs.getperf('../apps/GROMACS/1400k-atoms/results/Isambard/benchmark_1nodes128tasks2threads_201808201249.log')
notes['Isambard'] = '(128 tasks, 2 threads)'
perf['Tesseract'] = gromacs.getperf('../apps/GROMACS/1400k-atoms/results/Tesseract/benchmark_1nodes2threads_201810080945.log')
notes['Tesseract'] = '(24 tasks, 2 threads)'
perf['Wilkes2-GPU'] = gromacs.getperf('../apps/GROMACS/1400k-atoms/results/CSD3-GPU/benchmark_1nodes4rankspn4gpus_201808240945.log')
notes['Wilkes2-GPU'] = '(4 MPI tasks, 3 OMP per task, 4 GPU)'
# JADE performance result is taken from the runs by HEC BioSim
perf['JADE'] = 1.647
notes['JADE'] = '(5 core, 1 GPU), http://www.hecbiosim.ac.uk/jade-benchmarks'

In [24]:
formath = "{:>15s} {:>15s} {:>15s}"
formatp = "{:>15s} {:>15.3f} {:>15.3f} {:s}"
print("Performance improvement relative to ARCHER:\n")
print(formath.format('System', 'Perf. (ns/day)', 'Improvement'))
print(formath.format('======', '==============', '==========='))
for system in systems:
    tperf = perf.get(system,0.0)
    print(formatp.format(system, tperf, tperf/perf['ARCHER'], notes.get(system, '')))

Performance improvement relative to ARCHER:

         System  Perf. (ns/day)     Improvement
         ARCHER           1.216           1.000 (24 tasks, 2 threads)
  Peta4-Skylake           2.082           1.712 (32 tasks, 1 thread)
         Cirrus           1.699           1.397 (36 tasks, 2 threads)
       Isambard           1.471           1.210 (128 tasks, 2 threads)
      Tesseract           1.323           1.088 (24 tasks, 2 threads)
    Wilkes2-GPU           2.744           2.257 (4 MPI tasks, 3 OMP per task, 4 GPU)
           JADE           1.647           1.354 (5 core, 1 GPU), http://www.hecbiosim.ac.uk/jade-benchmarks


### Performance comparison matrix

In [25]:
print("{:13s}".format(''),end='')
for jsystem in systems:
    print("{:>14s}".format(jsystem), end='')
print()
for isystem in systems:
    print("{:13s}".format(isystem), end='')
    for jsystem in systems:
        print("{:14.3f}".format(perf[isystem]/perf[jsystem]), end='')
    print()

                     ARCHER Peta4-Skylake        Cirrus      Isambard     Tesseract   Wilkes2-GPU          JADE
ARCHER                1.000         0.584         0.716         0.827         0.919         0.443         0.738
Peta4-Skylake         1.712         1.000         1.225         1.415         1.574         0.759         1.264
Cirrus                1.397         0.816         1.000         1.155         1.284         0.619         1.032
Isambard              1.210         0.707         0.866         1.000         1.112         0.536         0.893
Tesseract             1.088         0.635         0.779         0.899         1.000         0.482         0.803
Wilkes2-GPU           2.257         1.318         1.615         1.865         2.074         1.000         1.666
JADE                  1.354         0.791         0.969         1.120         1.245         0.600         1.000


## OpenSBLI: Taylor-Green Vortex 512^3 benchmark

Details of the Taylor-Green Vortex 512^3 benchmark can be found in this repository at: https://github.com/hpc-uk/archer-benchmarks/tree/master/apps/OpenSBLI

Performance is measured in 'interations/s'. The total runtime and number of iterations are read directly from the OpenSBLI ouptut and these are used to compute the number of iterations per second.

In [26]:
from appanalysis import osbli

In [27]:
osbli_systems = ['ARCHER','Peta4-Skylake','Cirrus','Isambard','Tesseract']

### Best performance per platform comparison

This section compares the best performaning configuration on a single node of each platform.

In [28]:
osbli_perf = {}
osbli_perf['ARCHER'] = 1.0 / osbli.gettiming('../apps/OpenSBLI/TGV512ss/results/ARCHER/output_1nodes_201808020923.txt')
osbli_perf['Peta4-Skylake'] = 1.0 / osbli.gettiming('../apps/OpenSBLI/TGV512ss/results/CSD3-Skylake/output_1nodes_201812131243.txt')
osbli_perf['Cirrus'] = 1.0 / osbli.gettiming('../apps/OpenSBLI/TGV512ss/results/Cirrus/output_1nodes_201812201536.txt')
osbli_perf['Isambard'] = 1.0 / osbli.gettiming('../apps/OpenSBLI/TGV512ss/results/Isambard/output_1nodes_201808020732.txt')
osbli_perf['Tesseract'] = 1.0 / osbli.gettiming('../apps/OpenSBLI/TGV512ss/results/Tesseract/output_1nodes_201812171437.txt')

In [29]:
formath = "{:>15s} {:>15s} {:>15s}"
formatp = "{:>15s} {:>15.3f} {:>15.3f}"
print("Performance improvement relative to ARCHER:\n")
print(formath.format('System', 'Perf. (iter/s)', 'Improvement'))
print(formath.format('======', '==============', '==========='))
aperf = osbli_perf.get('ARCHER',0.0)
for system in osbli_systems:
    tperf = osbli_perf.get(system,0.0)
    print(formatp.format(system, tperf, tperf/aperf))

Performance improvement relative to ARCHER:

         System  Perf. (iter/s)     Improvement
         ARCHER           0.100           1.000
  Peta4-Skylake           0.170           1.700
         Cirrus           0.130           1.302
       Isambard           0.178           1.777
      Tesseract           0.097           0.971


### Performance comparison matrix

In [30]:
print("{:13s}".format(''),end='')
for jsystem in osbli_systems:
    print("{:>14s}".format(jsystem), end='')
print()
for isystem in osbli_systems:
    print("{:13s}".format(isystem), end='')
    iperf = osbli_perf[isystem]
    for jsystem in osbli_systems:
        jperf = osbli_perf[jsystem]
        print("{:14.3f}".format(iperf/jperf), end='')
    print()

                     ARCHER Peta4-Skylake        Cirrus      Isambard     Tesseract
ARCHER                1.000         0.588         0.768         0.563         1.030
Peta4-Skylake         1.700         1.000         1.306         0.957         1.751
Cirrus                1.302         0.766         1.000         0.733         1.341
Isambard              1.777         1.045         1.365         1.000         1.830
Tesseract             0.971         0.571         0.746         0.547         1.000


## CASTEP: Al Slab benchmark

Details of the Al Slab benchmark can be found in this repository at:  https://github.com/hpc-uk/archer-benchmarks/blob/master/apps/CASTEP/

Performance is measured in 'mean SCF cycles per second'. This is calculated from the CASTEP output files by computing the SCF cycle times, removing the minimum and maximum value and then computing the mean of the remaining values.

In [31]:
from appanalysis import castep

In [32]:
castep_systems = ['ARCHER','Peta4-Skylake','Cirrus','Isambard','Tesseract']

### Best performance per platform comparison

This section compares the best performaning configuration on a single node of each platform.

In [33]:
castep_perf = {}
castep_perf['ARCHER'] = 1.0 / castep.getmeancycle('../apps/CASTEP/al3x3/results/ARCHER/al3x3.castep.1nodes')
castep_perf['Peta4-Skylake'] = 1.0 / castep.getmeancycle('../apps/CASTEP/al3x3/results/CSD3-Skylake/al3x3.castep.1nodes')
castep_perf['Cirrus'] = 1.0 / castep.getmeancycle('../apps/CASTEP/al3x3/results/Cirrus/17.21_gcc620_impi17/al3x3.castep.1nodes')
castep_perf['Isambard'] = 1.0 / castep.getmeancycle('../apps/CASTEP/al3x3/results/Isambard/al3x3.castep.1nodes_201806130634')
castep_perf['Tesseract'] = 1.0 / castep.getmeancycle('../apps/CASTEP/al3x3/results/Tesseract/al3x3_1nodes_201808071417.castep')

In [34]:
formath = "{:>15s} {:>15s} {:>15s}"
formatp = "{:>15s} {:>15.5f} {:>15.3f}"
print("Performance improvement relative to ARCHER:\n")
print(formath.format('System', 'Perf. (scf/s)', 'Improvement'))
print(formath.format('======', '==============', '==========='))
aperf = castep_perf.get('ARCHER',0.0)
for system in castep_systems:
    tperf = castep_perf.get(system,0.0)
    print(formatp.format(system, tperf, tperf/aperf))

Performance improvement relative to ARCHER:

         System   Perf. (scf/s)     Improvement
         ARCHER         0.00543           1.000
  Peta4-Skylake         0.01641           3.023
         Cirrus         0.01109           2.043
       Isambard         0.00691           1.273
      Tesseract         0.00728           1.341


### Performance comparison matrix

In [35]:
print("{:13s}".format(''),end='')
for jsystem in castep_systems:
    print("{:>14s}".format(jsystem), end='')
print()
for isystem in castep_systems:
    print("{:13s}".format(isystem), end='')
    iperf = castep_perf[isystem]
    for jsystem in castep_systems:
        jperf = castep_perf[jsystem]
        print("{:14.3f}".format(iperf/jperf), end='')
    print()

                     ARCHER Peta4-Skylake        Cirrus      Isambard     Tesseract
ARCHER                1.000         0.331         0.489         0.786         0.746
Peta4-Skylake         3.023         1.000         1.479         2.375         2.254
Cirrus                2.043         0.676         1.000         1.605         1.524
Isambard              1.273         0.421         0.623         1.000         0.949
Tesseract             1.341         0.444         0.656         1.054         1.000
