Skip to content

Latest commit

 

History

History
105 lines (84 loc) · 3.95 KB

README.md

File metadata and controls

105 lines (84 loc) · 3.95 KB

Artifact Description: Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX

A.1 Abstract

The A64FX CPU powers the current #1 supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, we construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops. We also identify architectural peculiarities and derive optimization hints. Applying the ECM model to sparse matrix-vector multiplication (SpMV), we motivate why the CRS matrix storage format is inappropriate and how the SELL-C-σ format with suitable code optimizations can achieve bandwidth saturation for SpMV.

A.2 Description

A.2.1 Check-list (artifact meta information)

  • Compilation: gcc
  • Binary: aarch64
  • Hardware: A64FX (FX700)
  • Publicly available?: yes

A.2.2 How software can be obtained (if available)

Check out this repository.

git clone https://github.com/RRZE-HPC/pmbs2020-paper-artifact

A.2.3 Hardware dependencies

We ran on a FX700 system (A64FX architecture) at 1.8 GHz with 48 cores per CPU. The results should be reproducible on any FX700 system.
You can find a detailed state description of the system we used for running the experiments in machine_state.txt.

A.2.4 Software dependencies

  • GCC 10.1.1

You can find a detailed state description of the system we used for running the experiments in machine_state.txt.

A.2.5 Datasets

For SpMV we use matrices from SuiteSparse Matrix Collection (https://suitesparse-collection-website.herokuapp.com).

A.3 Installation

None necessary.

A.4 Experiment workflow

To validate our results use the following commands.

Download script and benchmark codes:

git clone https://github.com/RRZE-HPC/pmbs2020-paper-artifact
cd pmbs20-paper-appendix/

The description for compiling and running can be found in the corresponding README.md files of each benchmark.

A.5 Evaluation and expected result

In-core instructions

Navigate to the in-core_instructions directory:

cd single_core/in-core_instructions/

Run the validation accordingly to the README file and compare the results to Table II in the paper.

Single-core STREAM and 2d5pt

We provide code to reproduce the single-core streaming kernel and 2d five-point benchmarks for shown in this publication.
Navigate to either one of the directories:

cd single_core/[stream|stencil]/

Run the validation accordingly to the README file and compare the results to Table III in the paper.

Multi-core STREAM and 2d5pt

We provide code to reproduce the multi-core streaming kernel and 2d five-point benchmarks shown in this publication.
Navigate to either one of the directories:

cd multi_core/[stream|stencil]/

Run the validation accordingly to the README file and compare the results to Figure 4 in the paper.

Sparse Matrix-Vector Multiplication (SpMV)

To reproduce the results of the SpMV kernels in this work, navigate to the SpMV directory:

cd spmv/

Run the benchmarks according to the README file and compare the results to Figure 5 in the paper.

A.6 Experiment customization

None

A.7 Notes

None