Matthias Korch and Thomas Rauber. Optimizing locality and scalability of embedded Runge-Kutta solvers using block-based pipelining. Journal of Parallel and Distributed Computing, 66(3):444–468, 2006.
DOI: 10.1016/j.jpdc.2005.09.003
Matthias Korch and Thomas Rauber. Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance. International Journal of High Performance Computing Applications, 25(2):236-255, 2011.
DOI: 10.1177/1094342010384418
Matthias Korch and Thomas Rauber. Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance. Bayreuth Reports on Parallel and Distributed Systems, No. 1, University of Bayreuth, July 2010.
URN: urn:nbn:de:bvb:703-opus-7136

Since the publication of those articles, new implementations and performance improvements have been added, as well as additional method tables (including fixed step size) and other features.

The following implementations are part of this collection:

Sequential implementations

src/impl/seq/A.c
- suitable for general ODE systems
- loop structure exploits spatial locality
src/impl/seq/D.c
- suitable for general ODE systems
- loop structure exploits temporal locality of reads (fused right hand side)
src/impl/seq/E.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes
src/impl/seq/F.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes (fused right hand side)
src/impl/seq/Dblock.c
- suitable for general ODE systems
- loop structure exploits temporal locality of reads and spatial locality
src/impl/seq/AEblock.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes and spatial locality
src/impl/seq/Fblock.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes and spatial locality (fused right hand side)
src/impl/seq/PipeD.c
- pipelining scheme based on implementation D
- only suitable for ODE systems with limited access distance
src/impl/seq/PipeDls.c
- low-storage pipelining scheme based on implementation D
- only suitable for ODE systems with limited access distance

Parallel implementations for shared address space

src/impl/pthreads/A.c
- suitable for general ODE systems
- loop structure exploits spatial locality
- shared data structures
- barrier synchronization between the stages
src/impl/pthreads/D.c
- suitable for general ODE systems
- loop structure exploits temporal locality of reads
- shared data structures
- barrier synchronization between the stages
src/impl/pthreads/E.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes
- shared data structures
- barrier synchronization between the stages
src/impl/seq/F.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes (fused right hand side)
- shared data structures
- barrier synchronization between the stages
src/impl/pthreads/Dblock.c
- suitable for general ODE systems
- loop structure exploits temporal locality of reads and spatial locality
- shared data structures
- barrier synchronization between the stages
src/impl/pthreads/AEblock.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes and spatial locality
- shared data structures
- barrier synchronization between the stages
src/impl/seq/Fblock.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes and spatial locality (fused right hand side)
- shared data structures
- barrier synchronization between the stages
src/impl/pthreads/Dbc.c
- loop structure exploits temporal locality of reads
- shared data structures
- block-based synchronization between neighbors using mutex variables
- only suitable for ODE systems with limited access distance
src/impl/pthreads/Dbcblock.c
- loop structure exploits temporal locality of reads and spatial locality
- shared data structures
- block-based synchronization between neighbors using mutex variables
- only suitable for ODE systems with limited access distance
src/impl/pthreads/PipeD.c
- pipelining scheme based on implementation D
- shared data structures
- block-based synchronization between neighbors using mutex variables
- only suitable for ODE systems with limited access distance
src/impl/pthreads/PipeD2.c
- pipelining scheme based on implementation D
- alternative finalization strategy
- shared data structures
- only one barrier operation after the initialization of the pipelines
- only suitable for ODE systems with limited access distance
src/impl/pthreads/PipeD4.c
- pipelining scheme based on implementation D
- alternative computation order
- distributed data structures
- block-based synchronization between neighbors using mutex variables
- only suitable for ODE systems with limited access distance
src/impl/pthreads/PipeD4ls.c
- low-storage pipelining scheme based on implementation D
- alternative computation order
- distributed data structures
- block-based synchronization between neighbors using mutex variables
- only suitable for ODE systems with limited access distance

These implementations require a POSIX thread library to be installed.

Parallel implementations for distributed address space

src/impl/mpi/A.c
- suitable for general ODE systems
- loop structure exploits spatial locality
- multibroadcast operations (MPI_Allgatherv) between the stages
src/impl/mpi/D.c
- suitable for general ODE systems
- loop structure exploits temporal locality of reads
- multibroadcast operations (MPI_Allgatherv) between the stages
src/impl/mpi/E.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes
- multibroadcast operations (MPI_Allgatherv) between the stages
src/impl/seq/F.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes (fused right hand side)
- multibroadcast operations (MPI_Allgatherv) between the stages
src/impl/mpi/Dblock.c
- suitable for general ODE systems
- loop structure exploits temporal locality of reads and spatial locality
- multibroadcast operations (MPI_Allgatherv) between the stages
src/impl/mpi/AEblock.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes and spatial locality
- multibroadcast operations (MPI_Allgatherv) between the stages
src/impl/seq/Fblock.c
- suitable for general ODE systems
- loop structure exploits temporal locality of writes and spatial locality (fused right hand side)
- multibroadcast operations (MPI_Allgatherv) between the stages
src/impl/mpi/Dbc.c
- loop structure exploits temporal locality of reads
- block-based communication between neighbors using single transfer operations (MPI_Isend/MPI_Irecv)
- only suitable for ODE systems with limited access distance
src/impl/mpi/Dbcblock.c
- loop structure exploits temporal locality of reads and spatial locality
- block-based communication between neighbors using single transfer operations (MPI_Isend/MPI_Irecv)
- only suitable for ODE systems with limited access distance
src/impl/mpi/PipeD.c
- pipelining scheme based on implementation D
- block-based communication between neighbors using single transfer operations (MPI_Isend/MPI_Irecv)
- only suitable for ODE systems with limited access distance
src/impl/mpi/PipeD2.c
- pipelining scheme based on implementation D
- alternative finalization strategy
- only one pair of single transfer operations (MPI_Isend/MPI_Irecv), but more data is transferred
- only suitable for ODE systems with limited access distance
src/impl/mpi/PipeD4.c
- pipelining scheme based on implementation D
- alternative computation order
- block-based communication between neighbors using single transfer operations (MPI_Isend/MPI_Irecv)
- only suitable for ODE systems with limited access distance
src/impl/mpi/PipeD5.c
- pipelining scheme based on implementation D
- alternative computation order
- block-based communication between neighbors using single transfer operations (MPI_Isend/MPI_Irecv)
- only suitable for ODE systems with limited access distance
src/impl/mpi/PipeD4ls.c
- low-storage pipelining scheme based on implementation D
- alternative computation order
- block-based communication between neighbors using single transfer operations (MPI_Isend/MPI_Irecv)
- only suitable for ODE systems with limited access distance

These implementations require an MPI library to be installed.

Test problems

Only one test problem is provided, but you can add your own test problems easily.

BRUSS2D-MIX
- 2D Brusselator equation with diffusion (2D PDE with 2 variables)
- semi-discretized on spatial N x N grid
- mixed row-oriented ordering to obtain a limited access distance of d(f)=2N

Configuration of parameters

Some parameters can be changed by editing the file include/config.h. See the comments in that file for details.

You can also overwrite the settings in include/config.h by using the make variable DEFINES, e.g.,

> make DEFINES="PROBLEM=LV METHOD=VERNER65"

But don't forget to make clean first.

Compiling

A Makefile is provided which builds all combinations of test problems and implementations automatically when you invoke make – provided you have gcc and MPI installed. The binaries will be created in the bin/ directory.

If MPI is not installed on your system, you should set HAVE_MPI to no in the Makefile.

When you modify the source codes, in particular, when you add or remove implementations or when you add or remove #include directives, it will be necessary to run make dep to let make know the new dependencies between the source files.

Testing and running

After successful compilation, you can run the sequential implementations directly from the bin/ directory, e.g., you can type:

> ./bin/seq_bruss2d-mix_D

If you have built with HAVE_MPI=yes, some MPI libraries may require the use of mpirun or mpiexec, e.g.:

> mpirun -n 1 ./bin/seq_bruss2d-mix_D

The shared-address space implementations can be started in a similar fashion, but you should specify the number of threads beforehand by setting the environment variable NUM_THREADS:

> NUM_THREADS=4 ./bin/pthreads_bruss2d-mix_Dbc

The distributed-address space implementations usually require the use of mpirun or mpiexec:

> mpirun -n 10 ./bin/mpi_bruss2d-mix_PipeD4ls

There are two make targets to help you test and evaluate the implementations.

> make check

will run all implementations and print a footprint of the final output approximation of each implementation per screen row. The output rows are aligned so that one can visually recognize differences in the results easily.

> make time

will run all implementations and print the kernel time of each per screen row.

Coding style

All C source codes should be formatted with GNU indent using the options in the included .indent.pro file before commit. Similarly, Perl source codes should be formatted with perltidy using the options in the included perltidy.conf file before commit. You can run make indent to format all source codes.

License

GPL version 3 or later. Please see COPYING for details.

Contact

Prof. Dr. Matthias Korch
Department of Computer Science
University of Bayreuth
95440 Bayreuth
Germany

E-Mail: korch@uni-bayreuth.de

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
bin		bin
include		include
src		src
.depend		.depend
.gitignore		.gitignore
.indent.pro		.indent.pro
COPYING		COPYING
ChangeLog		ChangeLog
Makefile		Makefile
README.md		README.md
benchmark		benchmark
log2tab		log2tab
perltidy.conf		perltidy.conf

License

UBT-AI2/rk

Folders and files

Latest commit

History

Repository files navigation

Contents

Sequential implementations

Parallel implementations for shared address space

Parallel implementations for distributed address space

Test problems

Configuration of parameters

Compiling

Testing and running

Coding style

License

Contact

About

Resources

License

Stars

Watchers

Forks

Languages