Visit our project Homepage for documentations, user-guides, examples and more!
FELTOR (Full-F ELectromagnetic code in TORoidal geometry) is a modular scientific software package used for:
-
Physics: study fluid models for magnetised (fusion) plasmas in one, two and three dimensions
-
Numerics: develop and study numerical methods for these models in particular novel discontinuous Galerkin methods and structured grid generators
-
High performance computing: investigate parallel peformance, binary reproducibility and accuracy of the above algorithms on modern hardware architectures.
FELTOR applications are platform independent and run on a large variety of hardware from laptop CPUs to GPUs to high performance compute clusters.
This guide discusses how to setup, build, test and benchmark FELTOR on a given system. Please read it before you proceed to the user guide to learn how to use the library in your own programs.
The first step is to clone and configure the FELTOR repository from GitHub.
git clone https://www.github.com/feltor-dev/feltor
cd feltor
cmake --preset cpu # or gpu, omp, mpi-cpu, mpi-gpu, mpi-omp
You may need to install the external dependencies libnetcdf-dev
, liblapack-dev
and libboost-dev
(and libglfw3-dev
for OpenGL output and libopenmpi-dev
for MPI support) from your system package manager.
On Windows you can use the built-in git client and cmake integration of Visual Studio. It is easiest to use the built-in vcpkg manager to install the
netcdf-c
,glfw3
,lapack
andboost-headers
dependencies.
There are 6 presets targeting 6 different hardware architectures. All require the C++-17 standard. Each preset X
creates its binary directory build/X
.
CMake Preset | Requirements | Description |
---|---|---|
cpu |
gcc >= 9 or msvc >= 19 or icc >= 19.0 or clang >= 19 |
Single core CPU, no parallelization; support for AVX and FMA instruction set is recommended |
omp |
OpenMP-4 support |
Multi-core CPU parallelisation, set OMP_NUM_THREADS environment variable to set number of OpenMP threads to run. Not currently available on Windows with MSVC nor for clang! |
gpu |
nvcc >= 11.0 |
Parallel computation on a single NVidia GPU |
mpi-cpu |
MPI version 3 |
Distributed memory system for pure MPI parallelisation |
mpi-omp |
MPI version 3 |
Hybrid MPI + OpenMP parallelisation. In this configuration you may want to investigate how OpenMP threads map to CPU cores. Not currently available on Windows with MSVC nor for clang! |
mpi-gpu |
MPI version 3, ideally CUDA aware |
Hybrid MPI + GPU parallelisation. Each MPI thread targets one GPU. |
Our GPU backend uses the Nvidia-CUDA programming environment and in order to compile and run a program for a GPU a user needs the nvcc compiler and a NVidia GPU. However, we explicitly note here that due to the modular design of our software a user does not have to possess a GPU nor the nvcc compiler. The CPU version of the backend is equally valid and provides the same functionality. Analogously, an MPI installation is only required if the user targets a distributed memory system.
The Feltor projects defines a host of CMake targets that can be built after configuration. To build everything run
cmake --build build/cpu -j 4
# Replace "cpu" with the preset of your choice; here and in the following.
# -j 4 activates parallel compilation with 4 threads.
The Feltor CMake targets are organised into three categories: tests, benchmarks and production projects. These can be targeted individually using
cmake --build build/cpu --target dg_tests -j 4
# Compile all tests of the dg library
cmake --build build/cpu --target dg_benchmarks -j 4
# Compile all benchmarks of the dg library
cmake --build build/cpu --target feltor_projects -j 4
# Compile all production projects of feltor
The tests will be built in the build/X/tests
the benchmarks
in build/X/benchmarks
and the projects in build/X/src
, where X
is the preset in use.
The tests can be run using
ctest --test-dir build/cpu
Lastly, one can also target individual programs. All programs in the dg library start with the prefix dg_
followed by the component name backend
, topology
, geometries
, file
or matrix
followed by the program name without suffix. The feltor project targets follow the naming scheme project_target
where project
is the name of the folder in the src
directory and target
is the program name without suffix. The output name in the binary directory follows the original folder structure and program name. For example:
cmake --build build/cpu --target dg_blas_b
./build/cpu/benchmarks/blas_b
# Compile and run the benchmark program feltor/inc/dg/blas_b.cpp
cmake --build build/cpu --target dg_topology_derivatives_t
./build/cpu/tests/topology/derivatives_t
# Compile and run the test program feltor/inc/dg/topology/derivatives_t.cpp
cmake --build build/cpu --target feltor_feltor
./build/cpu/src/feltor/feltor
# Compile and run the 3d feltor code in feltor/src/feltor/feltor.cpp
Again, remember to replace cpu
with the preset of your choice and mind the various options when running parallel programs, e.g.
cmake --preset gpu
cmake --build build/gpu --target dg_blas_b
./build/gpu/benchmarks/blas_b
# Compile and run the benchmark program feltor/inc/dg/blas_b.cpp for GPU
cmake --preset omp
cmake --build build/omp --target dg_blas_b
export OMP_NUM_THREADS=4
./build/gpu/benchmarks/blas_b
# Compile and run the benchmark program feltor/inc/dg/blas_b.cpp for OpenMP
cmake --preset mpi-cpu
cmake --build build/mpi-cpu --target feltor_feltor
mpirun -n 4 ./build/mpi-cpu/src/feltor/feltor
# Compile and run the 3d feltor code in feltor/src/feltor/feltor.cpp for pure MPI using 4 MPI threads
FELTOR contains a library called the dg-library (from discontinuous Galerkin). To integrate FELTOR’s dg library in your own project via cmake currently the only option is to add it as a submodule i.e. either (i) use FetchContent directly or (ii) use the cmake package manager CPM (our recommendation) or (iii) add feltor as a git submodule and use add_subdirectory
in your CMakeLists.txt
. We here show the CPM version. To get started follow the CPM quick start guide to setup the file cmake/CPM.cmake
. It is also highly recommended to set the CPM_SOURCE_CACHE
environment variable.
CMake’s install rules and
find_package
currently does not work well with targets that can be compiled for various languages (see this issue)
The available library targets in cmake are of the format feltor::dg::component
, where component
is one of the following:
feltor::dg::component
component | Corresponding Header | Description |
---|---|---|
|
|
Depends on cccl and vectorclass (loaded via |
|
|
Depends on |
|
|
Depends on |
|
|
Depends on |
|
|
Depends on either |
|
|
Depends on |
As noted before you may need to install the external dependencies
libnetcdf-dev
,liblapack-dev
andlibboost-dev
from your system package manager (or use e.g. the vcpkg manager to installnetcdf-c
,lapack
andboost-headers
). Note that you can set the optionsFELTOR_DG_WITH_MATRIX OFF
andFELTOR_FILE_WITH_NETCDF OFF
to avoid having to install netcdf, lapack or boost.
Furthermore, since feltor’s dg library depends on cccl, we inherit their option CCCL_THRUST_DEVICE_SYSTEM
, which can be either CPP
, OMP
or CUDA
. Since with CUDA a new language must be enabled (which can only be done once in a cmake project) we must add this to the cmake file:
cmake_minimum_required(VERSION 3.26)
project( myProject
VERSION 1.0.0
LANGUAGES CXX
)
# We need to enable CUDA language if the user wants it
if(CCCL_THRUST_DEVICE_SYSTEM STREQUAL "CUDA" OR CCCL_THRUST_DEVICE_SYSTEM STREQUAL "")
enable_language(CUDA)
set_source_files_properties(main.cpp PROPERTIES LANGUAGE CUDA)
endif()
include(cmake/CPM)
CPMAddPackage(
NAME feltor
GITHUB_REPOSITORY "feltor-dev/feltor"
VERSION 8.2
SYSTEM ON
EXCLUDE_FROM_ALL ON
OPTIONS "FELTOR_DG_WITH_MATRIX OFF" "FELTOR_FILE_WITH_NETCDF OFF"
)
add_executable(main main.cpp)
# The base dg library header "dg/algorithm.h"
target_link_libraries( main PRIVATE feltor::dg::dg)
Note that the dg library is header-only, which means that you just have to include the relevant header(s) and you’re good to go. For example in the following program we compute the square L2 norm of a function:
#include <iostream>
//include the basic dg-library
#include "dg/algorithm.h"
double function(double x, double y){return exp(x)*exp(y);}
int main()
{
//create a 2d discretization of [0,2]x[0,2] with 3 polynomial coefficients
dg::CartesianGrid2d g2d( 0, 2, 0, 2, 3, 20, 20);
//discretize a function on this grid
const dg::DVec x = dg::evaluate( function, g2d);
//create the volume element
const dg::DVec vol2d = dg::create::volume( g2d);
//compute the square L2 norm on the device
double norm = dg::blas2::dot( x, vol2d, x);
// norm is now: (exp(4)-exp(0))^2/4
std::cout << norm <<std::endl;
return 0;
}
To compile and run this code for a GPU use
cmake -Bbuild/gpu -DCCCL_THRUST_DEVICE_SYTEM="CUDA" -DCMAKE_CUDA_ARCHITECTURES="native" -DCMAKE_CUDA_FLAGS="-march=native -O3"
cmake --build build/gpu
./build/gpu/main
Or if you want to use OpenMP and gcc instead of CUDA for the device functions you can also use
cmake -Bbuild/omp -DCCCL_THRUST_DEVICE_SYTEM="OMP" -DCMAKE_CXX_FLAGS="-march=native -O3"
cmake --build build/omp
export OMP_NUM_THREADS=4
./build/omp/main
If you do not want any parallelization, you can use a single thread version
cmake -Bbuild/omp -DCCCL_THRUST_DEVICE_SYTEM="CPP" -DCMAKE_CXX_FLAGS="-march=native -O3"
cmake --build build/cpu
./build/cpu/main
If you want to use mpi, just include the MPI header before any other FELTOR header and use our convenient typedefs like so:
#include <iostream>
#ifdef WITH_MPI
//activate MPI in FELTOR
#include "mpi.h"
#endif
#include "dg/algorithm.h"
double function(double x, double y){return exp(x)*exp(y);}
int main(int argc, char* argv[])
{
#ifdef WITH_MPI
//init MPI and create a 2d Cartesian Communicator assuming 4 MPI threads
MPI_Init( &argc, &argv);
int periods[2] = {true, true}, np[2] = {2,2};
MPI_Comm comm;
MPI_Cart_create( MPI_COMM_WORLD, 2, np, periods, true, &comm);
#endif
//create a 2d discretization of [0,2]x[0,2] with 3 polynomial coefficients
dg::CartesianMPIGrid2d g2d( 0, 2, 0, 2, 3, 20, 20
#ifdef WITH_MPI
, comm
#endif
);
//discretize a function on this grid
const dg::x::DVec x = dg::evaluate( function, g2d);
//create the volume element
const dg::x::DVec vol2d = dg::create::volume( g2d);
//compute the square L2 norm
double norm = dg::blas2::dot( x, vol2d, x);
//on every thread norm is now: (exp(4)-exp(0))^2/4
#ifdef WITH_MPI
//be a good MPI citizen and clean up
MPI_Finalize();
#endif
return 0;
}
The CMake file needs to be modified like
option(MAIN_WITH_MPI "Compile main with MPI parallelisation" OFF)
if(MAIN_WITH_MPI)
target_link_libraries(main PRIVATE MPI::MPI_CXX)
target_compile_definitions(main PRIVATE WITH_MPI)
endif()
Compile e.g. for a hybrid MPI + OpenMP hardware platform with
cmake -Bbuild/mpi-omp -DCCCL_THRUST_DEVICE_SYTEM="OMP" -DCMAKE_CXX_FLAGS="-march=native -O3" -DMAIN_WITH_MPI=ON
cmake --build build/mpi-omp
export OMP_NUM_THREADS=2
mpirun -n 4 ./build/mpi-omp/main
This will run 4 MPI threads with 2 OpenMP threads each.
Note the striking similarity to the previous program. Especially the line calling the dot function did not change at all. The compiler chooses the correct implementation for you! This is a first example of platform independent code.
Open a terminal and clone the repository into any folder you like
git clone https://www.github.com/feltor-dev/feltor
You also need to clone cccl distributed under the Apache-2.0 license. Also, we need Agner Fog’s vcl library (Apache 2.0). So again in a folder of your choice
git clone https://www.github.com/nvidia/cccl
git clone https://www.github.com/vectorclass/version2 vcl
Our code only depends on external libraries that are themselves openly available. If version2 of the vectorclass library does not work for you, you can also try version1.
In order to compile one of the many test and benchmark codes
inside the FELTOR library you need to tell
the FELTOR configuration where the external libraries are located on
your computer. The default way to do this is to go into your HOME
directory, make an include directory and link the paths in this
directory
cd ~
mkdir include
cd include
ln -s path/to/cccl/thrust/thrust # Yes, thrust is there twice!
ln -s path/to/cccl/cub/cub
ln -s path/to/cccl/libcudacxx/include/cuda
ln -s path/to/cccl/libcudacxx/include/nv
ln -s path/to/vcl
If you do not like this, you can also set the include paths in your own config file as described here.
Now let us compile the first benchmark program.
cd path/to/feltor/inc/dg
make blas_b device=cpu #(for a single thread CPU version)
#or
make blas_b device=omp #(for an OpenMP version)
#or
make blas_b device=gpu #(if you have a GPU and nvcc )
Run the code with
./blas_b
and when prompted for input vector sizes type for example 3 100 100 10
which makes a grid with 3 polynomial coefficients, 100 cells in x, 100
cells in y and 10 in z. If you compiled for OpenMP, you can set the
number of threads with e.g. export OMP_NUM_THREADS=4
.
This is a benchmark program to benchmark various elemental functions the library is built on. Go ahead and vary the input parameters and see how your hardware performs. You can compile and run any other program that ends in
_t.cu
(test programs) or_b.cu
(benchmark programs) infeltor/inc/dg
in this way.
Now, let us test the mpi setup
You can of course skip this if you don’t have mpi installed on your computer. If you intend to use the MPI backend, an implementation library of the mpi standard is required. Per default
mpic++
is used for compilation.
cd path/to/feltor/inc/dg
make blas_mpib device=cpu # (for MPI+CPU)
# or
make blas_mpib device=omp # (for MPI+OpenMP)
# or
make blas_mpib device=gpu # (for MPI+GPU, requires CUDA-aware MPI installation)
Run the code with $ mpirun -n '# of procs' ./blas_mpib
then tell how
many process you want to use in the x-, y- and z- direction, for
example: 2 2 1
(i.e. 2 procs in x, 2 procs in y and 1 in z; total
number of procs is 4) when prompted for input vector sizes type for
example 3 100 100 10
(number of cells divided by number of procs must
be an integer number). If you compiled for MPI+OpenMP, you can set the
number of OpenMP threads with e.g. export OMP_NUM_THREADS=2
.
Now, we want to compile and run a simulation program. To this end, we have to download and install some additional libraries for I/O-operations.
First, we need to install jsoncpp (distributed under the MIT License),
which on linux is available as libjsoncpp-dev
through the package managment system.
For a manual build check the instructions on JsonCpp.
# You may have to manually link the include path
cd ~/include
ln -s /usr/include/jsoncpp/json
For data output we use the
NetCDF-C library under an
MIT - like license (we use the netcdf-4 file format).
The underlying HDF5
library also uses a very permissive license.
Both can be installed easily on Linux through the libnetcdf-dev
and libhdf5-dev
packages.
For a manual build follow the build instructions in the netcdf-documentation.
Note that by default we use the serial netcdf and hdf5 libraries alson in the mpi
versions of applications.
Some desktop applications in FELTOR use the
draw library (developed by us
also under MIT), which depends on
glfw3, an OpenGL development library under a
BSD-like license. There is a libglfw3-dev
package for convenient installation. Again, link path/to/draw
in the include
folder.
If you are on a HPC cluster, you may need to set INCLUDE and LIB variables manually. For details on how FELTOR’s Makefiles are configured please see the config file. There are also examples of some existing Makefiles in the same folder.
We are now ready to compile and run a simulation program
cd path/to/feltor/src/toefl # or any other project in the src folder
make toefl device=gpu # (compile for gpu, cpu or omp)
cp input/default.json inputfile.json # create an inputfile
./toefl inputfile.json # (behold a live simulation with glfw output on screen)
# or
make toefl_hpc device=gpu # (compile for gpu, cpu or omp)
cp input/default_hpc.json inputfile_hpc.json # create an inputfile
./toefl_hpc inputfile_hpc.json outputfile.nc # (a single node simulation with output stored in a file)
# or
make toefl_mpi device=omp # (compile for gpu, cpu or omp)
export OMP_NUM_THREADS=2 # (set OpenMP thread number to 1 for pure MPI)
echo 2 2 | mpirun -n 4 ./toefl_mpi inputfile_hpc.json outputfile.nc
# (a multi node simulation with now in total 8 threads with output stored in a file)
# The mpi program will wait for you to type the number of processes in x and y direction before
# running. That is why the echo is there.
Default input files are located in path/to/feltor/src/toefl/input
. All
three programs solve the same equations. The technical documentation on
what equations are discretized, input/output parameters, etc. can be
generated as a pdf with make doc
in the path/to/feltor/src/toefl
directory.
The
documentation
of the dg library was generated with
Doxygen. You can generate a local
version directly from source code. This depends on the doxygen
,
libjs-mathjax
, graphviz
and doxygen-awesome
packages. Type make doc
in
the folder path/to/feltor/doc
and open index.html
(a symbolic link
to dg/html/modules.html
) with your favorite browser.
Finally, also note the documentations of thrust.
We maintain tex files in every src folder for
technical documentation, which can be compiled using pdflatex with
make doc
in the respective src folder.
FELTOR has been developed by Matthias Wiesenberger and Markus Held. Please see the Acknowledgements section on our homepage for a full list of contributors and funding. Contribution guidelines can be found in the CONTRIBUTING file.
This project is licensed under the MIT license - see LICENSE for details.