You can find this project written as a literate program
or if you prefer reading the source code with Doxygen there is also a built of doxygen available:
If you need to cite this algorithm before the proper paper is released please contact me.
In the mean time the code has been used in this publication and can therefore been cited.
Atrip uses autotools to build the system.
Autotools works by first creating a configure
script from
a configure.ac
file.
Atrip should be built out of source, this means that
you have to create a build directory other that the root
directory, for instance in the build/tutorial
directory
mkdir -p build/tutorial/
cd build/tutorial
First you have to build the configure
script by doing
../../bootstrap.sh
Creating configure script
Now you can build by doing
mkdir build cd build ../configure make extern make all
And then you can see the configure
options
../../configure --help
The script tools/configure-benches.sh
can be used to create
a couple of configurations for benches:
- default
- This configuration uses a CPU code with dgemm and without computing slices.
- only-dgemm
- This only runs the computation part that involves dgemms.
- cuda-only-dgemm
- This is the naive CUDA implementation compiling only the dgemm parts of the compute.
- cuda-slices-on-gpu-only-dgemm
- This configuration tests that slices reside completely on the gpu and it should use a CUDA aware MPI implementation. It also only uses the routines that involve dgemm.
In order to generate the benches just create a suitable directory for it
mkdir -p build/benches
cd buid/benches
../../tools/configure-benches.sh CXX=g++ ...
and you will get a Makefile together with several project folders.
You can either configure all projects with make all
or
then go in each folder.
Notice that you can give a path for ctf for all of them by doing
../../tools/configure-benches.sh --with-ctf=/absolute/path/to/ctf
The main benchmark gets built in bench/atrip
and is used to run an
atrip run with random tensors.
A common run of this script will be the following
bench/atrip \
--no 100 \
--nv 1000 \
--mod 1 \
--% 0 \
--dist group \
--nocheckpoint \
--max-iterations 1000