IMPORTANT: IF YOU CLONE THIS REPOSITORY, USE
--recursive, OR INITIALISE GIT SUBMODULES MANUALLY AFTERWARDS.
Experimental infrastructure for the PPOPP'19 paper Incremental Flattening for Nested Data Parallelism
We provide a Docker image with the necessary programs and libraries for Futhark to build and run. This currently only works on Linux machines with Nvidia GPUs. Install:
docker run -it --runtime=nvidia --storage-opt size=20G futhark/ppopp19
You may need to use
sudo or similar for this. The
--storage-opt size=20G part is needed because the default Docker disk quota of
10GiB tends to be too small.
You should now have a shell open inside a directory with the contents of this repository. Next step: Usage.
(Alternatively you can build the Docker image yourself by using the Dockerfile included in this repository.)
Manual installation of dependencies
In case you don't use the Docker image:
This infrastructure depends on some fairly common tools and libraries being installed on the local system. Specifically, we require:
The OpenTuner Python libraries must be installed, which can be done with
pip install --user opentuner.
OpenTuner depends on SQLite, which must also be installed. SQLite can be found in the package system of virtually any Linux distribution.
Generating the graphs requires Numpy and Matplotlib version 2 for Python, as well as a working LaTeX setup.
bc is needed for some of the data generation scripts. This should be preinstalled (or easily installable) on just about any Unix system.
The locale must be UTF-8-enabled. On a Unix system, this can typically be accomplished by setting an environment variable
As a guideline, the
Dockerfile contains commands showing how to
install the necessary components on a Debian/Ubuntu machine.
make and everything will happen. Note that this may
well take hours. All external resources are automatically be
wget if necessary. Results will be
located in machine-readable form in the
auto-tuned parameters in
tunings/, and graphs will be produced in
PDF format in the root directory.
config.mk file contains commented configuration parameters that
may need customisation based on the machine being used. In
particular, you may need to increase the time allotted to auto-tuning
in order to reach the results from the paper, depending on the speed
of the machine (and how lucky you are).
The main requirement is a working OpenCL installation; specifically one that can compile without passing many weird flags or options to the compiler. We expect that on Linux, an OpenCL program can be compiled with
gcc foo.c -lOpenCL
and on macOS with
gcc foo.c -framework OpenCL
A quick way to determine whether the system is sound is to run
On some systems, depending on the OpenCL vendor, it may be necessary
to set some combination of the environment variables
LD_LIBRARY_PATH for this to work.
You will need a relatively beefy GPU, as in particular LocVolCalib is memory-hungry when being auto-tuned. 4GiB should be enough.
While Futhark-generated code can handle multiple available OpenCL
platforms, most of the third party benchmark implementations look only
at the first platform. These may fail if the device name indicated in
config.mk is not part of the first platform. On most systems, this
will not be an issue, but it may be worth looking at the contents of
/etc/OpenCL/vendors, or using the clinfo tool, which is available
in many package systems.
While Futhark code can compile and run correctly on macOS, it is our experience that many Rodinia benchmarks cannot. Furthermore, most macOS systems do not have GPUs with enough memory available to run the larger benchmarks.
python executable on the default PATH must be Python 2, because
OpenTuner does not support Python 3. For generating the graphs, you
will need Python with Matplotlib 2, Numpy, as well as a standard
working LaTeX setup.
For building the Futhark compiler you will need stack, a build
tool for Haskell. You can possibly also install a binary release of
Futhark, and modify
config.mk to use that instead. However, a
future (or past) release Futhark may not match the one this benchmark
suite was designed for.
You will also need OpenTuner installed. This is usually accomplished simply by running
pip install --user opentuner
The Makefile will automatically detect whether the matrix
multiplication experiment should use cuBLAS as the reference, or a
portable OpenCL implementation. This is done by checking whether
nvcc in scope. If this heuristic goes wrong, or cuBLAS is for some
reason not available, you can modify
config.mk to set
USE_CUBLAS=0. On machines without an NVIDIA GPU, the absence of
nvcc means that the OpenCL implementation gets picked.
This repository does not contain the Futhark compiler source code. Rather, it uses a Git submodule that pins a specific commit of the Futhark compiler. A similar submodule is used for the FinPar benchmark suite. Rodinia does not have public source control, and therefore the Makefile simply downloads a tarball from a known location.
In case not the entire suite is able to execute on a given system, there are some useful sub-targets that can be used to run just parts. In particular, this can be used to obtain results on GPUs that do not have enough memory to run the larger benchmarks.
make matmul-runtimes-small.pdf: run the matrix multiplication benchmark and plot the results.
make LocVolCalib-runtimes.pdf: run the LocVolCalib benchmark and plot the results.
make bulk-impact-speedup.pdf: run both Futhark and reference versions of the various Rodinia and FinPar benchmarks and plot the results. This is perhaps the one most likely to fail, as it involves a significant amount of third party code, not all of which was designed with benchmarking and portability in mind.
Results without plots
This can be useful if you are on a system that does not have an
appropriate version of Matplotlib installed. The runtimes will be
printed on the screen, and the raw data also available in JSON format
results directory. Runtimes are not computed; only the raw
make matmul-runtimes-small: run the matrix multiplication benchmark.
make LocVolCalib-runtimes: run the LocVolCalib benchmark.
make bulk-impact-speedup: run both Futhark and reference versions of the various Rodinia and Parboil benchmarks.
Very fine-grained targets
make results/benchmark-moderate.json: produce a JSON file with runtime results for benchmark compiled with moderate flattening, where benchmark must be one of
make results/benchmark-incremental.json: as above, but with incremental flattening.
make results/benchmark-incremental-tuned.json: as above, but with incremental flattening and with auto-tuning.
make results/benchmark-rodinia.json: as above, but use the Rodinia implementation (benchmark may not be
make results/benchmark-finpar.json: as above, but use the FinPar implementation (benchmark must be
make veryclean: like
make clean, but removes even the parts that have been