Skip to content

bravegag/eigen-magma-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Eigen Magma benchmark

This project provides a simple benchmarking facility for Eigen. It was developed mainly for benchmarking the Eigen MAGMA backend implementation. This project also serves as sample CMake project to use Eigen in combination with MAGMA and MKL.

Requirements

You need to first install Intel MKL, Eigen, CUDA and MAGMA. It has been tested with Intel Parallel Studio 2013, Eigen 3.2.0, CUDA 5.5 and MAGMA 1.4.0.

Modus Operandis

  • Create a Release build with the following command (using Intel compiler):
    rm -rf build; mkdir build; cd build; cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DCMAKE_Fortran_COMPILER=ifort ../src
    
  • Edit the CMakeLists.txt file and enable Eigen only, MKL or MAGMA by commenting or uncommenting the following definitions:
    add_definitions(-DEIGEN_USE_MKL_ALL)
    add_definitions(-DEIGEN_USE_MAGMA_ALL)
    
  • Build the project executing make to have faster compilation with more threads e.g. 5 use make -j5
  • Execute the benchmark ./benchmark or use ./benchmark --help for help.

Benchmark environment

The nVidia Titan GTX card out-of-the-box defaults to a Double-Precision (DP) performance that is only 1/24th of the Single-Precision (SP) performance. The nVidia Titan GTX is capable of reaching a DP performance of up to 1/3 of the SP performance. However, this has to be configured by changing the nVidia default driver settings using the “nvidia-settings” tool installed as part of the nVidia drivers. The CUDA-Double precision box must be checked as shown in the figure below.

Results

The following plots where obtained by executing the currently ported Eigen MAGMA backends:

The benchmarks above were obtained using export MKL_NUM_THREADS=1 and export OMP_NUM_THREADS=1 increasing the MKL_NUM_THREADS may improve the results for both the MKL and the MAGMA versions. Furthermore, unlike the benchmarks shown in MAGMA testing implementations these benchmark results above account for the memory transfer times between Host and Device. This is the reason why the dgemv and dtrsm do not seem to perform better than the CPU versions.

About

Benchmark corresponding to the eigen-magma project implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published