CUDA and OpenMP implementations of C2R/R2C inplace transposition
Cuda Python C++ Makefile
Switch branches/tags
Nothing to show
Clone or download
bryancatanzaro Update for Maxwell on CUDA 7.0.
Register allocation regression.
Latest commit adbdc6f Feb 10, 2015

README.md

Inplace

CUDA and OpenMP implementations of the C2R and R2C inplace transposition algorithms. These algorithms are described in our PPoPP paper.

We have included a specialization for very tall, skinny matrices that yields good performance for in-place conversions between Arrays of Structures and Structures of Arrays.

The code includes OpenMP and CUDA implementations. The OpenMP implementation is declared in <inplace/openmp.h>, while the CUDA implementation is declared in <inplace/transpose.h>, and carries the following signatures:

namespace inplace {

void transpose(bool row_major, float* data, int m, int n);
void transpose(bool row_major, double* data, int m, int n);

}