Skip to content
GPU functions for registration of Gaussian Mixture Models
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

GPU Kernels for GaussTransform

GaussTransform is used as a cost function to determine the distance between two Gaussian Mixture Models.

Example usage

The code currently consists of a single class which can be instantiated to allocate resources on the GPU. The number you pass to the constructor is the largest possible size of the model you wish to compute on (this is simply used for memmory allocations).

The GPUGaussTransform object performs expensive memory allocations and deallocations and as such it should be reused for as many calls to compute() as possible.

The GPU code currently assumes double precision and 2 dimensional coordinates for the models.

    #include "gausstransform.h"

    //instantiate GPUGaussTransform object
    GPUGaussTransform gpu_gt(m);

    //call the cost function
    double cost = gpu_gt.compute(A, B, m, n, scale, grad);

The compute() function will take care of transferring the data to the GPU, calling the GPU functions, as well as copying the results back from the GPU to host memory. It will only copy the actual size of A and B, so there is no real performance drawback from overestimating the largest possible size when creating GPUGaussTransform.

The destructor of GPUGaussTransform guarantees that GPU resources are freed when the computations on the GPU have finished and the GPUGaussTransform instance is deleted.



The GPU code requires a CUDA-capable GPU as well as the CUDA toolkit to be installed. Please see Nvidia's website for installation fo the CUDA toolkit (

CUB Library

The GPU code currently has one dependency, which is the CUB library. You can download it from: The easiest way to install CUB is to add the directory where you unpack CUB to your $CPATH environment variable.

Python 3

The tests for the GPU code and several of the C functions are written in Python, to run these a Python 3 installation is required. The easiest way to get this is using Miniconda.

On Linux systems one could type the following commands to download and install Python 3 using Miniconda:


All the required Python packages can be installed using the following command, before you run it make sure that you have CUDA installed:

pip install -r requirements.txt

The tests can be run using nose, for example by typing the following in the top-level or test directory:

nosetests -v

Building the code

A Makefile is provided that can be used to produce a dynamic library. Simply type make in the top-level directory and the static library will be produced in the bin/ directory.

Note that, the CUDA path inside the Makefile should be changed to match your local configuration.

The dynamic library can be used to link the code to your application, don't forget to also link the CUDA runtime library. This means adding -L/path/to/cuda/lib64/ -lcudart to the compiler command when linking the application.

You can’t perform that action at this time.