Testing out CUDA on NVIDIA cards
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
.gitignore
README.org

README.org

README.org

1 Tests of GP-GPU programming

1.1 Running the tests

for test in tangle/*test.py; do
    echo 
    echo "----------------------------------------------------------------"
    echo $test ---- $(date)
    echo 
    time python $test
done
----------------------------------------------------------------
tangle/3d-fft-test.py ---- Fri Nov 26 23:43:08 CST 2010

Testing fft/ifft..
Success status:  True
Success status:  True

1.2 FFTs using scikits.cuda

The demo program fft_demo.py in the scikits.cuda distribution is for one-d transforms, but the docs of src_python{scikits.cuda.fft()} say that it works for up to three-d, so let’s try that first.

And let’s do it all with literate programming via org-babel!

1.2.1 3D FFT test

"""
Demonstrates how to use PyCUDA interface to CUFFT for 3D arrays.
"""

1.2.1.1 Import modules

import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import numpy as np

import scikits.cuda.fft as cu_fft

1.2.1.2 Set up source array in space domain

print 'Testing fft/ifft..'
nx, ny, nz = 128, 128, 128

x = np.random.rand(nx, ny, nz).astype(np.float32)

1.2.1.3 Transfer to fourier domain and back with standard numpy

xf = np.fft.fft(x)
y = np.real(np.fft.ifft(xf))

1.2.1.4 Do the same but using CUDA

1.2.1.4.1 Forward transform with CUDA
  • Need to pre-allocate the array to hold the transform
x_gpu = gpuarray.to_gpu(x)
xf_gpu = gpuarray.empty((nx/2+1, ny, nz), np.complex64)
plan_forward = cu_fft.Plan(x_gpu.shape, np.float32, np.complex64)
cu_fft.fft(x_gpu, xf_gpu, plan_forward)
1.2.1.4.2 Backward transform with CUDA
y_gpu = gpuarray.empty_like(x_gpu)
plan_inverse = cu_fft.Plan(x_gpu.shape, np.complex64, np.float32)
cu_fft.ifft(xf_gpu, y_gpu, plan_inverse, True)

1.2.1.5 Test that we get the same results from numpy and CUDA

print 'Success status: ', np.allclose(x, x_gpu.get(), atol=1e-6)
print 'Success status: ', np.allclose(y, y_gpu.get(), atol=1e-6)