The following code generates a distance matrix, overwrites it with a covariance matrix and Cholesky factors that matrix. :
import numpy as np
import geo_gpu as gg
nbx = 40
blocksize=16
nby = 60
d = 'float32'
x = np.arange(blocksize*nbx,dtype=d)
y = np.arange(blocksize*nby,dtype=d)
D = gg.CudaDistance(gg.euclidean, d, blocksize)
C = gg.CudaRawCovariance(gg.exponential, d, blocksize, amp=2., scale=10.)
D_eval = D(x,x,symm=True)
C_eval = C(D_eval, symm=True)
S = gg.cholesky(C_eval, blocksize)
CudaDistance
and CudaRawCovariance
are subclasses of CudaMatrixFiller
. At initialization, they compile corresponding symmetric and nonsymmetric GPU kernels for the given datatype, blocksize and parameter values. All parameter values are compiled in as constants.
When called directly as in the example above, CudaDistance
and CudaRawCovariance
return numpy arrays. However, they each have a lower-lever method called gpu_call
which returns a reference to an array on the GPU. These methods can optionally take GPU arrays as inputs. We can use these methods to pipe distance functions and covariance functions together.
The function cholesky
compiles and caches kernels for each block-size and dtype it sees behind the scenes. It also makes use of a lower-level function, cholesky_gpu
, which will make it faster to evaluate and factorize covariance matrices when the actual evaluation is not needed.
You can make a new distance like so: :
euclidean = {'preamble': "",
'params':(),
'body': """
{{dtype}} d = 0;
for(int i = 0; i < ndx; i++)
{
{{dtype}} dev = x[nxi+i] - y[nyj+i];
d += dev*dev;
}
return sqrt(d);
"""}
or a new covariance function like so: :
exponential = {'preamble': "",
'params':('amp','scale'),
'body': " d[0]=exp(-abs(d[0])/{{scale}})*{{amp}}*{{amp}};"}
Params
is a list of the name of the special parameters exponential
takes, and body
is a code snippet describing how to fill in a single element. The body code may contain the parameters enclosed in double curly braces, and can also use the template parameter {{dtype}}
.
The body code can also contain 'if' statements. The distance function template uses these to produce separate symmetric and nonsymmetric kernels: :
generic = """
#define BLOCKSIZE {{blocksize}}
{{preamble}}
__device__ {{dtype}} compute_element__({{dtype}} *x, {{dtype}} *y, int nxi, int nyj, int ndx)
{
{{body}}
}
__global__ void compute_matrix__({{dtype}} *cuda_matrix, {{dtype}} *x, {{dtype}} *y, int nx, int ndx)
{
{{ if symm }}
if(blockIdx.x >= blockIdx.y){
{{ endif }}
int nxi = blockIdx.x * blockDim.x + threadIdx.x;
int nyj = blockIdx.y * blockDim.y + threadIdx.y;
{{dtype}} d_xi_yj = compute_element__(x,y,nxi,nyj,ndx);
__syncthreads;
cuda_matrix[nyj*nx + nxi] = d_xi_yj;
{{ if symm }}
cuda_matrix[nxi*nx + nyj] = d_xi_yj;
} {{ endif }}
}
"""
and likewise the covariance function template:
generic = """
#define BLOCKSIZE {{blocksize}}
{{preamble}}
__device__ void compute_element__({{dtype}} *d)
{
{{body}}
}
__global__ void compute_matrix__({{dtype}} *cuda_matrix, int nx, int ny)
{
{{ if symm }}
if(blockIdx.x >= blockIdx.y){
{{ endif }}
int nxi = blockIdx.x * blockDim.x + threadIdx.x;
int nyj = blockIdx.y * blockDim.y + threadIdx.y;
compute_element__(cuda_matrix + nyj*nx + nxi);
__syncthreads;
{{if symm }}
cuda_matrix[nxi*nx + nyj] = cuda_matrix[nyj*nx + nxi];
} {{ endif }}
}"""
If symm
is true, the stuff between the if blocks is kept; otherwise it's thrown out.