Gram matrix using VexCL #93

ddemidov · 2014-02-10T17:18:21Z

This is a copy of a stackoverflow question. Keeping it here for future reference.

I have a pretty large data (does not fit into a GPU memory) containing many vectors where each vector is several MBs. I'd like to calculate, using multiple GPU devices, the Gram matrix using a gaussian kernel.

In other words, for every pair of vectors x,y, I need to calculate the norm of x-y. So if I have N vectors, I have (N^2+N)/2 such pairs. I don't care about saving space or time by taking advantage of the symmetry, it can do the whole N^2.

How can I do it with VexCL? I know its the only library supporting multiple GPUs, and I did pretty much doing it effectively with plain OpenCL with no success so far.

Please note that the dataset won't even fit the machine's RAM, I'm reading blocks of vectors from a memory mapped file.

Thanks much!!

ddemidov · 2014-02-10T17:20:44Z

You will obviously need to split your vectors into groups of, say, m, load the groups one by one onto your GPUs and do the computations. Here is a complete program that does the computation (as I understood it) for two currently loaded chunks:

#include <vexcl/vexcl.hpp>

int main() {
    const size_t n = 1024; // Each vector size.
    const size_t m = 4;    // Number of vectors in a chunk.

    vex::Context ctx( vex::Filter::Count(1) );

    // The input vectors...
    vex::vector<double> chunk1(ctx, m * n);
    vex::vector<double> chunk2(ctx, m * n);

    // ... with some data.
    chunk1 = vex::element_index();
    chunk2 = vex::element_index();

    vex::vector<double> gram(ctx, m * m); // The Gram matrix to fill.

    /*
     * chunk1 and chunk2 both have dimensions [m][n].
     * We want to take each of chunk2 m rows, subtract those from each of
     * chunk1 rows, and reduce the result along the dimension n (I assume
     * norm of (x - y) is max(abs(x - y)), but this could be easily generalized
     * for other norms).
     *
     * In order to do this, we create two virtual 3D matrices (x and y below,
     * those are just expressions and are never instantiated) sized [m][m][n],
     * where
     *
     *     x[i][j][k] = chunk1[i][k] for each j, and
     *     y[i][j][k] = chunk2[j][k] for each i.
     *
     * Then what we need to compute is
     *
     *     gram[i][j] = max_k(abs(x[i][j][k] - y[i][j][k]));
     *
     * Here it goes:
     */
    using vex::extents;
    using vex::_;

    auto x = vex::reshape(chunk1, extents[m][m][n], extents[0][2]);
    auto y = vex::reshape(chunk2, extents[m][m][n], extents[1][2]);

    // The single OpenCL kernel is generated and launched here:
    gram = vex::reduce<vex::MAX>(
            // the dimensions of the expression to reduce:
            vex::slicer<3>(extents[m][m][n])[_],
            fabs(x - y),          // The expression to reduce.
            2                     // The dimension to reduce along.
            );

    // Copy the result to host, spread it across your complete gram matrix.
    // I am lazy though, so let's just dump it to std::cout:
    std::cout << gram << std::endl;
}

I suggest you load chunk1 once, then in sequence load all chunk2 variants and do the computations, then load next chunk1, etc. etc. Note that slicing, reshaping, and multidimensional reduction operations are only supported for a context with a single compute device in it. So what is left is how to spread the computations across all of your compute devices. The easiest way to do this is probably to create single VexCL context that would grab all available GPUs, and then create vectors of command queues out of it:

vex::Context ctx( vex::Filter::Any );

std::vector<std::vector<vex::command_queue>> q;
for(size_t d = 0; d < ctx.size(); ++d)
    q.push_back({ctx.queue(d)});

//...
// In a std::thread perhaps:
chunk1(q[d], m * n);
chunk2(q[d], m * n);
// ...

I hope this is enough to get you started.

ddemidov closed this as completed Feb 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gram matrix using VexCL #93

Gram matrix using VexCL #93

ddemidov commented Feb 10, 2014

ddemidov commented Feb 10, 2014

Gram matrix using VexCL #93

Gram matrix using VexCL #93

Comments

ddemidov commented Feb 10, 2014

ddemidov commented Feb 10, 2014