Skip to content

New gather scatter infrastructure

mmp edited this page Jul 9, 2012 · 2 revisions

There are a few improvements that could/should be made to ispc's code for handling gathers and scatters.

More efficient handling of structs

The biggest opportunity is in how structs are handled: currently, if we're gathering from/scattering do a struct type, the compiler goes down to the leaves of the struct elements and gathers/scatters each one. So for a scatter the generated code looks something like:

// scatter to struct Foo { uniform float a, b, c };
foreach SIMD lane
    if lane is on
        scatter to lvalue for Foo::a for this lane
foreach SIMD lane
    if lane is on
        scatter to lvalue for Foo::b for this lane
foreach SIMD lane
    if lane is on
        scatter to lvalue for Foo::c for this lane

It would be much better to generate code like:

foreach SIMD lane
    if lane is on
        scatter to lvalue for Foo for this lane

A possible route to doing this would be to implement a more general scatter function in LLVM bitcode in the standard library that had a signature something like:

scatter(uniform void *basePointer, varying int *offsets, uniform int structEltCount,
        uniform int *structEltOffsets, uniform int *structEltSizes, varying void *rvalue)

(Note the mishmash of ispc/C syntax in the above declaration). The basic notion being that the ispc compiler emits a call to this helper scatter function with the base pointer of the array being scattered to, per-program-instance offsets into the element of the array to scatter to, information about the struct layout (number of elements, offsets of each element from the start of the struct, size of each element type), and then the varying rvalue to be scattered out.

The LLVM bitcode implementation of this function would implement the outer loop over the SIMD lanes and then a loop over the struct elements as passed in. Hopefully, once this was inlined and optimized, the loop over struct elements and the indexing into the structEltOffsets/structEltSizes arrays would be turned into straightline code with explicit constants. (This might require some experimentation to present it in a format that LLVM was happy with.)