Skip to content

Better in language support for aosoa layout

mmp edited this page Jun 27, 2011 · 5 revisions

The language syntax should be more supportive of accessing data in SoA/AoSoA layout. As an example, consider a structure with the following declaration in C/C++:

struct Point {
    float x, y, z;
};

If code is written to take arrays of these in ispc, then doing an access like:

void func(Point pts[], int npts) {
    for (uniform int i = 0; i < npts; i += programCount) {
        Point p = pts[i + programIndex];
        ...

Will cause the compiler to issue three gathers to initialize 'p', one for each of the x, y, and z components, due to the fact that the array is laid out in memory as xyzxyzxyz...

If instead the struct is defined as (again a C definition):

struct Point4 {
    float x[4], y[4], z[4];
};

Then on a 4-wide target, we can do much better since the data is laid out as xxxxyyyyzzzzxxxx....

void func(Point4 pts[], int npts) {
    for (uniform int i = 0; i < npts; i += programCount) {
        Point p;
        int index = i + programIndex;
        p.x = pts[i >> 2].x[index & 3];
        p.y = pts[i >> 2].y[index & 3];
        p.z = pts[i >> 2].z[index & 3];
        ...

This code should compile down to three vector loads (as long as programCount evenly divides i), but it's syntactically pretty ugly. Furthermore, it bakes the 4-wideness of the AoSoA'd Point4 into the function implementation, which is undesirable. (We'd like people to be able to write functions without having to know the AoSoA-ization factor of the data being passed to them. The following is an outline of a possible approach to address both of these issues.

Table of Contents

Syntax

First, we can add a qualifier that is only legal when declaring arrays of structure types:

soa<4> Point foo[32];

Assuming that Point is declared as in the example above, this declares an array that holds a total of 32 Points, actually laid out as an array of 8 Point4s (as declared above). In other words, the soa qualifier causes each of the leaf elements of the given struct to be widened out by the factor in the brackets. The SOA factor must be a power of two, and it must evenly divide the array length. (Perhaps the latter requirement could be relaxed, but for now the compiler checks for both of these and issues errors if not true.)

(The syntax described here has already been implemented in the compiler, though support for doing anything interesting with the syntax is not there. There is support in the Type implementations in type.cpp for converting to n-times-widened types; see the GetSOAType() method implementations.)

Implementation

The main issue is that for soa-qualified arrays, we need to transform array lookups like pts[i].x to pts[i>>log2(soa_width)].x[i & (soa_width-1)]. (TBD also what needs to be done when grabbing an entire struct, e.g. Point foo = pts[i].) It seems like this might most easily be done as an early rewriting pass over the AST, though this isn't fully clear.

Passing Arrays of Structs to Functions

It's important that we can write separately-compiled functions that take arrays of structs as parameters, without necessarily needing to know the actual SoA factor when compiling the function. (We also don't want to have to compile an exponential number of permutations of functions for different possible SoA factors for parameters that are arrays of structures.)

First, a helpful observation:

Lemma: An array of structures qualified with soa<1> is the same as a regular array of structures without any SoA-iazation of the elements.

Therefore, as long as we can handle general (still powers of two) SoA factors, we can handle regular AoS data (if not full on SoA layouts). We can augment the calling convention for array parameters with a hidden extra parameter that gives the SoA factor for the array (or its log base two). The SoA factor will always be known at the call site of each function, so it's easy for the compiler to pass this along. (Hello, "dope vector".)

The generated code for the function can then use this passed value in rewriting the corresponding indexing calculations. In the case where the function definition is available at compile time, then sometimes it may be inlined and then all of the indexing stuff should compile out to be as optimal as possible. (Otherwise, there may be some additional overhead; this should be measured.)

If the indexing math overhead for the case of not knowing the SoA factor at compile time turns out to be unacceptable, we could also generate specialized versions of the function for e.g. 4, 8, and 16x SoA factors; if all of the arrays of structure parameters at a call side have SoA factors that match one of the specialized functions, we could call that and otherwise call the general one. This obviously needs measurement and further investigation.

Automatically-generated Functions to Reformat to AoSoA

Because the ispc compiler understands struct layouts, it would be easy for it to generate utility functions that take an AoS layout of data with a given struct declaration and then rewrite them to be AoSoA with a given SoA factor (and vice versa). It could either generate the AoSoA result in the original memory of the AoS stuff (if it took some care in how it did the rewrite), or it could do it into newly allocated memory. This would be a nice way to let people lay out the data the easy way (AoS) in the application but then transform it to a format that was friendly to ispc computation, but without a driver model or runtime memory management being imposed.