API Design

Lindsey Kuper edited this page Jan 31, 2015 · 3 revisions

Three-Pillar Approach

The design of River Trail is based on three pillars:

  • a type called ParallelArray that holds data values
  • several operations on ParallelArrays that implement parallel constructs, like map
  • the concept of an elemental function which is passed to the operations and typically returns a single data element

We have chosen a set of operations on ParallelArrays that we feel is minimal and upon which other data-parallel constructs can be built. For example, sum could be implemented using reduce, while prefix-sum could be implemented using scan. We anticipate useful libraries and infra-structure being built upon our constructs. This approach enables a "do few things well" implementation strategy while ensuring the composability needed to build other constructs.

ParallelArray Data Structure

We add ParallelArray, a new data type, to JavaScript. It is a read-only data structure holding the actual data that is created by a call to a constructor or is returned from a call to one of the ParallelArray prototype methods. Input to the constructor is typically a JavaScript Array, a TypedArray, or a function that is used to generate the ParallelArray's values. For example, new ParallelArray([1,2,3]) would create a ParallelArray holding the values 1, 2, and 3.

We also support comprehension, so the same ParallelArray could be created using a function that generates the values. An example is new ParallelArray(3, function() { return this+1; }). The first argument is the length of the freshly minted array, and this inside the elemental function is the value of the index in the result that receives the returned value. All the data values are eagerly realized, so one can't use this mechanism to create infinitely large arrays. The result would be a ParallelArray holding the values 1, 2, and 3.

Elemental Functions

Similar in spirit to the use of kernel functions in OpenCL, our approach makes use of elemental functions which are written in JavaScript. Any JavaScript function with an appropriate signature can be used as an elemental function. It is the responsibility of the River Trail system to determine if the elemental function can be optimized to take advantage of the data-parallel hardware. If optimization is not possible, the function is simply executed sequentially.

It is the responsibility of the software development environment to assist the developer in determining whether the elemental function can be optimized. There are of course a few hints that will serve the programmer well. Assignment of either a global or free variable will defeat optimization. This is because such assignments are inherently non-deterministic, and we want deterministic execution.

Recursion is another example of a typical JavaScript construct that is likely to defeat optimization. Of course, as the system matures, it is possible that tail-recursive kernels can be converted into loops and allowed. Pretty much any ParallelArray that does not hold a TypedArray or that cannot be turned into a TypedArray is unlikely to be optimized. So far we have not seen meaningful use cases where this is a significant restriction.

It is important to note that if something cannot be optimized does not mean that it won't run in all browsers, just that it will run more slowly. We could have chosen to not run the code at all and throw an exception, but we decided that the culture of JavaScript dictated that we make a best-effort attempt at running all code. Of course the development environment would provide feedback to the programmer as to why a kernel could not be optimized.

JavaScript provides the object-oriented concept of a this variable. Within the elemental functions we are careful to define this in appropriate ways. For example, for map, this is the element of the ParallelArray used to calculate the corresponding element in the resulting array. For other elemental functions, this would be the entire ParallelArray.

Prototype Methods

ParallelArray comes with the following five data-parallel methods: map, combine, scan, filter, and scatter. Each of these methods, when combined with elemental functions, creates and returns a new ParallelArray. ParallelArray's sixth core data-parallel method is reduce, which can return a scalar value. With these six methods, one can create a very robust and complete data-parallel library. ParallelArray also provides non-data-parallel methods, such as length, that are similar to their sister methods from the Array class which we will not touch upon here.

Finally we provide a get method that takes an index into a ParallelArray and returns the value located at that index. We provide multi-dimensional arrays and related methods to query things like the number of dimensions. We do not provide a set method or other destructive array methods such as push, pop, shift or unshift. The syntax in our prototype does not include the binary operator [ ], to avoid confusion when it is used to set a location in a ParallelArray.

Finally, we expect developers to use JavaScript prototype features to extend the core River Trail methods in appropriate ways. A complete library could define a reasonable sum method using reduce, for instance.