Create binned_array class #1

jackm97 · 2023-04-01T22:09:27Z

After implementing a 2D/3D point cloud partitioning algorithm, I noticed the need for some sort of binned array type class that benefits from the automatic memory management of an array.

Imagine a set of bins, each with different sizes, that contain indexes to an array of values that are associated with that bin. Performance-wise, it is useful to store all these indices in a single array and then have each bin point to its portion of the index array in some manner. The order of operations for this goes something like:

First, determine the bin size
Use a scan algorithm to determine where each bin points to in the index array
Finally, write to the index array

It is step 3 that poses a problem with our automatic load and data management algorithms. Do we split the work/data according to the bins or the indices? In my implementation, I decided to split the work according to the bins, setting an uneven split of data across devices for the index array using the set_primary_devices method. This meant that, splitting the load according to the shape of the bin array, all writes to the index array were local to the device as desired. Further, as there was no way to tell the kernel launcher that the index array was a result array, I had to manually call unset_read_mostly and set_read_mostly manually.

As someone who is very intimate with the inner workings of Scalix, this wasn't incredibly difficult. But to a basic user, they might struggle with knowing how to setup the problem to get good scaling performance.

For this reason, I would like to implement a binned_array class that takes a type, the raw data to be binned, and an index generator that maps from a data index to a bin index. It will handle all the nitty gritty details of actually setting up the binned_array in a distributed fashion. From there, it will provide read-only access to the binned data.

We could also then provide some utilities like:

reorganize the raw data so that its order matches the bin order, possibly also setting its device split info to match the bins
get the device_split_info, allowing result data, mapped from binned data, to match the memory split of the bins/binned data, minimizing page faults and data migrations for reads from the binned data

The text was updated successfully, but these errors were encountered:

jackm97 added the enhancement New feature or request label Apr 1, 2023

jackm97 changed the title ~~Create an binned_array class~~ Create binned_array class Apr 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create binned_array class #1

Create binned_array class #1

jackm97 commented Apr 1, 2023 •

edited

Loading

Create binned_array class #1

Create binned_array class #1

Comments

jackm97 commented Apr 1, 2023 • edited Loading

jackm97 commented Apr 1, 2023 •

edited

Loading