Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create binned_array class #1

Open
jackm97 opened this issue Apr 1, 2023 · 0 comments
Open

Create binned_array class #1

jackm97 opened this issue Apr 1, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@jackm97
Copy link
Member

jackm97 commented Apr 1, 2023

After implementing a 2D/3D point cloud partitioning algorithm, I noticed the need for some sort of binned array type class that benefits from the automatic memory management of an array.

Imagine a set of bins, each with different sizes, that contain indexes to an array of values that are associated with that bin. Performance-wise, it is useful to store all these indices in a single array and then have each bin point to its portion of the index array in some manner. The order of operations for this goes something like:

  1. First, determine the bin size
  2. Use a scan algorithm to determine where each bin points to in the index array
  3. Finally, write to the index array

It is step 3 that poses a problem with our automatic load and data management algorithms. Do we split the work/data according to the bins or the indices? In my implementation, I decided to split the work according to the bins, setting an uneven split of data across devices for the index array using the set_primary_devices method. This meant that, splitting the load according to the shape of the bin array, all writes to the index array were local to the device as desired. Further, as there was no way to tell the kernel launcher that the index array was a result array, I had to manually call unset_read_mostly and set_read_mostly manually.

As someone who is very intimate with the inner workings of Scalix, this wasn't incredibly difficult. But to a basic user, they might struggle with knowing how to setup the problem to get good scaling performance.

For this reason, I would like to implement a binned_array class that takes a type, the raw data to be binned, and an index generator that maps from a data index to a bin index. It will handle all the nitty gritty details of actually setting up the binned_array in a distributed fashion. From there, it will provide read-only access to the binned data.

We could also then provide some utilities like:

  • reorganize the raw data so that its order matches the bin order, possibly also setting its device split info to match the bins
  • get the device_split_info, allowing result data, mapped from binned data, to match the memory split of the bins/binned data, minimizing page faults and data migrations for reads from the binned data
@jackm97 jackm97 added the enhancement New feature or request label Apr 1, 2023
@jackm97 jackm97 changed the title Create an binned_array class Create binned_array class Apr 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant