Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory footprint of data indexing #10162

merged 2 commits into from Jun 18, 2020


Copy link

@mattpap mattpap commented Jun 13, 2020

Currently data indexing is very inefficient, because it doesn't work in fixed memory, doesn't use typed arrays and creates a lot of intermediate objects. However, the underlying spatial indexing library (flatbush) is very efficient.

This PR does the following:

  • no intermediate objects are created during indexing
  • data is transferred directly from data arrays to the index (typed array to typed array)
  • the index is pre-allocated and doesn't require intermediate reallocations

The size of the index is defined as the length of the shortest column in a data source (all columns should be of the same length in general, but this still isn't enforced).

This is based on PR #10161.

@mattpap mattpap force-pushed the mattpap/spatial_index branch from eec9eb0 to 1837dab Compare Jun 16, 2020
@mattpap mattpap force-pushed the mattpap/spatial_index branch from 1837dab to 892ab68 Compare Jun 18, 2020
@mattpap mattpap merged commit 7a03120 into master Jun 18, 2020
@mattpap mattpap deleted the mattpap/spatial_index branch Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

Successfully merging this pull request may close these issues.

None yet

2 participants