Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive use of memory when building spatial index #317

Open
bjornharrtell opened this issue Oct 1, 2023 · 5 comments
Open

Excessive use of memory when building spatial index #317

bjornharrtell opened this issue Oct 1, 2023 · 5 comments

Comments

@bjornharrtell
Copy link
Member

Noted at OSGeo/gdal#8490.

I belive the same issue is in the ref C++ implementation (same as GDAL), Java, rust and go.

I also think it's possible to improve this by internally separating the list of nodes for the lowest level and the rest of the tree, to avoid copying when passing in the lowest level (feature/leaf nodes) in construction.

@rouault
Copy link
Contributor

rouault commented Oct 1, 2023

another idea is would be to use float32 instead of float64 during index construction. You would round down the float64 values for xmin, ymin and round up for xmax, ymax. That could be just for the index construction, while the on disk format would remain float64 (even if ultimately float32 would probably be a better choice if backwards compatibility wasn't a concern)

@bjornharrtell
Copy link
Member Author

Yes, in hindsight I realize 32-bits floats would have been the better choice but that ship has sailed.

I suppose it could make sense to have the whole tree sorted and written using some memory mapped construct to avoid the memory requirement entirely.

@rouault
Copy link
Contributor

rouault commented Oct 1, 2023

I suppose it could make sense to have the whole tree sorted and written using some memory mapped construct to avoid the memory requirement entirely.

I believe for that to be efficient would require to use some carefuly chosen algorithms that avoids purely "random" access during the sort, like https://stxxl.org/ (never used it, just came through from a quick search)

@michaelkirk
Copy link
Collaborator

I understand the requirements to be that we can insert the (hilbert_number, feature_buffer_temp_location) in whatever order and then iterate over them in order of hilbert_number so we can write the features in their hilbert ordering.

In my own experience, we're talking about hundreds of MB or a small number of GB to hold a large index, but probably not terabytes for the index.

So if we wanted to do some kind of memory mapped thing, I don't think we necessarily have to support anything too exotic, like parallel writes across disks on terabyte data sets like stxxl supports (although maybe that'd be cool too).

My first guess would be a btree that allocates reasonably WRT the file system. I haven't found an out of the box library that does this in rust yet, without pulling in an entire database system with it.

@michaelkirk
Copy link
Collaborator

Another alternative that might be simpler than a memory mapped b-tree: Since we only need to iterate after everything is inserted, we could write the (hilbert_number, feature_buffer_temp_location) to a file and then use some external sort algorithm (presumably with chunks that are some multiple of page size).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants