-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive use of memory when building spatial index #317
Comments
another idea is would be to use float32 instead of float64 during index construction. You would round down the float64 values for xmin, ymin and round up for xmax, ymax. That could be just for the index construction, while the on disk format would remain float64 (even if ultimately float32 would probably be a better choice if backwards compatibility wasn't a concern) |
Yes, in hindsight I realize 32-bits floats would have been the better choice but that ship has sailed. I suppose it could make sense to have the whole tree sorted and written using some memory mapped construct to avoid the memory requirement entirely. |
I believe for that to be efficient would require to use some carefuly chosen algorithms that avoids purely "random" access during the sort, like https://stxxl.org/ (never used it, just came through from a quick search) |
I understand the requirements to be that we can insert the In my own experience, we're talking about hundreds of MB or a small number of GB to hold a large index, but probably not terabytes for the index. So if we wanted to do some kind of memory mapped thing, I don't think we necessarily have to support anything too exotic, like parallel writes across disks on terabyte data sets like stxxl supports (although maybe that'd be cool too). My first guess would be a btree that allocates reasonably WRT the file system. I haven't found an out of the box library that does this in rust yet, without pulling in an entire database system with it. |
Another alternative that might be simpler than a memory mapped b-tree: Since we only need to iterate after everything is inserted, we could write the |
Noted at OSGeo/gdal#8490.
I belive the same issue is in the ref C++ implementation (same as GDAL), Java, rust and go.
I also think it's possible to improve this by internally separating the list of nodes for the lowest level and the rest of the tree, to avoid copying when passing in the lowest level (feature/leaf nodes) in construction.
The text was updated successfully, but these errors were encountered: