Feature: Add Spatial Index support #7

Maxxen · 2023-02-27T20:18:12Z

Ideally we implement an R*-TREE, but we should research what the current best bulk-loading algorithm is, since we basically only do bulk-loads by indexing vectors at a time.

A first step might be to just use the ART index on a bounding box expression, or use some clever space-filling curve encoding like a hilbert curve or geohash. We could also look into alternative encodings like "quad keys" or "s2/h3 cells". Once we have some spatial predicate functions implemented we should implement optimizer rules to look for these sorts of indexes and compare how useful they are performance-wise. Its probably not gonna beat an R-Tree but will probably be easier to implement to start with.

joshhopkins · 2023-05-08T09:15:16Z

Would love to see S2/H3 cell support.

igor-suhorukov · 2023-05-20T01:40:20Z

My publication and OSM converter could be useful for H3 OpenStreetMap data indexing and as real world dataset for DuckDB Spatial extension.

colinjamesstewart · 2023-10-31T15:50:06Z

@Maxxen Excited to see this on the roadmap -- having spatial indices would make my organization seriously consider using DuckDB extensively. Is there a (very) approximate timeline for adding this feature? Much appreciated, thanks.

Maxxen · 2023-10-31T16:20:07Z

Hi @colinjamesstewart! Unfortunately not, we might maybe be able to start prototyping non-peristent in-memory only indexes before the end of the year. But would you mind telling me a little bit about the workflows you do that makes you think the lack of spatial indexes would be a blocker right now? We recently landed a pretty big optimization for spatial predicate joins, and we can do the same for within-range/knearest joins too.

colinjamesstewart · 2023-11-02T19:20:02Z

Thanks, @Maxxen. We have a number of spatial databases with millions or even billions of records across Canada and the US, and we often need to do spatial queries (e.g. ST_Contains, ST_Intersection, etc.) on specific regions, such as cities and metro areas. I'll look into how well DuckDB handles queries on these databases, but in my experience, spatial indexing is essential when working with this amount of data.

Maxxen · 2023-11-02T19:58:53Z

@colinjamesstewart For sure, I understand that once you reach billions of records, being able to prune out intersection candidates at the storage level becomes more or less mandatory, but at city-region scale we are pretty efficient at brute-forcing it. There's still a lot to figure out when it comes to custom indexes in DuckDB in general but we progressing towards that over at the core project and hopefully we will be able to utilize that work in spatial soon.

colinjamesstewart · 2023-11-03T16:28:10Z

@Maxxen I ran a quick spatial-query performance test (using ST_Intersects) to compare DuckDB with Sedona, which we'd previously been using. DuckDB was often 30x faster. Performance starts to degrade when going from a table with millions of records to one with billions, but the performance on tables with <50M records is pretty impressive given that there's no spatial indexing.

sedot42 · 2024-02-07T14:11:37Z

Just another use case here: We would like to do 2D points-in-polygon-filtering on a point cloud (xyz data) with 4B points. Performance decreases roughly linearly with table size. With 700k points a query takes about 0.07s, with 7M points about 0.7s, with 70M points about 7s. With 4B points, analysis is a bit cumbersome. If the data fits in memory, we can do much faster queries with GeoPandas and its spatial index (R-tree afaik).

raphaellaude · 2024-02-27T14:16:14Z

Does duckdb-spatial utilize flatgeobuf spatial indices if they're available? Since they're optionally part of the file itself supporting spatial indexing for flatgeobuf files might be a nice intermediate step on the way to full spatial index support.

Maxxen · 2024-02-27T14:27:58Z

Yes, since our flatgeobuf support is based on gdal we do support pruning when reading flatgeobuf files, there is an optional "filter_bbox" argument you can pass to st_read to use it.

cboettig · 2024-03-10T19:05:08Z

Do you think the h3 extension, https://github.com/isaacbrodsky/h3-duckdb, might become an official submodule of this module at some point? I think it could be very valuable, but unless I'm miss-understanding, presently users need to build that module with duckdb from source?

Maxxen · 2024-03-10T19:52:13Z

Yes and no. We might add some utility functions in spatial to make it easier to integrate with H3 andI've briefly exchanged a few words with Isaac about DuckDB labs eventually adopting and distributing the H3 extension as a separate extension. But we dont really have a set policy on adopting third party extensions taking into account e.g. security, maintenance and support.

However, we are actively working on enabling support for some form of custom extension repositories that you can opt-into so that you can do e.g. INSTALL <extension> FROM <some repository>, with the goal of us being able to set up an official "community" extension repository for commonly used third-party extensions, where do some basic vetting, set up nightly/weekly builds and host binaries. The H3 extension would be an ideal candidate to include in this repository of community extensions once we get something up.

Maxxen added the roadmap label Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add Spatial Index support #7

Feature: Add Spatial Index support #7

Maxxen commented Feb 27, 2023 •

edited

Loading

joshhopkins commented May 8, 2023

igor-suhorukov commented May 20, 2023

colinjamesstewart commented Oct 31, 2023

Maxxen commented Oct 31, 2023

colinjamesstewart commented Nov 2, 2023

Maxxen commented Nov 2, 2023

colinjamesstewart commented Nov 3, 2023

sedot42 commented Feb 7, 2024

raphaellaude commented Feb 27, 2024

Maxxen commented Feb 27, 2024

cboettig commented Mar 10, 2024

Maxxen commented Mar 10, 2024

Feature: Add Spatial Index support #7

Feature: Add Spatial Index support #7

Comments

Maxxen commented Feb 27, 2023 • edited Loading

joshhopkins commented May 8, 2023

igor-suhorukov commented May 20, 2023

colinjamesstewart commented Oct 31, 2023

Maxxen commented Oct 31, 2023

colinjamesstewart commented Nov 2, 2023

Maxxen commented Nov 2, 2023

colinjamesstewart commented Nov 3, 2023

sedot42 commented Feb 7, 2024

raphaellaude commented Feb 27, 2024

Maxxen commented Feb 27, 2024

cboettig commented Mar 10, 2024

Maxxen commented Mar 10, 2024

Maxxen commented Feb 27, 2023 •

edited

Loading