Skip to content

Reading GeoParquet with ST_Within() or ST_Contains() predicate produces incorrect results #49

@paleolimbot

Description

@paleolimbot

I think that the bounding box containment for ST_Within() and ST_Contains() might be flipped?

import sedonadb

sd = sedonadb.connect()

url = "https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/microsoft-buildings_point_geo.parquet"
sd.read_parquet(url).to_view("buildings")

filter = "POLYGON ((-73.4341 44.0087, -73.4341 43.7981, -73.2531 43.7981, -73.2531 43.8889, -73.1531 43.8889, -73.1531 44.0087, -73.4341 44.0087))"

sd.sql(f"""
SELECT * FROM buildings
  WHERE ST_Intersects(ST_SetSRID(ST_GeomFromText('{filter}'), 4326), geometry)
""").count()
#> 6710 (takes ~30s)

sd.sql(f"""
SELECT * FROM buildings
  WHERE ST_Contains(ST_SetSRID(ST_GeomFromText('{filter}'), 4326), geometry)
""").count()
#> 0 (takes ~2s)

sd.sql(f"""
SELECT * FROM buildings
  WHERE ST_Within(ST_SetSRID(ST_GeomFromText('{filter}'), 4326), geometry)
""").count()
#> 0 (takes ~30s)

Also I think we may not be respecting the value of the pruning configuration option:

sd.sql("SET datafusion.execution.parquet.pruning = false").show()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions