I think that the bounding box containment for ST_Within() and ST_Contains() might be flipped?
import sedonadb
sd = sedonadb.connect()
url = "https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/microsoft-buildings_point_geo.parquet"
sd.read_parquet(url).to_view("buildings")
filter = "POLYGON ((-73.4341 44.0087, -73.4341 43.7981, -73.2531 43.7981, -73.2531 43.8889, -73.1531 43.8889, -73.1531 44.0087, -73.4341 44.0087))"
sd.sql(f"""
SELECT * FROM buildings
WHERE ST_Intersects(ST_SetSRID(ST_GeomFromText('{filter}'), 4326), geometry)
""").count()
#> 6710 (takes ~30s)
sd.sql(f"""
SELECT * FROM buildings
WHERE ST_Contains(ST_SetSRID(ST_GeomFromText('{filter}'), 4326), geometry)
""").count()
#> 0 (takes ~2s)
sd.sql(f"""
SELECT * FROM buildings
WHERE ST_Within(ST_SetSRID(ST_GeomFromText('{filter}'), 4326), geometry)
""").count()
#> 0 (takes ~30s)
Also I think we may not be respecting the value of the pruning configuration option:
sd.sql("SET datafusion.execution.parquet.pruning = false").show()