-
Notifications
You must be signed in to change notification settings - Fork 749
Closed
Description
I have read the release notes but I still have a couple questions regarding the latest version of GeoSpark:
- Any performance improvements with Spatial Join (ST_CONTAINS) in either core or sql?
- As a general question, should we still always try to use GeoSpark Core and SpatialRDD instead of dataframes and SparkSQL for optimal performance? Especially for spatial joins, there have been issues like this one: Use Spatial Indexes in Geospark SQL? #217 (comment)
which suggest using GeoSpark Core is preferable. This page http://datasystemslab.github.io/GeoSpark/tutorial/benchmark/ also suggests using core over sql.
I've encountered issues with slow GeoSpark SQL performance due to uneven distribution of work across executors as outlined here: #249 (comment)
and have followed a lot of the steps that you've outlined in previous posts to help with this issue. Unfortunately, the skew still persists and I'm wondering if there's simply a difference between using core and sql and if 1.2 has anything which helps with performance (compared to 1.1.3).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels