Skip to content

GeoSpark Core vs GeoSpark SQL Performance in 1.2 #343

@jhamiltonpro

Description

@jhamiltonpro

I have read the release notes but I still have a couple questions regarding the latest version of GeoSpark:

  1. Any performance improvements with Spatial Join (ST_CONTAINS) in either core or sql?
  2. As a general question, should we still always try to use GeoSpark Core and SpatialRDD instead of dataframes and SparkSQL for optimal performance? Especially for spatial joins, there have been issues like this one: Use Spatial Indexes in Geospark SQL? #217 (comment)
    which suggest using GeoSpark Core is preferable. This page http://datasystemslab.github.io/GeoSpark/tutorial/benchmark/ also suggests using core over sql.

I've encountered issues with slow GeoSpark SQL performance due to uneven distribution of work across executors as outlined here: #249 (comment)
and have followed a lot of the steps that you've outlined in previous posts to help with this issue. Unfortunately, the skew still persists and I'm wondering if there's simply a difference between using core and sql and if 1.2 has anything which helps with performance (compared to 1.1.3).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions