GeoSpark Core vs GeoSpark SQL Performance in 1.2

I have read the release notes but I still have a couple questions regarding the latest version of GeoSpark:

1. Any performance improvements with Spatial Join (ST_CONTAINS) in either core or sql?
2. As a general question, should we still always try to use GeoSpark Core and SpatialRDD instead of dataframes and SparkSQL for optimal performance? Especially for spatial joins, there have been issues like this one: https://github.com/DataSystemsLab/GeoSpark/issues/217#issuecomment-454318741
which suggest using GeoSpark Core is preferable. This page http://datasystemslab.github.io/GeoSpark/tutorial/benchmark/ also suggests using core over sql. 

I've encountered issues with slow GeoSpark SQL performance due to uneven distribution of work across executors as outlined here: https://github.com/DataSystemsLab/GeoSpark/issues/249#issuecomment-473443663
and have followed a lot of the steps that you've outlined in previous posts to help with this issue. Unfortunately, the skew still persists and I'm wondering if there's simply a difference between using core and sql and if 1.2 has anything which helps with performance (compared to 1.1.3). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeoSpark Core vs GeoSpark SQL Performance in 1.2 #343

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GeoSpark Core vs GeoSpark SQL Performance in 1.2 #343

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions