Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Df.count takes forever #228

AlexTo opened this issue May 2, 2018 · 3 comments


Copy link

commented May 2, 2018

Expected behavior

Df.count should finish quickly

Actual behavior

Df.count takes forever

Steps to reproduce the problem

Load the shape data in the link below. The zip file contains 4 files: LGA_2016_AUST.dbf, LGA_2016_AUST.prj, LGA_2016_AUST.shp, LGA_2016_AUST.shx

The exact code is in the screenshot. In my setup, LGA folder on HDFS is the folder containing 4 files listed above.

Call count() on the data frame => takes forever. I used ogr2ogr tool to load the same shape files into PostGis, it has only 536 records.

LGA shape files


GeoSpark version = 1.1.2

Apache Spark version = 2.3.0

JRE version = 1.8.162

API type = Scala

screen shot 2018-05-02 at 10 24 45 am


This comment has been minimized.

Copy link

commented May 2, 2018

@AlexTo Verified. This is a potential bug. Not sure about the reason. Will fix it in 1.2.0. For now, use any tools to convert this shapefile to WKT TSV file. It can be loaded by GeoSparkSQL perfectly.


This comment has been minimized.

Copy link

commented Aug 1, 2018

@zongsizhang Please handle this.


This comment has been minimized.

Copy link

commented Sep 8, 2018

@jiayuasu I'm working on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
3 participants
You can’t perform that action at this time.