Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Df.count takes forever #228

Closed
AlexTo opened this issue May 2, 2018 · 3 comments

Comments

@AlexTo
Copy link

commented May 2, 2018

Expected behavior

Df.count should finish quickly

Actual behavior

Df.count takes forever

Steps to reproduce the problem

Load the shape data in the link below. The zip file contains 4 files: LGA_2016_AUST.dbf, LGA_2016_AUST.prj, LGA_2016_AUST.shp, LGA_2016_AUST.shx

The exact code is in the screenshot. In my setup, LGA folder on HDFS is the folder containing 4 files listed above.

Call count() on the data frame => takes forever. I used ogr2ogr tool to load the same shape files into PostGis, it has only 536 records.

LGA shape files

Settings

GeoSpark version = 1.1.2

Apache Spark version = 2.3.0

JRE version = 1.8.162

API type = Scala

screen shot 2018-05-02 at 10 24 45 am

@jiayuasu

This comment has been minimized.

Copy link
Member

commented May 2, 2018

@AlexTo Verified. This is a potential bug. Not sure about the reason. Will fix it in 1.2.0. For now, use any tools to convert this shapefile to WKT TSV file. It can be loaded by GeoSparkSQL perfectly.

@jiayuasu

This comment has been minimized.

Copy link
Member

commented Aug 1, 2018

@zongsizhang Please handle this.

@zongsizhang

This comment has been minimized.

Copy link
Collaborator

commented Sep 8, 2018

@jiayuasu I'm working on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.