Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with coordinates when reading shapefile #185

Closed
tociek opened this issue Feb 2, 2018 · 5 comments
Closed

Issue with coordinates when reading shapefile #185

tociek opened this issue Feb 2, 2018 · 5 comments

Comments

@tociek
Copy link
Contributor

tociek commented Feb 2, 2018

Hey,

I am new to geospark so forgive me, if the question is stupid... :)
using "org.datasyslab" % "geospark" % "1.0.1 and scalaVersion := "2.11.12"

I am reading NY shape file available here: https://www1.nyc.gov/site/planning/data-maps/open-data.page:

val sparkSession:SparkSession = SparkSession.builder().config("spark.serializer",classOf[KryoSerializer].getName).
config("spark.kryo.registrator", classOf[GeoSparkKryoRegistrator].getName).
master("local[*]").appName("GeoSpark-Analysis").getOrCreate()

Logger.getLogger("org").setLevel(Level.WARN)
Logger.getLogger("akka").setLevel(Level.WARN)

val polygon = ShapefileReader.readToPolygonRDD(sparkSession.sparkContext, "C:\Users\my.user\NY")
print(polygon.rawSpatialRDD.take(1))

[POLYGON ((997277.2344000041 221816.0936000049, 997300.0160000026 221803.44499999285...

I thought that maybe I needed to convert to different coordinate system, by using CRSTransform("epsg:3857","epsg:4326"), but it didn't return good results either. I tested with London data, but the result values were also different from what I expected (based on lat/long coords from google maps).

Any guidance on what I am doing wrong will be highly appreciated 😄

@jiayuasu
Copy link
Member

jiayuasu commented Feb 2, 2018

Hi @tociek ,

GeoSpark shapfile reader is correct. The data in this shapefile is in "epsg:102718" not "epsg:4326" or "epsg:3857". The technical detail is described in "nyzd.shp.xml" within the original NYC taxi zone file. The introduction of epsg:102718 is here: https://epsg.io/102718

The unit of this CRS is feet.

However, if you want to use GeoSpark ST_Transform (SQL API) or CRStransform (RDD API) to convert it from EPSG:102718 to EPSG:4326, it will throw an exception as follows:

Caused by: org.opengis.referencing.operation.OperationNotFoundException: Bursa wolf parameters required.

The reason that leads to this issue has been explained by GeoTools:
http://docs.geotools.org/stable/userguide/faq.html#q-bursa-wolf-parameters-required

As suggested by GeoTools, the easiest way is to just forbid all exceptions. But we want to figure out a perfect solution to solve this. This may take time.

Above all, to solve your issue, I have two suggestions:

  1. Keep it like this way. EPSG:102718 is a feet-based CRS. This will not impact the correctness of any GeoSpark operation. They will just just calculate the distance based on feet.

  2. If you don't mind to use the old taxi zone data, use this Taxizone shapefile from NYU Geo website:
    https://geo.nyu.edu/catalog/nyu_2451_36743
    This shapefile is the 2016 NYC taxi zone and in EPSG:4326 (WGS84).

@tociek
Copy link
Contributor Author

tociek commented Feb 3, 2018

Thank you for detailed explanation. Could you tell me where you have found information that this is espg:102718? I assumed that simply units are different from what I would expect, but I couldn't find information in these files regarding what the actual "standard" had been used in these files.

Actually NY and London where just examples I use to learn geospark and if I can simply use feet instead of m, then that is not a problem for me.

I may be coming back with more questions soon ;)

Thanks

@jiayuasu
Copy link
Member

jiayuasu commented Feb 6, 2018

Hi @tociek ,

As I mentioned in my previous comment, "The technical detail is described in "nyzd.shp.xml" within the original NYC taxi zone file".

I just downloaded the data from the webpage you gave me. The description is in the data.

@tociek
Copy link
Contributor Author

tociek commented Feb 7, 2018

Thanks.

I need one more clarification - if I have another dataset with coordinates in espg:4326, I must convert it to the same standard before doing further operations (like join) on these two datasets, correct?

@jiayuasu
Copy link
Member

jiayuasu commented Feb 7, 2018

@tociek

Yes, you need to make sure both datasets involved in a join query have the same CRS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants