New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with coordinates when reading shapefile #185
Comments
Hi @tociek , GeoSpark shapfile reader is correct. The data in this shapefile is in "epsg:102718" not "epsg:4326" or "epsg:3857". The technical detail is described in "nyzd.shp.xml" within the original NYC taxi zone file. The introduction of epsg:102718 is here: https://epsg.io/102718 The unit of this CRS is feet. However, if you want to use GeoSpark ST_Transform (SQL API) or CRStransform (RDD API) to convert it from EPSG:102718 to EPSG:4326, it will throw an exception as follows:
The reason that leads to this issue has been explained by GeoTools: As suggested by GeoTools, the easiest way is to just forbid all exceptions. But we want to figure out a perfect solution to solve this. This may take time. Above all, to solve your issue, I have two suggestions:
|
Thank you for detailed explanation. Could you tell me where you have found information that this is espg:102718? I assumed that simply units are different from what I would expect, but I couldn't find information in these files regarding what the actual "standard" had been used in these files. Actually NY and London where just examples I use to learn geospark and if I can simply use feet instead of m, then that is not a problem for me. I may be coming back with more questions soon ;) Thanks |
Hi @tociek , As I mentioned in my previous comment, "The technical detail is described in "nyzd.shp.xml" within the original NYC taxi zone file". I just downloaded the data from the webpage you gave me. The description is in the data. |
Thanks. I need one more clarification - if I have another dataset with coordinates in espg:4326, I must convert it to the same standard before doing further operations (like join) on these two datasets, correct? |
Yes, you need to make sure both datasets involved in a join query have the same CRS. |
Hey,
I am new to geospark so forgive me, if the question is stupid... :)
using "org.datasyslab" % "geospark" % "1.0.1 and scalaVersion := "2.11.12"
I am reading NY shape file available here: https://www1.nyc.gov/site/planning/data-maps/open-data.page:
val sparkSession:SparkSession = SparkSession.builder().config("spark.serializer",classOf[KryoSerializer].getName).
config("spark.kryo.registrator", classOf[GeoSparkKryoRegistrator].getName).
master("local[*]").appName("GeoSpark-Analysis").getOrCreate()
Logger.getLogger("org").setLevel(Level.WARN)
Logger.getLogger("akka").setLevel(Level.WARN)
val polygon = ShapefileReader.readToPolygonRDD(sparkSession.sparkContext, "C:\Users\my.user\NY")
print(polygon.rawSpatialRDD.take(1))
[POLYGON ((997277.2344000041 221816.0936000049, 997300.0160000026 221803.44499999285...
I thought that maybe I needed to convert to different coordinate system, by using CRSTransform("epsg:3857","epsg:4326"), but it didn't return good results either. I tested with London data, but the result values were also different from what I expected (based on lat/long coords from google maps).
Any guidance on what I am doing wrong will be highly appreciated 😄
The text was updated successfully, but these errors were encountered: