Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Issues while loading GeoJson file obtained from ESRI ArcMap tool #224
We are using ESRI ArcMap to convert our feature classes to GeoJson data. Our geojson data looks like this
So As you can see, when we tried to load this geojson we get the error as
So we removed the firstline and since we had 2 records, those two records had to be on two different lines (else it showed the count as 1 only). Then we were able to load the geojson file.
Since we would be working with a huge datasets (shape file wont be feasible as our Geodatabases can reach 22.5 gb and shp files have a limitation of 2gb) can you help us out? I am not sure by removing the first line and the corresponding braces and brackets through a python script will solve this issue either.
Can you add the first line (till the features array) in the next fix or is there any other workaround to this problem?
Thanks a lot
GeoSpark version = 1.1.2
Apache Spark version = 2.1
JRE version = 1.8?
API type = Scala
@SrinivasRIL We will try to fix this in the next major release which is 1.2.0. But it won't come out soon because it will contain many new functions and API changes. To solve this issue for now, I have several suggestions:
I know this is very annoying however the design of GeoJSON really complicates the data parsing. To fix this trivial issue perfectly, we may have to write a big chunk of code to customize a Spark input reader like GeoSpark shapefile reader
We also tried to create a spatial rdd using filedatasplitter as 'geojson' and carryallattributes as 'true' and convert the resulting rdd into a dataframe but we are getting this error
error: overloaded method value toDf with alternatives:
@SrinivasRIL If you use RDD/SQL API to load GeoJSON, the other attributes are stored in geometry's UserData attribute. Use "myDf.map()..." "myRDD.map.." to manipulate.
The GeoJSON support in GeoSpark is limited. I will fix this issue in 1.2.0 (will be out late May or early June). For now, WKT and WKB are preferred.
val geoJsonFile = "file:///data/test.geojson" val pointRDDSplitter = FileDataSplitter.GEOJSON val carryAttributes = true val rdd = new PolygonRDD(spark.sparkContext, geoJsonFile, pointRDDSplitter, carryAttributes) rdd.rawSpatialRDD.take(1).asScala.foreach(println) val rddWithOtherAttributes = rdd.rawSpatialRDD.rdd.map[String](f => f.getUserData.asInstanceOf[String]) rddWithOtherAttributes.take(1).foreach(println) var df = spark.read .format("csv") .option("delimiter", "\t") .option("header", "false") .load(geoJsonFile) df.show(1, false) df.createOrReplaceTempView("tabblock") val converted = spark.sql(""" | SELECT ST_GeomFromGeoJSON(tabblock._c0) AS shape, | FROM tabblock """.stripMargin) converted.show(1, false) converted.createOrReplaceTempView("poly_coords")
SQL Show() (no _c* columns)
SQL Show() after running
Do we have to map over the entire