A Cluster Computing System for Processing Large-Scale Spatial Data
Java Scala

README.md

GeoSpark Logo

Stable Latest Source code
Maven Central with version prefix filter Sonatype Nexus (Snapshots) Build Status

GeoSpark@Twitter || GeoSpark Discussion Board || Join the chat at https://gitter.im/geospark-datasys/Lobby || HitCount(since Jan. 2018)

GeoSpark is listed as Infrastructure Project on Apache Spark Official Third Party Project Page

GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.

GeoSpark contains three modules:

Name API Spark compatibility Dependency
GeoSpark-core RDD Spark 2.X/1.X Spark-core
GeoSpark-SQL SQL/DataFrame SparkSQL 2.1 and later Spark-core, Spark-SQL, GeoSpark-core
GeoSpark-Viz RDD Spark 2.X/1.X Spark-core, GeoSpark-core
  • Core: GeoSpark SpatialRDDs and Query Operators.
  • SQL: SQL interfaces for GeoSpark core.
  • Viz: Visualization extension of GeoSpark core.

Please visit GeoSpark website for details and documentations.

News!

  • GeoSpark 1.1.3 is released. This release contains a critical bug fix for GeoSpark-core RDD API. Release notes || Maven Coordinate.
  • GeoSpark 1.1.2 is released. This release contains several bug fixes. Thanks for the patch from Lucas C.! Release notes || Maven Coordinate.
  • GeoSpark 1.1.0 is released. This release contains new SQL functions, custom Quad-Tree/R-Tree index serializers and bug fixes. GeoSpark 1.1.0 supposrt Apache Spark 2.3. Note, GeoSparkSQL Maven Coordinate changed Release notes || Maven Coordinate (Thanks for the index serializer patch contributed by Zongsi Zhang!)
  • GeoSpark wiki is now moved to GeoSpark new website! Users are welcome to contribute your tutorials and stories by making a PR!