# Part 2 #
## Gettings Started ##
What is ST_?
## Sedona and Geopandas working together. ##
### Read Latitude and Longitude ###
## turning a Spark Dataframe into a Geospatial Dataframe ##
### with SQL ###
### with python ###

#Getting Started

Sedona functions in the API are designated by the prefix "ST_". For example, "ST_Point" is a function that creates a point geometry.We can test our Sedona setup by reading an ST_Point into a dataframe.

#Enable geospatial features#
spark.conf.set("spark.databricks.geo.st.enabled", "true")

In [0]:
spark.conf.set("spark.databricks.geo.st.enabled", "true")

In [0]:
df = spark.sql('SELECT ST_ASTEXT(ST_POINT(0,0)) AS test_point')
df.display()

*By creating or reading a geometry column, you convert a Spark dataframe into a Sedona Spatial Data Frame*

# reading geometry from files #

While new ways of parsing geometry in Spark, I have found it easiest to use Geopandas.

The first step is to upload the geometry to a Volume in the catalog.

Then read the map as a Pandas Geodataframe. 

In [0]:
import geopandas as gpd
geojson_path = '/Volumes/moo_ops_workspace/geospatial/geospatial_files_volume/BusinessImprovementDistrict.geojson'
geopandas_dataframe = gpd.read_file(geojson_path)
geopandas_dataframe.info()

### WKT and WKB ###
WKT stands for Well-Known-Text and WKB stands for Well-Known Binary. Well known text is human-readable but takes up much space. Binary files are not human readable but are more compressed. Both are valid formats for storing geospatial information. However, somee operations will need one type or the other. Carto requires binary files, while spatial analysis often requires text. GeoJSON files saved as WKT. 

The process of turning a well-known text into a well-known binary is called * serialization * . The process of converting a binary to text is called *deserializatin*.

In [0]:
from pyspark.sql.functions import col, expr

In [0]:
#read the GeoPandas Geodataframe into a Sedona Data Frame
#convert the  the geometry as a well-known text geometry first
geopandas_wkt = geopandas_dataframe.assign(geometry=geopandas_dataframe.geometry.apply(lambda geom: geom.wkt))
geopandas_wkt.info()

#convert geodataframe into vanilla Spark dataframe. note the Geometry type is "object" not geometry.
spark_dataframe = spark.createDataFrame(geopandas_wkt)
spark_dataframe.display()

#cast the geometry column to a Sedona geometry type
spatial_dataframe= spark_dataframe.select(col("geometry"), expr("ST_AsText(ST_GeomFromWKT(geometry)) AS geom"))



# Using Sedona as SQL #

In [0]:
#upload dataframe as a temp table
spark_dataframe.createOrReplaceTempView("spark_dataframe_temp_view")

#parse the geometry column as a Sedona geometry type
#this will throw an error. Cannot query with spatial dataframe
spatial_dataframe = spark.sql("SELECT ST_GeomFromWKT(geometry) as geom FROM spark_dataframe_temp_view")
spatial_dataframe.show()

Once the geometry column is converted from a string to geometry, it **cannot** be displayed.


In [0]:
#these will all throw errors. Cannot query with spatial dataframe
spatial_dataframe.write.mode("overwrite").saveAsTable("moo_ops_workspace.default.spatial_dataframe_test2")
spatial_dataframe.display()
spatial_dataframe.printSchema()


In [0]:
#To display, list, or save geography data, convert it back to a well known interchange format, e.g. GeoJSON, WKT, WKB.
#ST_is already a WKT.
spatial_dataframe_wkt = spatial_dataframe.select(expr("ST_AsText(geom) AS wkt"))
spatial_dataframe_wkt = spatial_dataframe.select(expr("geom as wkt"))
spatial_dataframe_wkt.display()
spatial_dataframe_wkt.printSchema()

# Writing a Spatial DataFrame

In [0]:
#Save the Sedona spatial dataframe with the geometry saved to a standard interchange format 
#but it is only writing if it is WKT, not geometry.
spatial_dataframe_wkt.write.mode("overwrite").saveAsTable("moo_ops_workspace.default.spatial_dataframe_test")