## Creating Spatial Dataframes (1)
ENV 859 - Fall 2022  
© John Fay, Duke University

### What is a spatial dataframe
A **spatial dataframe** (aka a **geodataframe**) is much like a typical Pandas dataframe except that it accomodates a new datatype: the **geometry** data type. The geometry, as you might guess can contain geometric features: points, lines, and polygons, each of which is defined by a single or series of coordinate pairs. These spatial dataframes are also assigned a **coordinate reference system (crs)**, which links these coordinates to specific places on the Earth and allows us to do geospatial analysis. In other words, these spatial dataframes are quite the same as our familiar GIS feature classes!

### Libraries for working with spatial dataframes
To work with spatial dataframes, we need one of two Python libraries, each of which has its own version of the spatial dataframe and its own set of functions and classes. First is **geopandas**, which has been around for a while, and then there's the newcomer, the **ArcGIS API for Python** (which really needs a better name). In this notebook we explore the former, **geopandas**, focusing in on how we create the GeoDataframes from existing data in various formats. 

The source formats we examine include:
1. A delimited text file (e.g. CSV) containing coordinate columns and a know coordinate reference system
2. An existing feature class in the form of a shapefile or within a geodatabase
3. Other formats: GeoJSON files, KML, and [kind of] GeoDatabases.

## Lesson 1. Creating spatial dataframes from CSV files using GeoPandas
In this example, we examine how to create a point spatial dataframe from a CSV file containing latitude and longitude coordinates. The data we'll use in this exercise is electric vehicle charging locations in North Carolina ([source](https://afdc.energy.gov/data_download)).

The process of importing a CSV file into a GeoPandas geodataframe consists of first importing the data into a Pandas dataframe and then creating a **GeoSeries** - or column of geometry objects - from the coordinate columns. Then we construct the geodataframe using the GeoPandas `GeoDataFrame()` function supplying the original dataframe, the geoseries object, and the coordinate reference system or `crs`. 

In [None]:
#Import libraries: Pandas (as "pd") and geopandas (as "gpd")
import pandas as pd
import geopandas as gpd

In [None]:
#Read the EV Charging station data into a Pandas dataframe
df = pd.read_csv('../data/NC_Charging_Stations.csv')

In [None]:
#Examine the first few rows, noting the data include "latitude"  "longitude" columns
df.head()

#### Creating a column of geometric objects (i.e., a GeoSeries) 
To create a geoseries, we use the geopandas `points_from_xy()` function.

In [None]:
#Show info on the command
gpd.points_from_xy?

The essential inputs are a series of x coordinates (our `Longitude` column), a series of y coordinates (our `Latitude` column).

In [None]:
#Create a geoseries object from the coordinate column
geometries = gpd.points_from_xy(
    x=df['Longitude'],
    y=df['Latitude']
)

Next, we use the `GeoDataFrame()` function to construct our geodataframe, attaching our geoseries as its "shape" field. We also, however, need to define the geodataframes's coordinate reference system, which is done by specifing the *well known ID* or **WKID** (really?) of the coordinate system to which our data is referenced.  

>#### What is an WKID code?
>All "official" coordinate systems have a unique ID, often defined by the "European Petroleum Survey Group". These ids, often refered to as  "*WKIDs* or sometimes as "*EPSG codes*", can be found by looking up the name of the coordinate system on either https://spatialreference.org or https://epsg.io/. For example, the WKID for WGS 84 (which is what our data uses) is [4326](https://spatialreference.org/ref/epsg/wgs-84/).

In [None]:
#View the GeoDataframe() command
gpd.GeoDataFrame?

In [None]:
#Create a geodataframe from our data
gdf_csv = gpd.GeoDataFrame(
    data=df,
    geometry=geometries,
    crs = 4326
)

Now, let's explore our geodataframe using many commands familiar with our exploration of Pandas dataframes. These include:
* `head()` to show the first few records of the dataframe (note the last column)
* `info()` to reveal the structure of the dataframe (note the data type of the last column)
* `crs` to reveal the coordinate reference system the dataset uses
* `plot()` to plot the data

In [None]:
#Show the first few records of the geodataframe
gdf_csv.head()

In [None]:
#Show the structure of the dataframe
gdf_csv.info()

In [None]:
#Show the geodataframe's coordinate reference system
gdf_csv.crs

In [None]:
#Show just the EPSG code of the crs
gdf_csv.crs.to_epsg()

In [None]:
#Plot the data
gdf_csv.plot()

And that's it! Pretty straight forward. Soon we will explore the various analyses and visualizations we can do with these spatial dataframes, but first, we'll examine a few other types of data we can import into our coding environment as geodataframes

## Lesson 2: Creating spatial dataframes from existing feature classes
Here we look at the process of getting existing feature classes, e.g. Shapefiles, into spatial dataframes. We'll again look at methods using GeoPandas and then compare that with similar methods using the ArcGIS API for Python. 

The dataset we'll use represents major river basins of North Carolina (source: https://data-ncdenr.opendata.arcgis.com/datasets/ncdenr::major-river-basins), a copy of which has been downloaded into the data folder as `Major_Basins.shp`. 

Importing feature classes using GeoPandas is easy with the `read_file()` command. 

>What's worth noting is that GeoPandas actually uses the Python **Fiona** package to read the shapefiles. Fiona leverages a collection of drivers that provide access to a number of geospatial data formats. Geopandas simplifies the usage of Fiona commands, making import and export of geodataframes easier to use.

In [None]:
#Explore the read_file() command
gpd.read_file?

In [None]:
#Read the shapefile into a GeoPandas geodataframe
gdf_shp = gpd.read_file('../data/Major_Basins.shp')

**Pro tip** -- a shapefile zipped into a single file can also be read in!

In [None]:
#Read a *zipped* shapefile into a GeoPandas geodataframe
gdf_shp = gpd.read_file('../data/Major_River_Basins.zip')

In [None]:
#Explore the data...
gdf_shp.plot()

## Lesson 3. Creating spatial dataframes from other file formats

Now we look at some formats that may be less familiar to you but are becoming more and more common.

### 3.1 Reading GeoJSON files
We have a GeoJSON format of the major river basins in NC saved in our data folder: `../data/12-Major_River_Basins.geojson` ([source](https://data-ncdenr.opendata.arcgis.com/datasets/ncdenr::major-river-basins/)). Let's see how we go about importing that file. 

>##### What is GeoJSON?
>GeoJSON is a text based format that stores spatial features in a long, but universally readable format (i.e. text!). "JSON" stands for JavaScript Object Notation, and if you look at raw JSON files from a Python perspective, it looks like a set of nested dictionary and list objects. We need not get too deep into that, but understand that being text based, JSON and its spatial counterpart GeoJSON, are used widely in web-based services and can be quite useful in certain circumstances.

As it happens, that Fiona package we read about, the one GeoPandas uses, can read this format as well. We simply have to indicate what **driver** the `read_file()` function should use to conver the file into a geodataframe.

In [None]:
#read in the file 
gdf_geojson = gpd.read_file(
    filename='../data/Major_River_Basins.geojson',
    driver='GeoJSON')

In [None]:
gdf_geojson.plot()

### 3.2 Reading KML files
We have a GeoJSON format of the major river basins in NC saved in our data folder: `../data/12-Major_River_Basins.geojson` ([source](https://data-ncdenr.opendata.arcgis.com/datasets/ncdenr::major-river-basins/)). Let's see how we go about importing that file. 

>##### What is KML?
>KML, short for "Keyhole Markup Language", is yet another text based format developed to store geospatial features. This format was originally desgined to work with the Google Earth application (which was originally developed by a company called Keyhole), but others have adopted this format as well because of its simplicity. 

And yes, Fiona has a driver to work with KML files, but for some reason this driver is not enabled by default. Let's look at all the drivers Fiona can work with by default and how to enable this one. 

In [None]:
#Import Fiona
import fiona

In [None]:
#Display fiona's active drivers
fiona.supported_drivers

The result is a dictionary where the keys are the driver and the values are what we can do with them: 
* `r` indicates we can read those formats but not write to them
* `rw` indicates we can both read from and write to those formats
* `raw` indicates we can read, write, and append data to existing files in that format

You'll also notice KML does not appear on that list, but we can add it with the code below (where we add it directly to the list the geopandas can see...) What is the complete list of drivers? For some odd reason, that's not easily found, but you can decifer a bit from this page: https://github.com/Toblerity/Fiona/blob/master/fiona/drvsupport.py. (Thanks to this [StackExchange page](https://gis.stackexchange.com/questions/191365/drivers-of-fiona) for revealing this.)

In [None]:
#Enable the KML driver in geopandas as a read-write format
gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'

In [None]:
#Read the IML file
gdf_kml = gpd.read_file('../data/Major_River_Basins.kml',driver='KML')
gdf_kml.plot();

### 3.3 Reading Geodatabase files
The ESRI Geodatabase is a tricky format that sits somewhere in the gray area between proprietary and opensource. ESRI does publish enough of how these Geodatabases are structured, programmatically, but that structure evolves quickly -- sometimes faster than coders can update Fiona drivers. 

In any event, those drivers are usually labeled as `OpenFileGDB` and you'd be best off doing a websearch for the latest sequence of commands required to read geodatabase feature classes into a spatial dataframe.