## Creating Spatial Dataframes (1)
ENV 859 - Fall 2022  
© John Fay, Duke University

### What is a spatial dataframe
A **spatial dataframe** (aka a **geodataframe**) is much like a typical Pandas dataframe except that it accomodates a new datatype: the **geometry** data type. The geometry, as you might guess can contain geometric features: points, lines, and polygons, each of which is defined by a single or series of coordinate pairs. These spatial dataframes are also assigned a **coordinate reference system (crs)**, which links these coordinates to specific places on the Earth and allows us to do geospatial analysis. In other words, these spatial dataframes are quite the same as our familiar GIS feature classes!

### Libraries for working with spatial dataframes
To work with spatial dataframes, we need one of two Python libraries, each of which has its own version of the spatial dataframe and its own set of functions and classes. First is **geopandas**, which has been around for a while, and then there's the newcomer, the **ArcGIS API for Python** (which really needs a better name). There's much overlap among the two, but also some very important distinctions and its useful to know the basics of both. 

### The lessons
Here in this first lesson, we review several techniques for creating spatial dataframes from various source formats using both GeoPandas and the ArcGIS API for Python. The source formats include:
1. A delimited text file (e.g. CSV) containing coordinate columns and a know coordinate reference system
2. An existing feature class in the form of a shapefile or within a geodatabase
3. Other formats: GeoJSON files, KML, and ArcGIS REST-based web services

## Lesson 1. Creating spatial dataframes from CSV files
In this example, we examine how to create a point spatial dataframe from a CSV file containing latitude and longitude coordinates. The data we'll use in this exercise is electric vehicle charging locations in North Carolina ([source](https://afdc.energy.gov/data_download)).

### Using GeoPandas
The process of importing a CSV file into a GeoPandas geodataframe consists of first importing the data into a Pandas dataframe and then creating a **GeoSeries** - or column of geometry objects - from the coordinate columns. Then we construct the geodataframe using the GeoPandas `GeoDataFrame()` function supplying the original dataframe, the geoseries object, and the coordinate reference system or `crs`. 

In [None]:
#Import libraries: Pandas (as "pd") and geopandas (as "gpd")
import pandas as pd
import geopandas as gpd

In [None]:
#Read the EV Charging station data into a Pandas dataframe
df = pd.read_csv('../data/NC_Charging_Stations.csv')

In [None]:
#Examine the first few rows, noting the data include "latitude"  "longitude" columns
df.head()

#### Creating a column of geometric objects (i.e., a GeoSeries) 
To create a geoseries, we use the geopandas `points_from_xy()` function.

In [None]:
#Show info on the command
gpd.points_from_xy?

The essential inputs are a series of x coordinates (our `Longitude` column), a series of y coordinates (our `Latitude` column).

In [None]:
#Create a geoseries object from the coordinate column
geometries = gpd.points_from_xy(
    x=df['Longitude'],
    y=df['Latitude']
)

Next, we use the `GeoDataFrame()` function to construct our geodataframe, attaching our geoseries as its "shape" field. We also, however, need to define the geodataframes's coordinate reference system, which is done by specifing the *well known ID* or **WKID** (really?) of the coordinate system to which our data is referenced.  

>#### What is an WKID code?
>All "official" coordinate systems have a unique ID, often defined by the "European Petroleum Survey Group". These ids, often refered to as  "*WKIDs* or sometimes as "*EPSG codes*", can be found by looking up the name of the coordinate system on either https://spatialreference.org or https://epsg.io/. For example, the WKID for WGS 84 (which is what our data uses) is [4326](https://spatialreference.org/ref/epsg/wgs-84/).

In [None]:
#View the GeoDataframe() command
gpd.GeoDataFrame?

In [None]:
#Create a geodataframe from our data
gdf  = gpd.GeoDataFrame(
    data=df,
    geometry=geometries,
    crs = 4326
)

Now, let's explore our geodataframe using many commands familiar with our exploration of Pandas dataframes

In [None]:
#Show the structure of the geodataframe
gdf.info()

In [None]:
#Show the valus for the first record
gdf.iloc[0]

In [None]:
#Show the first few records
gdf.head()

In [None]:
#Plot
gdf.plot?
gdf.plot(kind='geo',color='green');

And that's it! Pretty straight forward. Soon we'll see what we can do with these dataframes, but first let's examine the same proceedure as done with the ArcGIS API for Python...

### Using the ArcGIS API for Python
This process is quite similar. We again start with a Pandas dataframe, but here we don't have to create the geoseries. Instead, we just specify the dataframe and the columns in that dataframe that contain X and Y coordinates -- and the coordinate reference system too, of course.

In [None]:
#Import the library; because the arcgis package is HUGE, we import just the bit we need
from arcgis import GeoAccessor

The function we use here is the *GeoAccessor*'s `from_xy()` function...

In [None]:
#Explore the function 
GeoAccessor.from_xy?

In [None]:
#Convert our dataframe to a spatial dataframe
sdf = GeoAccessor.from_xy(
    df=df,
    x_column='Longitude',
    y_column='Latitude',
    sr = 4326
)

The spatial dataframe returned here is different than the GeoPandas geodataframe. It too can accept all the commands of Pandas dataframe. However, to access the spatially-enabled features, we append `.spatial` to the object.

In [None]:
#It appears to be just a Pandas dataframe
type(sdf)

In [None]:
#To access its spatial component, we add ".spatial" to the object
type(sdf.spatial)

View the spatial features using the `plot()` function of the GeoAccessor object.

In [None]:
#View the points
sdf.spatial.plot()