# Performing analysis with GeoDataframes
With spatial data stored as a geodataframe, we can run a number of analyses, both tabular, using Pandas operations, and spatial, using Geopandas operations. This notebook wades gently into the world of GeoPandas and also serves as a review of several Pandas operations. 

Specifically we examine the following:
* Reading & writing shapefiles into a GeoPandas dataFrame
* Reprojecting data using GeoPandas
* Exploring the GeoPandas dataFrame
 * Exposing the number of features 
 * Revealing the data types of each column 
 * Exploring the `geometry` data type
* Quick view of plotting in GeoPandas
* Quick view of geoprocessing in GeoPandas

### 1. Install the package and enable inline plots

In [None]:
#import the package
import geopandas as gpd

#enable plots to appear in the notebook
%matplotlib inline

---
<h2><font color='red'>► FIX REQUIRED ◄</font></h2>

*You need to run this code block to fix an issue with the `pyproj` module used by GeoPandas.*

For more on the issue see: 
https://github.com/geopandas/geopandas/issues/830

In [None]:
#Fix issue with pyproj 
import sys, os
pythonPath = sys.executable
pythonFolder = os.path.dirname(pythonPath)
shareFolder = os.path.join(pythonFolder,'Library','share')
os.environ["PROJ_LIB"] = shareFolder

---
### 2. Read a shapefile into a _GeoDataframe_
GeoPandas can read shapefiles directly. Behind the scenes, this operation is using the `GDAL` package which contains the binaries capable of understanding geospatial data, the `fiona` package, which allows Python to interact nicely with `GDAL` libraries, and the `shapely` package which has functions for operating with feature classes in a Pythonic way. GeoPandas coordinate reference systems can use the "European Petroleum Survey Group" (EPSG) codes as shorthand for various standard systems. 

Complete documentation on the GeoDataframe is here: http://geopandas.org/data_structures.html#geodataframe

In [None]:
#read in the HUC12.shp feature class
gdf = gpd.read_file('./Data/12Digit_HUC_Subwatersheds.shp')

### 3. Explore properties of the GeoDataframe
Here we explore various properties of our GeoDataframe. Note that all the operations that apply to a Pandas dataframe also apply to geodataframes... 

In [None]:
#How many features and attributes in the dataset?
gdf.shape

In [None]:
#show information on each column in the geodataframe
gdf.info()

In [None]:
#Quick summary stats of the dataset
gdf.describe()

In [None]:
#examine the attributes for the first feature
gdf.iloc[0]

In [None]:
#What coordinate reference system is used? Check http://epsg.io for what this code is
gdf.crs

→ If the crs returns an 'epsg' code you can generate a URL to look it up...

In [None]:
#Get the epsg code from the crs
epsg = gdf.crs['init'].split(':')[1]
#Generate and print the URL, which you can click on...
print("http://epsg.io/{}".format(epsg))

In [None]:
#show the geometry type(s) in this geodataframe
gdf.type.unique()

In [None]:
#Simple plot
gdf.plot(figsize=(10,5));

### 4. Projections in GeoPandas: Reprojecting data
We see our native data is unprojected, using the WGS84 geographic coordinate system. If we want to compute areas or lengths in meaningful units, we'll have to reproject the data to projected coordinate system. Here we'll reproject our NAD83 data to UTM Zone 17 N, which has an EPSG code of `26917`. 
* http://geopandas.org/projections.html

In [None]:
#Reproject to UTM: 
#  If this results in "b'no arguments in initialization list'" error
#  see the "Fix" above!
gdfUTM  = gdf.to_crs({'init':'epsg:26917'})

In [None]:
#Simple plot - does it look different than above? 
gdfUTM.plot(figsize=(10,5));

### 5. Exploring the `geometry` objects in a GeoPandas dataframe
The key to Geopandas ability to work with geospatial data is by adding a new data type to the standard Pandas dataFrame: this is stored in the `geometry` field. Let's explore this field. 

Complete documentation on the geometry object is here: http://geopandas.org/geometric_manipulations.html

In [None]:
#show the first 5 values in the geometry field: this is actually a GeoSeries...
gdfUTM['geometry'][0:5]

In [None]:
#show just a single geometry - it appears as a shape
gdfUTM['geometry'][10]

Now let's save that one geometry object, a polygon in this case to variable and examine what GeoPandas allows us to do with it. 

In [None]:
#Extract one feature geometry to a variable; what is its datatype?
thePoly = gdfUTM['geometry'][10]
type(thePoly)

In [None]:
#Show thePoly
thePoly

In [None]:
#Show the area and perimeter length of ths polygon
theArea = thePoly.area
thePerim = thePoly.length
print ("Area (m2):",int(theArea))
print ("Permeter (m):",int(thePerim))

In [None]:
#Convert the polygon's boundary to a linestring (i.e. a line feature)
theBoundary = thePoly.boundary
type(theBoundary)

In [None]:
#Show the linestring - we see it as a line feature, as expected
theBoundary

In [None]:
#Create the centroid of the feature
theCentroid = thePoly.centroid
type(theCentroid)

In [None]:
#Display the centroid - it doesn't appear (a point is infinitely small)
theCentroid

In [None]:
#But we can show the point buffered 10 m
theCentroid.buffer(10)

In [None]:
#We can buffer polygons too
thePoly.buffer(100)

#### To try:
1. Display the polygon's perimeter (e.g. `theBoundary` object) bufferd by 250 meters.
2. What happens if you buffer `thePoly`  **-250m**?

In [None]:
#See if you can display the boundary line created above, buffered 250 meters


In [None]:
#Display the polygon buffered negative 250m


→ Geopandas has other [feature transformations](http://geopandas.org/geometric_manipulations.html?highlight=buffer#constructive-methods). Try: `convex_hull`, `envelope`, `simplify(tolerance=100)`...

In [None]:
#What does the `.convex_hull` transformation do?

In [None]:
#What does the `.envelope` transformation do?

In [None]:
#Simplify the polygon using various tolerance values

### 6. Spatial Analysis among geometries
Let's move on to more sophisticated spatial analysis that we can do with GeoPandas. First we'll look at working with GeoSeries objects, i.e. arrays of geometries. Just as we can perform mathematical operations on sets of numbers stored in a Panda series or Numpy array, we can run spatial analyses on entire collections of geometries.

#### Subset features using Pandas `query`
First, we'll subset our data to a more manageable size dataset. For this we use Pandas queries. 

In [None]:
#Remind ourselves what columns are in this dataset
gdfUTM.columns

In [None]:
#List unique values in the Basin field
gdfUTM['DWQ_Basin'].unique()

In [None]:
#Create a dataframe of HUCs in the particular basin
gdfNeuse = gdfUTM.query('DWQ_Basin == "Neuse"').copy(deep=True)
gdfNeuse.shape

In [None]:
#Quick plot - adding "column=" allows us to color on unique values in that column
gdfNeuse.plot(column='HUC_8');

#### Computing distances between features
How far is each HUC in the Neuse from Durham? To do this we first have to create a point representing Durham. We do this using the Shapely package: https://shapely.readthedocs.io/en/stable/manual.html#points. However, to match the projection of our Neuse dataframe, we need to project our point to UTM Zone 17N. We do this with the `pyproj` package.

In [None]:
#Create a point for Durham (Lat=36.0044;Long=-78.9429)
from shapely.geometry import Point
ptDurham_DD = Point(-78.9429,36.0044)
type(ptDurham_DD)

In [None]:
#Project Durham from NAD83 decimal degrees to UTM Zone 17n (WKID=26917)
import pyproj
prjNAD83 = pyproj.Proj(init='epsg:4326')
prjUTM17N = pyproj.Proj(init='epsg:26917')
ptDurham_UTM = Point(pyproj.transform(prjNAD83,       #Source projection
                                      prjUTM17N,      #Destination projection
                                      ptDurham_DD.x,  #X coordinate
                                      ptDurham_DD.y)) #Y coordinate
#Show the coordinates
ptDurham_UTM.x,ptDurham_UTM.y

In [None]:
#Compute the distance of each feature to this center point & show the mean
theDistances_km = gdfUTM.distance(ptDurham_UTM)/1000
#This returns a series -- a list of distances to each catchment feature
type(theDistances_km)

In [None]:
theDistances_km[:5]

In [None]:
#Summary stats of all the distances
theDistances_km.describe()

In [None]:
#Plot a histogram of values
theDistances_km.hist(figsize=(10,3));

In [None]:
#We can even join the distances back to the geo dataframe and plot HUCs on distance
gdfNeuse['dist2durham'] = theDistances_km
gdfNeuse.plot(column='dist2durham',
              cmap='YlOrRd',
              legend=True);

#### Buffering all features
What if we wanted to find the "core area" of all our Neuse HUCS, i.e. all area within 1500m of its border. We can do that easily by buffering our features with a negative value:

In [None]:
#Buffer the HUCs -1500m
gdfNeuseCore = gdfNeuse.buffer(-1500)
gdfNeuseCore.plot();

In [None]:
#Report summary stats of the areas of the returned features
gdfNeuseCore.area.describe()

In [None]:
#Reveal a histogram of the areas
gdfNeuseCore.area.hist();

#### Clipping features
Now let's see what the area of each HUC is within 5000m of Durham...

In [None]:
#Buffer Durham 1500m 
durham_5000m = ptDurham_UTM.buffer(15000)
type(durham_5000m)

In [None]:
#Add that one geometry feature to a new geoseries 
gs_Durham = gpd.GeoSeries(durham_5000m)
type(gs_Durham)

In [None]:
#Create a one-item spatial dataframe and assign it's geometry to the geoseries above 
gdf_Durham = gpd.GeoDataFrame([{'Location':'Durham'}],
                              geometry=gs_Durham)
type(gdf_Durham)

In [None]:
#Show the dataframe. How's it look?
gdf_Durham

In [None]:
#What is the geodataframes coordinate reference system?
gdf_Durham.crs

In [None]:
#As it's undefined, we'll define it - using the same crs as the Neuse datasets
gdf_Durham.crs = gdfNeuse.crs

In [None]:
#gs_Neuse = gdfNeuse.geometry
#type(gs_Neuse)

In [None]:
#Plot both dataframes
theAxis = gdf_Durham.plot(color='red')             #Plot the Durham gdf, saving its axis to "theAxis"
gdfNeuse.plot(ax=theAxis,color='blue',alpha=0.5);  #Plot the Neuse gdf, using the same axis as above

In [None]:
#Clip the HUC layer
gdfNeuseClip = gpd.overlay(gdf_Durham,gdfNeuse,how='intersection')
#Show the Clip
gdfNeuseClip.plot()

In [None]:
#Extract each polygon's area to a new fields
gdfNeuseClip['Area'] = gdfNeuseClip.area
#Set the dataframe's index to the HUC12 Name
gdfNeuseClip.set_index('HU_12_NAME',inplace=True)
gdfNeuseClip.head()

In [None]:
gdfNeuseClip['Area'].plot(kind='barh');

### A more complex analyis
Here we will buffer the centroid of a feature and then intersect that with the feature. 

* We begin by selecting a feature. We'll pick on the Elk Creek HUC...

In [None]:
#Select a feature by an attribute
hucMask = gdfUTM['HU_12_NAME'] == 'Elk Creek'
gdfHUC = gdfUTM[hucMask]
type(gdfHUC)

In [None]:
#Show the results
gdfHUC

►This approach is slightly different than in the above example (`thePoly = gdfUTM['geometry'][0]`) which returned a _Shapely geometry_ object from the geodataframe; here our query returns a _GeoSeries_ object. However, other than plotting, the behavior is mostly the same.

In [None]:
#Get the shape of the feature
feature_geometry = gdfHUC['geometry'] #->returns a GeoSeries, not a shapely geometry
type(feature_geometry)

In [None]:
#Copy the dfHUC dataframe and then we'll modify geometries
gdfHUC_copy = gdfHUC.copy(deep=True)

In [None]:
#Update geometry to the centroid of each feature buffered 5000m
gdfHUC_copy['geometry'] = gdfHUC_copy['geometry'].centroid.buffer(5000)

In [None]:
#Buffer the centroid
theBuffer = theCentroid.buffer(100)
#Intersect the buffer and the original shape
theClip = gpd.overlay(gdfHUC_copy,gdfHUC,how='intersection')
#Show the Clip
theClip.plot()

## 7. Geospatial capabilities of the GeoPandas dataFrame object

In [None]:
#Dissolving
dfHUC8 = gdf.dissolve(by='HUC_8',aggfunc='sum')
dfHUC8.dtypes

In [None]:
dfHUC8.plot(column='ACRES',
            scheme='quantiles',        
            figsize=(14,18));

## Recap
In this super quick introduction to GeoPandas, we saw that the GeoDataFrame is easy to construct from a shapefile, and once constructed gives us access to the analytic capability of Pandas dataframes (e.g. selecting, summarizing, etc.) as well as plotting and spatial analytic capability. 

I'm hopeful that at the end of this short introduction you're eager to read up on the documentation and learn more what GeoPandas can do. 