Zonal statistics
================

Quite often you have a situtation when you want to summarize raster datasets based on vector geometries, such as calculating the average elevation of specific area.

[Rasterstats](https://github.com/perrygeo/python-rasterstats) is a Python module that does exactly that, easily.

- Let's start by reading the data:

In [None]:
import rasterio
from rasterio.plot import show
from rasterstats import zonal_stats
import osmnx as ox
import geopandas as gpd
import os
import matplotlib.pyplot as plt
%matplotlib inline

# File path
dem_fp = os.path.join("data", "la_partial_dem.tiff")

# Read the Digital Elevation Model
dem = rasterio.open(dem_fp)
dem

Good, now our elevation data is in read mode.

Next, we want to calculate the elevation of two neighborhoods San Fernando and Topanga, and find out which one of them is higher based on the elevation data. We will use a package called [OSMnx](https://github.com/gboeing/osmnx) to fetch the data from OpenStreetMap for those areas.

- Specify place names for `San Fernando` and `Topanga` that Nominatim can identify https://nominatim.openstreetmap.org/, and retrieve information 

In [None]:
# Keywords in such format that they can be found from OSM
topanga_q = "Topanga, California, United States of America"
san_fernando_q = "San Fernando, California, United States of America"

# Retrieve the geometries of those areas using osmnx
topanga = ox.geocode_to_gdf(topanga_q)
san_fernando = ox.geocode_to_gdf(san_fernando_q)

# Reproject to same coordinate system as the 
topanga = topanga.to_crs(crs=dem.crs)
san_fernando = san_fernando.to_crs(crs=dem.crs)

type(san_fernando)

As we can see, now we have retrieved data from OSMnx and they are stored as GeoDataFrames.

- Let's see how our datasets look by plotting the DEM and the regions on top of it

In [None]:
# Plot the Polygons on top of the DEM
ax = topanga.plot(facecolor='None', edgecolor='red', linewidth=2)
ax = san_fernando.plot(ax=ax, facecolor='None', edgecolor='blue', linewidth=2)

# Plot DEM
show((dem, 1), ax=ax)

**Which one is higher? We can use zonal statistics to find out!**

- First we need to get the values of the dem as numpy array and the affine of the raster

In [None]:
# Read the raster values
array = dem.read(1)

# Get the affine
affine = dem.transform

- Now we can calculate the zonal statistics by using the function `zonal_stats`.

In [None]:
# Calculate zonal statistics for Topanga
zs_topanga = zonal_stats(topanga, array, affine=affine, stats=['min', 'max', 'mean', 'median', 'majority'])

# Calculate zonal statistics for Long Beach
zs_san_fernando = zonal_stats(san_fernando, array, affine=affine, stats=['min', 'max', 'mean', 'median', 'majority'])


Okey. So what do we have now?

In [None]:
print(zs_topanga)
print(zs_san_fernando)