# **Module 3: Raster Data in Python**

#### Data
In this example, we will create some raster data to imitate real-world data. We will use `data-module-3` as a workspace. We have also prepared the following datasets:
- `ndvi_summer.tif` and `ndvi_winter.tif` - Normalized Difference Vegetation Index (NDVI) for a study area in Kansas. NDVI is an indicator of vegetation health. Data was acquired by Landsat 8 and exported from Climate Engine at http://climateengine.org.
- `ag_fields.shp` - selected agricultural fields for a study area in Kansas.
- `friction_mali.tif` - a friction surface that quanitfies travel cost for a sample study area in Mali.

#### Software
To execute the code you will need a Python environment with the packages imported below. The default environment does not have all required packages to execute this script. Therefore, run the following command beforehand:
- `pip install xarray xarray-spatial --user`

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import rasterio
from rasterio.transform import Affine
from rasterio.plot import plotting_extent
from rasterio.enums import Resampling
from rasterstats import zonal_stats
import xarray as xr
from xrspatial.convolution import circle_kernel
import xrspatial.zonal, xrspatial.focal
import geopandas as gpd
import skimage.graph as graph

### **Raster Data Review**

#### Create a raster

To generate raster data in Python we rely on `numpy` and `rasterio` packages. We need the following three components:
- An array of data and the xy coordinates;
- A Coordinate Reference System;
- A transform defining the coordinate of the upper left hand corner of the array.

In [None]:
ras_0 = np.zeros([6, 6])

ras_a = np.arange(1, 37).reshape(6, 6)

np.random.seed(0)
ras_b = np.random.randint(100, size=(6,6))

In [None]:
fig, axs = plt.subplots(1,3, figsize=(15,4), tight_layout=True)

plot0 = axs[0].imshow(ras_0, cmap="YlGnBu")
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("'Zeros' array")

plot1 = axs[1].imshow(ras_a, cmap="YlGnBu")
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("ras_a (Consecutive Integers)")

plot2 = axs[2].imshow(ras_b, cmap="YlGnBu")
fig.colorbar(plot2, ax=axs[2])
axs[2].set_title("ras_b (Random Integers)")

plt.show()

In [None]:
x = np.linspace(-1.25, 1.25, 6)
y = np.linspace(-1.25, 1.25, 6)
X, Y = np.meshgrid(x, y)

In [None]:
fig, axs = plt.subplots(1,2, figsize=(10,4),  tight_layout=True)

plot0 = axs[0].imshow(X, cmap="magma_r", origin="lower")
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("longitude")

plot1 = axs[1].imshow(Y, cmap="magma_r",origin="lower")
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("latitude")

plt.show()

In [None]:
res = 0.5
transform = Affine.translation(x[0] - res / 2, -(y[0] - res / 2)) * Affine.scale(res, -res)

with rasterio.open("./data-module-3/ras_a.tif",
                   "w",
                   height=ras_a.shape[0],
                   width=ras_a.shape[1],
                   count=1,
                   dtype=np.int16,
                   crs="epsg:4326",
                   transform=transform,
                   nodata=-999
                  ) as dst:
     dst.write(ras_a, 1)

#### Read raster data from a file

In [None]:
ndvi_winter = rasterio.open("./data-module-3/ndvi_winter.tif")
ndvi_summer = rasterio.open("./data-module-3/ndvi_summer.tif")

print (f"Dataset CRS is {ndvi_summer.crs}")
print (f"Dataset extent is {ndvi_summer.bounds}")
print (f"Dataset resolution is {ndvi_summer.res}")
print (f"Dataset NoData is {ndvi_summer.nodata}")
print("Dataset transform is below")
ndvi_summer.transform

**How to read Affine matrix (a, b, c, d, e, f)?**
- a = width of a pixel
- b = row rotation (typically zero)
- c = x-coordinate of the upper-left corner of the upper-left pixel
- d = column rotation (typically zero)
- e = height of a pixel (typically negative)
- f = y-coordinate of the of the upper-left corner of the upper-left pixel.

**Note that when you read a raster dataset with `rasterio` the interpretation of the 3 axes is `(bands, rows, columns)`.**

In [None]:
ndvi_winter_array = ndvi_winter.read(1)
ndvi_winter_array[ndvi_winter_array==ndvi_winter.nodata] = np.nan

ndvi_summer_array = ndvi_summer.read(1)
ndvi_summer_array[ndvi_summer_array==ndvi_summer.nodata] = np.nan

In [None]:
fig, axs = plt.subplots(1,2, figsize=(10,4), tight_layout=True)

plot0 = axs[0].imshow(ndvi_winter_array, cmap="YlGn", clim=(0,1))
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("NDVI winter")

plot1 = axs[1].imshow(ndvi_summer_array, cmap="YlGn",  clim=(0,1))
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("NDVI summer")

plt.show()

#### Summarize raster data
Below we present some methods to summarize and describe array data with common statistics.

In [None]:
fig, ax = plt.subplots(figsize=(5,3), tight_layout=True)
ax.hist(ras_b.flatten(), facecolor="grey", alpha=0.75)
plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(5,3), tight_layout=True)
ax.hist(ndvi_summer_array.flatten(), facecolor="seagreen", alpha=0.75)
ax.hist(ndvi_winter_array.flatten(), facecolor="lightskyblue", alpha=0.75)
plt.show()

In [None]:
print (f"Mean: {np.mean(ras_a)}")
print (f"Median: {np.median(ras_a)}")
print (f"Maximum: {np.max(ras_a)}")
print (f"Minimum: {np.min(ras_a)}")
print (f"Standard Deviation: {np.std(ras_a)}")
print (f"Percentile: {np.percentile(ras_a, 70)}")

In [None]:
print (f"Mean: {np.mean(ndvi_summer_array)}")

In [None]:
print (f"Mean: {np.nanmean(ndvi_summer_array)}")
print (f"Median: {np.nanmedian(ndvi_summer_array)}")
print (f"Maximum: {np.nanmax(ndvi_summer_array)}")
print (f"Minimum: {np.nanmin(ndvi_summer_array)}")
print (f"Standard Deviation: {np.nanstd(ndvi_summer_array)}")
print (f"Percentile: {np.nanpercentile(ndvi_summer_array, 70)}")

### **Local operations**

#### Map algebra 
Below we demonstrate examples of how to use execute a variety of map algebra expressions on one or more arrays.

In [None]:
X2 =  ras_a*2
Sq = ras_a**2
ratio = ras_b/ras_a
mean = (X2+Sq)/2
ras_a_cap = np.where(ras_a > 25, 25, ras_a)

In [None]:
fig, axs = plt.subplots(1,5, figsize=(16,3), tight_layout=True)

plot0 = axs[0].imshow(X2, cmap="PiYG")
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("ras_a doubled")

plot1 = axs[1].imshow(Sq, cmap="PiYG")
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("ras_a squared")

plot2 = axs[2].imshow(ratio, cmap="PiYG")
fig.colorbar(plot2, ax=axs[2])
axs[2].set_title("Ratio of ras_b to ras_a")

plot3 = axs[3].imshow(mean, cmap="PiYG")
fig.colorbar(plot3, ax=axs[3])
axs[3].set_title("Mean of doubled/squared")

plot4 = axs[4].imshow(ras_a_cap, cmap="PiYG")
fig.colorbar(plot4, ax=axs[4])
axs[4].set_title("ras_a capped at 25")

plt.show()

In [None]:
ndvi_diff_array = ndvi_summer_array - ndvi_winter_array

In [None]:
fig, axs = plt.subplots(1,3, figsize=(15,4), tight_layout=True)

plot0 = axs[0].imshow(ndvi_winter_array, cmap="YlGn", clim=(0,1))
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("NDVI winter")

plot1 = axs[1].imshow(ndvi_summer_array, cmap="YlGn",  clim=(0,1))
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("NDVI summer")

plot2 = axs[2].imshow(ndvi_diff_array, cmap="coolwarm")
fig.colorbar(plot2, ax=axs[2])
axs[2].set_title("NDVI difference")

plt.show()

#### Reclassify array data
Reclassification allows to reassign one or more values in a raster dataset to new output values.

In [None]:
reclassified = ras_a.copy()

reclassified[(reclassified > 0) & (reclassified <= 12)] = 1
reclassified[(reclassified > 12) & (reclassified <= 24)] = 2
reclassified[(reclassified > 24) & (reclassified <= 37)] = 3

In [None]:
fig, axs = plt.subplots(1,2, figsize=(10,4), tight_layout=True)

plot0 = axs[0].imshow(ras_a, cmap="YlGnBu")
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("ras_a")

plot1 = axs[1].imshow(reclassified, cmap="YlGnBu")
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("ras_a reclassified")

plt.show()

### **Focal operations**

#### Resample to a coarser resolution

In [None]:
scale_factor = 1/2
dataset = rasterio.open("./data-module-3/ras_a.tif")
aggregated = dataset.read(1, 
                          out_shape=(int(dataset.height * scale_factor), int(dataset.width * scale_factor)),
                          resampling=Resampling.nearest)
transform = dataset.transform * dataset.transform.scale(
    (dataset.width / aggregated.shape[0]),
    (dataset.height / aggregated.shape[1]))

In [None]:
fig, axs = plt.subplots(1,2, figsize=(10,4), tight_layout=True)

plot0 = axs[0].imshow(ras_a, cmap="coolwarm")
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("ras_a original")

plot1 = axs[1].imshow(aggregated, cmap="coolwarm")
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("ras_a aggregated")

plt.show()

In [None]:
scale_factor = 1/10
dataset = rasterio.open("./data-module-3/ndvi_summer.tif")
aggregated = dataset.read(1, 
                          out_shape=(int(dataset.height * scale_factor), int(dataset.width * scale_factor)),
                          resampling=Resampling.nearest)
aggregated[aggregated==dataset.nodata] = np.nan

In [None]:
fig, axs = plt.subplots(1,2, figsize=(10,4), tight_layout=True)

plot0 = axs[0].imshow(ndvi_summer_array,  cmap="YlGn", clim=(0,1))
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("NDVI original")

plot1 = axs[1].imshow(aggregated,  cmap="YlGn", clim=(0,1))
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("NDVI aggregated")

plt.show()

#### Resample to a higher resolution

In [None]:
scale_factor = 2
dataset = rasterio.open("./data-module-3/ras_a.tif")

resampled_nearest = dataset.read(1, 
                          out_shape=(int(dataset.height * scale_factor), int(dataset.width * scale_factor)),
                          resampling=Resampling.nearest)
resampled_bilnear = dataset.read(1, 
                          out_shape=(int(dataset.height * scale_factor), int(dataset.width * scale_factor)),
                          resampling=Resampling.bilinear)

In [None]:
fig, axs = plt.subplots(1,3, figsize=(15,4), tight_layout=True)

plot0 = axs[0].imshow(ras_a, cmap="coolwarm")
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("ras_a original")

plot0 = axs[1].imshow(resampled_nearest, cmap="coolwarm")
fig.colorbar(plot0, ax=axs[1])
axs[1].set_title("ras_a: nearest resamling")

plot1 = axs[2].imshow(resampled_bilnear, cmap="coolwarm")
fig.colorbar(plot1, ax=axs[2])
axs[2].set_title("ras_a: bilinear resampling")

plt.show()

#### Apply focal statistics
Focal statistics calculates a statistic for input cells with a set of overlapping windows or neighborhoods defined as `kernel`. Please note that we use `xarray` and `xarray-spatial` packages to perfrom these operations.

In [None]:
ras_b_xr =  xr.DataArray(ras_b, dims=["y", "x"], name="raster b")
ras_b_xr

In [None]:
ds = xr.Dataset(data_vars=dict(a = (["y", "x"], ras_a), b = (["y", "x"], ras_b)), 
                attrs=dict(description="Data we generated"))
ds

In [None]:
kernel1 = circle_kernel(1,1,1)
focal1 = xrspatial.focal.focal_stats(ras_b_xr, kernel1, stats_funcs=["min"])[0]

kernel2 = np.ones([3,3])
focal2 = xrspatial.focal.focal_stats(ras_b_xr, kernel2, stats_funcs=["min"])[0]

In [None]:
fig, axs = plt.subplots(1,3, figsize=(15,4), tight_layout=True)

plot0 = axs[0].imshow(ras_b, cmap="coolwarm")
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("ras_b original")

plot1 = axs[1].imshow(focal1, cmap="coolwarm") 
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("min focal wtih circle kernel")

plot2 = axs[2].imshow(focal2, cmap="coolwarm") 
fig.colorbar(plot2, ax=axs[2])
axs[2].set_title("min focal wtih custom kernel")

plt.show()

In [None]:
ndvi_summer_xr =  xr.DataArray(ndvi_summer_array, dims=["y", "x"], name="ndvi_summer")
kernel = np.ones([15,15])
ndvi_summer_xr_focal = xrspatial.focal.focal_stats(ndvi_summer_xr, kernel, stats_funcs=["mean"])[0]
difference  = ndvi_summer_xr - ndvi_summer_xr_focal

In [None]:
fig, axs = plt.subplots(1,3, figsize=(15,4), tight_layout=True)

plot0 = axs[0].imshow(ndvi_summer_xr, cmap="YlGn", clim=(0,1))
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("NDVI summer original")

plot1 = axs[1].imshow(ndvi_summer_xr_focal, cmap="YlGn", clim=(0,1)) 
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("NDVI summer Mean focal")

plot2 = axs[2].imshow(difference, cmap="magma") 
fig.colorbar(plot2, ax=axs[2])
axs[2].set_title("Difference between original and focal")

plt.show()

### **Zonal operations**

#### Summarize array by another array
Here we show how to calculate summary statistics for each zone defined by a `zones` dataset, based on `values` array.

In [None]:
np.random.seed(0)
categorical = np.random.randint(1,4, size=(6,6))

In [None]:
fig, axs = plt.subplots(1,2, figsize=(10,4), tight_layout=True)

plot0 = axs[0].imshow(ras_a, cmap="YlGnBu")
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("values (ras_a)")

plot1 = axs[1].imshow(categorical, cmap="coolwarm")
fig.colorbar(plot1, ax=axs[1], ticks=[1,2,3])
axs[1].set_title("zones")

plt.show()

In [None]:
values = xr.DataArray(ras_a)
zones = xr.DataArray(categorical)
stats_df = xrspatial.zonal.stats(zones=zones, values=values)
stats_df

#### Summarize array by vector geometries

In [None]:
fields_gdf = gpd.read_file("./data-module-3/ag_fields.shp")
fields_gdf

In [None]:
fig, axs = plt.subplots(1,2, figsize=(10,4), tight_layout=True)

plot_extent = plotting_extent(ndvi_winter_array, ndvi_winter.transform)

fields_gdf.plot(ax=axs[0], facecolor="none", edgecolor="blue", linewidth=2)

plot0 = axs[0].imshow(ndvi_winter_array, cmap="YlGn", clim=(0,1), extent=plot_extent)
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("NDVI winter")

fields_gdf.plot(ax=axs[1], facecolor="none", edgecolor="blue", linewidth=2)
plot1 = axs[1].imshow(ndvi_summer_array, cmap="YlGn",  clim=(0,1), extent=plot_extent)
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("NDVI summer")

plt.show()

In [None]:
zs = zonal_stats("./data-module-3/ag_fields.shp", "./data-module-3/ndvi_summer.tif", 
                 stats = ["mean", "max"])
zs

In [None]:
fields_gdf["ndvi_winter"] = fields_gdf.apply(lambda x: zonal_stats(x.geometry, ndvi_winter_array, affine=ndvi_winter.transform, 
                                                         nodata=ndvi_winter.nodata, stats =["mean"])[0]["mean"], axis=1)
fields_gdf["ndvi_summer"] = fields_gdf.apply(lambda x: zonal_stats(x.geometry, ndvi_summer_array, affine=ndvi_summer.transform, 
                                                         nodata=ndvi_summer.nodata, stats =["mean"])[0]["mean"], axis=1)
fields_gdf

### **Global operations**

#### Compute travel time to a destination

In [None]:
dataset =  rasterio.open("./data-module-3/friction_mali.tif")
friction = dataset.read(1)

In [None]:
destinations = [[600, 200], [100,500]]
mcp = graph.MCP_Geometric(friction, fully_connected=True, sampling=(dataset.res[0], dataset.res[1]))
cumulative_costs, traceback = mcp.find_costs(destinations)

In [None]:
fig, axs = plt.subplots(1,2, figsize=(10,4), tight_layout=True)

axs[0].plot(destinations[0][1], destinations[0][0],'*', color="orange", markersize=20)
plot0 = axs[0].imshow(friction, cmap="coolwarm")
axs[0].plot(destinations[1][1], destinations[1][0],'*', color="orange", markersize=20) 
fig.colorbar(plot0, ax=axs[0])
axs[0].set_title("Friction (minutes/metre)")

plot1 = axs[1].imshow(cumulative_costs, cmap="cubehelix_r")
axs[1].plot(destinations[0][1], destinations[0][0],'*', color="orange", markersize=20)
axs[1].plot(destinations[1][1], destinations[1][0],'*', color="orange", markersize=20) 
fig.colorbar(plot1, ax=axs[1])
axs[1].set_title("Travel time (minutes)")

plt.show()

### **Exercises**
Your expercises will draw on datasets from the Spatial Production Allocation Model (SPAM) and Minnesota Geospatial Commons, which have been downloaded, cleaned, transformed, and saved to the directory `./data-module-3/` for this workshop.
#### Data
- `spam_H_MAIZ_A_mn.tif` (crop harvested area), `spam_P_MAIZ_A_mn.tif` (crop production) - agricultural indicators at 10x10km grid-cell resolution from SPAM (Spatial Production Allocation Model) data center  https://www.mapspam.info/data/
- `gw_provinces_extra.shp` - Groundwater Provinces of Minnesota derived from  https://gisdata.mn.gov/dataset/geos-groundwater-provinces-mn

**Question 1. Open raster files `spam_H_MAIZ_A_mn.tif` and `spam_P_MAIZ_A_mn.tif` and check their properties: Coordinate Reference System, extent, resolution, NoData, and transform.** 

**Question 2. Load arrays from the datasets opened in the previous question, reset NoData values, and plot them.**

**Question 3. Calculate a ratio of Production raster to Harvested Area to create an array Yield. Find statistics: maximum Yield and its standard deviation.**

**Question 4. Reclassify Yield array by using 3 categories (make your own breaks). Plot both for comparison.**

**Question 5: Resample Production raster to a coarser resolution. Plot both for comparison.**

**Question 6. Use focal statistics function on Yield array to create an array which shows the `mean` Yield within 3x3 cell neighborhood. Then create a raster map that displays the difference between each grid cell's Yield and `mean` Yield of its neighborhood. Plot the original raster, raster with focal statistics applied, and their difference raster as 3 subplots on the same figure.**

**Question 7. Open `gw_provinces_extra.shp`. Change the CRS of this `GeoDataFrame` to the CRS of the Production raster dataset.** 

**Question 8. Calculate zonal statistics of Production raster by groundwater provinces as `sum`. Append results to the groundwater provinces `GeoDataFrame`.**