# Masking and Processing
This notebook uses content from the [Geospatial Carpentries](https://carpentries-incubator.github.io/geospatial-python/)

## ❓ Questions
- How can I crop my raster data to the area of interest?


## ❗ Objectives
- Crop raster data with a bounding box.
- Crop raster data with a polygon.
- Match two raster datasets in different CRS.





# Initial setup
Some parameters we'll need throughout the lesson.  
> **GOOGLE COLAB**  
> If you're on google colab, the variable in this cell will already have been set in the top-cells, so just comment it out below with a `#` like so:
> ```py
> # storage_location = '../workshop_data'
> ```
> Alternatively delete the cell, or just dont run it!

If you're not on clab, run the cell by holding `Shift` then pressing `Enter`!

In [None]:
# If you're on google colab, DONT RUN THIS CELL!
storage_location = '../workshop_data'

# Visualisation

## Finding the files
Before we can load the data files into Python, we first need to navigate to them on our computer. 

For Sentinel 2 files, these come in the "SAFE" standard.

In [None]:
# Get the directory again
import os

base_product_dir = os.path.join(storage_location, 'S2')

In [None]:
os.listdir(base_product_dir)

# Sentinel 2 Data

![Common bands](../notebook_pictures/dmidS2LS7Comparison.png)  
*Comparison of Landsat 7 and 8 bands with Sentinel-2 (USGS Public Domain Image)*

## What's in a raster?

In [None]:
# Get a dictionary of images
import rioxarray
from os.path import join

images = {}
for fname in os.listdir(base_product_dir):
    fpath = join(base_product_dir, fname)
    print(fname[-3:])
    if fname[-3:] == 'txt':
        continue
    band = fname.split('.')[-2]
    images[band] = rioxarray.open_rasterio(fpath)

images

In [None]:
raster_s2_tci = images['visual']

In [None]:
# Check out some attributes
raster_s2_tci

In [None]:
# save portion of an image to disk
subset_tci = raster_s2_tci[:, 100:500, 100:500]
subset_tci

In [None]:
# Write to disk and check it out!
output_filename = os.path.join(storage_location, 'test.tif')
subset_tci.rio.to_raster(output_filename)

# Loading in vector data

In [None]:
# Load in AOI
import geopandas as gpd

# Should be the same on self-install and colab
john_forrest_poly_fname = os.path.join("..", "shp_data", "john_forrest_rough.shp")
AOI_4326 =  gpd.read_file(john_forrest_poly_fname)

In [None]:
# Check out parameters of the polygon(s) we loaded
AOI_4326.crs

In [None]:
# Make sure they're both in the same CRS
# Only run this cell once!
AOI = AOI_4326.to_crs(raster_s2_tci.rio.crs)

# Crop raster data witha  bounding box
The `clip_box` function allows one to crop a raster by the min/max of the x and y coordinates.   
Note that we are cropping the original image raster now (`raster_s2_tci`), and not the roughly subset image `subset_tci`.

In [None]:
raster_clip_box = raster_s2_tci.rio.clip_box(*AOI.total_bounds)

In [None]:
raster_clip_box.plot.imshow(figsize=(8,8))

# Precise raster data cropping with polygons
We have a cropped image around the polygon. To further analyse the image, one may want to crop the image to the exact polygon boundaries.   
This can be done with the clip function!

In [None]:
raster_clip_JF = raster_clip_box.rio.clip(AOI['geometry'])

In [None]:
raster_clip_JF.plot.imshow(figsize=(8,8))

## Cropping raster data using the `reproject_match()` method
So far we have learned how to crop raster images with vector data. We can also crop a raster with another raster data.  
To do this, we will use the `reproject_match` function. As indicated by its name, it performs reprojection and clipping in one go.  

`reproject_match` is an incredibly useful function.   
In addition to clipping and reprojecting, it will also ensure that pixels in each image match up with each other, allowing easy comparison between even disparate datasets.   
For example, it will allow combining reprojecting Landsat 8 imagery to Sentinel 2 for comparison of metrics (reflectance, NDVI, etc)

In [None]:
# First, artificially change the data to show the utility of reproject_match
fake_raster = raster_s2_tci.rio.reproject("EPSG:4326")
fake_raster.plot.imshow()

In [None]:
# Now lets reproject match
fake_raster_clip = fake_raster.rio.reproject_match(raster_clip_JF)
fake_raster_clip.plot.imshow(figsize=(8,8))

We can also use this function to expand and image.

In [None]:
# Now lets reproject match

# Set nodata properly for int8
raster_clip_JF.data[raster_clip_JF.data == 0] = 255 

fake_reproject_match = raster_clip_JF.rio.reproject_match(fake_raster)
fake_reproject_match.plot.imshow(figsize=(8,8))

In one line `reproject_match` does a lot of helpful things:

1. It reprojects (both with a CRS and aligning offset pixels).
2. It matches the extent using nodata values or by clipping the data.
3. It sets nodata values. This means we can run calculations on those two images.


# 📢 Key Points

- Use `clip_box` to crop a raster with a bounding box.
- Use `clip` to crop a raster with a given polygon.
- For efficiency, its usually recommended to `clip_box` then `clip`.
- Use `reproject_match` to match two raster datasets (e.g. for comparison).