# Masking and Processing

## ❓ Questions
- How can I crop my raster data to the area of interest?


## ❗ Objectives
- Crop raster data with a bounding box.
- Crop raster data with a polygon.
- Match two raster datasets in different CRS.





# Initial setup for Google Drive
Some parameters we'll need throughout the lesson. Please run these cells!

In [None]:
import os
from os.path import join

from google.colab import drive
google_dir = '/content/drive'
drive.mount(google_dir)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
os.listdir(join(google_dir, 'MyDrive'))

['Corporate-Powerpoint-Template.pptx',
 'Show and Tell.pptx',
 'CIC drop in session - Keegan.gdoc',
 'Quick_meeting_slides_2021-05-09.pptx',
 'asdaf',
 'pipe3_20k.mp4',
 'chl_7_bin_1_stride_4fps.mkv',
 'WAT_Waste_City_of_canning_Project_Lessons_Learnt.xlsx',
 'slides',
 'WCCC_yolov4-unseen.mp4',
 'CIC Carpentries Collaborative Google Doc.gdoc',
 'RezBaz 22.gslides',
 'S&T',
 'ASDAF_BMT_UHI_JIRA_New_Project_Questionnaire.xlsx',
 'CHL_Weekly_2018_1440p.mp4',
 'Chlor_a_Weekly_2018_1440p.mp4',
 'UChl_abs_Weekly_2018_1440p.mp4',
 'solo work',
 'Untitled form.gform',
 'CIDS Computational Resources 2024-03-22.gslides',
 'Colab Notebooks',
 'CIC_Carpentries_Python-master',
 'workshop_google',
 '202404_Intro_Rrs.gslides']

In [None]:
project_dir = join(google_dir, 'MyDrive', "workshop_google")
storage_location = join(project_dir, "workshop_data")

os.makedirs(storage_location, exist_ok=True)

In [None]:
!ls {project_dir}

data		 google_requirements.txt  notebook_pictures  notebooks_colab  workshop_data
environment.yml  LICENSE		  notebooks	     README.md


In [None]:
!pip install -r {project_dir}/google_requirements.txt

Collecting rioxarray (from -r /content/drive/MyDrive/workshop_google/google_requirements.txt (line 2))
  Downloading rioxarray-0.15.5-py3-none-any.whl (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.5/60.5 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Collecting earthpy (from -r /content/drive/MyDrive/workshop_google/google_requirements.txt (line 4))
  Downloading earthpy-0.9.4-py3-none-any.whl (1.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
Collecting pystac-client (from -r /content/drive/MyDrive/workshop_google/google_requirements.txt (line 11))
  Downloading pystac_client-0.8.2-py3-none-any.whl (33 kB)
Collecting rasterio>=1.3 (from rioxarray->-r /content/drive/MyDrive/workshop_google/google_requirements.txt (line 2))
  Downloading rasterio-1.3.10-cp310-cp310-manylinux2014_x86_64.whl (21.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.5/21.5 MB[0m [31

# Visualisation

## Finding the files
Before we can load the data files into Python, we first need to navigate to them on our computer. 

For Sentinel 2 files, these come in the "SAFE" standard.

In [None]:
# Get the directory again
import os

product_dir_textfile = "product_dir.txt"

with open(product_dir_textfile, 'r') as f:
    base_product_dir = f.readline()

base_product_dir

In [None]:
from os.path import join

product_dir = join(base_product_dir, 'GRANULE')
L2_dirname = os.listdir(product_dir)[0]
product_dir = join(product_dir, L2_dirname)
product_dir = join(product_dir, 'IMG_DATA', 'R60m')
os.listdir(product_dir)

In [None]:
# Get a dictionary of filenames
image_paths = {}
for fname in os.listdir(product_dir):
    fpath = join(product_dir, fname)
    file_band = fname.split('_')[2]
    image_paths[file_band] = fpath

image_paths

# Sentinel 2 Data

![Common bands](../notebook_pictures/dmidS2LS7Comparison.png)  
*Comparison of Landsat 7 and 8 bands with Sentinel-2 (USGS Public Domain Image)*

## What's in a raster?

In [None]:
image_paths

In [None]:
import rioxarray


raster_s2_tci = rioxarray.open_rasterio(image_paths['TCI'])

In [None]:
# Check out some attributes
raster_s2_tci

In [None]:
# save portion of an image to disk
subset_tci = raster_s2_tci[:, 100:500, 100:500]
subset_tci

In [None]:
# Write to disk and check it out!
subset_tci.rio.to_raster('test.tif')

# Loading in vector data

In [None]:
# Load in AOI
import geopandas as gpd

john_forrest_poly_fname = "../data/john_forrest_rough.shp"
AOI_4326 =  gpd.read_file(john_forrest_poly_fname)

In [None]:
# Check out parameters of the polygon(s) we loaded
AOI_4326.crs

In [None]:
# Make sure they're both in the same CRS
# Only run this cell once!
AOI = AOI_4326.to_crs(raster_s2_tci.rio.crs)

# Crop raster data witha  bounding box
The `clip_box` function allows one to crop a raster by the min/max of the x and y coordinates.   
Note that we are cropping the original image raster now (`raster_s2_tci`), and not the roughly subset image `subset_tci`.

In [None]:
raster_clip_box = raster_s2_tci.rio.clip_box(*AOI.total_bounds)

In [None]:
raster_clip_box.plot.imshow(figsize=(8,8))

# Precise raster data cropping with polygons
We have a cropped image around the polygon. To further analyse the image, one may want to crop the image to the exact polygon boundaries.   
This can be done with the clip function!

In [None]:
raster_clip_JF = raster_clip_box.rio.clip(AOI['geometry'])

In [None]:
raster_clip_JF.plot.imshow(figsize=(8,8))

## Cropping raster data using the `reproject_match()` method
So far we have learned how to crop raster images with vector data. We can also crop a raster with another raster data.  
To do this, we will use the `reproject_match` function. As indicated by its name, it performs reprojection and clipping in one go.  

`reproject_match` is an incredibly useful function.   
In addition to clipping and reprojecting, it will also ensure that pixels in each image match up with each other, allowing easy comparison between even disparate datasets.   
For example, it will allow combining reprojecting Landsat 8 imagery to Sentinel 2 for comparison of metrics (reflectance, NDVI, etc)

In [None]:
# First, artificially change the data to show the utility of reproject_match
fake_raster = raster_s2_tci.rio.reproject("EPSG:4326")
fake_raster.plot.imshow()

In [None]:
# Now lets reproject match
fake_raster_clip = fake_raster.rio.reproject_match(raster_clip_JF)
fake_raster_clip.plot.imshow(figsize=(8,8))

We can also use this function to expand and image.

In [None]:
# Now lets reproject match

# Set nodata properly for int8
raster_clip_JF.data[raster_clip_JF.data == 0] = 255 

fake_reproject_match = raster_clip_JF.rio.reproject_match(fake_raster)
fake_reproject_match.plot.imshow(figsize=(8,8))

In one line `reproject_match` does a lot of helpful things:

1. It reprojects (both with a CRS and aligning offset pixels).
2. It matches the extent using nodata values or by clipping the data.
3. It sets nodata values. This means we can run calculations on those two images.


# 📢 Key Points

- Use `clip_box` to crop a raster with a bounding box.
- Use `clip` to crop a raster with a given polygon.
- For efficiency, its usually recommended to `clip_box` then `clip`.
- Use `reproject_match` to match two raster datasets (e.g. for comparison).