# Tutorial 7: Introduction to Rioxarray
[`rioxarray`](https://corteva.github.io/rioxarray/) is an extension of the powerful Python library Xarray that focuses on geospatial raster data. It provides easy access to georeferencing information and geospatial transforms using Xarray’s labeled, multi-dimensional data structures, which makes it an ideal tool for working with geospatial data like satellite imagery or climate data.

The key feature of `rioxarray` is its seamless integration of rasterio’s geospatial data handling capabilities (such as CRS and affine transforms) with Xarray’s efficient multi-dimensional array handling. This allows you to manipulate, analyze, and visualize raster data with ease.

<div class="alert alert-block alert-info">
<b>Note:</b> This tutorial is heavily based upon the work of <a href="https://geog-312.gishub.org/index.html">others</a>
</div>

## Important before we start
<hr>
Make sure that you save this file before you continue, else you will lose everything. To do so, go to Bestand/File and click on Een kopie opslaan in Drive/Save a Copy on Drive!

Now, rename the file into TAA1_Tutorial7.ipynb. You can do so by clicking on the name in the top of this screen.

## Learning Objectives
<hr>

- Understand how `rioxarray` extends Xarray for geospatial data handling.
- Load and inspect georeferenced raster datasets using `rioxarray`.
- Perform basic geospatial operations, such as clipping, reprojection, and masking, using `rioxarray`.
- Use `rioxarray` to manage CRS and spatial dimensions in raster datasets.
- Export and visualize geospatial raster datasets.

<h2>Tutorial outline<span class="tocSkip"></span></h2>
<hr>
<div class="toc"><ul class="toc-item">
    <li><span><a href="#installing-and-importing-rioxarray" data-toc-modified-id="1.-Installing-and-Importing-Rioxarray-1">1. Installing and Importing Rioxarray</a></span></li>
    <li><span><a href="#loading-georeferenced-raster-data" data-toc-modified-id="2.-Loading-Georeferenced-Raster-Data-2">2. Loading Georeferenced Raster Data</a></span></li>
    <li><span><a href="#basic-geospatial-operations" data-toc-modified-id="3.-Basic-Geospatial-Operations-3">3. Basic Geospatial Operations</a></span></li>
    <li><span><a href="#visualization-of-georeferenced-data" data-toc-modified-id="4.-Visualization-of-Georeferenced-Data-4">4. Visualization of Georeferenced Data</a></span></li>
    <li><span><a href="#saving-data" data-toc-modified-id="5.-Saving-Data-5">5. Saving Data</a></span></li>
    <li><span><a href="#handling-nodata-values" data-toc-modified-id="6.-Handling-NoData-Values-6">6. Handling NoData Values</a></span></li>
    <li><span><a href="#reproject-to-multiple-crs" data-toc-modified-id="7.-Reproject-to-Multiple-CRS-7">7. Reproject to Multiple CRS</a></span></li>
    <li><span><a href="#basic-band-math-ndvi-calculation" data-toc-modified-id="8.-Basic-Band-Math-NDVI-Calculation-8">8. Basic Band Math (NDVI Calculation)</a></span></li>
    <li><span><a href="#exercises" data-toc-modified-id="9.-Exercises-9">9. Exercises</a></span></li>
</ul></div>

## 1. Installing and importing rioxarray
<hr>
To use `rioxarray`, you'll need to install it along with `rasterio` and its dependencies. You can install it via pip by uncommenting the following line:

In [None]:
# %pip install rioxarray rasterio

### Importing rioxarray

You can start by importing `rioxarray` and other necessary libraries:

In [None]:
import rioxarray
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt

xr.set_options(keep_attrs=True, display_expand_data=False)

## 2. Loading Georeferenced Raster Data

One of the core functionalities of `rioxarray` is the ability to load georeferenced raster data, including its CRS and geospatial transformations. You can load a raster file (e.g., a GeoTIFF file) directly using `rioxarray`:

In [None]:
# Load a raster dataset using rioxarray
url = "https://github.com/opengeos/datasets/releases/download/raster/LC09_039035_20240708_90m.tif"
data = rioxarray.open_rasterio(url)
data

Here, `rioxarray.open_rasterio` loads the raster data into an Xarray `DataArray` and automatically attaches the geospatial metadata, including CRS, affine transformations, and spatial coordinates.

### Inspecting the Dataset

You can easily inspect the loaded dataset, including its dimensions, coordinates, and attributes:

In [None]:
# View the structure of the DataArray
data.dims  # Dimensions (e.g., band, y, x)

In [None]:
data.coords  # Coordinates (e.g., y, x in geographic or projected CRS)

In [None]:
print(data.attrs)  # Metadata (including CRS)

### Checking CRS and Transform Information

`rioxarray` integrates CRS and affine transform metadata into the Xarray object:

In [None]:
# Check the CRS of the dataset
data.rio.crs

In [None]:
# Check the affine transformation (mapping pixel coordinates to geographic coordinates)
data.rio.transform()

Sometimes, raster data may not have a CRS, or the CRS could be incorrect. You can assign a CRS manually if necessary:

In [None]:
# If the CRS is missing or incorrect, assign a CRS
data = data.rio.write_crs("EPSG:32611", inplace=True)

## 3. Basic Geospatial Operations

### Reprojecting a Dataset

Reprojecting raster data to another CRS is common in geospatial analysis. For example, you may want to reproject the dataset from its native projection to the WGS84 geographic coordinate system (EPSG:4326):

In [None]:
# Reproject the dataset to WGS84 (EPSG:4326)
data_reprojected = data.rio.reproject("EPSG:4326")
print(data_reprojected.rio.crs)

### Clipping a Raster

Clipping a raster dataset is useful when you only want to focus on a specific geographic area. You can clip a dataset using a bounding box in the same CRS as the data:

In [None]:
# Define a bounding box (in the same CRS as the dataset)
bbox = [-115.391, 35.982, -114.988, 36.425]

# Clip the raster to the bounding box
clipped_data = data_reprojected.rio.clip_box(*bbox)

In [None]:
clipped_data.shape

Alternatively, you can clip the raster using a vector dataset containing polygon geometries:

In [None]:
import geopandas as gpd

# Load a geojson with regions of interest
geojson_path = "https://github.com/opengeos/datasets/releases/download/places/las_vegas_bounds_utm.geojson"
bounds = gpd.read_file(geojson_path)

# Clip the raster to the shape
clipped_data2 = data.rio.clip(bounds.geometry, bounds.crs)

In [None]:
clipped_data2.shape

## Working with Spatial Dimensions

`rioxarray` supports operations on spatial dimensions (latitude/longitude or x/y coordinates) like resampling, reducing, or slicing.

### Resampling

To resample the raster dataset to a different resolution (e.g., 1 km), use the `rio.resample` method:

In [None]:
# Resample to 1km resolution (using an average resampling method)
resampled_data = data.rio.reproject(data.rio.crs, resolution=(1000, 1000))

In [None]:
resampled_data.shape

### Extracting Spatial Subsets

You can extract spatial subsets of the dataset by selecting specific coordinate ranges:

In [None]:
# Select a subset of the data within a lat/lon range
min_x, max_x = -115.391, -114.988
min_y, max_y = 35.982, 36.425
subset = data_reprojected.sel(
    x=slice(min_x, max_x), y=slice(max_y, min_y)
)  # Slice y in reverse order
subset.shape

## 4. Visualization of Georeferenced Data

Once you have performed operations on the data, you can visualize it using matplotlib. For example, to plot a multi-band image using bands 4, 3, and 2:

In [None]:
# Plot the raster data
plt.figure(figsize=(8, 8))
data_reprojected.sel(band=[4, 3, 2]).plot.imshow(vmin=0, vmax=0.3)
plt.title("Landsat Image covering Las Vegas")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

You can also visualize clipped or masked data in the same way:

In [None]:
# Plot the raster data
plt.figure(figsize=(8, 8))
clipped_data.sel(band=[4, 3, 2]).plot.imshow(vmin=0, vmax=0.3)
plt.title("Landsat Image covering Las Vegas")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

For more advanced plots, such as overlaying a vector dataset on the raster data, you can combine `rioxarray` with `geopandas` and `matplotlib`:

In [None]:
# Plot raster with GeoJSON overlay
fig, ax = plt.subplots(figsize=(8, 8))
data.attrs["long_name"] = "Surface Reflectance"  # Update the long_name attribute
data.sel(band=4).plot.imshow(ax=ax, vmin=0, vmax=0.4, cmap="gray")
bounds.boundary.plot(ax=ax, color="red")
plt.title("Raster with Vector Overlay")
plt.show()

## 5. Saving Data

Just like loading data, you can export `rioxarray` datasets to disk. For example, you can save the modified or processed raster data as a GeoTIFF file:

In [None]:
# Save the DataArray as a GeoTIFF file
data.rio.to_raster("output_raster.tif")

## 6. Handling NoData Values

If your dataset contains NoData values, you can manage them using the following functions:

In [None]:
# Assign NoData value
data2 = data.rio.set_nodata(-9999)

# Remove NoData values (mask them)
data_clean = data2.rio.write_nodata(-9999, inplace=True)

## 7. Reproject to Multiple CRS

You can reproject the dataset to multiple CRS and compare them. For instance:

In [None]:
# Reproject to WGS 84 (EPSG:4326)
data = data.rio.reproject("EPSG:4326")
print(data.rio.crs)

In [None]:
# Reproject to EPSG:3857 (Web Mercator)
mercator_data = data.rio.reproject("EPSG:3857")
print(mercator_data.rio.crs)

In [None]:
# Plot the raster data in WGS84
plt.figure(figsize=(6, 6))
data.sel(band=[4, 3, 2]).plot.imshow(vmin=0, vmax=0.3)
plt.title("EPSG:4326")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

In [None]:
# Plot the raster data in Web Mercator
plt.figure(figsize=(6, 6))
mercator_data.sel(band=[4, 3, 2]).plot.imshow(vmin=0, vmax=0.3)
plt.title("EPSG:3857")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

## 8. Basic Band Math (NDVI Calculation)

Band math enables us to perform computations across different bands. A common application is calculating the Normalized Difference Vegetation Index (NDVI), which is an indicator of vegetation health.

NDVI is calculated as:

NDVI = (NIR - Red) / (NIR + Red)

We can compute and plot the NDVI as follows:

In [None]:
# Select the red (band 4) and NIR (band 5) bands
red_band = data.sel(band=4)
nir_band = data.sel(band=5)

# Calculate NDVI
ndvi = (nir_band - red_band) / (nir_band + red_band)
ndvi = ndvi.clip(min=-1, max=1)  # Clip values to the range [-1, 1]
ndvi.attrs["long_name"] = "NDVI"

To visualize the NDVI, we can plot it using matplotlib:

In [None]:
# Plot the NDVI values
ndvi.plot(cmap="RdYlGn", vmin=-1, vmax=1)
plt.title("NDVI of the Landsat Image")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

You can also mask out non-vegetated areas or areas with invalid NDVI values (such as water or urban regions) by applying a threshold:

In [None]:
# Mask out non-vegetated areas (NDVI < 0.2)
ndvi_clean = ndvi.where(ndvi > 0.2)
ndvi_clean.plot(cmap="Greens", vmin=0.2, vmax=0.5)
plt.title("Cleaned NDVI (non-vegetated areas masked)")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

## 9. Exercises

### Sample Dataset

For the exercises, we will use a sample GeoTIFF raster dataset of a flood map for the region of Kampen, which is available at the following URL:
https://github.com/VU-IVM/DamageScanner/raw/refs/heads/main/data/inundation/inundation_map.tif

### Question 8: Load and Inspect a Raster Dataset

1. Use `rioxarray` to load the GeoTIFF raster file.
3. Check and print the CRS and affine transformation of the dataset. 

**Return the EPSG of the coordinate system, and the Affine**

In [None]:
url = ""
raster = xr.open_dataset(url,engine='rasterio')

In [None]:
raster.rio.XXX

In [None]:
raster.rio.XXX

### Question 9: Reproject the Raster to a New CRS

Reproject the loaded raster dataset from its original CRS to EPSG:4326 (WGS84).

**What are the new dimensions? Can you reflect how this approach is different from the rasterio approach?**


In [None]:
reprojected_raster = raster.rio.reproject(XXXX)

In [None]:
reprojected_raster.rio.XXX

### Question 10: Clip the reprojected raster Using a Bounding Box

1. Define a bounding box (e.g., `xmin`, `ymin`, `xmax`, `ymax`) that covers only the city centre of Kampen.
2. Clip the raster dataset using this bounding box.
3. Plot the clipped data to visualize the result.

In [None]:
# 1. Define a bounding box for the Kampen city centre
xmin, ymin, xmax, ymax = 5.895, 52.545, 5.920, 52.565

In [None]:
# 2. Clip the raster using the bounding box
clipped = reprojected_raster.rio.clip_box(minx=, miny=, maxx=, maxy=)

In [None]:
# 3. Plot the clipped data
plt.figure(figsize=(10, 8))
clipped.plot(cmap='Blues')
plt.title('')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

### Question 11 Resample the Raster to a Different Resolution

1. Resample the original raster dataset to a 100m resolution, using an average resampling method.

**What are the new x and y dimensions after resampling?**

In [None]:
from rasterio.enums import Resampling

raster.rio.reproject(
    raster.rio.crs,  # Keep the same CRS
    resolution=XXXX,    # 100m resolution
    resampling=Resampling.XXXX    # average resampling
)