# Zonal Statistics

Author: [Lukas Valentin Graf](https://github.com/lukasValentin/lukasValentin) (2022-2023)

## Learning Objectives

In this notebook you will learn how to

* calculate zonal statistics for agricultural field parcels
* see how these statistics change over time

## Tutorial Content

This tutorial is introductorial level.

Basic knowledge about [GeoPandas](https://geopandas.org/en/stable/) and EOdal might be helpful. If you are unfamiliar with EOdal you might check out [these notebooks](../General) first.

If you don't know about zonal statistics you might find reading [this blog](https://up42.com/blog/an-introduction-to-zonal-statistics) helpful.

To run this notebook no additional requirements in terms of software-setup are necessary. It connects, however, to [Microsoft Planetary Computer](https://planetarycomputer.microsoft.com/). No authentication is required but stable internet connection is an asset.

The data required to run this notebook can be found [here](./../data).

### Preparing the environment

In [3]:
%pip install scipy eodal

Collecting eodal
  Downloading eodal-0.2.4-py3-none-any.whl.metadata (8.9 kB)
Collecting geoalchemy2 (from eodal)
  Downloading geoalchemy2-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Collecting pydantic-settings (from eodal)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting rasterio (from eodal)
  Downloading rasterio-1.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.1 kB)
Collecting psycopg2-binary (from eodal)
  Downloading psycopg2_binary-2.9.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting rtree (from eodal)
  Downloading rtree-1.4.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.1 kB)
Collecting zarr (from eodal)
  Downloading zarr-3.1.1-py3-none-any.whl.metadata (10 kB)
Collecting pystac-client (from eodal)
  Downloading pystac_client-0.9.0-py3-none-any.whl.metadata (3.1 kB)
Collecting sentinelsat (from eodal)
  Downloading sentinelsat-1.2.1-py3-none-any.whl

In [4]:
import geopandas as gpd
import numpy as np

from datetime import datetime
from eodal.config import get_settings
from eodal.core.scene import SceneCollection
from eodal.core.sensors.sentinel2 import Sentinel2
from eodal.mapper.feature import Feature
from eodal.mapper.filter import Filter
from eodal.mapper.mapper import Mapper, MapperConfigs
from scipy.stats import median_abs_deviation

from pathlib import Path
from typing import List

Settings = get_settings()
# set to False to use a local data archove
Settings.USE_STAC = True

In [6]:
# check EOdal version
import eodal
print(f'The EOdal version is {eodal.__version__}')

The EOdal version is 0.2.4


### Defining custom metrics for zonal statistics

EOdal makes use of the [rasterstats](https://pythonhosted.org/rasterstats/) package to calculate zonal statistics. The cool thing about `rasterstats` is its ability to take custom functions as an input for calculating user-defined statistical metrics. This means, users will have a lot of freedom choicing the metrics most suited to their needs.

Below you find two examples of such custom functions. There are two important things to consider when writing user-defined functions:

1. the *function name* should be unique and not have the same name as any existing numpy function. Therefore, using a prefix such as `my_` might be helpful.
2. the *function should return a single scalar* as this is the definition of doing zonal statistics. *N* raster cells overlapping a geometry are aggregated by the function into a single value.

The two functions below calculate the `median absolute deviation` and the `sum of the square` of the raster cell values. This has no deeper meaning and is just to show how such custom functions should be designed.

In [5]:
def my_median_abs_deviation(x: np.ma.MaskedArray) -> float:
    """
    Custom function to calculate the median absolute
    deviation in `eodal.scene.get_feature_timeseries()`.

    :param x:
        array with raster values
    :returns:
        median absolute deviation value
    """
    x = x.filled(np.nan)
    return median_abs_deviation(x, nan_policy='omit', axis=None)

In [7]:
def my_square_sum(x: np.ma.MaskedArray) -> float:
    """
    Custom function returning sum(x**2)

    :param x:
        array with raster values
    :returns:
        median absolute deviation value
    """
    x = x.filled(np.nan)
    # important: the sum of nan is zero!
    if np.isnan(x).all():
        return np.nan
    return np.nansum(x*x)

### Getting the Sentinel-2 Data

This part of the notebook is essentially the same as in the [EOdal mapper notebook](../General/EOdal_Mapper.ipynb). We therefore don't repeat explainations here and recommened to have a look at the aforementioned notebook for details.

In [8]:
def preprocess_sentinel2_scenes(
    ds: Sentinel2,
    target_resolution: int,
) -> Sentinel2:
    """
    Resample Sentinel-2 scenes and mask clouds, shadows, and snow
    based on the Scene Classification Layer (SCL).

    NOTE:
        Depending on your needs, the pre-processing function can be
        fully customized using the full power of EOdal and its
        interfacing libraries!

    :param target_resolution:
        spatial target resolution to resample all bands to.
    :returns:
        resampled, cloud-masked Sentinel-2 scene.
    """
    # resample scene
    ds.resample(inplace=True, target_resolution=target_resolution)
    # mask clouds, shadows, and snow
    ds.mask_clouds_and_shadows(inplace=True)
    return ds

In [12]:
#%% user-inputs
# -------------------------- Collection -------------------------------
collection: str = 'sentinel2-msi'

# ------------------------- Time Range ---------------------------------
time_start: datetime = datetime(2022,3,1)  		# year, month, day (incl.)
time_end: datetime = datetime(2022,5,1)   		# year, month, day (incl.)

# ---------------------- Spatial Feature  ------------------------------
geom: Path = Path('data/sample_polygons/ZH_Polygons_2020_ESCH_EPSG32632.shp')

# ------------------------- Metadata Filters ---------------------------
metadata_filters: List[Filter] = [
    Filter('cloudy_pixel_percentage','<', 80),
    Filter('processing_level', '==', 'Level-2A')
]

In [11]:
#%% query the scenes available (no I/O of scenes, this only fetches metadata)
feature = Feature.from_geoseries(gpd.read_file(geom).geometry)
mapper_configs = MapperConfigs(
    collection=collection,
    time_start=time_start,
    time_end=time_end,
    feature=feature,
    metadata_filters=metadata_filters
)
mapper_configs

DataSourceError: ../../data/sample_polygons/ZH_Polygons_2020_ESCH_EPSG32632.shp: No such file or directory

In [None]:
# now, a new Mapper instance is created
mapper = Mapper(mapper_configs)
mapper.query_scenes()
# the metadata is loaded into a GeoPandas GeoDataFrame
mapper.metadata

In [None]:
#%% load the scenes available from STAC
scene_kwargs = {
    'scene_constructor': Sentinel2.from_safe,
    'scene_constructor_kwargs': {'band_selection': ['B02', 'B03', 'B04', 'B08']},
    'scene_modifier': preprocess_sentinel2_scenes,
    'scene_modifier_kwargs': {'target_resolution': 10}
}
mapper.load_scenes(scene_kwargs=scene_kwargs)
scoll = mapper.data
scoll

### Calculationg zonal statistics

With the cells above we have loaded multiple scenes at once into a `SceneCollection`. We can now either calculate the zonal statistics per scene (which would be done using `RasterCollection.band_summaries()`) or make use of the `SceneCollection.get_feature_timeseries()` method that calls `RasterCollection.band_summaries()` for each scene and stacks the results into a single `GeoDataFrame`.

In the example below, we calculate two vegetation indices, namely the [NDVI](https://gisgeography.com/ndvi-normalized-difference-vegetation-index/) and the [MSAVI](https://eos.com/industries/agriculture/msavi/), and calculate zonal statistics per single field parcel and Sentinel-2 scene.

In [None]:
# calculate spectral indices
for _, scene in scoll:
    scene.calc_si('NDVI', inplace=True)
    scene.calc_si('MSAVI', inplace=True)

In [None]:
# get time series of the field parcel using a bunch of statistical metrics
ts = scoll.get_feature_timeseries(
    vector_features=geom,
    method=['percentile_10', 'percentile_50', 'median', 'mean', 'percentile_90', my_square_sum, my_median_abs_deviation],
    band_selection=['ndvi', 'msavi']
)
ts

### Plotting Time Series

We can now plot time series values per field parcel using e.g., the median NDVI and MSAVI value per parcel and Sentinel-2 acquisition date.

In [None]:
%pip install seaborn

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('tableau-colorblind10')

# convert timestamps to get a nicely formatted x axis
ts.acquisition_time = pd.to_datetime(ts.acquisition_time)
f, ax = plt.subplots(ncols=2, figsize=(20,10))
ts_sis = ts.groupby(ts.band_name)

idx = 0
for si, ts_si in ts_sis:
    sns.lineplot(x='acquisition_time', y='median', hue='GIS_ID', data=ts_si, ax=ax[idx])
    ax[idx].set_title(si.upper())
    idx += 1

Since `ts` is a `GeoDataFrame` all results can be also visualized on a map or saved as, e.g., GeoPackage file for further analysis.

In [None]:
ts.plot(column='NUTZUNGSCO')