# ISPRS Part 3

This part is about the combination of the the other two notebooks and also adding processing data. Our goal is to analyse real situations. We will also get deeper into plotting to create figures which could be part of your next paper.

We start again with loading the according libraries and load datasets

In [None]:
import xcube
from xcube.core.store import new_data_store
from xcube.webapi.viewer import Viewer

import matplotlib.pyplot as plt

In [None]:
store = new_data_store("s3", 
                       root="deep-esdl-public", 
                       storage_options=dict(anon=True))
dataset = store.open_data('esdc-8d-0.25deg-256x128x128-3.0.1.zarr')

## Processing data

The `xarray` library is also capable to process data. And the DeepESDL platform hardware powerful enough to fullfill even difficult tasks.

The simplest computations for reduction of dimension are: `min`, `max`, `mean`, `median`, `count`, `std`, `var`

Additionally you can use the `dim=` argument to specify along which dimension the reduction should happen. Here are some examples:

In [None]:
# precipitation in the coastal area in south west Australia
perth_data = dataset['precipitation_era5'].sel(
    lat = slice(-36.0, -28.0),
    lon = slice(114.5, 121.0),
    time = slice('2019-01-01', '2022-12-31')
)

# and now the processing to mean values for each time step
perth_mean = perth_data.mean(dim = ['lat','lon'])
# plotting
perth_mean.plot.line()
plt.title("Precipitation in south western Australia coastal region 2019-2022")

Another example is to compute the Normalized Difference Vegetation Index (NDVI), which is an index especially for vegetation. Let's consider we don't have acces to them (in fact they are part of the ESDC).

Remember the formula:

$$\text{NDVI} = \frac{\text{NIR} - \text{RED}}{\text{NIR} + \text{RED}}$$

`xarray` is capable to understand arithmetics and to compute accordingly. Therefore consider the following example


In [None]:
# First subset the data
perth_data = dataset.sel(
    lat = slice(-36.0, -28.0),
    lon = slice(114.5, 121.0),
    time = slice('2019-01-01', '2022-12-31')
)

# Now define the important bands for better comprehension
NIR = perth_data['nbar_nir']
RED = perth_data['nbar_red']

# now you can process these data arrays
NDVI = (NIR - RED) / (NIR + RED)

# This will ensure that any NaN or infinite values resulting from the division are filtered out, keeping the NDVI values valid:
NDVI = NDVI.where((NIR + RED) != 0)

You should check the result with some nice plot

In [None]:
NDVI.sel(time = '2021-03-16', method='nearest').plot()

Another small example task is to find differences in different data products which describe similar things. In the ESDC there are two Solar induced chlorophyll fluorescence products: the `sif_rtsif` and the `sif_gosif`. We want to find out if there are differences and how big these are. To limit our computations we want to slice the time to the year 2020.

In [None]:
# create a subset for the year 2020
subset = dataset.sel(
    time = slice('2020-01-01', '2020-12-31')
)

# create variables for each product
rtsif = subset['sif_rtsif']
gosif = subset['sif_gosif']

# calculate the absolute difference
sif_difference = abs(rtsif-gosif)

# remember the values operation to access the data
print(f"The mean absolute difference between rtsif and gosif is {sif_difference.mean().values:.3g}")

This difference is really not big. But we would like to know how the differences are distributed around the world. Therefore we want to create a map with the maximum values.

In [None]:
sif_difference.max(dim = 'time').plot(aspect=2, size=8)
plt.title("maximum differences of absolute sif values in 2020")

## Grouping mechanisms

The grouping mechanism is another powerful tool to perform:

1. *Temporal Grouping*: Imagine you have a time series of data spanning multiple years and you want to compute the monthly or yearly average.
1. *Spatial Grouping*: If your data has categorical land use types, you might want to compute the mean value of a variable for each land type.

In our examples we want to concentrate on temporal grouping because we didn't introduce categorical variables.

We want to create a plot which gives the mean 2m air temperatures of each year in the northern area of Australia.

In [None]:
# Limit to spatial dimensions
coastal_data = dataset['air_temperature_2m'].sel(
    lat = slice(-20.0, -10.0),
    lon = slice(117, 155),
)

# group by time.year
data_yearly = coastal_data.groupby('time.year').mean(dim = 'time')
data_yearly.mean(dim = ['lat','lon']).plot()
plt.title('2m air temperature in North Australien region')

You might wonder why we didn't combine the `mean` function to `mean(dim = ['time', 'lat', 'lon')`. This is because we have yearly maps now which can be used to create a new yearly dataset.

With grouping we even have a new coordinate which we can use. Have a look at the dataset coordinates

In [None]:
data_yearly

This coordinate we can use as a new time slicer.

In [None]:
data = data_yearly.sel(
    year = [*range(1979, 2021, 3)]
)

In [None]:
data = data_yearly.sel(
    year = slice(2010,2021)
)

And with this new coordinate we can also finally introduce faceting which is one more plotting possibility

In [None]:
data.plot(
    col='year',
    col_wrap=4,
    cmap = 'inferno',
    aspect = 4
)

In this way you can nget a fast overview on data. Just check the years 2011 and 2019.