In [1]:
import earthnet as en
import xarray as xr
import numpy as np
import pandas as pd
from pathlib import Path

# Downloading EarthNet2021x

We are starting by downloading EarthNet2021x. For demonstration purposes, we only download 2 samples per split.

In [2]:
en.download(dataset = "earthnet2021x", split = "all", save_directory = "data/", limit = 2)#, proxy = "your proxy path")

Finding files of earthnet2021x, split train to download.
Downloading files of earthnet2021x, split train


100%|██████████| 2/2 [00:00<00:00, 14.69it/s]

Downloaded earthnet2021x, split train.
Finding files of earthnet2021x, split iid to download.





Downloading files of earthnet2021x, split iid


100%|██████████| 2/2 [00:00<00:00, 23.05it/s]

Downloaded earthnet2021x, split iid.
Finding files of earthnet2021x, split ood to download.





Downloading files of earthnet2021x, split ood


100%|██████████| 2/2 [00:00<00:00, 15.27it/s]

Downloaded earthnet2021x, split ood.
Finding files of earthnet2021x, split extreme to download.





Downloading files of earthnet2021x, split extreme


100%|██████████| 2/2 [00:00<00:00, 13.01it/s]

Downloaded earthnet2021x, split extreme.
Finding files of earthnet2021x, split seasonal to download.





Downloading files of earthnet2021x, split seasonal


100%|██████████| 2/2 [00:00<00:00,  5.44it/s]

Downloaded earthnet2021x, split seasonal.





# Loading one minicube

Minicubes in EarthNet2021x are saved as NetCDF. We can open them using `xarray` (https://docs.xarray.dev/en/stable/index.html).

In [3]:
train_path = Path("data/earthnet2021x/train/")
trainfiles = list(train_path.glob("**/*.nc"))
print(trainfiles)

[PosixPath('data/earthnet2021x/train/29SND/29SND_2017-06-20_2017-11-16_1209_1337_5049_5177_18_98_78_158.nc'), PosixPath('data/earthnet2021x/train/29SND/29SND_2017-06-10_2017-11-06_2105_2233_2873_3001_32_112_44_124.nc')]


In [4]:
minicube = xr.open_dataset(trainfiles[0])

In [5]:
minicube

# Saving your predictions

We are going to just predict the mean NDVI of the context period and then save that as a netCDF file with variable name `"ndvi_pred"`.
You may save your predictions similarly

In [6]:
test_path = Path("data/earthnet2021x/iid/")
testfiles = list(test_path.glob("**/*.nc"))
print(testfiles)

[PosixPath('data/earthnet2021x/iid/29SND/29SND_2017-06-20_2017-11-16_2617_2745_1465_1593_40_120_22_102.nc'), PosixPath('data/earthnet2021x/iid/29SND/29SND_2017-06-20_2017-11-16_1977_2105_1721_1849_30_110_26_106.nc')]


In [7]:
preddir = Path('preds/')/testfiles[0].parent.stem
preddir.mkdir(parents = True, exist_ok = True)

In [8]:
for testfile in testfiles:
    targ = xr.open_dataset(testfile)
    ndvi = ((targ.s2_B8A - targ.s2_B04) / (targ.s2_B8A + targ.s2_B04 + 1e-8)).where(targ.s2_mask == 0, np.NaN).isel(time = slice(4,None,5))
    pred = ndvi.isel(time = slice(10, None))
    pred.loc[:] = np.nanmean(ndvi.isel(time = slice(10)).values, axis = 0, keepdims = True).repeat(20, axis = 0)
    pred = pred.to_dataset(name = "ndvi_pred")
    predpath = preddir/testfile.name
    pred.to_netcdf(predpath)

In [9]:
predpath

PosixPath('preds/29SND/29SND_2017-06-20_2017-11-16_1977_2105_1721_1849_30_110_26_106.nc')

In [10]:
pred = xr.open_dataset(predpath)

In [11]:
pred

# Scoring your predictions

Next we are going to score the predictions we made.

In [12]:
scores = en.score_over_dataset(str(test_path), str(preddir.parent))

scoring data/earthnet2021x/iid against preds


100%|██████████| 2/2 [00:00<00:00,  3.08it/s]

Done!





In [13]:
scores

{'veg_score': 0.34228313,
 'tree_score': 0.34231284,
 'shrub_score': 0.37179443,
 'grass_score': 0.3419284,
 'crop_score': 0.27430448,
 'swamp_score': nan,
 'mangroves_score': nan,
 'moss_score': nan,
 'all_scores':             lon        lat  longitude_eobs  latitude_eobs      NNSE  \
 0     -8.660640  39.278160            -9.0           39.0  0.380877   
 1     -8.660640  39.277978            -9.0           39.0  0.361314   
 2     -8.660640  39.277796            -9.0           39.0  0.316905   
 3     -8.660640  39.277613            -9.0           39.0  0.346263   
 4     -8.660640  39.277431            -9.0           39.0  0.357243   
 ...         ...        ...             ...            ...       ...   
 32763 -8.570785  39.370865            -9.0           39.0  0.330092   
 32764 -8.570785  39.370683            -9.0           39.0  0.344331   
 32765 -8.570785  39.370500            -9.0           39.0  0.368758   
 32766 -8.570785  39.370318            -9.0           39.0  0.400

We will use the `veg_score` to benchmark models.

This score is computed by taking the mean score across the three landcover classes `Tree cover`, `Shrubland` and `Grassland`.

We score each class using the normalized Nash Sutcliffe model efficiency (https://en.wikipedia.org/wiki/Nash%E2%80%93Sutcliffe_model_efficiency_coefficient) comparing the predicted and the observed NDVI over cloud-free observations.

The score ranges from $0$ to $1$. It is $0.5$ if the predictions are as good as using the mean of the observations (during the target period).
