# openEO Basics: How to load a data dube from a data collection?

This notebook provides a detailed guide on how to load a `DataCube` from a data collection.
Additionally, it will cover how to authenticate in order to process and download data.

## Setup

Import the `openeo` package and connect to the Copernicus Data Space Ecosystem openEO back-end.

In [14]:
import openeo
import xarray
import matplotlib.pyplot as plt
import rioxarray

from openeo.processes import ProcessBuilder
import pandas as pd
import pygc
import numpy as np
from tqdm import tqdm
from shapely.geometry import Polygon, Point
import geopandas as gpd
from geojson import Feature, FeatureCollection, dump

In [15]:
# SENTINEL1_GRD https://openeo.dataspace.copernicus.eu/openeo/1.1
# SENTINEL2_L2A https://openeo.dataspace.copernicus.eu/openeo/1.1

In [16]:
connection = openeo.connect(url="https://openeo.dataspace.copernicus.eu/openeo/1.1")
connection

<Connection to 'https://openeo.dataspace.copernicus.eu/openeo/1.1' with NullAuth>

Note the `NullAuth` in the representation of the connection, which indicates that we are not logged in yet.

The canonical way to log in is using the `authenticate_oidc()` method.
This might, depending on your situation, trigger an authentication procedure. 
Follow the instructions, if any.

In [17]:
connection.authenticate_oidc()
# connection.authenticate_basic(username="yeowanli24@gmail.com",password="Astorea4358")

Authenticated using refresh token.


<Connection to 'https://openeo.dataspace.copernicus.eu/openeo/1.1' with OidcBearerAuth>

Note that the connection is now authenticated now through `OidcBearerAuth`.

In [18]:
#lengths in m
ew_width = 2000
ns_height = 2000
size = int(ew_width/1000)

t = ("2020-01-01", "2020-12-31")

main_datafile_path = "estingAustralia.csv"

### Load Area of Interest
Each coordinate from the CSV file is used to create a bounding box.

In [19]:
treecoords = pd.read_csv(main_datafile_path)
treecoords

Unnamed: 0.1,Unnamed: 0,project,site,lat,long,Granule Number
0,0,SouthWestForests-DON019FireInv,k_1,-34.7310,116.2081,2
1,1,SouthWestForests-DON019FireInv,k_2,-34.7265,116.2081,2
2,2,SouthWestForests-DON019FireInv,k_3,-34.6949,116.2085,2
3,3,SouthWestForests-DON019FireInv,k_4,-34.7265,116.2136,1
4,4,SouthWestForests-DON019FireInv,k_5,-34.7221,116.2136,1
...,...,...,...,...,...,...
241,318,LIRE,k_242,-41.3530,147.5222,1
242,319,Ausplot Forest Monitoring Network,k_243,-41.3671,147.6032,3
243,320,LIPL,k_244,-42.4391,147.7789,3
244,321,LIPL,k_245,-42.7232,147.8451,5


In [26]:
batch = treecoords.iloc[67:]
batch

Unnamed: 0.1,Unnamed: 0,project,site,lat,long,Granule Number
67,99,NFSI_RG,k_68,-34.2120,144.0858,1
68,103,NFSI_RG,k_69,-35.6096,144.1773,1
69,104,NFSI_RG,k_70,-35.5886,144.1780,1
70,105,NFSI_RG,k_71,-35.6623,144.1883,1
71,106,NFSI_RG,k_72,-35.5994,144.1905,1
...,...,...,...,...,...,...
241,318,LIRE,k_242,-41.3530,147.5222,1
242,319,Ausplot Forest Monitoring Network,k_243,-41.3671,147.6032,3
243,320,LIPL,k_244,-42.4391,147.7789,3
244,321,LIPL,k_245,-42.7232,147.8451,5


In [27]:
# sitelist = treecoords['site']
# lat = list(treecoords['lat'])
# lon = list(treecoords['long'])
# projectlist = list(treecoords["project"])

sitelist = batch['site']
lat = list(batch['lat'])
lon = list(batch['long'])
projectlist = list(batch["project"])

In [28]:
def latLonBoxByWandH(lat,lon,ew_width,ns_height):
    lats, lons = [], []
    #distance in m, az (in deg), lat (in deg), long (in deg)

    res = pygc.great_circle(distance=ew_width/2, azimuth=90, latitude=lat, longitude=lon)
    lat, lon = res['latitude'], res['longitude']

    res = pygc.great_circle(distance=ns_height/2, azimuth=180, latitude=lat, longitude=lon)
    lat, lon = res['latitude'], res['longitude']
    lats.append(lat), lons.append(lon)

    res = pygc.great_circle(distance=ew_width, azimuth=270, latitude=lat, longitude=lon)
    lat, lon = res['latitude'], res['longitude']
    lats.append(lat), lons.append(lon)

    res = pygc.great_circle(distance=ns_height, azimuth=0, latitude=lat, longitude=lon)
    lat, lon = res['latitude'], res['longitude']
    lats.append(lat), lons.append(lon)

    res = pygc.great_circle(distance=ew_width, azimuth=90, latitude=lat, longitude=lon)
    lat, lon = res['latitude'], res['longitude']
    lats.append(lat), lons.append(lon)
    
    return {'lats':lats,'lons':lons}

In [29]:
#loc_lon,loc_lat
loc_lat = np.asfarray(lat,float)
loc_lon = np.asfarray(lon,float)
len(loc_lat),len(loc_lon)

(179, 179)

In [30]:
# define child process, use ProcessBuilder
def scale_function(x: ProcessBuilder):
    return x.linear_scale_range(0, 6000, 0, 255)

### Sentinel 1 Data
These set of images includes bands for VV and VH.

In [32]:
for lat, lon, code in tqdm(zip(loc_lat,loc_lon, sitelist), total = len(loc_lat)):
    box = latLonBoxByWandH(lat,lon,ew_width,ns_height)
    # print(box)
    
    spatial_extent = {
        "west": min(box["lons"]),
        "south": min(box["lats"]),
        "east": max(box["lons"]),
        "north": max(box["lats"]),
        "crs": "EPSG:4326",
    }
    # # print(spatial_extent)
    
    s1_cube = connection.load_collection(
        "SENTINEL1_GRD",
        temporal_extent= t,
        spatial_extent= spatial_extent,
        bands=["VV", "VH"],
    )

    # polygon_geom = Polygon(zip( box['lons'], box['lats']))
    # s1_clipped_cube = s1_cube.filter_spatial(polygon_geom)
    
    s1_cube = s1_cube.sar_backscatter(coefficient='sigma0-ellipsoid')

    s1_cube.download(f"sent1_2bands(2020)/S_{code}_2017.tif")
    
    

  1%|          | 1/179 [00:31<1:33:28, 31.51s/it]

### Sentinel 2 Data
These set of images include bands B02, B03, B04, B05, B06, B07, B08, B8A, B11, B12 and SCL. SCL is used as a mask for the dataset.

In [43]:
for lat, lon, code in tqdm(zip(loc_lat,loc_lon, sitelist), total = len(loc_lat)):
    box = latLonBoxByWandH(lat,lon,ew_width,ns_height)
    # print(box)
    
    spatial_extent = {
        "west": min(box["lons"]),
        "south": min(box["lats"]),
        "east": max(box["lons"]),
        "north": max(box["lats"]),
        "crs": "EPSG:4326",
    }
    # # print(spatial_extent) 
    
    s2_cube = connection.load_collection(
    "SENTINEL2_L2A",
    temporal_extent= t,
    spatial_extent = spatial_extent,
    bands=["B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B11", "B12", "SCL"],
    max_cloud_cover=60,
    )
        
    scl_band = s2_cube.band("SCL")
    cloud_mask = (scl_band == 3) | (scl_band == 8) | (scl_band == 9)
    
    cloud_mask = cloud_mask.resample_cube_spatial(s2_cube)
    cube_masked = s2_cube.mask(cloud_mask)
    s2_cube= cube_masked.mean_time()
    s2_cube = s2_cube.apply(scale_function)
    
    s2_cube.download(f"sent2_8bands(2020)/S_{code}_2017.tif")

  0%|          | 0/70 [00:00<?, ?it/s]Preflight process graph validation raised: [MissingProduct] Tile 'S2A_MSIL2A_20200103T001101_N0500_R073_T55GDP_20230426T062103' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2A_MSIL2A_20200103T001101_N0213_R073_T55GDP_20200103T021705' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2B_MSIL2A_20200105T000239_N0500_R030_T55GDP_20230421T144219' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2B_MSIL2A_20200105T000239_N9999_R030_T55GDP_20230814T205832' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2B_MSIL2A_20200105T000239_N0213_R030_T55GDP_20200105T014904' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2B_MSIL2A_20200108T001109_N0213_R073_T55GDP_20200108T014752' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2B_MSIL2A_20200108T001109_N9999_R073_T55GDP_20230817T024822' in collection 'SENTINEL2_L2A' is not

In [45]:
ds = xarray.load_dataset("../../../S30E170_ESACCI-BIOMASS-L4-AGB_SD-MERGED-100m-2020-fv4.0.tiff")
ds