# Loading ERA5

This notebook serves to subset and load ERA5 data from Google's public ERA5 analysis-ready, cloud-optimised (ARCO) mirror. Here, ERA5 data for the 1940-01-01 to 2025-12-31 period (continually, if irregularly, updated) at hourly frequency is stored in Zarr format. Beyond format, the sole difference between data available therein and that through the Copernicus Climate Data Store (CDS) is variable naming: longnames are used in the former and shortnames in the latter.

By default, subset data is written to the default blob storage container for the workspace, "workspaceblobstore".

This notebook should be copied / uploaded to the Notebooks area of an Azure Machine Learning and run therein. It does not require GPU capable compute and, due to lazy loading of extracted data and streamed write of the selected subset, is less memory intensive.

In [3]:
%pip install xarray fsspec gcsfs adlfs zarr dask azure-ai-ml azure-identity microsoft-aurora

Collecting microsoft-aurora
  Downloading microsoft_aurora-1.8.0-py3-none-any.whl.metadata (11 kB)
Collecting einops (from microsoft-aurora)
  Downloading einops-0.8.1-py3-none-any.whl.metadata (13 kB)
Collecting huggingface-hub (from microsoft-aurora)
  Downloading huggingface_hub-1.3.1-py3-none-any.whl.metadata (13 kB)
Collecting netcdf4 (from microsoft-aurora)
  Downloading netcdf4-1.7.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (2.1 kB)
Collecting timm (from microsoft-aurora)
  Downloading timm-1.0.24-py3-none-any.whl.metadata (38 kB)
Collecting torch (from microsoft-aurora)
  Downloading torch-2.9.1-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (30 kB)
Collecting hf-xet<2.0.0,>=1.2.0 (from huggingface-hub->microsoft-aurora)
  Downloading hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting httpx<1,>=0.23.0 (from huggingface-hub->microsoft-aurora)
  Downloading httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)


In [10]:
import sys
from datetime import timezone, datetime
from pathlib import Path
from uuid import uuid4

import numpy as np
import xarray as xr  # also requires zarr, gcsfs, dask
from adlfs import AzureBlobFileSystem
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Data, Datastore
from azure.core.exceptions import ResourceNotFoundError
from azure.identity import DefaultAzureCredential
from fsspec import FSMap

# insert parent directory to path for proper absolute local imports
sys.path.insert(0, str(Path.cwd().parent.parent.resolve()))
from setup.common.utils import get_aml_ci_env_vars
from setup.components.common.constants import (
    ATMOS_LEVELS,
    ATMOS_VAR_MAP,
    SFC_VAR_MAP,
    STATIC_VAR_MAP,
)

Define the GCP ERA5 dataset from which to extract a subset.

NOTE: See the [GCP ERA5 ARCO bucket](https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5) for other datasets including alternatively gridded Zarr and raw source NetCDF files. Not all datasets contain every variable or the same time range / frequency.

In [3]:
GCP_ERA5_PATH = "gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3"

Define parameters dictating timestamps to load, namely start and end date and timestep (frequency).

NOTE: At time of writing, data up to 2050-12-31 appears to have been pre-allocated, producing nan-filled arrays when queried. Ensure the time period requested contains valid data.

In [7]:
START_DATE = datetime(2025, 1, 1, 0, tzinfo=timezone.utc)
END_DATE = datetime(2025, 1, 1, 12, tzinfo=timezone.utc)
FREQUENCY = 6

Define the surface and pressure level variables and pressure levels to load. Variable longnames are mapped to shortnames for convenience (particularly when reading data into Aurora `Batch` objects) and are non-functional.

To ingest new variables and levels, add the former by longname to the appropriate dictionary and the latter by integer pressure level (hPa) to the given list.

NOTE: The two variable mappings are not strict. That is, single-level variables can be added to the pressure level variable mapping without error, they simply afford separation and readability. For the minimum set of variables required for Aurora 0.25 pre-trained, see `setup/components/common/constants.py`.

In [8]:
EXTRA_SFC_VARS = {
    "volumetric_soil_water_layer_1": "swvl1",
    "volumetric_soil_water_layer_2": "swvl2",
}
EXTRA_ATMOS_VARS = {}
EXTRA_LEVELS = []

Lazy load and subset data by variables, levels, time range, and timestep.

NOTE: This will take at least 1 minute regardless of subset size due to the need to load all remote metadata which, for an archive of this volume, comprises several GB.

In [13]:
ds = xr.open_zarr(GCP_ERA5_PATH, chunks={})

# separately extract static and dynamic variable subsets, update attrs to reflect subset
static_subset_ds = ds[STATIC_VAR_MAP.keys()].sel(time=np.datetime64(START_DATE))
ds.attrs.update(
    valid_time_start=START_DATE.isoformat(),
    valid_time_stop=START_DATE.isoformat(),
)

dynamic_vars = [
    *SFC_VAR_MAP.keys(),
    *EXTRA_SFC_VARS.keys(),
    *ATMOS_VAR_MAP.keys(),
    *EXTRA_ATMOS_VARS.keys(),
]
dynamic_subset_ds = ds[dynamic_vars].sel(
    time=slice(np.datetime64(START_DATE), np.datetime64(END_DATE), FREQUENCY),
    level=ATMOS_LEVELS + EXTRA_LEVELS,
)
dynamic_subset_ds.attrs.update(
    valid_time_start=START_DATE.isoformat(),
    valid_time_stop=END_DATE.isoformat(),
)
dynamic_subset_ds

  static_subset_ds = ds[STATIC_VAR_MAP.keys()].sel(time=np.datetime64(START_DATE))
  time=slice(np.datetime64(START_DATE), np.datetime64(END_DATE), FREQUENCY),


Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 154.46 MiB 51.49 MiB Shape (3, 13, 721, 1440) (1, 13, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",3  1  1440  721  13,

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 154.46 MiB 51.49 MiB Shape (3, 13, 721, 1440) (1, 13, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",3  1  1440  721  13,

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 154.46 MiB 51.49 MiB Shape (3, 13, 721, 1440) (1, 13, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",3  1  1440  721  13,

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 154.46 MiB 51.49 MiB Shape (3, 13, 721, 1440) (1, 13, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",3  1  1440  721  13,

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 154.46 MiB 51.49 MiB Shape (3, 13, 721, 1440) (1, 13, 721, 1440) Dask graph 3 chunks in 3 graph layers Data type float32 numpy.ndarray",3  1  1440  721  13,

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 3 graph layers,3 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Obtain and create necessary environment parameters and Azure interface objects.

In [14]:
az_cred = DefaultAzureCredential()
sub_id, rg_name, ws_name = get_aml_ci_env_vars()
ml_client = MLClient(
    credential=az_cred,
    subscription_id=sub_id,
    resource_group_name=rg_name,
    workspace_name=ws_name,
)

Define location and write subset data using the default workspace blob storage container ("workspaceblobstore"), the corresponding storage account, and a UUID v4 store name to avoid inadvertent naming collisions.

NOTE: The filesystem object and mapper can be avoided by using "abfs://" protocol paths and the `storage_options` parameter of `.to_zarr()`, though doing so can result in bugs from event loops created and managed by `xarray` / `zarr` and `fsspec` / `adlfs`. For example:
```python
subset_ds.to_zarr(
    f"abfs://{dst_datastore.container_name}/{uuid4()}.zarr",
    mode="w",
    compute=True,
    consolidated=True,
    zarr_format=2,
    storage_options={
        "credential": DefaultAzureCredential(),
        "account_name": dst_datastore.account_name,
    },
)
```

In [15]:
def write_data(
    ds: xr.Dataset,
    path: str,
    datastore: Datastore,
    fs: AzureBlobFileSystem,
) -> FSMap:
    """Write an xarray Dataset to Zarr in a given datastore and path.

    Parameters
    ----------
    ds : xarray.Dataset
        Dataset to write.
    path : str
        Path within the datastore to write the dataset to (i.e. below container level).
    datastore : Datastore
        Datastore to write the dataset to.
    fs : adlfs.AzureBlobFileSystem
        Filesystem object for the datastore.

    Returns
    -------
    mapper : fsspec.FSMap
        Filesystem mapper pointing to the written Zarr dataset.

    """
    mapper = fs.get_mapper(f"{datastore.container_name}/{path}")
    ds.to_zarr(
        mapper,
        mode="w",
        compute=True,
        consolidated=True,
        zarr_format=2,
    )
    print(
        f"Wrote to: account={datastore.account_name}, "
        f"container={datastore.container_name}, store={path}",
    )
    return mapper

In [16]:
dst_datastore = ml_client.datastores.get("workspaceblobstore")
path_template = f"aurora-workshop/input/{uuid4()}/workshop_era5_{{data_type}}.zarr"
static_path = path_template.format(data_type="static")
dynamic_path = path_template.format(data_type="dynamic")

fs = AzureBlobFileSystem(dst_datastore.account_name, credential=az_cred)
write_data(static_subset_ds, static_path, dst_datastore, fs)
dynamic_store = write_data(dynamic_subset_ds, dynamic_path, dst_datastore, fs)

Wrote to: account=amldataplatfor2828763095, container=azureml-blobstore-4703d702-8507-4a39-b60d-b522261ab74a, store=aurora-workshop/input/1ef5ceb0-95cc-4490-875d-23eb3f63c71d/workshop_era5_static.zarr
Wrote to: account=amldataplatfor2828763095, container=azureml-blobstore-4703d702-8507-4a39-b60d-b522261ab74a, store=aurora-workshop/input/1ef5ceb0-95cc-4490-875d-23eb3f63c71d/workshop_era5_dynamic.zarr


Confirm persisted data is available and valid.

NOTE: An equality check with the original `subset_ds` (e.g. `new_ds.equals(subset_ds)`) can be used for the avoidance of doubt but requires loading data into memory, which may take time and result in an OOM error, subset size dependent.

In [17]:
xr.open_dataset(dynamic_store, engine="zarr", chunks={})

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 154.46 MiB 51.49 MiB Shape (3, 13, 721, 1440) (1, 13, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",3  1  1440  721  13,

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 154.46 MiB 51.49 MiB Shape (3, 13, 721, 1440) (1, 13, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",3  1  1440  721  13,

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 154.46 MiB 51.49 MiB Shape (3, 13, 721, 1440) (1, 13, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",3  1  1440  721  13,

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 154.46 MiB 51.49 MiB Shape (3, 13, 721, 1440) (1, 13, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",3  1  1440  721  13,

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 154.46 MiB 51.49 MiB Shape (3, 13, 721, 1440) (1, 13, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",3  1  1440  721  13,

Unnamed: 0,Array,Chunk
Bytes,154.46 MiB,51.49 MiB
Shape,"(3, 13, 721, 1440)","(1, 13, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.88 MiB 3.96 MiB Shape (3, 721, 1440) (1, 721, 1440) Dask graph 3 chunks in 2 graph layers Data type float32 numpy.ndarray",1440  721  3,

Unnamed: 0,Array,Chunk
Bytes,11.88 MiB,3.96 MiB
Shape,"(3, 721, 1440)","(1, 721, 1440)"
Dask graph,3 chunks in 2 graph layers,3 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Define and create / update the Azure Machine Learning data asset entity for persisted data.

In [18]:
def generate_asset(name: str, description: str, path: str) -> None:
    """Create or update an AML data asset.

    Parameters
    ----------
    name : str
        Name of the data asset.
    description : str
        Description of the data asset.
    path : str
        Path to data within the datastore (i.e. below container level).

    """
    try:
        new = int(next(ml_client.data.list(name=name)).version) + 1
    except ResourceNotFoundError:
        new = 1
    asset = Data(
        name=name,
        version=str(new),
        description=description,
        path=f"azureml://subscriptions/{sub_id}/resourcegroups/{rg_name}/workspaces/{ws_name}/datastores/{dst_datastore.name}/paths/{path}",
    )
    ml_client.data.create_or_update(asset)
    print(
        f"Created or updated asset: name={asset.name}, version={asset.version}, "
        f"path={asset.path}",
    )

In [None]:

for asset_type in ["static", "dynamic"]:
    asset_name = f"gcp-era5-{asset_type}"
    asset_description = "Subset of static ERA5 variables from the GCP ERA5 ARCO dataset."
    generate_asset(asset_name, asset_description, static_path)

Created or updated asset: name=workshop-gcp-era5-static, version=1, path=azureml://subscriptions/62118f5c-be37-400f-9f20-a8b77a2a7877/resourcegroups/aml-data-platform-poc-rg/workspaces/aml-data-platform-poc-hub-ws/datastores/workspaceblobstore/paths/aurora-workshop/input/1ef5ceb0-95cc-4490-875d-23eb3f63c71d/workshop_era5_static.zarr
Created or updated asset: name=workshop-gcp-era5-dynamic, version=1, path=azureml://subscriptions/62118f5c-be37-400f-9f20-a8b77a2a7877/resourcegroups/aml-data-platform-poc-rg/workspaces/aml-data-platform-poc-hub-ws/datastores/workspaceblobstore/paths/aurora-workshop/input/1ef5ceb0-95cc-4490-875d-23eb3f63c71d/workshop_era5_static.zarr
