## Imports

In [1]:
import pyearthtools.data
import tempfile

## Variables

In [2]:
var = '2t'
doi = '2021-01-01T0100'

## Catalog
For easy repeatability and sharing of DataIndex configurations, it is possible to create a Catalog of DataIndexes. 

This can then be saved and reloaded for use.

Also included is a `Default_Catalog`, that exists at the top level of the package, and thus is stateful while `pyearthtools.data` is loaded.

In [3]:
pyearthtools.data.Default_Catalog

### Creating a New Entry
Ultimately a catalog entry can be any function, or class, the CatalogEntry simply stores it's path, and the parameters to be passed to it upon initialisation or execution. **For reuse however, these parameters cannot be objects.**

We shall show use cases using the DataIndexes here.

First, let's setup a basic ERA5 index as a catalog entry, and show how to use it

In [4]:
CatEntry = pyearthtools.data.CatalogEntry(pyearthtools.data.archive.ERA5,'ERA5', var, level = 'single')
CatEntry

A Catalog Entry can be called just like any other DataIndex

In [5]:
CatEntry(doi)

Unnamed: 0,Array,Chunk
Bytes,3.96 MiB,500.07 kiB
Shape,"(1, 721, 1440)","(1, 253, 506)"
Dask graph,9 chunks in 3 graph layers,9 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.96 MiB 500.07 kiB Shape (1, 721, 1440) (1, 253, 506) Dask graph 9 chunks in 3 graph layers Data type float32 numpy.ndarray",1440  721  1,

Unnamed: 0,Array,Chunk
Bytes,3.96 MiB,500.07 kiB
Shape,"(1, 721, 1440)","(1, 253, 506)"
Dask graph,9 chunks in 3 graph layers,9 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### Adding to a Catalog
A catalog is a combination of a dict and list, and as such new entries can be added to it. 

A Catalog, CatalogEntry can easily be added together.

In [6]:
new_catalog = pyearthtools.data.Default_Catalog + CatEntry
new_catalog

To then retrieve from the Catalog, simply use it's key/name

In [7]:
new_catalog.ERA5

### DataIndex Catalogues
All inbuilt DataIndexes provide thier own CatalogEntry accessible from `.catalog`, they can also be added/appended directly to a catalog, with the option to override the name.

In [8]:
era5 = pyearthtools.data.archive.ERA5(var, level = 'single')
era5.catalog

In [9]:
new_catalog.append(era5, name = 'ERA5_DataIndex')
new_catalog

### Saving & Loading Catalogues
To provide the reusability mentioned, these catalogs can be saved and reload.

In [10]:
with tempfile.TemporaryDirectory() as tempdir:
    tfile = f"{tempdir}/test_cat.catalog"
    new_catalog.save(tfile)
    !cat "{tfile}"

{
    "ERA5": {
        "data_index": "pyearthtools.data.archive.ERA5.ERA5",
        "name": "ERA5",
        "args": [
            "2t"
        ],
        "kwargs": {
            "level": "single"
        }
    },
    "ERA5_DataIndex": {
        "data_index": "pyearthtools.data.archive.ERA5.ERA5",
        "name": "ERA5_DataIndex",
        "args": [],
        "kwargs": {
            "variables": [
                "2t"
            ],
            "level": "single"
        }
    }
}

In [11]:
with tempfile.TemporaryDirectory() as tempdir:
    tfile = f"{tempdir}/test_cat.catalog"
    new_catalog.save(tfile)
    
    reloaded_catalog = pyearthtools.data.Catalog.load(tfile)
reloaded_catalog

In [12]:
reloaded_catalog['ERA5_DataIndex'](doi)

Unnamed: 0,Array,Chunk
Bytes,3.96 MiB,500.07 kiB
Shape,"(1, 721, 1440)","(1, 253, 506)"
Dask graph,9 chunks in 3 graph layers,9 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.96 MiB 500.07 kiB Shape (1, 721, 1440) (1, 253, 506) Dask graph 9 chunks in 3 graph layers Data type float32 numpy.ndarray",1440  721  1,

Unnamed: 0,Array,Chunk
Bytes,3.96 MiB,500.07 kiB
Shape,"(1, 721, 1440)","(1, 253, 506)"
Dask graph,9 chunks in 3 graph layers,9 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
