In [None]:
!pip install --quiet climetlab

# Creating a shared dataset of GRIBs

In [1]:
import climetlab as cml

## Download data to the climetlab cache

In [None]:
for month in range(1, 13):  # This takes a few minutes.
    cml.load_source(
        "mars",
        param=["2t"],
        levtype="sfc",
        area=[50, -50, 20, 50],
        grid=[1, 1],
        date=f"2012-{month}",
    )

In [None]:
cml.load_source(
    "mars",
    param="msl",
    levtype="sfc",
    area=[50, -50, 20, 50],
    grid=[1, 1],
    date="2012-12-01",
);

## Export the data to a shared directory

This is optional, you could keep working on the data from the cache if you are the only user of the data and you do not mind redownloading it later.
Other people should not use your cache:
- When using climetlab the cache will eventually fills up and the data may be deleted automatically,
- You will need to deal with permissions issues.
- It will make it difficult to share the data with other people.

Let us export the data to a shared directory `shared-data/temperature-for-analysis`

In [4]:
# Some housekeeping
!rm -rf shared-data/temperature-for-analysis
!mkdir -p shared-data/temperature-for-analysis

In [5]:
# export all data from my cache which is from mars and not older that 1 day
!climetlab export_cache shared-data/temperature-for-analysis --newer 1h --match mars

[32mCopying cache entries matching 'mars' and newer than '2023-03-11 13:29:29' to shared-data/temperature-for-analysis.[0m
100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 367.98it/s]
[32mCopied 13 cache entries to shared-data/temperature-for-analysis.[0m


## Create indexes to speed up data access when using it. (Optional)

In [None]:
!climetlab index_directory shared-data/temperature-for-analysis

In [None]:
!climetlab availability shared-data/temperature-for-analysis

## Using the data


In [18]:
DATA = "shared-data/temperature-for-analysis"

In [19]:
source = cml.load_source("indexed-directory", DATA)

In [20]:
source.availability

This is a good time to check the data, is all the data here? Are they missing dates? Parameters?

The data is ready to be used as numpy, tensorflow or xarray object.

In [11]:
source.sel(param="msl").to_numpy().mean()

101725.47522756307

In [22]:
cml.load_source("indexed-directory", DATA, param="msl").to_numpy().mean()

101725.47522756307

In [23]:
temp = source.sel(param="2t").order_by("date")
temp.to_tfdataset()

<PrefetchDataset element_spec=TensorSpec(shape=<unknown>, dtype=tf.float32, name=None)>

In [24]:
temp.to_xarray()

In [25]:
# Note that this is wrong (not implemented yet)
temp.availability