In [None]:
!pip install --quiet climetlab

# Creating a shared dataset of GRIBs (Optional notebook)

This notebook is here illustrate the process of creating a dataset directly based on CDS/MARS data. 

This notebook is optional because the CliMetLab feature used is experimental, the final API may change slightly and the data **may** need to be reindexed when changing CliMetLab version. For now, use it for research, do not use it for operations. 

In [None]:
import climetlab as cml

## Download data to the climetlab cache

In [None]:
for month in range(1, 13): # This takes a few minutes.
    cml.load_source(
        "mars",
        param=["2t"],
        levtype="sfc",
        area=[50, -50, 20, 50],
        grid=[1, 1],
        date=f"2012-{month}",
    )


In [None]:

cml.load_source(
    "mars",
    param="msl",
    levtype="sfc",
    area=[50, -50, 20, 50],
    grid=[1, 1],
    date="2012-12-01",
);

## Export the data to a shared directory

This is optional, you could keep working on the data from the cache if you are the only user of the data and you do not mind redownloading it later.
Other people should not use your cache:
- When using climetlab the cache will eventually fills up and the data may be deleted automatically,
- You will need to deal with permissions issues.
- It will make it difficult to share the data with other people.

Let us export the data to a shared directory `shared-data/temperature-for-analysis`

In [None]:
# Some housekeeping
!rm -rf shared-data/temperature-for-analysis
!mkdir -p shared-data/temperature-for-analysis

In [None]:
# export all data from my cache which is from mars and not older that 1 day
!climetlab export_cache shared-data/temperature-for-analysis --newer 1h --match mars

## Create indexes to speed up data access when using it. (Optional)

In [None]:
!climetlab index_directory shared-data/temperature-for-analysis

In [None]:
!climetlab availability shared-data/temperature-for-analysis

## Using the data


In [None]:
DATA = "shared-data/temperature-for-analysis"

In [None]:
source = cml.load_source('directory', DATA)

In [None]:
source.availability

This is a good time to check the data, is all the data here? Are they missing dates? Parameters?

The data is ready to be used as numpy, tensorflow or xarray object.

In [None]:
source.sel(param='msl').to_numpy().mean()

In [None]:
cml.load_source('directory', DATA, param='msl').to_numpy().mean()

In [None]:
temp = source.sel(param='2t').order_by('date')
temp.to_tfdataset()

In [None]:
temp.to_xarray()

# Exercice

1 - Download some data from the EFAS seasonal reforecast dataset from the CDS (https://cds.climate.copernicus.eu/cdsapp#!/dataset/efas-seasonal-reforecast?tab=form). 

2 - Export the data to a shared location.

**Hint:**

  You will need to create a CDS account.
  
  You may start with a request such as:

In [None]:
import climetlab as cml
ds = cml.load_source('cds', 'efas-seasonal-reforecast',
    **{
        'variable': 'volumetric_soil_moisture',
        'model_levels': 'soil_levels',
        'soil_level': [ '1', '2', '3', ],
        'hyear': '2019',
        'hmonth': '05',
        'leadtime_hour': [ '24', '48', '72'],
        'format': 'grib',
    })

3 - To go further:

Create a dataset plugin to distribute your dataset. (Please do not upload your test dataset to pypi, use https://test.pypi.org for testing and learning purposes.)