# Sharing code next to the data with CliMetLab

Relevant CliMetLab documentation is here: https://climetlab.readthedocs.io/en/latest/contributing/overview.html

# Sharing code




**Excercice**:

Create a dataset plugin for the data `forecast_error.csv` and `soil_temperature.csv`. (These files are located next to this notebook).

- Step 1: Create the plugin boilerplate structure using climetlab-plugin-tools.
- Step 2: Add your code to the plugin.

In [None]:
!ls *.csv

#### Step 1: Create the plugin boilerplate structure using climetlab-plugin-tools.

In [None]:
!climetlab help

In [None]:
!pip install climetlab-plugin-tools --quiet

In [None]:
!climetlab help

In [None]:
# From a shell terminal:
# $ climetlab
# (climetlab) create_plugin_dataset
# Answer questions...

Questions:
- What is the convention for a CliMetLab plugin package?
- What is the convention for a dataset name?

We need feedback:
- How easy is creating a plugin?

#### Step 2: Add your code to the plugin.
Here is the file you want to edit.

In [None]:
!ls climetlab-*/climetlab_*/*.py

In [None]:
@normalize("parameter", ["tp", "t2m"])
def __init__(self, year, parameter):
    request = dict(parameter=parameter, url=URL, year=year)
    self.source = cml.load_source("url-pattern", PATTERN, request)

Solution:

In [None]:
def __init__(self, parameter):
    self.source = cml.load_source("file", parameter + '.csv')
    # For a real plugin use "url" or "url-pattern" sources:
    # self.source = cml.load_source("url", URL_PREFIX + parameter + '.csv')
    # self.source = cml.load_source("url-pattern", PATTERN, {"parameter": parameter} )

Let us test this:

In [None]:
import climetlab as cml 
ds = cml.load_dataset('my-plugin', parameter = 'soil_temperature')

**Excercice**: What is not working? What is missing?

Solution: 

The `pip` package need to be installed.  (Suggested: `pip install -e ./climetlab-my-plugin/.` + reload the kernel)

In [None]:
import climetlab as cml 
cml.load_dataset('my-plugin', parameter = 'soil_temperature').to_pandas()


In [None]:
cml.load_dataset('my-plugin', parameter = 'forecast_error').to_pandas()

# Improving data usability:
Data can be access as panda dataframe. Can we do better to help the end-user handling the data?

What about helping them fixing a typo?

In [None]:
import climetlab as cml
cml.load_dataset('my-plugin', parameter = 'soiltemperature')

In [None]:
# Add the climetlab decorator `@normalize`
@normalize("parameter", ['soil_temperature', 'forecast_error'])
def __init__(self, parameter):
    ...

# And retry previous cell.

This also take care of user using capitals:

In [None]:
import climetlab as cml
ds = cml.load_dataset('my-plugin', parameter = 'SOIL_TEMPERATURE') # ok
ds = cml.load_dataset('my-plugin', parameter = 'Soil_Temperature') # ok

**Excercice**:

Adapt the plugin to use the zipped csv files instead of the csv files.

In [None]:
!ls *.zip

Solution:

Replace "csv" by "zip".

# Dates time parameters
Date and time as some ubiquitous in the climate and meteorology domains that we have developed specific tools to handle these input arguments.

Similar to `@normalize("parameter", ['soil_temperature', 'forecast_error'])`

Adding `@normalize("argument", "date(%Y-%m-%d)")` transforms the input as a string with the requested format.

Relevant CliMetLab documentation: https://climetlab.readthedocs.io/en/latest/contributing/normalize.html

# CliMetLab dataset plugin tour:

https://github.com/mchantry/climetlab-mltc-surface-observation-postprocessing

- Python pip package structure:
	- setup.py + MANIFEST
	- version file
- README
	- Links to notebook in colab/binder/etc. 
- Examples in notebooks:
	- Used in README links
	- Tested in github actions.
- Test in tests/*
	- Using pytest.
	- Used in github actions.
- Github actions: yaml files in .github/workflows/*.yml
	- Check code quality
	- Run tests (fron tests/*.py) on various platform and python versions
- Automated release of the pip package from github (need and account on pypi.org)
	- Make sure the tests pass.
	- Update the */version file
	- Trigger a release : https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository
- Legal stuff: LICENCE/AUTHOR/CONTRIBUTORS


Exercices:
- Create an account on github.com
- Create an account on pypi.org
- Create an account on test.pypi.org
- Publish a plugin on test.pypi.org




Exercice (optional):
- Choose your favorite data.
- Create the corresponding plugin.
- Tell us what is missing.

# Automatic merging (not fully implemented yet)

In [None]:
# What about merging data in one panda dataframe?
import climetlab as cml
cml.load_dataset('my-plugin', parameter = ['soil_temperature', 'forecast_error']).to_pandas()

# CliMetLab architecture
Use modularity of object programming : Dedicated classes to handle specific user requests. 

Such as:
- Read a file with a given format
- Download data from a given source type
- Plot given data to a given backend
- …

Each class knows if it can handle a user request:  **“Ask the classes if they can handle the user request”**
- Dataset->source->reader
- Mutate.

![ARCHITECTURE](architecture.png)

In [None]:
s = cml.load_source('file', 'forecast_error.csv')
print(s)
s = cml.load_source('file', 'forecast_error.zip')
print(s)
s = cml.load_source('file', 'a.nc')
print(s)

# Source plugin (advanced)

**Excercice**:

Create a source plugin for a source named 'my-new-source'.

- Step 1: Create the plugin boilerplate structure using climetlab-plugin-tools.
- Step 2: Add your code to the plugin.



In [None]:
# From a shell terminal:
# $ pip install climetlab-plugin-tools
# $ climetlab
# (climetlab) create_plugin_source
# Answer questions...

In [None]:
# Test using source:
cml.load_source('my-new-source', arg='soil_temperature')

Solution:

Compare to https://github.com/ecmwf/climetlab-demo-source/blob/master/climetlab_demo_source/__init__.py

Compare to https://github.com/ecmwf-lab/climetlab-google-drive-source/blob/main/climetlab_google_drive_source/__init__.py

### Multi-sources

Merging several sources is a common pattern. Here is a preview of how it could work.

We need to see more examples of merging sources of data to provide the right tool for the community.

In [None]:
import climetlab as cml

In [None]:
# Data from  https://pangeo-forge.readthedocs.io/en/latest/tutorials/terraclimate.html

aet1 = cml.load_source('url', 'http://thredds.northwestknowledge.net:8080/thredds/fileServer/TERRACLIMATE_ALL/data/TerraClimate_aet_1958.nc')
aet2 = cml.load_source('url', 'http://thredds.northwestknowledge.net:8080/thredds/fileServer/TERRACLIMATE_ALL/data/TerraClimate_aet_1959.nc')
def1 = cml.load_source('url', 'http://thredds.northwestknowledge.net:8080/thredds/fileServer/TERRACLIMATE_ALL/data/TerraClimate_def_1959.nc')

In [None]:
aet1.to_xarray()

In [None]:
aet2.to_xarray()

In [None]:
def1.to_xarray()

In [None]:
s = cml.load_source('multi', [aet1, def1, aet2])
print(s)
s.to_xarray()

In [None]:
s = cml.load_source('multi', [aet1, aet2], merger='concat(concat_dim=time)')
print(s)
s.to_xarray()


Note how the 'url-pattern' source can help here: parallel download and merging.

In [None]:
PATTERN = 'http://thredds.northwestknowledge.net:8080/thredds/fileServer/TERRACLIMATE_ALL/data/TerraClimate_{parameter}_{year}.nc'
s = cml.load_source('url-pattern', PATTERN, dict(year=[1961], parameter=['aet', 'def']))

In [None]:
s.to_xarray()

# Mirrors

Draft feature: mirroring remote data on a local storage service.

Need more development and discussions.

In [None]:
!echo $CLIMETLAB_MIRROR