Skip to content

Latest commit

 

History

History
82 lines (52 loc) · 2.18 KB

datasets.rst

File metadata and controls

82 lines (52 loc) · 2.18 KB

Datasets plugins

A Dataset <datasets> is a Python class that provide a curated set of data with specific helper functions. CliMetLab has build-in example datasets for demo purposes. See usage details in Dataset (User guide) <../guide/datasets> and implementation in Dataset (Dev guide) <../developer/datasets>. Dataset are added with pip plugin or yaml files.

Simple datasets using yaml files

Simple datasets are datasets that rely on existing built-in data source <data-sources>, and cannot be parametrised by users. This can be for example a single file downloadable from a URL.

---
dataset:
  source: url
  args:
    url: http://download.ecmwf.int/test-data/metview/gallery/temp.bufr

  metadata:
    documentation: Sample BUFR file containing TEMP messages

Complex datasets using pip plugin

See https://github.com/ecmwf/climetlab-demo-dataset

setuptools.setup(
    name="climetlab-demo-dataset",
    version="0.0.1",
    description="Example climetlab external dataset plugin",

    entry_points={"climetlab.datasets":
            ["demo-dataset = climetlab_demo_dataset:DemoDataset"]
    },

)

See CliMetLab plugin mechanism <plugins general>.

See an example notebook using an external plugin.

Python documentation on plugins.

Automatic generation of a pip package

To make it easier, there is a template for a Dataset plugin using cookiecutter. In addition, for a simple dataset, you can also use a yaml file and rely only on the code provided by CliMetLab or other plugins.

pip install cookiecutter
cookiecutter https://github.com/ecmwf-lab/climetlab-cookiecutter/dataset