This repository describes a dataset (otherwise known as a data package) and tabular data files using the Frictionless Data schemas.
Table schema: https://specs.frictionlessdata.io/table-schema/
Data package schema: https://specs.frictionlessdata.io/data-package/
Each directory within this repository refers to a published dataset that is named by its Digital Object Identifier (DOI). Note that the "/" from the DOI has been replaced by "_", so the directory 10.5281_zenodo.3634411 refers to the DOI 10.5281/zenodo.3634411.
Each dataset is described by a JSON file, datapackage.json. The tabular data files are described by a separate JSON file, tableschema.json.
Dataset files that are described in each data package, can be found through their DOI or by using the URL in the field "homepage" in the datapackage.json file.
In order for the data to be ingested into EMODNet Physics, a time column is required (seconds since 1970-01-01 in UTC), in accordance with the CF standard names.
A Python script,
convert_isodatetime_in_file_to_timesince.py is provided to convert the date_time field (in ISO 8601 format, YYYY-MM-DDT:hh:mm:ss+00:00) to time (secs since 1970-01-01 in UTC) and add a time field to the data file.
Install requirements. For example create a
venv, activate it and add use pip to install the requirements:
python3 -m venv venv . venv/bin/activate pip3 install -r requirements.txt
To validate the
tableschema run the python script
python3 validate_packages_and_resources.py --dois doi1 doi2 doi3
using the directory names for each
datapackage used in this repository, eg.
python3 validate_packages_and_resources.py --dois 10.5281_zenodo.3843376
If you need further checks to validate the Frictionless Data schema, then go to a directory and validate the
datapackage validate datapackage.json tableschema validate tableschema.json
GoodTables can also be used to validate the
tableschema but this does not validate the SPI-specific fields:
goodtables validate datapackage.json tableschema validate tableschema.json goodtables data.csv --schema tableschema.json
Each directory within the repository is named using the parent DOI of the dataset. Currently (as of June 2020) we are only dealing with datasets here that are published on Zenodo. Within Zenodo, each dataset has a parent DOI which will always resolve to the latest version of a dataset.
Within the Frictionless Data Package schema, the specific version DOIs are used to avoid confusion in the following fields:
- within the
"x_spi_citation": "Rickli, Jörg, Janssen, David J., Hassler, Christel, Ellwood, Michael, & Jaccard, Samuel L. (2019). Seawater chromium concentrations and isotope compositions in the Southern Ocean during the austral summer of 2016/2017, on board the Antarctic Circumnavigation Expedition (ACE). (Version 1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3250980"
index.json file can be consulted for the latest version of each dataset where they are listed with the parent DOI and latest version number.