The YAML Observation Data Archive & exchange (YODA) File Format
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
doc
examples
excel_templates
src/python
.gitignore
LICENSE
README.md

README.md

YAML Observation Data Archive & Exchange (YODA) Format

Getting Started with YODA

YODA is an observational data encoding format using YAML.

Goals

We developed the YAML Observation Data Archive & Exchange (YODA) File Format to serve as a specification for human-readable, machine-parseable, text-based data files that accommodate the full diversity of critical zone science data -- such as hydrological time series, soil profile geochemistry, biodiversity transects, etc. -- that can be organized with the Observations Data Model v2 (ODM2) Specifically, we designed the YODA File format to meet the following requirements:

  • Easy for humans to read and use. Anyone opening the file in a text editor or spreadsheet application should be able to intuitively understand the contents of the file's structured metadata header and comma-separated data table.
  • Easy for machines to parse and generate. The file should be very easy to parse and validate with the wide variety of software tools used by scientists.
  • Group results into a single data array similar to how scientists most commonly view their data, but also conforming to the metadata requirements of an ODM2 Dataset.
  • Serve as a self-describing archival file format that is readily accepted by earth and environmental science data repositories, such as IEDA EarthChem Library or Knowledge Network for Biocomplexity (KNB)

Design Vision

A YODA File follows the data serialization and interchange format of YAML ("YAML Ain't Markup Language"), a superset of JSON (JavaScript Object Notation). YAML can be readily parsed by any modern computer language.

The key feature of a YODA file that distiguishes it from generic YAML is that a YODA file:

  1. Organizes data into a comma-separated data array (e.g. a data table or DataFrame) with multiple columns and rows, and
  2. Provides all the metadata of an ODM2 Dataset so that the data array can be parsed by software into an ODM2 database instance.

YODA Profiles have been developed for common dataset types to define expectations for the data array block and to facilitate data/metadata input forms/templates for the end-user.

A YODA File will be structurally validated against required and optional ODM2 fields and controlled vocabularies using JSON Schema, which provides a means for documenting the YODA File Schema and set of software tools for validating any JSON file against our schema. This work in progress can be found in the YODA-Tools repository.

We are also developing the YODA Tools library, which is built upon the ODM2PythonAPI to create YODA files from our YODA Excel Templates or from an ODM2 database and to import YODA Files into an ODM2 database. YODA Files will thus serve as an interchange format between components of the ODM2 Software Ecosystem.

Specification

The draft YODA File Specification and other YODA File documentation provide many design and implementation details, but are presently a work in progress.

History

The YODA file format developed out of the effort to substantially extend the CZO Display File specfication. The original CZO Display File format was developed in 2010-2011 as a means for US Critical Zone Observatories to share data in a form that was both human readable and machine parsable. The header provides structured metadata that allows the comma-separated data to be ingested into an Observations Data Model 1.1 (ODM1.1) database, such as a CUAHSI HydroServer.

Contribute

There are many ways to contribute:

  • Help us develop the YODA File specification document.
  • Help us develop the JSON-schema validation tools.
  • Help us develop examples of valid YODA Files.
  • Help us develop tools for generating valid YODA Files, such as:
    • MS Excel templates that contain some auto-validation features.
    • Python/R/Matlab scripts.

Credits

This work was supported by National Science Foundation Grants EAR-1224638, EAR-1332257, and ACI-1339834. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.