Skip to content

Latest commit

 

History

History
163 lines (117 loc) · 6.5 KB

File metadata and controls

163 lines (117 loc) · 6.5 KB
jupytext kernelspec
formats text_representation
md:myst,ipynb
extension format_name format_version jupytext_version
.md
myst
0.13
1.14.5
display_name language name
Python 3 (ipykernel)
python
python3

Welcome to unitpackage's documentation!

Binder DOI

Annotation of scientific data plays a crucial role in research data management workflows to ensure that the data is stored according to the FAIR principles. A simple CSV file recorded during an experiment usually does, for example, not provide any information on the units of the values within the CSV, nor does it provide information on what system has been investigated or who performed the experiment. Such information can be stored in frictionless Data Packages, which consist of a CSV (data) file which is annotated with a JSON (metadata) file. The unitpackage module provides a Python library to interact with such Data Packages, which have a very specific structure. An example of using a collection of datapackages containing several entries along with the unitpackage Python library is found on echemdb.org. The website shows a collection of electrochemical data stored following in the electrochemistry-data repository according to echemdb's metadata schema.

Examples

A collection of entries can be generated from local files or from a remote repository, such as echemdb.org. To illustrate the usage of unitpackage, we collect the data to echemdb.org from the data repository, which is downloaded by default when the method from_remote() does not receive a URL argument.

For simplicity, we denote the collection as `db` (database), even though it is not a database in that sense.
from unitpackage.collection import Collection
db = Collection.from_remote()

A single entry can be retrieved with an identifier available in the database.

entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']

The metadata of the entry is available from entry.metadata, which supports both dict and attribute-style access.

The data related to an entry can be returned as a pandas dataframe.

entry.df.head()

The units of the columns can be retrieved.

entry.field_unit('j')

The values in the dataframe can be changed to other compatible units.

rescaled_entry = entry.rescale({'E' : 'mV', 'j' : 'uA / m2'})
rescaled_entry.df.head()

The data can be visualized in a plotly figure.

entry.plot('E', 'j')

Specific Collections

For certain datasets, the unitpackage module can be extended by additional modules. Such a module is the Echemdb class, which loads a collection of entries containing cyclic voltammograms stored according to the echemdb metadata schema. Such data is usually found in the field of energy conversion and storage, as illustrated on echemdb.org.

from unitpackage.database.echemdb import Echemdb
db = Echemdb.from_remote()
db.describe()

Filtering the collection for entries having specific properties, e.g., containing Pt as working electrode (WE) material, returns a new collection.

db_filtered = db.filter(lambda entry: entry.get_electrode('WE').material == 'Pt')
db_filtered.describe()
The filtering method is also available to the base class `Collection`.

Creating Unitpackages

Unitpackages can also be created from scratch using CSV files or pandas DataFrames. Metadata and unit descriptions for the fields can be added to produce self-describing data packages.

from unitpackage.entry import Entry

entry = Entry.from_csv(csvname="files/demo_package.csv")
entry = entry.load_metadata("files/demo_package.csv.yaml")
entry = entry.update_fields(fields=[{'name': 't', 'unit': 's'}, {'name': 'j', 'unit': 'A / cm2'}])
entry.save(outdir="generated/files/csv_entry/")

See Creating Unitpackages for more details on adding metadata from YAML, JSON, or Python dictionaries.

Further Usage

Frictionless Data Packages or unitpackges are perfectly machine-readable, making the underlying data and metadata reusable in many ways.

  • The unitpackage API can be used to filter collections of similar data for certain properties, thus allowing for simple comparison of different data sets. For example, you could think of comparing local files recorded in the laboratory with data published in a repository.
  • The content of datapackages can be included in other applications or the generation of a website. The latter has been demonstrated for electrochemical data on echemdb.org. The datapackages could also be published with the frictionless Livemark data presentation framework.

Installation

This package is available on PiPY and can be installed with pip:

pip install unitpackage

The package is also available on conda-forge and can be installed with conda

conda install -c conda-forge unitpackage

or mamba

mamba install -c conda-forge unitpackage

See the installation instructions for further details.

Citing

You can cite this project as described on our Zenodo page, or use this publication (DSJ, 24 (2025) 13) illustrating our approach.

License

The contents of this repository are licensed under the GNU General Public License v3.0 or, at your option, any later version.

+++

:maxdepth: 2
:caption: "Contents:"
:hidden:
installation.md
usage/unitpackage.md
usage/unitpackage_usage.md
usage/echemdb_usage.md
usage/load_and_save.md
usage/create_unitpackage.md
usage/loaders.md
api.md