# Unitpackage introduction

A short introduction on the basic functionalities of the `unitpackage` module. More extensive use cases, the documentation of the API and the structure of unitpackages can be found in the [unitpackage documentation](https://echemdb.github.io/unitpackage/).

A unitpackage consists of a JSON, describing a CSV, which can include additional metadata on the CSV. Such metadata can be stored in a YAML, as illustrated in the manuscript.

## Basic Usage

A unitpackage can be constructed with the `unitpackage` interface.

In [1]:
from unitpackage.entry import Entry
import yaml

with open("./data/raw/data.csv.meta.yaml", "rb") as f:
    metadata = yaml.load(f, Loader=yaml.SafeLoader)

fields = metadata["figure description"]["fields"]

entry = Entry.from_csv(csvname="./data/raw/data.csv", metadata=metadata, fields=fields)
entry.save(outdir="./data/generated/")

A collection of unitpackages can be loaded from a local source.

In [2]:
from unitpackage.collection import Collection

db = Collection.from_local('./data/generated')
db

[Entry('data')]

Individual entries can be selected by their identifier, which is identical to that of the original CSV.

In [3]:
entry = db['data']
entry.experimentalist

'Max Doe'

The data from the CSV can be evoked as a [`pandas`](https://pandas.pydata.org/docs/index.html) DataFrame.

In [4]:
entry.df.head()


Unnamed: 0,t,U,T
0,0,1.01,275
1,1,1.02,275
2,2,1.05,275
3,3,0.95,275
4,4,0.99,275


Simple operations with units are implemented.

In [5]:
entry.rescale({'U':'V'}).df.head()

Unnamed: 0,t,U,T
0,0,0.00101,275
1,1,0.00102,275
2,2,0.00105,275
3,3,0.00095,275
4,4,0.00099,275


The descriptors from the YAML can be called as a property

In [6]:
entry.current

5.0 mA

Values can be calculated from the data in the CSV and the metadata.

In [7]:
import astropy.units as u

# Value from metadata
I = entry.current
# Value from the actual data
U = entry.df['U'].mean() * u.Unit(entry.field_unit('U'))
R = U / I
print(R.to(u.Ohm))

0.2 Ohm


Unitpackages can directly be created from `pandas` DataFrames.

In [8]:
import pandas as pd
from unitpackage.entry import Entry

data = {'t': [1,2,3], 'v': [1,3,2]}
df = pd.DataFrame(data)

metadata = {'user': 'Max Doe'}
fields = [{'name': 't', 'unit':'s'},{'name': 'v', 'unit':'m/s', 'description': 'velocity'}]

df_entry = Entry.from_df(df, basename='acceleration', metadata=metadata, fields=fields)
df_entry

Entry('acceleration')

The data can be visualized.

In [9]:
df_entry.plot()

# Loading remote data

Loading entries form remote data location requires a URL pointing to a zip, for example in a Zenodo repository.

In [10]:
from unitpackage.cv.cv_collection import CVCollection
import os
zenodo_data = os.environ.get(
        "RU_DATA_URL",
        "https://zenodo.org/records/10623871/files/DunklesArchipel/ORR_on_Ru0001-0.2.0.zip",
    )

zdb = CVCollection.from_remote(data='.', url=zenodo_data)
# execution takes a while to download the content from Zenodo

In [11]:
len(zdb)

53

 By default, the data from the [electrochemistry-data]() repository is loaded.

In [12]:
from unitpackage.cv.cv_collection import CVCollection

cvdb = CVCollection.from_remote()
cvdb.describe()

{'number of references': 44,
 'number of entries': 205,
 'materials': {'Ag', 'Au', 'Cu', 'Pt', 'Ru'}}

# Filter Collection

A simple Filter

In [13]:
filtered_db = cvdb.filter(lambda entry: entry.get_electrode('WE').material == 'Pt')
filtered_db.describe()

{'number of references': 19, 'number of entries': 130, 'materials': {'Pt'}}

Define more complex custom filters

In [14]:
def custom_filter(entry):
    if entry.get_electrode('WE').material == 'Pt':
        for component in entry.system.electrolyte.components:
            if 'ClO4' in component.name:
                return True
    return False

filtered_db = cvdb.filter(custom_filter)
filtered_db.describe()

{'number of references': 12, 'number of entries': 94, 'materials': {'Pt'}}

## Create specific entries

For certain types of data it makes sense to create specific collections and entries to simplify accessing certain properties. This is illustrated for the example in the manuscript, where the resistance $R$ is calculated from a measured voltage $U$ upon applying a curren $I$.

In [15]:
from unitpackage.entry import Entry
from unitpackage.collection import Collection

class ElectroEntry(Entry):

    def __repr__(self):
        return f"ElectroEntry({self.identifier!r})"

    @property
    def mean_U(self):
        import astropy.units as u
        return self.df['U'].mean() * u.Unit(self.field_unit('U'))

    @property
    def R(self):
        import astropy.units as u

        return (self.mean_U / self.current).to(u.Ohm)

class ElectroCollection(Collection):

    Entry = ElectroEntry

    @property
    def df_summary(self):
        import pandas as pd
        I = []
        U = []
        R = []
        for entry in self:
            I.append(entry.current)
            U.append(entry.mean_U)
            R.append(entry.R)

        data = {'I':I, 'U': U, 'R': R}

        return pd.DataFrame(data)

Load data into an `ElectroCollection`

In [16]:
edb = ElectroCollection.from_local('./data/generated')
edb

[ElectroEntry('data')]

and show the summary for all entries within the Collection.

In [17]:
edb.df_summary

Unnamed: 0,I,U,R
0,5.0 mA,1.0 mV,0.2 Ohm


Access the properties specific to the collections' entries

In [18]:
eentry = edb['data']
eentry.R

<Quantity 0.2 Ohm>