# signac - PyData Ann Arbor Meetup 2018

## Integration with the Python ecosystem

``signac`` is designed to be extremely lightweight, making it easy to work with other tools.
Here, we demonstrate how it can be integrated with some other tools, which we also use to provide some comparison of ``signac``'s functionality with these tools.

### Sacred

The [Sacred provenance management tool](https://sacred.readthedocs.io/en/latest/) is a popular Python package for logging experiments and reproducing them later.
It provides some functionality that slightly overlaps with **signac**, but both packages can be used in a complementary manner.

In [None]:
# Remove left-over files from previous runs...
!rm -rf project.py experiment.py workspace signac.rc

Initialize the project iterating over some arbitrary index variable *i*.

In [None]:
import signac

project = signac.init_project("Sacred")

for i in range(5):
    project.open_job({"foo": i}).init()

Then setup the *sacred* experiment, which is a function of some state point variable.

In [None]:
%%writefile experiment.py
from sacred import Experiment

ex = Experiment()

@ex.command
def hello(foo):
    print('hello #', foo)

@ex.command
def goodbye(foo):
    print('goodbye #', foo)

In [None]:
%%writefile project.py
from flow import FlowProject
from sacred.observers import FileStorageObserver
from experiment import ex


class SacredProject(FlowProject):
    pass

    
def setup_sacred(job):
    ex.add_config(** job.sp)
    ex.observers[:] = [FileStorageObserver.create(job.fn('my_runs'))]

    
@SacredProject.operation
@SacredProject.post.true('hello')
def hello(job):
    setup_sacred(job)
    ex.run('hello')
    job.doc.hello = True


@SacredProject.operation
@SacredProject.pre.after(hello)
@SacredProject.post.true('goodbye')
def goodbye(job):
    setup_sacred(job)
    ex.run('goodbye')
    job.doc.goodbye = True


if __name__ == '__main__':
    SacredProject().main()

In [None]:
!python3 project.py run -n 1

In [None]:
!python3 project.py run

In [None]:
!python3 project.py status --stack --pretty --full

### pandas

The data in a signac database can easily be coerced into a format suitable for [pandas](https://pandas.pydata.org/).
Here, we showcase a simple ideal gas study, where both the state point metadata and document metadata is exported into a pandas `DataFrame`.

An ideal gas can be modeled with the ideal gas equation: $pV = NRT$, where the product of the pressure $p$ and the volume $V$ are linearly proportional to the amount of molecules $N$, the ideal gas constant $R=8.314 \frac{\text{J}}{\text{mol K}}$, and the absolute temperature $T$.

We start by initializing the data space.

In [None]:
import pandas as pd
import numpy as np
import signac

project = signac.init_project("pandas", root='pandas-project')

for T in 200, 300, 400:
    for p in 1, 10, 100:
        job = project.open_job(dict(T=T, p=p, N=1))
        job.doc.V = job.sp.N * job.sp.T * 8.313 / job.sp.p

We then export the *project index* to a pandas DataFrame, while flattening the statepoint dictionary:

In [None]:
def flatten_statepoint(doc):
    for key, value in doc.pop('statepoint').items():
        yield 'sp.' + key, value
    for key, value in doc.items():
        yield key, value
        

project_index = {doc['_id']: dict(flatten_statepoint(doc)) for doc in project.index()}
df = pd.DataFrame(project_index).T.set_index('_id')
df

We can then apply the standard pandas selection ...

In [None]:
df[df['sp.p'] == 1]

... and aggregation mechanisms.

In [None]:
df[df['sp.p'] == 1].V.max()

### Datreant

The [``datreant.core``](http://datreant.readthedocs.io/en/latest/) package is one of the closer analogues to the ``signac`` core data managment package.
However, it is even less restrictive than ``signac`` in that it does not require any index; it simply offers a way to associate arbitrary directories on the filesystem with metadata.

Both packages can be used in conjunction if there is value in maintaining trees within a ``signac`` data space or *vice versa*.

In [None]:
import signac

project = signac.init_project("Datreant", root='datreant-project')

for i in range(5):
    project.open_job({"i": i}).init()

In [None]:
import datreant.core as dtr

for job in project:
    with job:
        dtr.Treant('tree1').categories['foo'] = 1
        dtr.Treant('tree2').categories['foo'] = 2

In [None]:
from glob import glob

for job in project:
    print(job)
    with job:
        for tree in glob('tree?'):
            print(tree, dtr.Treant(tree).categories)
    print()