## Managing data with datreant

`datreant` is a Python package for managing data across many different directories.
We'll jump straight in to a messy directory of simulation results,
these are moderately organised, but are a little scattered:

```
datreant_data
 +-- other_dir
 |   +-- sim_T300
 |   +-- sim_T400
 |   +-- sim_T500
 |   +-- sim_T600
 +-- sim_T100
 +-- sim_T200
 +-- sim_T300
 +-- sim_T400
```

Using `datreant` we can quickly find all directories we have previously tagged

In [26]:
from asciitree import LeftAligned
from collections import OrderedDict as OD

In [43]:
contents = OD(
   datreant_data=OD(
       other_dir=OD(
           sim_T300={},
           sim_T400={},
           sim_T500={},
           sim_T600={},
       ),
       sim_T100={},
       sim_T200={},
       sim_T300={},
       sim_T400={},
   )
)

In [44]:
print(LeftAligned()(contents))

datreant_data
 +-- other_dir
 |   +-- sim_T300
 |   +-- sim_T400
 |   +-- sim_T500
 |   +-- sim_T600
 +-- sim_T100
 +-- sim_T200
 +-- sim_T300
 +-- sim_T400


In [24]:
import datreant as dtr
import os

treants = dtr.discover('.')  # find all datreant objects in this directory

treants

<Bundle(['foo', 'sim_T100', 'sim_T200', 'sim_T300', 'sim_T500', 'sim_T600', 'sim_T300', 'sim_T400', 'sim_T400'])>

From these results, we can then query

In [7]:
treants.tags.any

{'Simulation'}

In [8]:
treants.categories

<AggCategories({'temperature': [100, 200, 300, 500, 600, 300, 400, 400]})>

In [9]:
treants.get(temperature=300)

<Bundle(['sim_T300', 'sim_T300'])>

In [10]:
for t in treants.get(temperature=200):
    print(t.abspath)

/Users/richardgowers/code/WorkshopHackathon2018/06_AdvancedTutorials/datreant_data/sim_T200/


### datreant objects

explain Treant, Bundle, Leaf?

### Constructing your own datreant database

Now we've seen how this can work onc

In [19]:
dtr.Treant?

In [20]:
dtr.Treant('foo')

<Treant: 'foo'>

In [21]:
dtr.Treant('foo').exists

True

### How does it actually work?

Datreant works by storing small files to keep track of directories that you have tagged.
These are kept in a "hidden" directory called `.datreant`.

In [3]:
with open('./sim_T100/.datreant/categories.json', 'r') as inf:
    print(inf.read())
with open('./sim_T100/.datreant/tags.json', 'r') as inf:
    print(inf.read())

{"temperature": 100}
["Simulation"]


This simple implementation means that you can freely rearrange your results directories (eg move them from your HPC cluster to long term storage) without destroying the `Treant` objects you have created.

### Future work

datreant is being actively developed and a command line interface (cli) is currently being added.

For more information on this, talk to David or Richard and check out the repository during the hackathon on Tuesday!