Skip to content

christian-monch/datalad-compute

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataLad compute extension

This code is a POC, that means currently:

  • code does not thoroughly validate inputs
  • names might be inconsistent
  • few tests
  • fewer docs
  • no support for locking

This is a naive datalad compute extension that serves as a playground for the datalad remake-project.

It contains an annex remote that can compute content on demand. It uses template files that specify the operations. It encodes computation parameters in URLs that are associated with annex keys, which allows to compute dropped content instead of fetching it from some storage system. It also contains the new datalad command compute that can trigger the computation of content, generate the parameterized URLs, and associate this URL with the respective annex key. This information can then be used by the annex remote to repeat the computation.

Installation

There is no pypi-package yet. To install the extension, clone the repository and install it via pip (preferably in a virtual environment):

git clone https://github.com/christian-monch/datalad-compute.git
cd datalad-compute
pip install -r requirements-devel.txt
pip install .

Example usage

Install the extension and create a dataset

> datalad create compute-test-1
> cd compute-test-1

Create the template directory and a template

> mkdir -p .datalad/compute/methods
> cat > .datalad/compute/methods/one-to-many <<EOF
inputs = ['first', 'second', 'output']

use_shell = 'true'
executable = 'echo'
arguments = [
    "content: {first} > '{output}-1.txt';",
    "echo content: {second} > '{output}-2.txt'",
]
EOF
> datalad save -m "add `one-to-many` compute method"

Create a "compute" annex special remote:

> git annex initremote compute encryption=none type=external externaltype=compute

Execute a computation and save the result:

> datalad compute -p first=bob -p second=alice -p output=name -o name-1.txt \
-o name-2.txt one-to-many

The method one-to-many will create two files with the names <output>-1.txt and <output>-2.txt. That is why the two files name-1.txt and name-2.txt are listed as outputs in the command above.

Note that only output files that are defined by the -o/--output option will be available in the dataset after datalad compute. Similarly, only the files defined by -i/--input will be available as inputs to the computation (the computation is performed in a "scratch" directory, so the input files must be copied there and the output files must be copied back).

> cat name-1.txt
content: bob
> cat name-2.txt
content: alice

Drop the content of name-1.txt, verify it is gone, recreate it via datalad get, which "fetches" is from the compute remote:

> datalad drop name-1.txt
> cat name-1.txt
> datalad get name-1.txt
> cat name-1.txt

The command datalad compute does also support to just record the parameters that would lead to a certain computation, without actually performing the computation. We refer to this as speculative computation.

To use this feature, the following configuration value has to be set:

> git config annex.security.allow-unverified-downloads ACKTHPPT

Afterward, a speculative computation can be recorded by providing the -u option (url-only) to datalad compute.

> datalad compute -p first=john -p second=susan -p output=person \
-o person-1.txt -o person-2.txt -u one-to-many
> cat person-1.txt    # this will fail, because the computation has not yet been performed

ls -l person-1.txt will show a link to a not-downloaded URL-KEY. git annex whereis person-1.txt will show the associated computation description URL. No computation has been performed yet, datalad compute just creates an URL-KEY and associates a computation description URL with the URL-KEY.

Use datalad get to perform the computation for the first time and receive the result::

> datalad get person-1.txt
> cat person-1.txt

Contributing

See CONTRIBUTING.md if you are interested in internals or contributing to the project.

About

A playground to hone in on the datalad remake system

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages