Skip to content

MathiasStokholm/alkymi

Repository files navigation

alkymi ⚗️

build docs coverage pypi versions

Alkymi is a pure Python (3.7+) library for describing and executing tasks and pipelines with built-in caching and conditional evaluation based on checksums.

Alkymi is easy to install, simple to use, and has very few dependencies outside of Python's standard library. The code is cross-platform, and allows you to write your pipelines once and deploy to multiple operating systems (tested on Linux, Windows and Mac).

Documentation, including a quickstart guide, is provided here.

Features

  • Easily define complex data pipelines as decorated Python functions
    • This allows you to run linting, type checking, etc. on your data pipelines
  • Return values are automatically cached to disk, regardless of type
  • Efficiently checks if pipeline is up-to-date
    • Checks if external files have changed, bound functions have changed or if pipeline dependencies have changed
  • No domain specific language (DSL) or CLI tool, just regular Python
    • Supports caching and conditional evaluation in Jupyter Notebooks
  • Cross-platform - works on Linux, Windows and Mac
  • Expose recipes as a command-line interface (CLI) using alkymi's Lab type

Sample Usage

For examples of how to use alkymi, see the quickstart guide.

Example code:

import numpy as np
import alkymi as alk

@alk.recipe()
def long_running_task() -> np.ndarray:
    # Perform expensive computation here ...
    hard_to_compute_result = np.array([42])
    # Return value will be automatically cached to disk
    return hard_to_compute_result

result = long_running_task.brew()  # == np.ndarray([42])

Or one of the examples, e.g. MNIST.

Installation

Install via pip:

pip install --user alkymi

Or see the Installation page.

Testing

After installing, you can run the test suite (use the lint, coverage and type_check recipes to perform those actions):

python3 labfile.py brew test

License

alkymi is licensed under The MIT License as found in the LICENSE.md file

Upcoming Features

The following features are being considered for future implementation:

  • Type annotations propagated from bound functions to recipes
  • Support for call/type checking all recipes (e.g. by adding a check command to Lab)
  • Cache maintenance functionality

Known Issues

  • alkymi currently doesn't check custom objects for altered external files when computing cleanliness (e.g. MyClass has a self._some_path that points to a file somewhere outside alkymi's internal cache)
  • alk.foreach() currently only supports enumerable inputs of type List or Dict
  • Recipes marked transient will always be dirty, and thus always require reevaluation. This functionality should be replaced by a proper means of creating recipes that don't cache outputs, but only run when needed to provide inputs for downstream recipes