Flor

Build, configure, run, and reproduce experiments with Flor.

What is Flor?

Flor (formerly known as Jarvis) is a system with a declarative DSL embedded in python for managing the workflow development phase of the machine learning lifecycle. Flor enables data scientists to describe ML workflows as directed acyclic graphs (DAGs) of Actions and Artifacts, and to experiment with different configurations by automatically running the workflow many times, varying the configuration. To date, Flor serves as a build system for producing some desired artifact, and serves as a versioning system that enables tracking the evolution of artifacts across multiple runs in support of reproducibility.

How do I run it?

Clone or download this repository.

You'll need Anaconda, preferably version 4.4+

Please read this guide to set up a Python 3.6 environment inside Anaconda. Whenever you work with Flor, make sure the Python 3.6 environment is active.

Once the Python 3.6 environment in Anaconda is active, please run the following command (use the requirements.txt file in this repo):

pip install -r requirements.txt

Next, we will install RAY, a Flor dependency:

brew update
brew install cmake pkg-config automake autoconf libtool boost wget

pip install numpy funcsigs click colorama psutil redis flatbuffers cython --ignore-installed six
conda install libgcc

pip install git+https://github.com/ray-project/ray.git#subdirectory=python

Next, Add the directory containing this flor package (repo) to your PYTHONPATH.

For examples on how to write your own flor workflow, please have a look at:

examples/twitter.py -- classic example
examples/plate.py -- multi-trial example

Make sure you:

Import flor
Initialize a flor.Experiment
set the experiment's groundClient to 'ground'.

Once you build the workflow, call pull() on the artifact you want to produce. You can find it in ~/flor.d/.

If you pass in a non-empty dict to pull (see lifted_twitter.py), the call will return a pandas dataframe with literals and requested artifacts for the columns, and different trials for the rows.

Note on data

The dataset used in some of our examples has migrated.

Example program

Contents of the examples/plate.py file:

import flor

with flor.Experiment('plate_demo') as ex:

	ex.groundClient('ground')

	ones = ex.literal([1, 2, 3], "ones")
	ones.forEach()

	tens = ex.literal([10, 100], "tens")
	tens.forEach()

	@flor.func
	def multiply(x, y):
	    z = x*y
	    print(z)
	    return z

	doMultiply = ex.action(multiply, [ones, tens])
	product = ex.artifact('product.txt', doMultiply)

product.pull()
product.plot()

On run produces:

Motivation

Flor should facilitate the development of auditable, reproducible, justifiable, and reusable data science workflows. Is the data scientist building the right thing? We want to encourage discipline and best practices in ML workflow development by making dependencies explicit, while improving the productivity of adopters by automating multiple runs of the workflow under different configurations.

Features

Simple and Expressive Object Model: The Flor object model consists only of Actions, Artifacts, and Literals. These are connected to form dataflow graphs.
Data-Centric Workflows: Machine learning applications have data dependencies that obscure traditional abstraction boundaries. So, the data "gets everywhere": in the models, and the applications that consume them. It makes sense to think about the data carefully and specifically. In Flor, data is a first-class citizen.
Artifact Versioning: Flor uses git to automatically version every Artifact (data, code, etc.) and Literal that is in a Flor workflow.
Artifact Contextualization: Flor uses Ground to store data about the context of Artifacts: their relationships, their lineage. Ground and git are complementary services used by Flor. Together, they enable experiment reproduction and replication.
Parallel Multi-Trial Experiments: Flor should enable data scientists to try more ideas quickly. For this, we need to enhance speed of execution. We leverage parallel execution systems such as Ray to execute multiple trials in parallel.
Visualization and Exploratory Data Analysis: To establish the fitness of data for some particular purpose, or gain valuable insights about properties of the data, Flor will leverage visualization techniques in an interactive environment such as Jupyter Notebook. We use visualization for its ability to give immediate feedback and guide the creative process.

License

Flor is licensed under the Apache v2 License.

Name		Name	Last commit message	Last commit date
Latest commit History 286 Commits
docs		docs
examples		examples
images		images
object_model		object_model
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REPORT.md		REPORT.md
__init__.py		__init__.py
above_ground.py		above_ground.py
decorators.py		decorators.py
experiment.py		experiment.py
experiment_graph.py		experiment_graph.py
global_state.py		global_state.py
headers.py		headers.py
jground.py		jground.py
requirements.txt		requirements.txt
stateful.py		stateful.py
util.py		util.py
viz.py		viz.py

License

adambaker/flor

Folders and files

Latest commit

History

Repository files navigation

Flor

What is Flor?

How do I run it?

Note on data

Example program

Motivation

Features

License

About

Resources

License

Stars

Watchers

Forks

Languages