Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Metaflow makes it easy to build and manage real-life data science, AI, and ML pr
- [Introduction to Developing with Metaflow](metaflow/introduction)
- [Creating Flows](metaflow/basics) ✨*New: support for conditional and recursive steps*✨
- [Inspecting Flows and Results](metaflow/client)
- [Authoring Flows Incrementally](metaflow/authoring-flows/introduction) ✨*New: `spin` command*✨
- [Managing Flows in Notebooks and Scripts](metaflow/managing-flows/introduction)
- [Debugging Flows](metaflow/debugging)
- [Visualizing Results](metaflow/visualizing-results/)
Expand All @@ -42,7 +43,7 @@ Metaflow makes it easy to build and manage real-life data science, AI, and ML pr
- [Computing at Scale](scaling/remote-tasks/introduction)
- [Managing Dependencies](scaling/dependencies) ✨*New: support for `uv`*✨
- [Dealing with Failures](scaling/failures) ✨*New: support for `@exit_hook`*✨
- [Checkpointing Progress](scaling/checkpoint/introduction) ✨*New*✨
- [Checkpointing Progress](scaling/checkpoint/introduction)
- [Loading and Storing Data](scaling/data)
- [Organizing Results](scaling/tagging)
- [Accessing Secrets](scaling/secrets)
Expand Down
93 changes: 93 additions & 0 deletions docs/metaflow/authoring-flows/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import ReactPlayer from 'react-player'

# Authoring Flows Incrementally

Every non-trivial piece of software is built incrementally, one piece at a time.

With Metaflow, you might start with a simple stub, perhaps just a step to load data,
and then gradually add more `@step`s, say, for data transformation, model training,
and beyond, testing the flow at each iteration. To enable a smooth development
experience, these iterations should run quickly, with minimal waiting - much
like the familiar workflow in a notebook, where you build results one cell at a time.

## The `spin` command

:::info New Feature
The `spin` command was introduced in Metaflow
2.19. [Read the announcement blog post for motivation](https://netflixtechblog.medium.com/b2d5b95c63eb).
:::

While you can certainly `run` a flow from start to end at each iteration -
similar to the "Run All" mode in notebooks - this can take a while. You
can use [`resume`](/metaflow/debugging#how-to-use-the-resume-command) to run a part of
a flow, reusing past results, but even this might be
overkill when you are focused on developing a particular `@step` and you just
want to test it with appropriate input data.

Metaflow provides a `spin` command to address this use case: rapid, iterative
development and testing of a single step. Watch this one minute video
to see it in action:

<ReactPlayer controls url="https://youtu.be/3RNMM-lthm0" />
<br/>

As shown in the video, you can use `spin` to author flows incrementally following
this pattern:

1. Develop a stub of a flow - at the minimum, add `start` and `end` steps.
2. Use `run` to run the flow to produce an initial set of inputs.
3. Edit any step, `somestep`.
4. Use `python myflow.py spin somestep` to test the changes quickly using
the input artifacts from the latest run (or any earlier run).
5. Once `somestep` seems to work, add a next step and repeate the process from (2)
until the flow is complete.

### The properties of `spin`

As `spin` is meant for rapid testing of an individual step, it doesn't track
metadata or persist artifacts by default. Hence you won't see the `spin` iterations
on the Metaflow UI, and you can't access artifacts globally using
[the Client API](/metaflow/client). Instead, you can eyeball logs on the console and
optionally [access the output artifacts locally](/metaflow/authoring-flows/spin-input-output).
Once the step seems to work, just `run` the flow as usual to take a snapshot of all
metadata and artifacts.

Currently `spin` doesn't support
[executing tasks remotely](/scaling/remote-tasks/requesting-resources) but you can use
`@pypi` and `@conda` for [dependency management](scaling/dependencies) as usual. Also,
`spin` comes in handy in [visualizing results with `@card`](/metaflow/visualizing-results/effortless-task-inspection-with-default-cards#developing-cards-quickly-with-spin).

You may use `spin` programmatically using [the `Runner` API](/metaflow/managing-flows/runner),
as described in the section about using [`spin` for unit testing](/metaflow/authoring-flows/spin-input-output#using-spin-for-unit-testing).

Check warning on line 61 in docs/metaflow/authoring-flows/introduction.md

View workflow job for this annotation

GitHub Actions / Run linters

Line length: Expected: 100; Actual: 140

### Testing a step with past results

:::note
You may need to upgrade your metadata service to allow `spin` to find past results
efficiently. The command will show a message about this if an upgrade is needed.
:::

By default, `spin` executes the given step with artifacts originating from the latest
run in [the current namespace](/scaling/tagging). Hence you can just `spin somestep`
without having to worry about the input artifacts changing abruptly.

Optionally, you may spin a step using artifacts from any past run. Simply provide
the full pathspec of a task as an argument for `spin`, like here:
```
python myflow.py spin 32455/train/355
```
In this case, `spin` will re-execute the `train` step in `myflow.py` using the same
inputs that were provided for the given task `32455/train/355`.

:::tip
You can use `spin` to test a step quickly with different inputs, since it can replay
any past results. For example, if you have several previous runs with varying datasets
or sample sizes, you can `spin` the step against each one to see how it behaves with
diverse inputs.
:::

:::note
Spin may not work properly on certain flows with conditionals. We are working on
improving this support.
:::

88 changes: 88 additions & 0 deletions docs/metaflow/authoring-flows/spin-input-output.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@

# Spin Inputs And Outputs

By default, `spin` uses artifacts from the most recent `run` as inputs.
It doesn’t produce any new artifacts, nor does it record metadata, making
it ideal for quick, transient smoke tests which mainly focus on logs and
errors output on the console, as well as `@card`s.

However, you can optionally override inputs, even individual artifacts,
and capture outputs for later inspection, as described below.

## Inspecting artifacts produced by `spin`

To persist artifacts for inspection, run `spin` with the `--persist`
option:
```
python myflow.py spin train --persist
```
After running `spin`, you can inspect its artifacts using
[the Client API](/metaflow/client). To do so, tell the Client to look at
the ephemeral results from `spin` instead of the usual metadata service by
pointing it to the working directory that contains the results via
`inspect_spin`, as shown below:

```python
from metaflow import inspect_spin

inspect_spin(".")
Flow("TrainingFlow").latest_run["train"].task["model"].data
```

This will fetch the results from a special local `./.metaflow_spin`
datastore. You can safely delete the `.metaflow_spin` directory when you
don't need the results anymore. Note that the `"."` argument tells `inspect_spin` which
directory to look for the `.metaflow_spin` directory in so you can also create multiple
"spin" environments if you wish to.

This way, you can quickly test and inspect artifacts without persisting
anything in the main datastore permanently.

### Using `spin` for unit testing

The above pattern makes `spin` useful for unit testing of individual steps
e.g. in a CI/CD pipeline.

In a unit testing script (e.g. using `pytest`), you can use
[the `Runner` API](/metaflow/managing-flows/runner) to run `spin` with
`persist=True` to capture output artifacts, the correctness
of which you can `assert` on the step has completed, like here:

```python
from metaflow import Runner

with Runner("flow.py").spin("train", persist=True) as spin:
assert spin.task["model"].data == "kmeans"
```
Running this command in different directories would create separate `.metaflow_spin`
directories thereby isolating your various tests.

## Overriding input artifacts

As mentioned, `spin` uses the exact same input artifacts as what were used
in the latest run of the given step, or those of [any past
run](/metaflow/authoring-flows/introduction#testing-a-step-with-past-results).

You may, however, override any or all of the artifacts individually. This can
come in handy if you want to test your step code quickly with arbitrary inputs
on the fly. Since artifacts can be any Python objects, the overrides are defined
as a special Python module (file) that contains a dictionary, `ARTIFACTS`, like
in this example:

```python
ARTIFACTS = {
"model": "kmeans",
"k": 15
}
```

You can save this to a file, say, `artifacts.py`, and run `spin` as follows:
```
python myflow.py spin train --artifacts-module artifacts.py
```
In this case, the base set of artifacts is loaded from the latest run
(since no explicit pathspec was provided on the command line), and two of them,
`model` and `k`, are overridden by the module. In short, when looking for an artifact,
Metaflow will first look in the `ARTIFACTS` dictionary to see if the name it is looking
for is present, if so, it returns that value, if not, it will look in the artifacts
passed down from the run specified.
16 changes: 10 additions & 6 deletions docs/metaflow/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@ why this is a good idea and how to create flows in practice, see [Creating
Flows](/metaflow/basics).

3. Flows are living and dynamic
entities that you should be able to execute locally and improve gradually (this is where
[`resume` comes in handy!](/metaflow/debugging#how-to-use-the-resume-command)). The
entities that you should be able to execute locally and improve gradually. This is where
[`spin`](/metaflow/authoring-flows/introduction) and
[`resume`](/metaflow/debugging#how-to-use-the-resume-command) come in handy. The
workflow becomes the backbone of your application - in particular helping with [data
flow through artifacts](/metaflow/basics#artifacts) - which enables much of the
functionality in the next phases of the project.
Expand All @@ -54,10 +55,13 @@ core topics:

1. [Creating flows](/metaflow/basics)
2. [Inspecting results of flows](/metaflow/client)
3. [Managing flows programmatically](/metaflow/managing-flows/introduction)
4. [Visualizing results](/metaflow/visualizing-results)
5. [Debugging flows](/metaflow/debugging)
6. [Configuring flows](/metaflow/configuring-flows/introduction)
3. [Authoring flows incrementally](/metaflow/authoring-flows/introduction)
4. [Managing flows programmatically](/metaflow/managing-flows/introduction)
5. [Visualizing results](/metaflow/visualizing-results)
6. [Debugging flows](/metaflow/debugging)
7. [Configuring flows](/metaflow/configuring-flows/introduction)
8. [Composing flows](/metaflow/composing-flows/introduction)


These topics work locally on your workstation without any additional infrastructure, so
it is easy to get started.
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import ReactPlayer from 'react-player'

# Effortless Task Inspection with Default Cards

Metaflow comes with a built-in _Default Card_ that shows all artifacts produced by a
Expand Down Expand Up @@ -87,6 +89,25 @@ the local viewer allows you to view [updating cards in real-time](dynamic-cards)
similar to Metaflow UI, while the `card view` command only shows a card that was
available at the time when you executed the command.

### Developing Cards Quickly with `spin`

You can develop cards quickly by using [the
`spin` command](/metaflow/authoring-flows/introduction) together with the local
card viewer, as shown in this short video:

<ReactPlayer controls url="https://youtu.be/hoRO5eePjqo" />
<br/>

Run the local card viewer in a terminal using `--mode spin` to watch
ephemeral spin results instead of the standard, persistent cards:
```
python defaultcard.py --mode spin card server
```

You can then use spin in another terminal to iterate on a step, viewing the results as cards in real time. Build the card content incrementally by adding artifacts and
[card components](/metaflow/visualizing-results/easy-custom-reports-with-card-components), and quickly inspect your progress by running `spin` after each change.


## Visualizing Artifacts with the Default Card

As shown in the screenshot above, the artifacts table shows all Metaflow artifacts
Expand Down
11 changes: 11 additions & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,17 @@ const sidebars = {
items: [
"metaflow/basics",
"metaflow/client",
{
type: "category",
label: "Authoring Flows Incrementally",
link: {
type: "doc",
id: "metaflow/authoring-flows/introduction",
},
items: [
"metaflow/authoring-flows/spin-input-output"
]
},
{
type: "category",
label: "Managing Flows",
Expand Down
Loading