From 2b625c094ab0aae274388d548e61129a06aba4cb Mon Sep 17 00:00:00 2001 From: Ville Tuulos Date: Mon, 3 Nov 2025 00:55:53 -0800 Subject: [PATCH 1/3] add spin docs --- docs/index.md | 3 +- docs/metaflow/authoring-flows/introduction.md | 80 ++++++++++++++++++ .../authoring-flows/spin-input-output.md | 81 +++++++++++++++++++ docs/metaflow/introduction.md | 15 ++-- ...less-task-inspection-with-default-cards.md | 21 +++++ sidebars.js | 11 +++ 6 files changed, 204 insertions(+), 7 deletions(-) create mode 100644 docs/metaflow/authoring-flows/introduction.md create mode 100644 docs/metaflow/authoring-flows/spin-input-output.md diff --git a/docs/index.md b/docs/index.md index e44dc369..88e0b589 100644 --- a/docs/index.md +++ b/docs/index.md @@ -30,6 +30,7 @@ Metaflow makes it easy to build and manage real-life data science, AI, and ML pr - [Introduction to Developing with Metaflow](metaflow/introduction) - [Creating Flows](metaflow/basics) ✨*New: support for conditional and recursive steps*✨ - [Inspecting Flows and Results](metaflow/client) +- [Authoring Flows Incrementally](metaflow/authoring-flows/introduction) ✨*New: `spin` command*✨ - [Managing Flows in Notebooks and Scripts](metaflow/managing-flows/introduction) - [Debugging Flows](metaflow/debugging) - [Visualizing Results](metaflow/visualizing-results/) @@ -42,7 +43,7 @@ Metaflow makes it easy to build and manage real-life data science, AI, and ML pr - [Computing at Scale](scaling/remote-tasks/introduction) - [Managing Dependencies](scaling/dependencies) ✨*New: support for `uv`*✨ - [Dealing with Failures](scaling/failures) ✨*New: support for `@exit_hook`*✨ -- [Checkpointing Progress](scaling/checkpoint/introduction) ✨*New*✨ +- [Checkpointing Progress](scaling/checkpoint/introduction) - [Loading and Storing Data](scaling/data) - [Organizing Results](scaling/tagging) - [Accessing Secrets](scaling/secrets) diff --git a/docs/metaflow/authoring-flows/introduction.md b/docs/metaflow/authoring-flows/introduction.md new file mode 100644 index 00000000..e091440d --- /dev/null +++ b/docs/metaflow/authoring-flows/introduction.md @@ -0,0 +1,80 @@ +import ReactPlayer from 'react-player' + +# Authoring Flows Incrementally + +Every non-trivial piece of software is built incrementally, one piece at a time. + +With Metaflow, you might start with a simple stub, perhaps just a step to load data, +and then gradually add more `@step`s, say, for data transformation, model training, +and beyond, testing the flow at each iteration. To enable a smooth development +experience, these iterations should run quickly, with minimal waiting - much +like the familiar workflow in a notebook, where you build results one cell at a time. + +## The `spin` command + +:::info New Feature +The `spin` command was introduced in Metaflow +2.19. [Read the announcement blog post for motivation](https://netflixtechblog.medium.com/b2d5b95c63eb). +::: + +While you can certainly `run` a flow from start to end at each iteration - +similar to the "Run All" mode in notebooks - but this can take a while. You +can [use `resume` to run a part of a flow](/metaflow/debugging#how-to-use-the-resume-command), reusing past results, but even this might be +overkill when you are focused on developing a particular `@step` and you just +want to test it with appropriate input data. + +Metaflow provides a `spin` command to address this use case: Rapid, iterative +development and testing of a single step. Watch this one minute video +to see it in action: + + +
+ +As shown in the video, you can use `spin` to author flows incrementally following +this pattern: + +1. Develop a stub of a flow - at the minimum, add `start` and `end` steps. +2. Use `run` to run the flow to produce an initial set of inputs. +3. Edit any step, `somestep`. +4. Use `python myflow.py spin somestep` to test the changes quickly using + the input artifacts from the latest run (or any earlier run). +5. Once `somestep` seems to work, add a next step and repeate the process from (2) until the flow is complete. + +### The properties of `spin` + +As `spin` is meant for rapid testing of an individual step, it doesn't track +metadata or persist artifacts by default. Hence you won't see the `spin` iterations +on the Metaflow UI, and you can't access artifacts globally using [the Client API](/metaflow/client). Instead, you can eyeball logs on the console and optionally [access +the output artifacts locally](/metaflow/authoring-flows/spin-input-output). Once the +step seems to work, just `run` the flow as usual to take a snapshot of all +metadata and artifacts. + +Currently `spin` doesn't support [executing tasks remotely](/scaling/remote-tasks/requesting-resources) but you can use `@pypi` and `@conda` for [dependency management](scaling/dependencies) as usual. Also, `spin` comes in handy in [visualizing +results with `@card`](/metaflow/visualizing-results/effortless-task-inspection-with-default-cards#developing-cards-quickly-with-spin). + +You may use `spin` programmatically using [the `Runner` API](/metaflow/managing-flows/runner), as described in the section about using [`spin` for unit +testing](/metaflow/authoring-flows/spin-input-output#using-spin-for-unit-testing). + +### Testing a step with past results + +:::note +You may need to upgrade your metadata service to allow `spin` to find past results efficiently. The command will show a message about this if an upgrade is needed. +::: + +By default, `spin` executes the given step with artifacts originating from the latest +run in [the current namespace](/scaling/tagging). Hence you can just `spin somestep` +without having to worry about the input artifacts changing abruptly. + +Optionally, you may spin a step using artifacts from any past run. Simply provide +the full pathspec of a task as an argument for `spin`, like here: +``` +python myflow.py spin 32455/train/355 +``` +In this case, `spin` will re-execute the `train` step in `myflow.py` using the same +inputs that were provided for the given task `32455/train/355`. + +:::tip +You can use `spin` to test a step quickly with different inputs, since it can replay any past results. For example, if you have several previous runs with varying datasets or sample sizes, you can `spin` the step against each one to see how it behaves with diverse inputs. +::: + + diff --git a/docs/metaflow/authoring-flows/spin-input-output.md b/docs/metaflow/authoring-flows/spin-input-output.md new file mode 100644 index 00000000..ef992d31 --- /dev/null +++ b/docs/metaflow/authoring-flows/spin-input-output.md @@ -0,0 +1,81 @@ + +# Spin Inputs And Outputs + +By default, `spin` uses artifacts from the most recent `run` as inputs. +It doesn’t produce any new artifacts, nor does it record metadata, making +it ideal for quick, transient smoke tests which mainly focus on logs and +errors output on the console, as well as `@card`s. + +However, you can optionally override inputs, even individual artifacts, +and capture outputs for later inspection, as described below. + +## Inspecting artifacts produced by `spin` + +To persist artifacts for inspection, run `spin` with the `--persist` +option: +``` +python myflow.py spin train --persist +``` +After running `spin`, you can inspect its artifacts using +[the Client API](/metaflow/client). To do so, tell the Client to look at +the ephemeral results from `spin` instead of the usual metadata service by +pointing it to the working directory that contains the results via +`inspect_spin`, as shown below: + +```python +from metaflow import inspect_spin + +inspect_spin(".") +Flow("TrainingFlow").latest_run["train"].task["model"].data +``` + +This will fetch the results from a special local `./.metaflow_spin` +datastore. You can safely delete the `.metaflow_spin` directory when you +don't need the results anymore. + +This way, you can quickly test and inspect artifacts without persisting +anything in the main datastore permanently. + +## Using `spin` for unit testing + +The above pattern makes `spin` useful for unit testing of individual steps +e.g. in a CI/CD pipeline. + +In a unit testing script (e.g. using `pytest`), you can use +[the `Runner` API](/metaflow/managing-flows/runner) to run `spin` with +`persist=True` to capture output artifacts, the correctness +of which you can `assert` on the step has completed, like here: + +```python +from metaflow import Runner + +with Runner("flow.py").spin("train", persist=True) as spin: + assert spin.task["model"].data == "kmeans" +``` + +## Overriding input artifacts + +By default, `spin` uses the exact same input artifacts as what were used +in the latest run of the given step, or those of [any past +run](/metaflow/authoring-flows/introduction#testing-a-step-with-past-results). + +However, you may override any or all of the artifacts individually. This can +come in handy if you want to test your step code quickly with arbitrary inputs +on the fly. Since artifacts can be any Python objects, the overrides are defined +as a special Python module (file) that contains a dictionary, `ARTIFACTS`, like +in this example: + +```python +ARTIFACTS = { + "model": "kmeans", + "k": 15 +} +``` + +You can save this to a file, say, `artifacts.py`, and run `spin` as follows: +``` +python myflow.py spin train --artifacts-module artifacts.py +``` +In this case, the base set of artifacts is loaded from the latest run +(since no explicit pathspec was provided on the command line), and two of them, +`model` and `k`, are overridden by the module. diff --git a/docs/metaflow/introduction.md b/docs/metaflow/introduction.md index 125f4d64..831c02b5 100644 --- a/docs/metaflow/introduction.md +++ b/docs/metaflow/introduction.md @@ -37,8 +37,8 @@ why this is a good idea and how to create flows in practice, see [Creating Flows](/metaflow/basics). 3. Flows are living and dynamic -entities that you should be able to execute locally and improve gradually (this is where -[`resume` comes in handy!](/metaflow/debugging#how-to-use-the-resume-command)). The +entities that you should be able to execute locally and improve gradually. This is where +[`spin`](/metaflow/authoring-flows/introduction) and [`resume`](/metaflow/debugging#how-to-use-the-resume-command) come in handy. The workflow becomes the backbone of your application - in particular helping with [data flow through artifacts](/metaflow/basics#artifacts) - which enables much of the functionality in the next phases of the project. @@ -54,10 +54,13 @@ core topics: 1. [Creating flows](/metaflow/basics) 2. [Inspecting results of flows](/metaflow/client) - 3. [Managing flows programmatically](/metaflow/managing-flows/introduction) - 4. [Visualizing results](/metaflow/visualizing-results) - 5. [Debugging flows](/metaflow/debugging) - 6. [Configuring flows](/metaflow/configuring-flows/introduction) + 3. [Authoring flows incrementally](/metaflow/authoring-flows/introduction) + 4. [Managing flows programmatically](/metaflow/managing-flows/introduction) + 5. [Visualizing results](/metaflow/visualizing-results) + 6. [Debugging flows](/metaflow/debugging) + 7. [Configuring flows](/metaflow/configuring-flows/introduction) + 8. [Composing flows](/metaflow/composing-flows/introduction) + These topics work locally on your workstation without any additional infrastructure, so it is easy to get started. \ No newline at end of file diff --git a/docs/metaflow/visualizing-results/effortless-task-inspection-with-default-cards.md b/docs/metaflow/visualizing-results/effortless-task-inspection-with-default-cards.md index 9f0da95f..6830a8d9 100644 --- a/docs/metaflow/visualizing-results/effortless-task-inspection-with-default-cards.md +++ b/docs/metaflow/visualizing-results/effortless-task-inspection-with-default-cards.md @@ -1,3 +1,5 @@ +import ReactPlayer from 'react-player' + # Effortless Task Inspection with Default Cards Metaflow comes with a built-in _Default Card_ that shows all artifacts produced by a @@ -87,6 +89,25 @@ the local viewer allows you to view [updating cards in real-time](dynamic-cards) similar to Metaflow UI, while the `card view` command only shows a card that was available at the time when you executed the command. +### Developing Cards Quickly with `spin` + +You can develop cards quickly by using [the +`spin` command](/metaflow/authoring-flows/introduction) together with the local +card viewer, as shown in this short video: + + +
+ +Run the local card viewer in a terminal using `--mode spin` to watch +ephemeral spin results instead of the standard, persistent cards: +``` +python defaultcard.py --mode spin card server +``` + +You can then use spin in another terminal to iterate on a step, viewing the results as cards in real time. Build the card content incrementally by adding artifacts and +[card components](/metaflow/visualizing-results/easy-custom-reports-with-card-components), and quickly inspect your progress by running `spin` after each change. + + ## Visualizing Artifacts with the Default Card As shown in the screenshot above, the artifacts table shows all Metaflow artifacts diff --git a/sidebars.js b/sidebars.js index b3995dc6..4ac14f47 100644 --- a/sidebars.js +++ b/sidebars.js @@ -81,6 +81,17 @@ const sidebars = { items: [ "metaflow/basics", "metaflow/client", + { + type: "category", + label: "Authoring Flows Incrementally", + link: { + type: "doc", + id: "metaflow/authoring-flows/introduction", + }, + items: [ + "metaflow/authoring-flows/spin-input-output" + ] + }, { type: "category", label: "Managing Flows", From bd398dc4a11bae6be690b47a0ea0ea6c39bf6742 Mon Sep 17 00:00:00 2001 From: Romain Cledat Date: Mon, 3 Nov 2025 16:11:21 -0800 Subject: [PATCH 2/3] Small updates --- docs/metaflow/authoring-flows/introduction.md | 35 ++++++++++++------- .../authoring-flows/spin-input-output.md | 17 ++++++--- docs/metaflow/introduction.md | 3 +- 3 files changed, 36 insertions(+), 19 deletions(-) diff --git a/docs/metaflow/authoring-flows/introduction.md b/docs/metaflow/authoring-flows/introduction.md index e091440d..a30f8998 100644 --- a/docs/metaflow/authoring-flows/introduction.md +++ b/docs/metaflow/authoring-flows/introduction.md @@ -18,12 +18,13 @@ The `spin` command was introduced in Metaflow ::: While you can certainly `run` a flow from start to end at each iteration - -similar to the "Run All" mode in notebooks - but this can take a while. You -can [use `resume` to run a part of a flow](/metaflow/debugging#how-to-use-the-resume-command), reusing past results, but even this might be +similar to the "Run All" mode in notebooks - this can take a while. You +can use [`resume`](/metaflow/debugging#how-to-use-the-resume-command) to run a part of +a flow, reusing past results, but even this might be overkill when you are focused on developing a particular `@step` and you just want to test it with appropriate input data. -Metaflow provides a `spin` command to address this use case: Rapid, iterative +Metaflow provides a `spin` command to address this use case: rapid, iterative development and testing of a single step. Watch this one minute video to see it in action: @@ -38,27 +39,32 @@ this pattern: 3. Edit any step, `somestep`. 4. Use `python myflow.py spin somestep` to test the changes quickly using the input artifacts from the latest run (or any earlier run). -5. Once `somestep` seems to work, add a next step and repeate the process from (2) until the flow is complete. +5. Once `somestep` seems to work, add a next step and repeate the process from (2) + until the flow is complete. ### The properties of `spin` As `spin` is meant for rapid testing of an individual step, it doesn't track metadata or persist artifacts by default. Hence you won't see the `spin` iterations -on the Metaflow UI, and you can't access artifacts globally using [the Client API](/metaflow/client). Instead, you can eyeball logs on the console and optionally [access -the output artifacts locally](/metaflow/authoring-flows/spin-input-output). Once the -step seems to work, just `run` the flow as usual to take a snapshot of all +on the Metaflow UI, and you can't access artifacts globally using +[the Client API](/metaflow/client). Instead, you can eyeball logs on the console and +optionally [access the output artifacts locally](/metaflow/authoring-flows/spin-input-output). +Once the step seems to work, just `run` the flow as usual to take a snapshot of all metadata and artifacts. -Currently `spin` doesn't support [executing tasks remotely](/scaling/remote-tasks/requesting-resources) but you can use `@pypi` and `@conda` for [dependency management](scaling/dependencies) as usual. Also, `spin` comes in handy in [visualizing -results with `@card`](/metaflow/visualizing-results/effortless-task-inspection-with-default-cards#developing-cards-quickly-with-spin). +Currently `spin` doesn't support +[executing tasks remotely](/scaling/remote-tasks/requesting-resources) but you can use +`@pypi` and `@conda` for [dependency management](scaling/dependencies) as usual. Also, +`spin` comes in handy in [visualizing results with `@card`](/metaflow/visualizing-results/effortless-task-inspection-with-default-cards#developing-cards-quickly-with-spin). -You may use `spin` programmatically using [the `Runner` API](/metaflow/managing-flows/runner), as described in the section about using [`spin` for unit -testing](/metaflow/authoring-flows/spin-input-output#using-spin-for-unit-testing). +You may use `spin` programmatically using [the `Runner` API](/metaflow/managing-flows/runner), +as described in the section about using [`spin` for unit testing](/metaflow/authoring-flows/spin-input-output#using-spin-for-unit-testing). ### Testing a step with past results :::note -You may need to upgrade your metadata service to allow `spin` to find past results efficiently. The command will show a message about this if an upgrade is needed. +You may need to upgrade your metadata service to allow `spin` to find past results +efficiently. The command will show a message about this if an upgrade is needed. ::: By default, `spin` executes the given step with artifacts originating from the latest @@ -74,7 +80,10 @@ In this case, `spin` will re-execute the `train` step in `myflow.py` using the s inputs that were provided for the given task `32455/train/355`. :::tip -You can use `spin` to test a step quickly with different inputs, since it can replay any past results. For example, if you have several previous runs with varying datasets or sample sizes, you can `spin` the step against each one to see how it behaves with diverse inputs. +You can use `spin` to test a step quickly with different inputs, since it can replay +any past results. For example, if you have several previous runs with varying datasets +or sample sizes, you can `spin` the step against each one to see how it behaves with +diverse inputs. ::: diff --git a/docs/metaflow/authoring-flows/spin-input-output.md b/docs/metaflow/authoring-flows/spin-input-output.md index ef992d31..50fe9093 100644 --- a/docs/metaflow/authoring-flows/spin-input-output.md +++ b/docs/metaflow/authoring-flows/spin-input-output.md @@ -31,12 +31,14 @@ Flow("TrainingFlow").latest_run["train"].task["model"].data This will fetch the results from a special local `./.metaflow_spin` datastore. You can safely delete the `.metaflow_spin` directory when you -don't need the results anymore. +don't need the results anymore. Note that the `"."` argument tells `inspect_spin` which +directory to look for the `.metaflow_spin` directory in so you can also create multiple +"spin" environments if you wish to. This way, you can quickly test and inspect artifacts without persisting anything in the main datastore permanently. -## Using `spin` for unit testing +### Using `spin` for unit testing The above pattern makes `spin` useful for unit testing of individual steps e.g. in a CI/CD pipeline. @@ -52,14 +54,16 @@ from metaflow import Runner with Runner("flow.py").spin("train", persist=True) as spin: assert spin.task["model"].data == "kmeans" ``` +Running this command in different directories would create separate `.metaflow_spin` +directories thereby isolating your various tests. ## Overriding input artifacts -By default, `spin` uses the exact same input artifacts as what were used +As mentioned, `spin` uses the exact same input artifacts as what were used in the latest run of the given step, or those of [any past run](/metaflow/authoring-flows/introduction#testing-a-step-with-past-results). -However, you may override any or all of the artifacts individually. This can +You may, however, override any or all of the artifacts individually. This can come in handy if you want to test your step code quickly with arbitrary inputs on the fly. Since artifacts can be any Python objects, the overrides are defined as a special Python module (file) that contains a dictionary, `ARTIFACTS`, like @@ -78,4 +82,7 @@ python myflow.py spin train --artifacts-module artifacts.py ``` In this case, the base set of artifacts is loaded from the latest run (since no explicit pathspec was provided on the command line), and two of them, -`model` and `k`, are overridden by the module. +`model` and `k`, are overridden by the module. In short, when looking for an artifact, +Metaflow will first look in the `ARTIFACTS` dictionary to see if the name it is looking +for is present, if so, it returns that value, if not, it will look in the artifacts +passed down from the run specified. diff --git a/docs/metaflow/introduction.md b/docs/metaflow/introduction.md index 831c02b5..f9321664 100644 --- a/docs/metaflow/introduction.md +++ b/docs/metaflow/introduction.md @@ -38,7 +38,8 @@ Flows](/metaflow/basics). 3. Flows are living and dynamic entities that you should be able to execute locally and improve gradually. This is where -[`spin`](/metaflow/authoring-flows/introduction) and [`resume`](/metaflow/debugging#how-to-use-the-resume-command) come in handy. The +[`spin`](/metaflow/authoring-flows/introduction) and +[`resume`](/metaflow/debugging#how-to-use-the-resume-command) come in handy. The workflow becomes the backbone of your application - in particular helping with [data flow through artifacts](/metaflow/basics#artifacts) - which enables much of the functionality in the next phases of the project. From 0f5ceabcfc5394f3ade1d7e0ca2c3cc155aa1ef2 Mon Sep 17 00:00:00 2001 From: Romain Cledat Date: Mon, 3 Nov 2025 22:48:01 -0800 Subject: [PATCH 3/3] Added note on conditionals and spin --- docs/metaflow/authoring-flows/introduction.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/metaflow/authoring-flows/introduction.md b/docs/metaflow/authoring-flows/introduction.md index a30f8998..903fa139 100644 --- a/docs/metaflow/authoring-flows/introduction.md +++ b/docs/metaflow/authoring-flows/introduction.md @@ -86,4 +86,8 @@ or sample sizes, you can `spin` the step against each one to see how it behaves diverse inputs. ::: +:::note +Spin may not work properly on certain flows with conditionals. We are working on +improving this support. +:::