Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ Metaflow makes it easy to build and manage real-life data science, AI, and ML pr

- [Introduction to Scalable Compute and Data](scaling/introduction)
- [Computing at Scale](scaling/remote-tasks/introduction)
- [Managing Dependencies](scaling/dependencies) ✨*New support for `uv`*✨
- [Dealing with Failures](scaling/failures)
- [Managing Dependencies](scaling/dependencies) ✨*New: support for `uv`*✨
- [Dealing with Failures](scaling/failures) ✨*New: support for `@exit_hook`*✨
- [Checkpointing Progress](scaling/checkpoint/introduction) ✨*New*✨
- [Loading and Storing Data](scaling/data)
- [Organizing Results](scaling/tagging)
Expand Down
3 changes: 2 additions & 1 deletion docs/metaflow/composing-flows/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ steps and flows. For example, you might define shared, project-specific patterns
- Tracking data and model lineage,
- Performing feature engineering and transformations,
- Training and evaluating a model,
- Accessing an external service, e.g. an LLM endpoint through a model router.
- Accessing an external service, e.g. an LLM endpoint through a model router,
- Making tools available for agentic workflows.

You can handle cases like these by developing a shared library that encapsulates
the logic and importing it in your steps. Metaflow will [package the
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,14 @@ production.

On Argo Workflows we support sending notifications on a successful or failed flow. To enable notifications, supply the `--notify-on-success/--notify-on-error` flags while deploying your flow. You must also configure the notification provider. The ones currently supported are

### Custom notifications

:::info
New in Metaflow 2.16
:::

You can set up a custom function to be called on success or failure on Argo Workflows using [exit hooks](/scaling/failures#exit-hooks-executing-a-function-upon-success-or-failure).

### Slack notifications

In order to enable Slack notifications, we need to first create a webhook endpoing that Metaflow can send the notifications to by following the instructions at https://api.slack.com/messaging/webhooks
Expand Down
55 changes: 55 additions & 0 deletions docs/scaling/failures.md
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,60 @@ if __name__ == '__main__':

This example handles a timeout in `start` gracefully without showing any exceptions.

## Exit hooks: Executing a function upon success or failure

:::info
This is a new feature in Metaflow 2.16. Exit hooks work with local runs and when
[deployed on Argo Workflows](/production/scheduling-metaflow-flows/scheduling-with-argo-workflows).
:::

Exit hooks let you define a special function that runs at the end of a flow, regardless
of whether the flow succeeds or fails. Unlike the end step, which is skipped if the flow
fails, exit hooks always run. This makes them suitable for tasks like sending notifications
or cleaning up resources. However, since they run outside of steps, they cannot be used to
produce artifacts.

You can attach one or more exit hook functions to a flow using the `@exit_hook` decorator. For example:

```python
from metaflow import step, FlowSpec, Parameter, exit_hook, Run

def success_print():
print("✅ Flow completed successfully!")

def failure_print(run):
if run:
print(f"💥 Run {run.pathspec} failed. Failed tasks:")
for step in run:
for task in step:
if not task.successful:
print(f" → {task.pathspec}")
else:
print(f"💥 Run failed during initialization")

@exit_hook(on_error=[failure_print], on_success=[success_print])
class ExitHookFlow(FlowSpec):
should_fail = Parameter(name="should-fail", default=False)

@step
def start(self):
print("Starting 👋")
print("Should fail?", self.should_fail)
if self.should_fail:
raise Exception("failing as expected")
self.next(self.end)

@step
def end(self):
print("Done! 🏁")

if __name__ == "__main__":
ExitHookFlow()
```

Note that when deployed on Argo Workflows, exit hook functions execute as separate
containers (pods), so they will execute even if steps fail e.g. due to out of memory condition.

## Summary

Here is a quick summary of failure handling in Metaflow:
Expand All @@ -341,4 +395,5 @@ Here is a quick summary of failure handling in Metaflow:
safely](failures.md#how-to-prevent-retries). It is a good idea to use `times=0` for
`retry` in this case.
* Use `timeout` with any of the above if your code can get stuck.
* Use `@exit_hook` to execute custom functions upon success or failure.

Loading