Skip to content

Commit

Permalink
[docs] op retries concepts section (#6818)
Browse files Browse the repository at this point in the history
sorely overdue section on op retries 

resolves #6702

## Test Plan

make dev
  • Loading branch information
alangenfeld committed Mar 4, 2022
1 parent 9168e73 commit 66ae1c0
Show file tree
Hide file tree
Showing 9 changed files with 156 additions and 5 deletions.
4 changes: 4 additions & 0 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,10 @@
{
"title": "Op Hooks",
"path": "/concepts/ops-jobs-graphs/op-hooks"
},
{
"title": "Op Retries",
"path": "/concepts/ops-jobs-graphs/op-retries"
}
]
},
Expand Down
2 changes: 1 addition & 1 deletion docs/content/api/modules.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/content/api/searchindex.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/content/api/sections.json

Large diffs are not rendered by default.

78 changes: 78 additions & 0 deletions docs/content/concepts/ops-jobs-graphs/op-retries.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
title: Op Retries | Dagster
description: Retry ops on exception using RetryPolicy and RetryRequested
---

# Op Retries

When an exception occurs during op execution, Dagster provides tools to retry that op within the same job run.

## Relevant APIs

| Name | Description |
| ----------------------------------------------------- | ----------------------------------------------------------------------------- |
| <PyObject module="dagster" object="RetryRequested" /> | An exception that can be thrown from the body of an op to request a retry |
| <PyObject module="dagster" object="RetryPolicy" /> | A declarative policy to attach which will have retries requested on exception |
| <PyObject module="dagster" object="Backoff" /> | Modification to delay between retries based on attempt number |
| <PyObject module="dagster" object="Jitter" /> | Random modification to delay beween retries |

## Overview

In Dagster, code is executed within an [op](/concepts/ops-jobs-graphs/ops). Sometimes this code can fail for transient reasons, and the desired behavior is to retry and run the function again.

Dagster provides both declarative <PyObject module="dagster" object="RetryPolicy" />s as well as manual <PyObject module="dagster" object="RetryRequested" /> exceptions to enable this behavior.

## Using Op Retries

Here we start off with an op that is causing us to have to retry the whole job anytime it fails.

```python file=/concepts/solids_pipelines/retries.py startafter=problem_start endbefore=problem_end
@op
def problematic():
fails_sometimes()
```

### RetryPolicy

To get this solid to retry when an exception occurs, we can attach a <PyObject module="dagster" object="RetryRequested" />.

```python file=/concepts/solids_pipelines/retries.py startafter=policy_start endbefore=policy_end
@op(retry_policy=RetryPolicy())
def better():
fails_sometimes()
```

This improves the situation, but we may need additional configuration to control how many times to retry and/or how long to wait between each retry.

```python file=/concepts/solids_pipelines/retries.py startafter=policy2_start endbefore=policy2_end
@op(
retry_policy=RetryPolicy(
max_retries=3,
delay=0.2, # 200ms
backoff=Backoff.EXPONENTIAL,
jitter=Jitter.PLUS_MINUS,
)
)
def even_better():
fails_sometimes()
```

In addition to being able to set the policy directly on the op definition, it can also be set on specific invocations of an op, or a <PyObject module="dagster" object="job" decorator /> to apply to all ops contained within.

### RetryRequested

In certain more nuanced situations, we may need to evaluate code to determine if we want to retry or not. For this we can use a manual <PyObject module="dagster" object="RetryRequested" /> exception.

```python file=/concepts/solids_pipelines/retries.py startafter=manual_start endbefore=manual_end
@op
def manual():
try:
fails_sometimes()
except Exception as e:
if should_retry(e):
raise RetryRequested(max_retries=1, seconds_to_wait=1) from e
else:
raise
```

Using `raise from` will ensure the original exceptions information is captured by Dagster.
Binary file modified docs/next/public/objects.inv
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
from dagster import Backoff, Jitter, RetryPolicy, RetryRequested, job, op


def fails_sometimes():
raise Exception("jk, its always")


def should_retry(_):
return True


# problem_start
@op
def problematic():
fails_sometimes()


# problem_end

# policy_start
@op(retry_policy=RetryPolicy())
def better():
fails_sometimes()


# policy_end

# policy2_start
@op(
retry_policy=RetryPolicy(
max_retries=3,
delay=0.2, # 200ms
backoff=Backoff.EXPONENTIAL,
jitter=Jitter.PLUS_MINUS,
)
)
def even_better():
fails_sometimes()


# policy2_end


# manual_start
@op
def manual():
try:
fails_sometimes()
except Exception as e:
if should_retry(e):
raise RetryRequested(max_retries=1, seconds_to_wait=1) from e
else:
raise


# manual_end


@job
def retry_job():
problematic()
better()
even_better()
manual()
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
one_plus_one_from_constructor,
tagged_add_one,
)
from docs_snippets.concepts.solids_pipelines.retries import retry_job


def test_one_plus_one():
Expand Down Expand Up @@ -86,3 +87,7 @@ def test_dynamic_examples():
assert chained.execute_in_process().success
assert other_arg.execute_in_process().success
assert multiple.execute_in_process().success


def test_retry_examples():
assert retry_job.execute_in_process(raise_on_error=False) # just that it runs
4 changes: 2 additions & 2 deletions python_modules/dagster/dagster/core/definitions/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -689,8 +689,8 @@ class RetryRequested(Exception):
def flakes():
try:
flakey_operation()
except:
raise RetryRequested(max_retries=3)
except Exception as e:
raise RetryRequested(max_retries=3) from e
"""

def __init__(
Expand Down

1 comment on commit 66ae1c0

@vercel
Copy link

@vercel vercel bot commented on 66ae1c0 Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.