Skip to content

Commit

Permalink
Improve documentation around Output objects and op output annotations. (
Browse files Browse the repository at this point in the history
#8139)

* Improve documentation around Output objects and op output annotations.

* Add documentation for returnable dynamic outputs

* Address comments
  • Loading branch information
dpeng817 committed Jun 3, 2022
1 parent b47f6ee commit 5f7d322
Show file tree
Hide file tree
Showing 10 changed files with 211 additions and 68 deletions.
20 changes: 20 additions & 0 deletions docs/content/concepts/ops-jobs-graphs/dynamic-graphs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,26 @@ Within our `@job` decorated composition function, the object representing the dy

## Advanced Mapping Examples

### Returning Dynamic Outputs

In addition to yielding, <PyObject object="DynamicOutput" /> objects can also be returned as part of a list.

```python file=/concepts/ops_jobs_graphs/dynamic.py startafter=dyn_out_return_start endbefore=dyn_out_return_end
from dagster import DynamicOut, DynamicOutput, op
from typing import List


@op(out=DynamicOut())
def return_dynamic() -> List[DynamicOutput[str]]:
outputs = []
for idx, page_key in get_pages():
outputs.append(DynamicOutput(page_key, mapping_key=idx))
return outputs
```

<PyObject object="DynamicOutput" /> can be used as a generic type annotation describing
the expected type of the output.

### Chaining

The following two examples are equivalent ways to establish a sequence of ops that occur for each dynamic output.
Expand Down
2 changes: 2 additions & 0 deletions docs/content/concepts/ops-jobs-graphs/jobs-graphs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,8 @@ def branching():
branch_2_op(branch_2)
```

When using conditional branching, <PyObject object="Output" /> objects must be yielded instead of returned.

### Fixed Fan-in

<Image
Expand Down
52 changes: 28 additions & 24 deletions docs/content/concepts/ops-jobs-graphs/op-events.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,48 +34,52 @@ Metadata is attached to events at construction time. There are currently two ava

Yielding events from within the body of an op is a useful way of communicating with the Dagster framework. The most critical event to the functionality of Dagster is the <PyObject object="Output"/> event, which allows output data to be passed on from one op to the next. However, we also provide interfaces to inform Dagster about external assets and data quality checks during the run of an op.

### Outputs
### Output Objects

Because returning a value from an op is such a fundamental part of creating a data pipeline, we have a few different interfaces for this functionality, to help ease transition into writing Dagster-specific code.
Because returning a value from an op is such a fundamental part of creating a data pipeline, we have a few different interfaces for this functionality.

For ops with a single output, you can simply return a value directly from the `compute_fn`. Internally, this will be converted to a Dagster <PyObject object="Output"/> event with the default output name `result`:
For many use cases, Dagster ops can be used directly with python's native type annotations without additional modification. Check out the docs on [Op Outputs](/concepts/ops-jobs-graphs/ops#outputs) to learn more about this functionality. Dagster also provides the <PyObject object="Output"/> object, which opens up additional functionality to outputs when using Dagster, such as [specifying output metadata](/concepts/ops-jobs-graphs/op-events#attaching-metadata-to-outputs) and [conditional branching](/concepts/ops-jobs-graphs/jobs-graphs#conditional-branching), all while maintaining coherent type annotations.

```python file=/concepts/ops_jobs_graphs/op_events.py startafter=start_op_output_1 endbefore=end_op_output_1
from dagster import op
<PyObject object="Output" /> objects can be either returned or yielded. The Output
type is also generic, for use with return annotations:

```python file=/concepts/ops_jobs_graphs/op_events.py startafter=start_op_output_4 endbefore=end_op_output_4
from dagster import Output, op
from typing import Tuple

# Using Output as type annotation without inner type
@op
def my_simple_return_op(context):
return 1
```
def my_output_op() -> Output:
return Output("some_value")

While this is perhaps the most intuitive way to return a value from a function, once you have multiple outputs defined on your op, or want to yield additional, non-output information from the body of your op, explicitly returning a value is no longer an option. In these cases, you'll want to explicitly yield <PyObject object="Output"/> events. With that in mind, the above example can be converted to the equivalent yield pattern like so:

```python file=/concepts/ops_jobs_graphs/op_events.py startafter=start_op_output_0 endbefore=end_op_output_0
from dagster import Output, op
# A single output with a parameterized type annotation
@op
def my_output_generic_op() -> Output[int]:
return Output(5)


@op
def my_simple_yield_op(context):
yield Output(1)
# Multiple outputs using parameterized type annotation
@op(out={"int_out": Out(), "str_out": Out()})
def my_multiple_generic_output_op() -> Tuple[Output[int], Output[str]]:
return (Output(5), Output("foo"))
```

or, if you have a specific output name other than the default `result`:
When <PyObject object="Output" /> objects are yielded, type annotations cannot be used. Instead, type information can be specified using the `out` argument of the op decorator.

```python file=/concepts/ops_jobs_graphs/op_events.py startafter=start_op_output_2 endbefore=end_op_output_2
```python file=/concepts/ops_jobs_graphs/op_events.py startafter=start_yield_outputs endbefore=end_yield_outputs
from dagster import Output, op


@op(out={"my_output": Out(int)})
def my_named_yield_op(context):
yield Output(1, output_name="my_output")
@op(out={"out1": Out(str), "out2": Out(int)})
def my_op_yields():
yield Output(5, output_name="out2")
yield Output("foo", output_name="out1")
```

Check out the docs on [Op Outputs](/concepts/ops-jobs-graphs/ops#outputs) to learn more.

#### Attaching Metadata to Outputs <Experimental/>

If there is information specific to an <PyObject object="Output"/> that you would like to log, you may optionally represent that by passing in a `metadata` parameter containing a mapping of string labels to metadata values.
If there is information specific to an op output that you would like to log, you can use an <PyObject object="Output"/> object to attach metadata to the op's output. To do this, use the `metadata` parameter on the object, which expects a mapping of string labels to metadata values.

The <PyObject object="EventMetadata" /> class contains a set of static wrappers to customize the display of certain types of structured metadata.

Expand All @@ -86,9 +90,9 @@ from dagster import MetadataValue, Output, op


@op
def my_metadata_output(context):
def my_metadata_output(context) -> Output:
df = get_some_data()
yield Output(
return Output(
df,
metadata={
"text_metadata": "Text-based metadata for this event",
Expand Down
41 changes: 39 additions & 2 deletions docs/content/concepts/ops-jobs-graphs/ops.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -104,16 +104,53 @@ def my_output_op():

To define multiple outputs, or to use a different output name than "result", you can provide a dictionary of <PyObject object="Out" pluralize /> to the <PyObject object="op" decorator /> decorator.

When you have more than one output, you can return a tuple of values, one for each output.

```python file=/concepts/ops_jobs_graphs/ops.py startafter=start_multi_output_op_marker endbefore=end_multi_output_op_marker
@op(out={"first_output": Out(), "second_output": Out()})
def my_multi_output_op():
return 5, 6
```

Return type annotations can be used directly on ops. For a single output, the return annotation will be used directly for type checking.

```python file=/concepts/ops_jobs_graphs/ops.py startafter=start_return_annotation endbefore=end_return_annotation
from dagster import op


@op
def return_annotation_op() -> int:
return 5
```

If there are multiple outputs, a tuple annotation can be specified. Each inner type of the tuple annotation should correspond to an output in the op.

```python file=/concepts/ops_jobs_graphs/ops.py startafter=start_tuple_return endbefore=end_tuple_return
from dagster import op
from typing import Tuple


@op(out={"int_output": Out(), "str_output": Out()})
def my_multiple_output_annotation_op() -> Tuple[int, str]:
return (5, "foo")
```

Outputs are expected to follow the order they are specified in the op's `out` dictionary. In the above example, the `int` output corresponds to `int_output`, and the `str` output corresponds to `str_output`.

Note that if you would like to specify a single tuple output and still utilize type annotations, this can be done by providing either a single <PyObject object="Out" /> to the op, or none.

```python file=/concepts/ops_jobs_graphs/ops.py startafter=start_single_output_tuple endbefore=end_single_output_tuple
from dagster import op
from typing import Tuple


@op
def my_single_tuple_output_op() -> Tuple[int, str]:
return (5, "foo") # Will be viewed as one output
```

Like inputs, outputs can also have [Dagster Types](/concepts/types).

While many use cases can be served using built-in python annotations, <PyObject object="Output"/> and <PyObject object="DynamicOutput"/> objects unlock additional functionality. Check out the docs on [Op Outputs](/concepts/ops-jobs-graphs/op-events#output-objects) to learn more.

### Op Context

When writing an op, users can optionally provide a first parameter, `context`. When this parameter is supplied, Dagster will supply a context object to the body of the op. The context provides access to system information like op configuration, loggers, resources, and the current run id. See <PyObject object="OpExecutionContext"/> for the full list of properties accessible from the op context.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# isort: skip_file
# pylint: disable=reimported

from dagster import DynamicOut, DynamicOutput, job, op


Expand Down Expand Up @@ -149,3 +152,23 @@ def multiple():


# dyn_mult_end


def get_pages():
return [("1", "foo")]


# dyn_out_return_start
from dagster import DynamicOut, DynamicOutput, op
from typing import List


@op(out=DynamicOut())
def return_dynamic() -> List[DynamicOutput[str]]:
outputs = []
for idx, page_key in get_pages():
outputs.append(DynamicOutput(page_key, mapping_key=idx))
return outputs


# dyn_out_return_end
Original file line number Diff line number Diff line change
Expand Up @@ -40,47 +40,14 @@ def flaky_operation():
return 0


# start_op_output_0
from dagster import Output, op


@op
def my_simple_yield_op(context):
yield Output(1)


# end_op_output_0

# start_op_output_1
from dagster import op


@op
def my_simple_return_op(context):
return 1


# end_op_output_1

# start_op_output_2
from dagster import Output, op


@op(out={"my_output": Out(int)})
def my_named_yield_op(context):
yield Output(1, output_name="my_output")


# end_op_output_2

# start_op_output_3
from dagster import MetadataValue, Output, op


@op
def my_metadata_output(context):
def my_metadata_output(context) -> Output:
df = get_some_data()
yield Output(
return Output(
df,
metadata={
"text_metadata": "Text-based metadata for this event",
Expand All @@ -93,6 +60,30 @@ def my_metadata_output(context):

# end_op_output_3

# start_op_output_4
from dagster import Output, op
from typing import Tuple

# Using Output as type annotation without inner type
@op
def my_output_op() -> Output:
return Output("some_value")


# A single output with a parameterized type annotation
@op
def my_output_generic_op() -> Output[int]:
return Output(5)


# Multiple outputs using parameterized type annotation
@op(out={"int_out": Out(), "str_out": Out()})
def my_multiple_generic_output_op() -> Tuple[Output[int], Output[str]]:
return (Output(5), Output("foo"))


# end_op_output_4

# start_metadata_expectation_op
from dagster import ExpectationResult, MetadataValue, op

Expand Down Expand Up @@ -230,3 +221,15 @@ def my_expectation_op(context, df):


# end_expectation_op

# start_yield_outputs
from dagster import Output, op


@op(out={"out1": Out(str), "out2": Out(int)})
def my_op_yields():
yield Output(5, output_name="out2")
yield Output("foo", output_name="out1")


# end_yield_outputs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# pylint: disable=unused-argument
# isort: skip_file
# pylint: disable=unused-argument,reimported

import requests

Expand Down Expand Up @@ -117,3 +118,37 @@ def my_inner_op(**kwargs):


# end_op_factory_pattern_marker

# start_return_annotation
from dagster import op


@op
def return_annotation_op() -> int:
return 5


# end_return_annotation
# start_tuple_return
from dagster import op
from typing import Tuple


@op(out={"int_output": Out(), "str_output": Out()})
def my_multiple_output_annotation_op() -> Tuple[int, str]:
return (5, "foo")


# end_tuple_return

# start_single_output_tuple
from dagster import op
from typing import Tuple


@op
def my_single_tuple_output_op() -> Tuple[int, str]:
return (5, "foo") # Will be viewed as one output


# end_single_output_tuple
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
multiple,
naive,
other_arg,
return_dynamic,
)
from docs_snippets.concepts.ops_jobs_graphs.fan_in_job import fan_in
from docs_snippets.concepts.ops_jobs_graphs.jobs import (
Expand Down Expand Up @@ -74,6 +75,7 @@ def test_dynamic_examples():
assert chained.execute_in_process().success
assert other_arg.execute_in_process().success
assert multiple.execute_in_process().success
assert return_dynamic()


def test_retry_examples():
Expand Down

1 comment on commit 5f7d322

@vercel
Copy link

@vercel vercel bot commented on 5f7d322 Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.