Skip to content

Commit

Permalink
[docs] add multi_assets docs page [CON-33] (#8192)
Browse files Browse the repository at this point in the history
  • Loading branch information
OwenKephart committed Jun 7, 2022
1 parent b34f04d commit 85e9c3a
Show file tree
Hide file tree
Showing 4 changed files with 218 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,10 @@
{
"title": "Asset Observations",
"path": "/concepts/assets/asset-observations"
},
{
"title": "Multi-Assets",
"path": "/concepts/assets/multi-assets"
}
]
}
Expand Down
118 changes: 118 additions & 0 deletions docs/content/concepts/assets/multi-assets.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
title: Multi-Assets | Dagster
description: A multi-asset represents a set of assets that are all updated by the same operation.
---

# Multi-Assets

A multi-asset represents a set of assets that are all updated by the same operation.

## Relevant APIs

| Name | Description |
| ------------------------------------------- | ---------------------------------------- |
| <PyObject object="multi_asset" decorator /> | A decorator used to define multi-assets. |

## Overview

When working with [software-defined assets](/concepts/assets/software-defined-assets), it's sometimes inconvenient or impossible to map each persisted asset to a unique operation. A multi-asset is a way to define a single operation that will produce the contents of multiple data assets at once.

Multi-assets may be useful in the following scenarios:

- A single call to an API results in multiple tables being updated (e.g. Airbyte, Fivetran, dbt)
- A DataFrame is transformed into multiple assets, and serializing it between separate ops would be prohibitively inefficient
- The same in-memory object is used to compute multiple assets

## Defining multi-assets

The function responsible for computing the contents of any software-defined asset is an [op](/concepts/ops-jobs-graphs/ops). Multi-assets are responsible for updating multiple assets, so the underlying op will have multiple outputs -- one for each associated asset.

### A basic multi-asset

The easiest way to create a multi-asset is with the <PyObject object="multi_asset" decorator /> decorator. This decorator functions similarly to the <PyObject object="asset" decorator /> decorator, but requires an `outs` parameter specifying each output asset of the function.

```python file=/concepts/assets/multi_assets.py startafter=start_basic_multi_asset endbefore=end_basic_multi_asset
from dagster import Out, multi_asset


@multi_asset(
outs={
"my_string_asset": Out(),
"my_int_asset": Out(),
}
)
def my_function():
return "abc", 123
```

By default, the names of the outputs will be used to form the asset keys of the multi-asset. The decorated function will be used to create the op for these assets and must emit an output for each of them. In this case, we can emit multiple outputs by returning a tuple of values, one for each asset.

### Customizing how assets are materialized with IO managers

As with regular assets, you can customize how each asset is materialized with [IO managers](/concepts/io-management/io-managers). To do this, specify an `io_manager_key` on each output of the multi-asset.

```python file=/concepts/assets/multi_assets.py startafter=start_io_manager_multi_asset endbefore=end_io_manager_multi_asset
from dagster import Out, multi_asset


@multi_asset(
outs={
"s3_asset": Out(io_manager_key="s3_io_manager"),
"adls_asset": Out(io_manager_key="adls2_io_manager"),
},
)
def my_assets():
return "store_me_on_s3", "store_me_on_adls2"
```

### Subsetting multi-assets

By default, it is assumed that the computation inside of a multi-asset will always produce the contents all of the associated assets. This means that attempting to execute a set of assets that produces some, but not all, of the assets defined by a given multi-asset will result in an error.

Sometimes, the underlying computation is sufficiently flexible to allow for computing an arbitrary subset of the assets associated with it. In these cases, set the `is_required` attribute of the outputs to `False`, and set the `can_subset` parameter of the decorator to `True`.

Inside the body of the function, we can use `context.selected_output_names` or `context.selected_asset_keys` to find out which computations should be run.

```python file=/concepts/assets/multi_assets.py startafter=start_subsettable_multi_asset endbefore=end_subsettable_multi_asset
from dagster import Out, Output, multi_asset


@multi_asset(
outs={
"a": Out(is_required=False),
"b": Out(is_required=False),
},
can_subset=True,
)
def split_actions(context):
if "a" in context.selected_output_names:
yield Output(value=123, output_name="a")
if "b" in context.selected_output_names:
yield Output(value=456, output_name="b")
```

Because our outputs are now optional, we can no longer rely on returning a tuple of values, as we don't know in advance which values will be computed. Instead, we explicitly `yield` each output that we're expected to create.

### Dependencies inside multi-assets

When a multi-asset is created, it is assumed that each output asset depends on all of the input assets. This may not always be the case, however.

In these situations, you may optionally provide a mapping from each output asset to the set of <PyObject object="AssetKey" />s that it depends on. This information is used to display lineage information in Dagit and for parsing selections over your asset graph.

```python file=/concepts/assets/multi_assets.py startafter=start_asset_deps_multi_asset endbefore=end_asset_deps_multi_asset
from dagster import AssetKey, Out, Output, multi_asset


@multi_asset(
outs={"c": Out(), "d": Out()},
internal_asset_deps={
"c": {AssetKey("a")},
"d": {AssetKey("b")},
},
)
def my_complex_assets(a, b):
# c only depends on a
yield Output(value=a + 1, output_name="c")
# d only depends on b
yield Output(value=b + 1, output_name="d")
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# isort: skip_file
# pylint: disable=unused-argument,reimported

# start_basic_multi_asset
from dagster import Out, multi_asset


@multi_asset(
outs={
"my_string_asset": Out(),
"my_int_asset": Out(),
}
)
def my_function():
return "abc", 123


# end_basic_multi_asset

# start_io_manager_multi_asset
from dagster import Out, multi_asset


@multi_asset(
outs={
"s3_asset": Out(io_manager_key="s3_io_manager"),
"adls_asset": Out(io_manager_key="adls2_io_manager"),
},
)
def my_assets():
return "store_me_on_s3", "store_me_on_adls2"


# end_io_manager_multi_asset

# start_subsettable_multi_asset
from dagster import Out, Output, multi_asset


@multi_asset(
outs={
"a": Out(is_required=False),
"b": Out(is_required=False),
},
can_subset=True,
)
def split_actions(context):
if "a" in context.selected_output_names:
yield Output(value=123, output_name="a")
if "b" in context.selected_output_names:
yield Output(value=456, output_name="b")


# end_subsettable_multi_asset

# start_asset_deps_multi_asset
from dagster import AssetKey, Out, Output, multi_asset


@multi_asset(
outs={"c": Out(), "d": Out()},
internal_asset_deps={
"c": {AssetKey("a")},
"d": {AssetKey("b")},
},
)
def my_complex_assets(a, b):
# c only depends on a
yield Output(value=a + 1, output_name="c")
# d only depends on b
yield Output(value=b + 1, output_name="d")


# end_asset_deps_multi_asset
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from docs_snippets.concepts.assets.multi_assets import (
my_assets,
my_complex_assets,
my_function,
split_actions,
)


def test_basic():
assert len(my_function.asset_keys) == 2


def test_io_manager():
assert len(my_assets.asset_keys) == 2


def test_subset():
assert len(split_actions.asset_keys) == 2


def test_inter_asset_deps():
assert len(my_complex_assets.asset_keys) == 2

1 comment on commit 85e9c3a

@vercel
Copy link

@vercel vercel bot commented on 85e9c3a Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.