Skip to content

Commit

Permalink
[docs] - Graph-backed assets [CON-34] (#8174)
Browse files Browse the repository at this point in the history
* First pass

* Add example

* Review comments

* Remove unused imports; run snapshot

* Run isort
  • Loading branch information
erinkcochran87 committed Jun 8, 2022
1 parent eea9a77 commit 003bfc0
Show file tree
Hide file tree
Showing 2 changed files with 92 additions and 0 deletions.
42 changes: 42 additions & 0 deletions docs/content/concepts/assets/software-defined-assets.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,46 @@ def downstream_asset(upstream_asset):

The [explicit dependencies](#explicit-dependencies) example covers an alternative way to specify asset dependencies without needing to match argument names to upstream asset names.

### Graph-backed assets

[Basic software-defined assets](#a-basic-software-defined-asset) can only produce one data artifact. If generating an asset involves multiple discrete computations, you can use graph-backed assets by separating each computation into an op and building a graph to combine your computations. This way, each discrete computation can be reused in other assets and jobs.

This approach to asset creation allows you to define a graph of ops which can produce one or multiple assets. **Note**: To use graphs to create an asset, the graph **must return an output**.

To define a graph-backed asset, use the `from_graph` attribute on the `AssetsDefinition` object:

```python file=/concepts/assets/graph_backed_asset.py startafter=start example endbefore=end example
@op(required_resource_keys={"slack"})
def fetch_files_from_slack(context) -> DataFrame:
files = context.resources.slack.files_list(channel="#random")
return DataFrame(
[
{
"id": file.get("id"),
"created": file.get("created"),
"title": file.get("title"),
"permalink": file.get("permalink"),
}
for file in files
]
)


@op
def store_files(files):
return files.to_sql(name="slack_files", con=create_db_connection())


@graph
def store_slack_files_in_sql():
store_files(fetch_files_from_slack())


graph_asset = AssetsDefinition.from_graph(store_slack_files_in_sql)
```

**Note**: All output assets must be selected when using a graph-backed asset to create a job. Dagster will select all graph output automatically upon creating a job.

### Asset context

Since a software-defined asset contains an op, all the typical functionality of an op - like the use of [resources](/concepts/resources) and [configuration](#asset-configuration) - is available to an asset. Supplying the `context` parameter provides access to system information for the op, for example:
Expand Down Expand Up @@ -92,6 +132,8 @@ def my_configurable_asset(context):

Refer to the [Config schema documentation](/concepts/configuration/config-schema) for more configuration info and examples.

---

## Combining assets in groups

To materialize assets or load them in Dagit, you first need to combine them into an <PyObject object="AssetGroup" />, which is a set of assets with no unsatisfied dependencies. For example:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
from unittest import mock

from pandas import DataFrame

from dagster import AssetGroup, AssetsDefinition, ResourceDefinition, graph, op


def create_db_connection():
return "yay"


# start example


@op(required_resource_keys={"slack"})
def fetch_files_from_slack(context) -> DataFrame:
files = context.resources.slack.files_list(channel="#random")
return DataFrame(
[
{
"id": file.get("id"),
"created": file.get("created"),
"title": file.get("title"),
"permalink": file.get("permalink"),
}
for file in files
]
)


@op
def store_files(files):
return files.to_sql(name="slack_files", con=create_db_connection())


@graph
def store_slack_files_in_sql():
store_files(fetch_files_from_slack())


graph_asset = AssetsDefinition.from_graph(store_slack_files_in_sql)

# end example

slack_mock = mock.MagicMock()

store_slack_files = AssetGroup(
[graph_asset],
resource_defs={"slack": ResourceDefinition.hardcoded_resource(slack_mock)},
).build_job("store_slack_files")

1 comment on commit 003bfc0

@vercel
Copy link

@vercel vercel bot commented on 003bfc0 Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.