Skip to content

Commit

Permalink
[docs] - Document non-argument deps for assets [CON-16] (#7962)
Browse files Browse the repository at this point in the history
* Document non-argument deps

* Add test

* Remove parenthesis

* Fix spacing

* Run snapshot

* Re-enable pylint in example

* Remove resource from example

* Run formatting scripts

* Resources - Move Overview up

* Run snapshot

* Trying to fix buildkite

* Update non_argument_deps.py

* fix lint

Co-authored-by: Sandy Ryza <sandy@elementl.com>
  • Loading branch information
erinkcochran87 and sryza committed Jun 8, 2022
1 parent e8f256e commit a2d7861
Show file tree
Hide file tree
Showing 4 changed files with 79 additions and 14 deletions.
25 changes: 25 additions & 0 deletions docs/content/concepts/assets/software-defined-assets.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,31 @@ Using source assets has a few advantages over having the code inside of an asset
- **Dagster can use data-loading code factored into an <PyObject object="IOManager" /> to load the contents of the source asset**.
- **Asset dependencies can be written in a consistent way,** independent of whether they're downstream from a source asset or a derived asset. This makes it easy to swap out a source asset for a derived asset and vice versa.

#### Non-argument dependencies

Alternatively, you can define dependencies where data from an upstream asset doesn’t need to be loaded by Dagster to compute the output of a downstream asset. When used, `non_argument_deps` defines the dependency between assets but doesn’t pass data through Dagster.

Consider the following example:

1. `upstream_asset` creates a new table (`sugary_cereals`) by selecting records from the `cereals` table
2. `downstream_asset` then creates a new table (`shopping_list`) by selecting records from `sugary_cereals`

```python file=/concepts/assets/non_argument_deps.py startafter=start_marker endbefore=end_marker
from dagster import asset


@asset
def upstream_asset():
execute_query("CREATE TABLE sugary_cereals AS SELECT * FROM cereals")


@asset(non_argument_deps={"upstream_asset"})
def downstream_asset():
execute_query("CREATE TABLE shopping_list AS SELECT * FROM sugary_cereals")
```

In this example, Dagster doesn’t need to load data from `upstream_asset` to successfully compute the `downstream_asset`. While `downstream_asset` does depend on `upstream_asset`, the key difference with `non_argument_deps` is that data isn’t being passed between the functions. Specifically, the data from the `sugary_cereals` table isn't being passed as an argument to `downstream_asset`.

### Graph-backed assets

[Basic software-defined assets](#a-basic-software-defined-asset) can only produce one data artifact. If generating an asset involves multiple discrete computations, you can use graph-backed assets by separating each computation into an op and building a graph to combine your computations. This way, each discrete computation can be reused in other assets and jobs.
Expand Down
39 changes: 25 additions & 14 deletions docs/content/concepts/resources.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,29 @@ description: Resources enable you to separate graph logic from environment, and

# Resources

Resources provide a way to manage dependencies to external components and share implementations across multiple ops in a job.
Resources provide a way to manage dependencies to external components and share implementations across multiple ops or assets in a job.

Using resources, you could:

- Provide access to features of the execution environment to ops
- Bind a set of resources and other environment information to a job so that those resources can be available to the ops within that job
- Construct different jobs for the same graph, each with different resources, to represent the execution environments that the graph will be run in

### Why use resources?

External dependencies as resources are:

- **Pluggable**: Map a resource to a key in one job and then map a different resource to that same key in a different job. This is useful if there is a heavy external dependency that you want to use in production, but avoid using in testing.

You can simply provide different resource sets for each execution case: one for production with the heavy dependency (e.g., AWS) as a resource, and one for testing with something lighter (i.e., in-memory store) mapped to the same key. For more information about this capability, check out [Separating Business Logic from Environments](/concepts/testing#separating-business-logic-from-environments).

- **Job Scoped**: Since resources are job scoped, if you provide a resource to a job, then it becomes available for use with every op in that job.

- **Configurable**: Resources can be configured using a strongly typed [configuration system](/concepts/configuration/config-schema).

- **Dependencies**: Resources can depend on other resources. This makes it possible to cleanly represent external environment objects that rely on other external environment information for initialization.

---

## Relevant APIs

Expand All @@ -17,20 +39,9 @@ Resources provide a way to manage dependencies to external components and share
| <PyObject object="build_init_resource_context"/> | Function for building an <PyObject object="InitResourceContext"/> outside of execution, intended to be used when testing a resource. |
| <PyObject object="build_resources"/> | Function for initializing a set of resources outside of the context of a job's execution. |

## Overview

You can use **resources** to provide access to features of the execution environment to ops. You can bind a set of resources (and other environment information) to a job so that those resources can be available to the ops within that job. You can construct different jobs for the same graph, each with different resources, to represent the execution environments that your graph will be run within.

### Why Use Resources

Representing external dependencies as resources, in conjunction with jobs, has very convenient properties:

- **Pluggable**: You can map a resource to a key in one job, and then map a different resource to that same key in a different job. This is useful if there is a heavy external dependency that you want to use in production, but avoid using it in testing. You can simply provide different resource sets for each execution case: one for production with the heavy dependency (e.g., AWS) as a resource, and one for testing with something lighter (i.e., in-memory store) mapped to the same key. For more information about this capability, check out [Separating Business Logic from Environments](/concepts/testing#separating-business-logic-from-environments).
- **Job Scoped**: Since resources are job scoped, if you provide a resource to a job, then it becomes available for use with every op in that job.
- **Configurable**: Resources can be configured, using a strongly typed [configuration system](/concepts/configuration/config-schema).
- **Dependencies**: Resources can depend on other resources. This makes it possible to cleanly represent external environment objects that rely on other external environment information for initialization.
---

## Defining a Resource
## Defining a resource

To define a resource, use the <PyObject object="resource" decorator/> decorator. Wrap a function that takes an `init_context` as the first parameter, which is an instance of <PyObject object="InitResourceContext"/>. From this function, return or yield the object that you would like to be available as a resource.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
def execute_query(query):
del query


# start_marker

from dagster import asset


@asset
def upstream_asset():
execute_query("CREATE TABLE sugary_cereals AS SELECT * FROM cereals")


@asset(non_argument_deps={"upstream_asset"})
def downstream_asset():
execute_query("CREATE TABLE shopping_list AS SELECT * FROM sugary_cereals")


# end_marker
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from dagster import AssetGroup
from docs_snippets.concepts.assets.non_argument_deps import (
downstream_asset,
upstream_asset,
)


def test_non_argument_deps():
AssetGroup([upstream_asset, downstream_asset]).materialize()

1 comment on commit a2d7861

@vercel
Copy link

@vercel vercel bot commented on a2d7861 Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.