[docs] - Document non-argument deps for assets [CON-16] (#7962)

* Document non-argument deps * Add test * Remove parenthesis * Fix spacing * Run snapshot * Re-enable pylint in example * Remove resource from example * Run formatting scripts * Resources - Move Overview up * Run snapshot * Trying to fix buildkite * Update non_argument_deps.py * fix lint Co-authored-by: Sandy Ryza <sandy@elementl.com>
dagster-io · Jun 8, 2022 · a2d7861 · a2d7861 · vercel · Jun 8, 2022
1 parent e8f256e
commit a2d7861
Show file tree

Hide file tree

Showing 4 changed files with 79 additions and 14 deletions.
diff --git a/docs/content/concepts/assets/software-defined-assets.mdx b/docs/content/concepts/assets/software-defined-assets.mdx
@@ -177,6 +177,31 @@ Using source assets has a few advantages over having the code inside of an asset
 - **Dagster can use data-loading code factored into an <PyObject object="IOManager" /> to load the contents of the source asset**.
 - **Asset dependencies can be written in a consistent way,** independent of whether they're downstream from a source asset or a derived asset. This makes it easy to swap out a source asset for a derived asset and vice versa.
 
+#### Non-argument dependencies
+
+Alternatively, you can define dependencies where data from an upstream asset doesn’t need to be loaded by Dagster to compute the output of a downstream asset. When used, `non_argument_deps` defines the dependency between assets but doesn’t pass data through Dagster.
+
+Consider the following example:
+
+1. `upstream_asset` creates a new table (`sugary_cereals`) by selecting records from the `cereals` table
+2. `downstream_asset` then creates a new table (`shopping_list`) by selecting records from `sugary_cereals`
+
+```python file=/concepts/assets/non_argument_deps.py startafter=start_marker endbefore=end_marker
+from dagster import asset
+
+
+@asset
+def upstream_asset():
+    execute_query("CREATE TABLE sugary_cereals AS SELECT * FROM cereals")
+
+
+@asset(non_argument_deps={"upstream_asset"})
+def downstream_asset():
+    execute_query("CREATE TABLE shopping_list AS SELECT * FROM sugary_cereals")
+```
+
+In this example, Dagster doesn’t need to load data from `upstream_asset` to successfully compute the `downstream_asset`. While `downstream_asset` does depend on `upstream_asset`, the key difference with `non_argument_deps` is that data isn’t being passed between the functions. Specifically, the data from the `sugary_cereals` table isn't being passed as an argument to `downstream_asset`.
+
 ### Graph-backed assets
 
 [Basic software-defined assets](#a-basic-software-defined-asset) can only produce one data artifact. If generating an asset involves multiple discrete computations, you can use graph-backed assets by separating each computation into an op and building a graph to combine your computations. This way, each discrete computation can be reused in other assets and jobs.

diff --git a/docs/content/concepts/resources.mdx b/docs/content/concepts/resources.mdx
@@ -5,7 +5,29 @@ description: Resources enable you to separate graph logic from environment, and
 
 # Resources
 
-Resources provide a way to manage dependencies to external components and share implementations across multiple ops in a job.
+Resources provide a way to manage dependencies to external components and share implementations across multiple ops or assets in a job.
+
+Using resources, you could:
+
+- Provide access to features of the execution environment to ops
+- Bind a set of resources and other environment information to a job so that those resources can be available to the ops within that job
+- Construct different jobs for the same graph, each with different resources, to represent the execution environments that the graph will be run in
+
+### Why use resources?
+
+External dependencies as resources are:
+
+- **Pluggable**: Map a resource to a key in one job and then map a different resource to that same key in a different job. This is useful if there is a heavy external dependency that you want to use in production, but avoid using in testing.
+
+  You can simply provide different resource sets for each execution case: one for production with the heavy dependency (e.g., AWS) as a resource, and one for testing with something lighter (i.e., in-memory store) mapped to the same key. For more information about this capability, check out [Separating Business Logic from Environments](/concepts/testing#separating-business-logic-from-environments).
+
+- **Job Scoped**: Since resources are job scoped, if you provide a resource to a job, then it becomes available for use with every op in that job.
+
+- **Configurable**: Resources can be configured using a strongly typed [configuration system](/concepts/configuration/config-schema).
+
+- **Dependencies**: Resources can depend on other resources. This makes it possible to cleanly represent external environment objects that rely on other external environment information for initialization.
+
+---
 
 ## Relevant APIs
 
@@ -17,20 +39,9 @@ Resources provide a way to manage dependencies to external components and share
 | <PyObject object="build_init_resource_context"/> | Function for building an <PyObject object="InitResourceContext"/> outside of execution, intended to be used when testing a resource.                                                                                        |
 | <PyObject object="build_resources"/>             | Function for initializing a set of resources outside of the context of a job's execution.                                                                                                                                   |
 
-## Overview
-
-You can use **resources** to provide access to features of the execution environment to ops. You can bind a set of resources (and other environment information) to a job so that those resources can be available to the ops within that job. You can construct different jobs for the same graph, each with different resources, to represent the execution environments that your graph will be run within.
-
-### Why Use Resources
-
-Representing external dependencies as resources, in conjunction with jobs, has very convenient properties:
-
-- **Pluggable**: You can map a resource to a key in one job, and then map a different resource to that same key in a different job. This is useful if there is a heavy external dependency that you want to use in production, but avoid using it in testing. You can simply provide different resource sets for each execution case: one for production with the heavy dependency (e.g., AWS) as a resource, and one for testing with something lighter (i.e., in-memory store) mapped to the same key. For more information about this capability, check out [Separating Business Logic from Environments](/concepts/testing#separating-business-logic-from-environments).
-- **Job Scoped**: Since resources are job scoped, if you provide a resource to a job, then it becomes available for use with every op in that job.
-- **Configurable**: Resources can be configured, using a strongly typed [configuration system](/concepts/configuration/config-schema).
-- **Dependencies**: Resources can depend on other resources. This makes it possible to cleanly represent external environment objects that rely on other external environment information for initialization.
+---
 
-## Defining a Resource
+## Defining a resource
 
 To define a resource, use the <PyObject object="resource" decorator/> decorator. Wrap a function that takes an `init_context` as the first parameter, which is an instance of <PyObject object="InitResourceContext"/>. From this function, return or yield the object that you would like to be available as a resource.
 

diff --git a/examples/docs_snippets/docs_snippets/concepts/assets/non_argument_deps.py b/examples/docs_snippets/docs_snippets/concepts/assets/non_argument_deps.py
@@ -0,0 +1,20 @@
+def execute_query(query):
+    del query
+
+
+# start_marker
+
+from dagster import asset
+
+
+@asset
+def upstream_asset():
+    execute_query("CREATE TABLE sugary_cereals AS SELECT * FROM cereals")
+
+
+@asset(non_argument_deps={"upstream_asset"})
+def downstream_asset():
+    execute_query("CREATE TABLE shopping_list AS SELECT * FROM sugary_cereals")
+
+
+# end_marker
diff --git a/...s/docs_snippets/docs_snippets_tests/concepts_tests/assets_tests/test_non_argument_deps.py b/...s/docs_snippets/docs_snippets_tests/concepts_tests/assets_tests/test_non_argument_deps.py
@@ -0,0 +1,9 @@
+from dagster import AssetGroup
+from docs_snippets.concepts.assets.non_argument_deps import (
+    downstream_asset,
+    upstream_asset,
+)
+
+
+def test_non_argument_deps():
+    AssetGroup([upstream_asset, downstream_asset]).materialize()