Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs for graphs that depend on assets #12597

Merged
merged 1 commit into from
Mar 1, 2023
Merged

Conversation

sryza
Copy link
Contributor

@sryza sryza commented Feb 28, 2023

Summary & Motivation

Motivated by this feedback: https://dagster.slack.com/archives/C01U5LFUZJS/p1677548018200809

How I Tested These Changes

@vercel
Copy link

vercel bot commented Feb 28, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
dagster ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Feb 28, 2023 at 11:05PM (UTC)
1 Ignored Deployment
Name Status Preview Comments Updated
dagit-storybook ⬜️ Ignored (Inspect) Feb 28, 2023 at 11:05PM (UTC)

Copy link
Contributor

@erinkcochran87 erinkcochran87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, but looks good!

docs/content/concepts/ops-jobs-graphs/graphs.mdx Outdated Show resolved Hide resolved
docs/content/concepts/ops-jobs-graphs/graphs.mdx Outdated Show resolved Hide resolved
@@ -262,6 +263,36 @@ Note that in most cases, it is usually possible to pass some data dependency. In

Dagster also provides more advanced abstractions to handle dependencies and IO. If you find that you are finding it difficult to model data dependencies when using external storage, check out [IO managers](/concepts/io-management/io-managers).

### Loading an asset as an input

You can supply an asset as an input to one of the ops in a graph. Dagster can then use the [IO manager](/concepts/io-management/io-managers) on the asset to load the input value for the op.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I override the IOManager used by the asset? I can do that with @asset(Ins={"key": AssetIn(..., input_manager_key: "overriding_io_mgr")}) for assets, how do I do it with ops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't put together that this is an "unconnected input" but I guess that makes sense, OK.

If the asset is partitioned, then:

- If the job is partitioned, the corresponding partition of the asset will be loaded.
- If the job is not partitioned, then all partitions of the asset will be loaded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "all partitions will be loaded" mean for the shape of the value? Is it a list, or a dictionary, or a generator, or something else? I'm wondering how I ought to write my op to handle that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type depends on the I/O manager implementation:

  • The Pandas and PySpark type handlers of the DB IO managers (Snowflake, DuckDB, BigQuery) always return a single DataFrame, which can includes values from all the partitions.
  • When loading an input that corresponds to multiple partitions, the UPathIOManager returns a dictionary that maps each input partition key to the input value for that partition key.

This needs better docs, but I don't think this is the right place to put them.

@sryza sryza merged commit dab3831 into master Mar 1, 2023
@sryza sryza deleted the source-asset-graph-input-docs branch March 1, 2023 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants