Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce storage kind tag, helpers to @asset, @multi_asset #22037

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

benpankow
Copy link
Member

@benpankow benpankow commented May 22, 2024

Summary

Introduces a storage_kind param to @asset, @multi_assets decorator and to AssetSpec which sets the dagster/storage_kind tag from #22029 on the corresponding assets, similar to the kind tag we set on ops from compute_kind.

This is much more committal than #22029, so ok holding off for now.

@asset(compute_kind="python", storage_kind="snowflake")
def my_snowflake_asset():
  ...
@multi_assets(
  specs=
  compute_kind="python",
  storage_kind="bigquery"
)
def my_multi_assets():
  ...
@multi_asset(
    specs=[
        AssetSpec("asset1", storage_kind="snowflake"),
        AssetSpec("asset2", storage_kind="bigquery"),
    ],
)
def my_multi_assets():
  ...

This can be displayed in the UI specially as well as automatically populated by our various integrations.

Test Plan

Unit tests.

@benpankow
Copy link
Member Author

benpankow commented May 22, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @benpankow and the rest of your teammates on Graphite Graphite

@schrockn
Copy link
Member

Definitely prefer moving forward with #22029 (which I will approve now because it is a 2-way door) for the time being and holding off on this until we understand more. Is there a doc describing this new tag type and the plans for it? I'm also curious re: @braunjj 's take on this, as he has been advocating for a storage kind or something like it for a long time.

@braunjj
Copy link
Contributor

braunjj commented May 23, 2024

I'm supportive of this (and it has been requested by users a few times ASAIK. I think it's important that Storage Kind has parity with Compute Kind. For example you should be able to filter and group assets by storage kind just like you can with compute kinds.

@benpankow has implemented a bunch of that filtering so I feel good about this.

I think we might want to rethink how these tags appear on asset nodes and list views in the future now that assets can have Compute, Storage, and other arbitrary tags; but I don't think this is blocking any short term progress. I think our users will be very happy about this change.

@benpankow
Copy link
Member Author

The goal of the storage kind tag is to enable users to be able to sort & filter their assets by destination and to give an immediate indication of what data backs and asset at a glance when browsing in the catalog.

Right now it can be tricky to immediately distinguish which assets live in a warehouse vs in blob storage vs a database or SaaS tool unless they're named strategically - compute kind is often not enough in cases where tools like dbt or ETL tools span multiple storage types.

A couple user flows which get a lot easier with this metadata:

  • A stakeholder browsing Snowflake wants to figure out how recently a table users was rebuilt, and wants to find the logic that backs it. They go to the Dagster asset catalog and search users - but multiple assets with that name exist corresponding to an asset in ClickUp and a data source in the prod db. The Snowflake compute kind lets them distinguish the right asset immediately.

  • An engineer is triaging the failure of a pipeline that they are not particularly familiar with. The pipeline failed halfway through. The storage kind of the assets which failed to materialize helps the engineer understand the blast radius of the pipeline includes Stripe records, and they are able to notify the ops team that bills may be affected.

The basic UI implementation from #22031 and #22041 looks like the following:

Screenshot 2024-05-22 at 2 09 55 PM Screenshot 2024-05-22 at 1 33 08 PM Screenshot 2024-05-22 at 1 33 04 PM Screenshot 2024-05-22 at 2 09 43 PM Screenshot 2024-05-22 at 2 14 30 PM

Base automatically changed from benpankow/storage-kind to master May 24, 2024 20:12
@ion-elgreco
Copy link
Contributor

Awesome feature!

@ion-elgreco
Copy link
Contributor

#14475 related issue

@benpankow
Copy link
Member Author

Internal discussion for cross-linking purposes https://github.com/dagster-io/internal/discussions/9937

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants