Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add storage_kind to assets, similar to compute_kind #14475

Closed
danielgafni opened this issue May 25, 2023 · 12 comments
Closed

Add storage_kind to assets, similar to compute_kind #14475

danielgafni opened this issue May 25, 2023 · 12 comments
Assignees
Labels
area: tags Related to tagging and labeling type: feature-request

Comments

@danielgafni
Copy link
Contributor

danielgafni commented May 25, 2023

What's the use case?

It seems like compute_kind was initially created for ops (is this why it's named op_tags in the frontend code?).
90% of my assets have Python compute kind :)

Would be nice to see Parquet logo in Dagit, for example.

Perhaps a similar storage_kind (which would represent the data type) could be added to assets?

Alternatively, it could be assigned to the IOManager.

@danielgafni danielgafni changed the title Add Parquet compute_kind icon Add asset_kind to assets, similar to compute_kind May 25, 2023
@OwenKephart
Copy link
Contributor

Hi @danielgafni ! Thanks for the suggestion -- this is something we've had in the back of our minds for awhile... We've generally conceptualized this as a "storage kind", and (as you mention) potentially having it sourced from the IOManager.

This would be displayed independently from the "compute_kind" (e.g. you do your computation in Pandas, then store it in Snowflake vs. you do your computation in Pandas, then store it on s3).

Tons of interesting things that could be done with this -- for example you could create views for "all assets stored in Snowflake", or if we enforce that this tag is only sourced from a resource, you could imagine a "delete asset" command literally invoking a resource to delete the asset (as opposed to the current "wipe asset" command, which just removes materializations)

@ion-elgreco
Copy link
Contributor

ion-elgreco commented Feb 15, 2024

@OwenKephart any plan to pick this up..?

@alexismanin
Copy link

This would be very useful, indeed.

I would personally like two distinct tags:

  • one for target storage medium: S3, local, sftp, postgresql, etc.
  • one for the output data/file format, ideally accepting (common) media-types: application/json, image/webp, etc.

Why I would like two separate tags:

  • for databases, it will be the same in a majority of cases, but not always. Two examples that come to my mind:
    • In PostgreSQL, you can store a JSON as JSONB, or any format of file in a BLOB or a TEXT column.
    • Embedded databases (DuckDb, H2, HSQL, SQLite, etc.) can be files stored in an external file-system (although, for this one, I do not know if it is really feasible to do live-access on a database file on a remote file-system)
  • when storing in a local or distributed file-system, the storage medium and the asset data format are two separate things. On a file-system (local, S3, etc.), you can store Parquet file, CSV, pickle, etc.

Anyway, being able to search assets by their destination resource/storage and by data format would be useful. And when discovering/reading an asset lineage, having tags showing where and how it is stored would also be very useful.

In my opinion, having default storage/asset_kind tags assigned by the IOManager could be great. But for use-cases like external assets or manual dumping, being able to define formats on asset decorator would be desirable.

@bmarcj
Copy link

bmarcj commented Feb 15, 2024

I'd prefer one tag than more, just on the grounds of simplicity. The additional abstraction of 'medium' vs 'format' isn't always meaningful, and where it is it can already be encapsulated into the proposed single tag.

If we want to tag assets with a variety of metadata - possibly even in dict format - I believe there's already a proposal for this?

@OwenKephart
Copy link
Contributor

Hi all! There is a recent proposal for functionality along these lines here: #19737, which I'd definitely encourage you to add your thoughts to.

@ion-elgreco
Copy link
Contributor

@OwenKephart how is that the same thing?

We are talking about adding additional UI labels here for the storage on an asset

@alexismanin
Copy link

@ion-elgreco : I think there is overlap. The proposal #19737 aims to provide a generic system of tags for assets. A tag that describe asset output could be part of it. Dagster could unify tag definition with this system.

For example, if it is designed as a set of key/value, The UI could search and pick such tags to allow advanced search or improve asset display (by adding a storage label, for example).

@danielgafni
Copy link
Contributor Author

danielgafni commented Feb 16, 2024

Yes, it can work similarly to how currently partition_key is actually passed as Run tag.

There can be a special interface like:

@asset(storage_kind="parquet", compute_kind="polars", labels={"foo": "bar"})

and internally Dagster could insert them into the labels dictionary:

{
    "storage_kind": "parquet",
    "compute_kind": "polars",
    "foo": "bar",
}

storage_kind and compute_kind could then have special treatment in the UI (e.g. displaying icons, etc)

@danielgafni
Copy link
Contributor Author

I also believe that actually storage_kind should be a property of Out. So a multi_asset could have different storage kinds for different outputs

@mkleinbort-ic
Copy link

Yes, I'd like this.

@garethbrickman garethbrickman added the area: tags Related to tagging and labeling label Mar 8, 2024
@garethbrickman garethbrickman changed the title Add asset_kind to assets, similar to compute_kind Add storage_kind to assets, similar to compute_kind May 14, 2024
@garethbrickman
Copy link
Contributor

Note that as of 1.7.9: Dagster will now display a “storage kind” tag on assets in the UI, similar to the existing compute kind. To set storage kind for an asset, set its dagster/storage_kind tag.

@ion-elgreco
Copy link
Contributor

@garethbrickman then the issue can be closed, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: tags Related to tagging and labeling type: feature-request
Projects
None yet
Development

No branches or pull requests

8 participants