Add `storage_kind` to assets, similar to `compute_kind` #14475

danielgafni · 2023-05-25T21:41:18Z

What's the use case?

It seems like compute_kind was initially created for ops (is this why it's named op_tags in the frontend code?).
90% of my assets have Python compute kind :)

Would be nice to see Parquet logo in Dagit, for example.

Perhaps a similar storage_kind (which would represent the data type) could be added to assets?

Alternatively, it could be assigned to the IOManager.

The text was updated successfully, but these errors were encountered:

OwenKephart · 2023-05-25T21:59:28Z

Hi @danielgafni ! Thanks for the suggestion -- this is something we've had in the back of our minds for awhile... We've generally conceptualized this as a "storage kind", and (as you mention) potentially having it sourced from the IOManager.

This would be displayed independently from the "compute_kind" (e.g. you do your computation in Pandas, then store it in Snowflake vs. you do your computation in Pandas, then store it on s3).

Tons of interesting things that could be done with this -- for example you could create views for "all assets stored in Snowflake", or if we enforce that this tag is only sourced from a resource, you could imagine a "delete asset" command literally invoking a resource to delete the asset (as opposed to the current "wipe asset" command, which just removes materializations)

ion-elgreco · 2024-02-15T06:36:07Z

@OwenKephart any plan to pick this up..?

alexismanin · 2024-02-15T07:13:04Z

This would be very useful, indeed.

I would personally like two distinct tags:

one for target storage medium: S3, local, sftp, postgresql, etc.
one for the output data/file format, ideally accepting (common) media-types: application/json, image/webp, etc.

Why I would like two separate tags:

for databases, it will be the same in a majority of cases, but not always. Two examples that come to my mind:
- In PostgreSQL, you can store a JSON as JSONB, or any format of file in a BLOB or a TEXT column.
- Embedded databases (DuckDb, H2, HSQL, SQLite, etc.) can be files stored in an external file-system (although, for this one, I do not know if it is really feasible to do live-access on a database file on a remote file-system)
when storing in a local or distributed file-system, the storage medium and the asset data format are two separate things. On a file-system (local, S3, etc.), you can store Parquet file, CSV, pickle, etc.

Anyway, being able to search assets by their destination resource/storage and by data format would be useful. And when discovering/reading an asset lineage, having tags showing where and how it is stored would also be very useful.

In my opinion, having default storage/asset_kind tags assigned by the IOManager could be great. But for use-cases like external assets or manual dumping, being able to define formats on asset decorator would be desirable.

bmarcj · 2024-02-15T09:44:13Z

I'd prefer one tag than more, just on the grounds of simplicity. The additional abstraction of 'medium' vs 'format' isn't always meaningful, and where it is it can already be encapsulated into the proposed single tag.

If we want to tag assets with a variety of metadata - possibly even in dict format - I believe there's already a proposal for this?

OwenKephart · 2024-02-15T22:57:26Z

Hi all! There is a recent proposal for functionality along these lines here: #19737, which I'd definitely encourage you to add your thoughts to.

ion-elgreco · 2024-02-16T07:08:04Z

@OwenKephart how is that the same thing?

We are talking about adding additional UI labels here for the storage on an asset

alexismanin · 2024-02-16T08:44:53Z

@ion-elgreco : I think there is overlap. The proposal #19737 aims to provide a generic system of tags for assets. A tag that describe asset output could be part of it. Dagster could unify tag definition with this system.

For example, if it is designed as a set of key/value, The UI could search and pick such tags to allow advanced search or improve asset display (by adding a storage label, for example).

danielgafni · 2024-02-16T10:19:16Z

Yes, it can work similarly to how currently partition_key is actually passed as Run tag.

There can be a special interface like:

@asset(storage_kind="parquet", compute_kind="polars", labels={"foo": "bar"})

and internally Dagster could insert them into the labels dictionary:

{
    "storage_kind": "parquet",
    "compute_kind": "polars",
    "foo": "bar",
}

storage_kind and compute_kind could then have special treatment in the UI (e.g. displaying icons, etc)

danielgafni · 2024-02-16T17:02:15Z

I also believe that actually storage_kind should be a property of Out. So a multi_asset could have different storage kinds for different outputs

mkleinbort-ic · 2024-03-08T11:57:11Z

Yes, I'd like this.

garethbrickman · 2024-06-06T23:32:20Z

Note that as of 1.7.9: Dagster will now display a “storage kind” tag on assets in the UI, similar to the existing compute kind. To set storage kind for an asset, set its dagster/storage_kind tag.

ion-elgreco · 2024-06-07T05:26:30Z

@garethbrickman then the issue can be closed, right?

danielgafni added the type: feature-request label May 25, 2023

danielgafni changed the title ~~Add Parquet compute_kind icon~~ Add asset_kind to assets, similar to compute_kind May 25, 2023

garethbrickman added the area: tags Related to tagging and labeling label Mar 8, 2024

garethbrickman changed the title ~~Add asset_kind to assets, similar to compute_kind~~ Add storage_kind to assets, similar to compute_kind May 14, 2024

ion-elgreco mentioned this issue May 24, 2024

Introduce storage kind tag, helpers to @asset, @multi_asset #22037

Open

sryza assigned benpankow May 24, 2024

garethbrickman closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `storage_kind` to assets, similar to `compute_kind` #14475

Add `storage_kind` to assets, similar to `compute_kind` #14475

danielgafni commented May 25, 2023 •

edited by garethbrickman

Loading

OwenKephart commented May 25, 2023

ion-elgreco commented Feb 15, 2024 •

edited

Loading

alexismanin commented Feb 15, 2024

bmarcj commented Feb 15, 2024 •

edited

Loading

OwenKephart commented Feb 15, 2024

ion-elgreco commented Feb 16, 2024

alexismanin commented Feb 16, 2024

danielgafni commented Feb 16, 2024 •

edited

Loading

danielgafni commented Feb 16, 2024

mkleinbort-ic commented Mar 8, 2024

garethbrickman commented Jun 6, 2024

ion-elgreco commented Jun 7, 2024

Add storage_kind to assets, similar to compute_kind #14475

Add storage_kind to assets, similar to compute_kind #14475

Comments

danielgafni commented May 25, 2023 • edited by garethbrickman Loading

What's the use case?

OwenKephart commented May 25, 2023

ion-elgreco commented Feb 15, 2024 • edited Loading

alexismanin commented Feb 15, 2024

bmarcj commented Feb 15, 2024 • edited Loading

OwenKephart commented Feb 15, 2024

ion-elgreco commented Feb 16, 2024

alexismanin commented Feb 16, 2024

danielgafni commented Feb 16, 2024 • edited Loading

danielgafni commented Feb 16, 2024

mkleinbort-ic commented Mar 8, 2024

garethbrickman commented Jun 6, 2024

ion-elgreco commented Jun 7, 2024

Add `storage_kind` to assets, similar to `compute_kind` #14475

Add `storage_kind` to assets, similar to `compute_kind` #14475

danielgafni commented May 25, 2023 •

edited by garethbrickman

Loading

ion-elgreco commented Feb 15, 2024 •

edited

Loading

bmarcj commented Feb 15, 2024 •

edited

Loading

danielgafni commented Feb 16, 2024 •

edited

Loading