Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph Asset more Asset-like #19932

Open
IsmaelRDeMelo opened this issue Feb 21, 2024 · 1 comment
Open

Graph Asset more Asset-like #19932

IsmaelRDeMelo opened this issue Feb 21, 2024 · 1 comment
Labels
area: ops/graphs/jobs Related to Dagster ops, graphs and jobs type: feature-request

Comments

@IsmaelRDeMelo
Copy link

What's the use case?

Hi,

I'm a native OPs JOBs user on Dagster, since 0.1.0 or below, can't remeber anymore, lmao, but anyway.
I would like to suggest graph_assets being treated like real assets, when we put the asset decorator we have lots of parameters to use and etc and as the point of graph_asset decorator is to convert a DAG of OPs into a single asset, I think it could have the same features as a normal asset.

Why is that? Because sometimes doing something like this

@asset(
    compute_kind="api",
    partitions_def=daily_partitions,
    metadata={"partition_expr": "created_at"},
    backfill_policy=BackfillPolicy.single_run()
)
def users(context, api: RawDataAPI):
    """A table containing all users data"""
    # during a backfill the partition range will span multiple hours
    # during a single run the partition range will be for a single hour
    first_partition, last_partition = context.asset_partitions_time_window_for_output()
    partition_seq = _daily_partition_seq(first_partition, last_partition)
    all_users = []
    for partition in partition_seq:
        resp = api.get_users(partition)
        users = pd.read_json(resp.json())
        all_users.append(users)

    return pd.concat(all_users)

just for the sake of using the @asset decorator isn't nice when you could just use the most powerfull resource in Dagster so far, which is DynamicOut. Instead of processing in a linear for loop, you would process all the partitions in its own process all in parallel. Faster. Better to visualize. Better to debug. Imagine if one partition from this asset gets a problem? The for loop would crash.

I'm giving this example and I know you could just use a multiprocessing lib here, but for me it's better to use something Dagster already gives us.

"I have a dream. The dream in which I build entire pipelines with OPs and they're just assets in the end, the best of two worlds, having your steps and also having them as your assets with all the capabilities that Dagster can offer." - MELO, Ismael. (2024)

And this just 1 use-case of many that may exist. I don't know if what I said makes sense for you guys, but for me makes a lot, hehe, if I'm crazy, please let me know :)

Ideas of implementation

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

@garethbrickman garethbrickman added the area: ops/graphs/jobs Related to Dagster ops, graphs and jobs label Feb 21, 2024
@ion-elgreco
Copy link
Contributor

For me the fundamental issue with Ops and Graph Assets is you can't configure that Ops in a graph are executed in a single execution using an inmemoryiomanager on any executor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ops/graphs/jobs Related to Dagster ops, graphs and jobs type: feature-request
Projects
None yet
Development

No branches or pull requests

3 participants