Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression (v1.7.8) - Increased Time Required to start a dbt Project #22709

Closed
tomsej opened this issue Jun 25, 2024 · 8 comments
Closed

Regression (v1.7.8) - Increased Time Required to start a dbt Project #22709

tomsej opened this issue Jun 25, 2024 · 8 comments
Labels
integration: dbt Related to dagster-dbt type: bug Something isn't working

Comments

@tomsej
Copy link

tomsej commented Jun 25, 2024

Dagster version

1.7.8

What's the issue?

I have a quite big dbt project (~2500 models). Prior to the 1.7.8 update, the startup process for local development was relatively swift, taking around 20 seconds. However, post-update, the same process now exceeds 1.5 minutes. Issue persists even without custom code (see below), suggesting the slowdown is linked to version 1.7.8:

Definition:

from unittest.mock import MagicMock

from dagster import (
    Definitions,
    load_assets_from_package_module,
)

from pipelines.locations.dbt_snowflake import assets

# Definitions
defs = Definitions(
    assets=load_assets_from_package_module(assets),
    resources={
        "dbt": MagicMock(),
    },
)

dbt_snowflake.py

from pathlib import Path

from dagster import AssetExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets


@dbt_assets(manifest=Path.cwd() / "dbt_snowflake" / "target" / "manifest.json")
def jaffle_shop_dbt_assets(context: AssetExecutionContext, dbt: DbtCliResource):
    yield from dbt.cli(["build"], context=context).stream()

I attempted to execute Dagster with the DEBUG log level using the following command: dagster dev --python-file pipelines/locations/dbt_snowflake/dbt_snowflake_definition.py --code-server-log-level debug --log-level debug, but it failed to yield additional information regarding the cause of the delay. Could you advise on alternative methods to diagnose the source of this performance issue?

The issue with prolonged startup times is not limited to dev; it also occurs in the Linux staging environment, where each step experiences substantial delays.

Flamegraphs

1.7.7.:
1 7 7

1.7.10:
1 7 10

Pyinstruments

1.7.7.:
CleanShot 2024-06-26 at 11 28 25@2x

1.7.10:
CleanShot 2024-06-26 at 11 26 07@2x

What did you expect to happen?

Performance between versions is the same.

How to reproduce?

No response

Deployment type

Local

Deployment details

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

@tomsej tomsej added the type: bug Something isn't working label Jun 25, 2024
@garethbrickman garethbrickman added the integration: dbt Related to dagster-dbt label Jun 26, 2024
@jonathanneo
Copy link

jonathanneo commented Jul 5, 2024

Experiencing significant slowdowns too (~7000 dbt models) after upgrading dagster from 1.7.2 to 1.7.12.

Before (1.7.2) running on linux: ~20 seconds

After (1.7.12) running on linux: ~300 seconds

I've tested it with different dagster versions 1.7.2, 1.7.7, 1.7.8, 1.7.12. Dagster 1.7.8 is where the slowdowns begin.

@rexledesma
Copy link
Member

@tomsej @jonathanneo If you could link your pyspy profiles in a speedscope compatible format, that would help our investigation. On our end, we'll investigate and try to reproduce the problem and see if any offending commits landed in 1.7.8.

@gibsondan
Copy link
Member

@tomsej if possible could you also run py-spy on the following command instead of the "dagster dev" command?

dagster api grpc -p 4000 --python-file <same file argument you are passing to dagster dev>

Your definitions are being loaded using that command in a subprocess - that will help us fully understand what the bottleneck is.

Adding -f speedscope to the py-spy command will create a JSON file that we can get more detailed information from.

@sryza
Copy link
Contributor

sryza commented Jul 8, 2024

Hey @tomsej - thanks for reporting this and for all the detail you included. Based on your pyinstrument output, I suspect I've found the source of the problem: #22874.

@tomsej
Copy link
Author

tomsej commented Jul 8, 2024

@gibsondan Sure, here are json files you requested:

Let me know if you need something else.

@sryza thanks a lot! Should I test the branch?

@sryza
Copy link
Contributor

sryza commented Jul 8, 2024

If you're able to test the branch that would be awesome!

@tomsej
Copy link
Author

tomsej commented Jul 8, 2024

I apologize, unfortunately, I wasn't by my computer. I have tested that branch and it is way better!! Seems even faster than in version 1.7.3! In 1.7.3 running dagster api grpc -p 4000 took ~15s, after your change ~7s. Here is py-spy json.

sryza added a commit that referenced this issue Jul 8, 2024
## Summary & Motivation

Now that specs are the source of truth on an `AssetsDefinition`'s
dependencies, the `asset_deps` property is a derived property that's
O(N) to compute, instead of its previous constant time.

This is a source of performance regressions:
#22709.

This PR avoids using this property in favor of directly getting
dependencies from the asset specs, which is a constant time operation.

## How I Tested These Changes
sryza added a commit that referenced this issue Jul 8, 2024
## Summary & Motivation

Now that specs are the source of truth on an AssetsDefinition's
dependencies, the asset_deps property is a derived property that's O(N)
to compute, instead of its previous constant time.

This is a source of performance regressions:
#22709.

While #22874 moves away from
using it internally, users could still be expecting constant time
behavior.

More background here:
#22874 (review)

## How I Tested These Changes
@sryza
Copy link
Contributor

sryza commented Jul 8, 2024

Awesome. That change will be included in this week's release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration: dbt Related to dagster-dbt type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants