Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3144] [Regression] build never starts on 1.6.2 and 1.6.3 #8692

Closed
2 tasks done
renanleme opened this issue Sep 22, 2023 · 12 comments · Fixed by #8696
Closed
2 tasks done

[CT-3144] [Regression] build never starts on 1.6.2 and 1.6.3 #8692

renanleme opened this issue Sep 22, 2023 · 12 comments · Fixed by #8696
Assignees
Labels
backport 1.6.latest bug Something isn't working Highest Severity critical bug that must be resolved immediately regression

Comments

@renanleme
Copy link
Contributor

renanleme commented Sep 22, 2023

Is this a regression in a recent version of dbt-core?

  • I believe this is a regression in dbt-core functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

Afterm upgrading to 1.6.2 from 1.5.1 our build is taking way longer to start the first model
From around 7 to 8 minutes to complete to more than 1 hour without starting.

I'm running dbt build --selector X

python: 3.11.5
dbt_core: 1.5.1 -> 1.6.2
dbt-snowflake: 1.5.1-> 1.6.2

Models:

Found 2840 models, 31 snapshots, 10 analyses, 2277 tests, 78 seeds, 1 operation, 499 sources, 0 exposures, 0 metrics, 960 macros, 0 groups, 0 semantic models

Selector example

  - name: XYZ
    definition:
      union:
        - '*'
        - exclude:
            - method: selector
              value: P1
            - method: selector
              value: P2
            - method: selector
              value: P3
            - method: selector
              value: P4
            - method: selector
              value: P5

Expected/Previous Behavior

That the build would take around 8 minutes to start after compiling all the models

Steps To Reproduce

root@b78d02b2ea5d:/usr/app/dbt# dbt build  --selector XYZ
15:10:34  Running with dbt=1.5.1
15:11:43  Found 2840 models, 2277 tests, 31 snapshots, 10 analyses, 909 macros, 1 operation, 78 seed files, 499 sources, 0 exposures, 0 metrics, 0 groups
15:18:21  
15:18:52  Concurrency: 8 threads (target='dev-ssh-password')
15:18:52  
15:18:52  1 of 1260 START sql view model raw.dbt_meta.Z ................. [RUN]

Takes around 6 to 7 minutes

Same command, different version

dbt build --selector XYZ
14:42:04  Running with dbt=1.6.2
14:42:05  Registered adapter: snowflake=1.6.2
14:42:06  Unable to do partial parsing because of a version mismatch
14:42:35  Found 2840 models, 31 snapshots, 10 analyses, 2277 tests, 78 seeds, 1 operation, 499 sources, 0 exposures, 0 metrics, 960 macros, 0 groups, 0 semantic models

And it goes for more than 50 minutes without starting with CPU at 100%

Relevant log output

Checking the log file there's nothing being logged after `Found 2840 models...` line besides the CPU usage at 100%

Environment

- OS: Running inside container - `python:3.11.5-slim-bookworm`
`Linux 5.15.49-linuxkit-pr #1 SMP PREEMPT Thu May 25 07:27:39 UTC 2023 aarch64 GNU/Linux`
- Python: `3.11.5`
- dbt (working version): `1.5.1`
- dbt (regression version):  `1.6.2` and `1.6.3`

Which database adapter are you using with dbt?

snowflake

Additional Context

@renanleme renanleme added bug Something isn't working regression triage labels Sep 22, 2023
@github-actions github-actions bot changed the title [Regression] build never starts on 1.6.2 [CT-3144] [Regression] build never starts on 1.6.2 Sep 22, 2023
@dbeatty10 dbeatty10 changed the title [CT-3144] [Regression] build never starts on 1.6.2 [CT-3144] [Regression] build never starts on 1.6.2 and 1.6.3 Sep 22, 2023
@graciegoheen
Copy link
Contributor

HI!

I am curious about this line of your output:

Unable to do partial parsing because of a version mismatch

If you try doing a dbt build using 1.5.1 with the no-partial-parse flag (dbt --no-partial-parse build), are you seeing the same performance issue?

@renanleme
Copy link
Contributor Author

renanleme commented Sep 22, 2023

HI!

I am curious about this line of your output:

Unable to do partial parsing because of a version mismatch

If you try doing a dbt build using 1.5.1 with the no-partial-parse flag (dbt --no-partial-parse build), are you seeing the same performance issue?

Hi, Gracie!

Thanks for checking!

I just ran the with the flag as requested on 1.5.1 and it ran even faster

root@f5e591bc8ed1:/usr/app/dbt# dbt --no-partial-parse  build --selector XYZ
16:29:02  Running with dbt=1.5.1
16:29:36  Found 2840 models, 2277 tests, 31 snapshots, 10 analyses, 909 macros, 1 operation, 78 seed files, 499 sources, 0 exposures, 0 metrics, 0 groups
16:30:01  
16:30:16  Concurrency: 8 threads (target='dev-ssh-password')
16:30:16  
16:30:16  1 of 1260 START sql view model raw.dbt_meta.Z ................. [RUN]

Meanwhile, I'm testing the same with 1.6.3 just in case... let's see

@peterallenwebb
Copy link
Contributor

@renanleme Would you be willing to do the following?

  1. Run the command with 1.6.3
  2. Wait about 10 minutes
  3. Break with Ctrl-C to interrupt
  4. Capture the Python stack trace and reply with it here?

It might give us a hint as to what is going on so that we can reproduce the issue.

@renanleme
Copy link
Contributor Author

renanleme commented Sep 22, 2023

Hi, @peterallenwebb

Sure, it was running for more than 1 hour, here is the stack trace

root@2f12c5a4a5b4:/usr/app/dbt# dbt --no-partial-parse  build --selector XYZ
16:34:38  Running with dbt=1.6.3
16:34:39  Registered adapter: snowflake=1.6.2
16:35:07  Found 2840 models, 31 snapshots, 10 analyses, 2277 tests, 78 seeds, 1 operation, 499 sources, 0 exposures, 0 metrics, 960 macros, 0 groups, 0 semantic models
^C17:55:51  Encountered an error:

17:55:51  Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 87, in wrapper
    result, success = func(*args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 72, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 143, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 172, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 219, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 259, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/cli/main.py", line 208, in build
    results = task.run()
              ^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/task/runnable.py", line 449, in run
    self._runtime_initialize()
  File "/venv/lib/python3.11/site-packages/dbt/task/compile.py", line 147, in _runtime_initialize
    super()._runtime_initialize()
  File "/venv/lib/python3.11/site-packages/dbt/task/runnable.py", line 146, in _runtime_initialize
    self.job_queue = self.get_graph_queue()
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/task/runnable.py", line 139, in get_graph_queue
    return selector.get_graph_queue(spec)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 312, in get_graph_queue
    selected_nodes = self.get_selected(spec)
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 303, in get_selected
    selected_nodes, indirect_only = self.select_nodes(spec)
                                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 158, in select_nodes
    direct_nodes, indirect_nodes = self.select_nodes_recursively(spec)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in select_nodes_recursively
    bundles = [self.select_nodes_recursively(component) for component in spec]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in <listcomp>
    bundles = [self.select_nodes_recursively(component) for component in spec]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in select_nodes_recursively
    bundles = [self.select_nodes_recursively(component) for component in spec]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in <listcomp>
    bundles = [self.select_nodes_recursively(component) for component in spec]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in select_nodes_recursively
    bundles = [self.select_nodes_recursively(component) for component in spec]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 130, in <listcomp>
    bundles = [self.select_nodes_recursively(component) for component in spec]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 128, in select_nodes_recursively
    direct_nodes, indirect_nodes = self.get_nodes_from_criteria(spec)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 95, in get_nodes_from_criteria
    neighbors = self.collect_specified_neighbors(spec, collected)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/selector.py", line 115, in collect_specified_neighbors
    additional.update(self.graph.select_parents(selected, depth))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/graph.py", line 72, in select_parents
    ancestors.update(self.ancestors(node, max_depth))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/graph.py", line 31, in ancestors
    filtered_graph = self.exclude_edge_type("parent_test")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/graph.py", line 45, in exclude_edge_type
    return nx.restricted_view(
           ^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/networkx/classes/function.py", line 506, in restricted_view
    hide_edges = nxf.hide_diedges(edges)
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/networkx/classes/filters.py", line 30, in hide_diedges
    edges = {(u, v) for u, v in edges}
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/networkx/classes/filters.py", line 30, in <setcomp>
    edges = {(u, v) for u, v in edges}
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/dbt/graph/graph.py", line 51, in <genexpr>
    if self.graph[a][b].get("edge_type") == edge_type_to_exclude
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

root@2f12c5a4a5b4:/usr/app/dbt# date
Fri Sep 22 17:56:05 UTC 2023

My guess is probably some issue with the recursion when excluding some models as we are using exclusion on our selector

@peterallenwebb
Copy link
Contributor

peterallenwebb commented Sep 22, 2023

I strongly suspect a performance regression associated with this changeset: f1dddaa

@renanleme It's really asking you to go above and beyond, but if you can revert the changes in this changeset on dbt/graph/graph.py and re-run in 1.6.3, that would prove our case.

@renanleme
Copy link
Contributor Author

I strongly suspect a performance regression associated with this changeset: f1dddaa

@renanleme It's really asking you to go above and beyond, but if you can revert the changes in this changeset on dbt/graph/graph.py and re-run in 1.6.3, that would prove our case.

Sure, I will try that in a bit

@renanleme
Copy link
Contributor Author

renanleme commented Sep 22, 2023

I strongly suspect a performance regression associated with this changeset: f1dddaa

@renanleme It's really asking you to go above and beyond, but if you can revert the changes in this changeset on dbt/graph/graph.py and re-run in 1.6.3, that would prove our case.

Hey, @peterallenwebb

Looks like that's the issue.

I reverted this one: f1dddaa

And it worked 🎉

dbt --no-partial-parse  build --selector XYZ --target dev-ssh-password                                                                                                                                                                 
19:01:19  Running with dbt=1.6.3
19:01:19  Registered adapter: snowflake=1.6.2
19:01:46  Found 2840 models, 31 snapshots, 10 analyses, 2277 tests, 78 seeds, 1 operation, 499 sources, 0 exposures, 0 metrics, 960 macros, 0 groups, 0 semantic models
19:02:13  
19:02:31  Concurrency: 8 threads (target='dev-ssh-password')
19:02:31  
19:02:31  1 of 1260 START sql view model raw.dbt_meta.Z ................. [RUN]

Let me know if I can help debugging more.

@peterallenwebb
Copy link
Contributor

peterallenwebb commented Sep 22, 2023

@renanleme Thanks for your extremely fast responses! I'll let the team know and we'll get a fix released as soon as we can.

@renanleme
Copy link
Contributor Author

renanleme commented Sep 22, 2023

@renanleme Thanks for your extremely fast responses! I'll let the team know and we'll a fix released as soon as we can.

You are welcome!

I was testing and I think this this code below solves the issue here:

    def descendants(self, node: UniqueId, max_depth: Optional[int] = None) -> Set[UniqueId]:
        """Returns all nodes reachable from `node` in `graph`"""
        if not self.graph.has_node(node):
            raise DbtInternalError(f"Node {node} not found in the graph!")
        filtered_graph = self.exclude_edge_type("parent_test")
        return {child for _, child in nx.bfs_edges(filtered_graph, node, depth_limit=max_depth)}

    def exclude_edge_type(self, edge_type_to_exclude):
        return nx.subgraph_view(
            self.graph,
            filter_edge=partial(self.filter_test_edges, edge_type=edge_type_to_exclude)
        )

    def filter_test_edges(self, n1, n2, edge_type):
        return self.graph[n1][n2].get("edge_type", edge_type)

The amount of models are the same so it should be ok (I ran some tests, all passed, including tests/unit/test_graph_selection.py)

dbt --no-partial-parse  build --selector XYZ --target dev-ssh-password                                                                                                                                                        
20:15:43  Running with dbt=1.6.3
20:15:43  Registered adapter: snowflake=1.6.2
20:16:10  Found 2840 models, 31 snapshots, 10 analyses, 2277 tests, 78 seeds, 1 operation, 499 sources, 0 exposures, 0 metrics, 960 macros, 0 groups, 0 semantic models
20:17:18  
20:17:34  Concurrency: 8 threads (target='dev-ssh-password')
20:17:34  
20:17:34  1 of 1260 START sql view model raw.dbt_meta.stg_dbt__exposures ................. [RUN]
  

Thanks @peterallenwebb !

@dbeatty10 dbeatty10 removed the triage label Sep 24, 2023
@peterallenwebb peterallenwebb self-assigned this Sep 25, 2023
@graciegoheen graciegoheen added the Highest Severity critical bug that must be resolved immediately label Sep 25, 2023
@peterallenwebb
Copy link
Contributor

peterallenwebb commented Sep 25, 2023

Update on my progress. I created a quick random graph generator to test the relative performance of the Graph.ancestors function under some hopefully-reasonable assumptions. This is the script:

from dbt.graph.graph import Graph
import time
import networkx
import random

def rand_edge(u, v, prob_test_parent):
    return (u,v,{'edge_type': 'parent_test'}) if random.random() < prob_test_parent else (u, v)

def rand_dag(num_nodes, exp_out_degree, prob_test_parent, seed):
    complete_edge_count = num_nodes * (num_nodes-1) / 2.0
    edge_prob = exp_out_degree * num_nodes / complete_edge_count
    g = networkx.gnp_random_graph(num_nodes, edge_prob, seed, directed=True)
    dag = networkx.DiGraph([rand_edge(u, v, prob_test_parent) for (u,v) in g.edges() if u < v])
    return dag

def time_ancestors(g, test_nodes):
    start_time = time.perf_counter()
    
    for node in test_nodes:
        a = g.ancestors(node)
    
    stop_time = time.perf_counter()
    return stop_time - start_time

seed = 12345

g100 = Graph(rand_dag(100, 10.0, 0.15, seed))
g400 = Graph(rand_dag(400, 10.0, 0.15, seed))
g1600 = Graph(rand_dag(1600, 10.0, 0.15, seed))
g3600 = Graph(rand_dag(3600, 10.0, 0.15, seed))

print("g100: " + str(time_ancestors(g100, [10, 20, 40, 60, 80])))
print("g400: " + str(time_ancestors(g400, [40, 80, 160, 240, 320])))
print("g1600: " + str(time_ancestors(g1600, [160, 320, 640, 960, 1290])))
print("g3600: " + str(time_ancestors(g3600, [300, 600, 1000, 2000, 3200])))

The results on some different branches:

1.5.latest
g100: 0.000161666888743639
g400: 0.00036137504503130913
g1600: 0.0011218339204788208
g3600: 0.0029966658912599087

1.6.latest
g100: 0.00265895901247859
g400: 0.009041625075042248
g1600: 0.0356107079423964
g3600: 0.08249700022861362

main
g100: 0.0025829579681158066
g400: 0.009305458050221205
g1600: 0.03589854063466191
g3600: 0.08135524997487664

patched according to current PR
g100: 0.0015306249260902405
g400: 0.003078209236264229
g1600: 0.010677333921194077
g3600: 0.02506020897999406

Observations

Assuming the random graphs are a reasonable approximation of the real world, we regressed the performance of this function in 1.6.0, and it now takes forty times longer to execute. However, even with the currently proposed fix PR, the performance is still much worse (ten times longer) than in 1.5.latest.

It's possible that on real-world data the PR is a better fix than these tests indicate, but it's hard to know, so I'm working on a more fundamental fix.

@peterallenwebb
Copy link
Contributor

reopening until backport is complete

@peterallenwebb
Copy link
Contributor

@renanleme Your contribution has been merged and backported. It will appear soon in dbt 1.6.4. Thank you for contributing to dbt-core!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.6.latest bug Something isn't working Highest Severity critical bug that must be resolved immediately regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants