[CT-899] Update manifest flat_graph after deferral, to support dbt-metrics compilation #5525

jtcohen6 · 2022-07-25T10:48:13Z

Metrics + deferral don't work perfectly together, because of how the dbt-metrics package is doing the "lookup" for a metric's model parent.

I can envision two ways to solve this problem:

Use a "real" ref in the dbt-metrics package here, instead of re-constructing the model relation via graph lookup. I'd certainly feel better about it, if we did it this way :)
Rebuild the flat_graph after deferral. This is actually very simple, and could be a good way to go regardless. It will also enable users who are extracting metadata from the graph variable (e.g. dbt_artifacts package rewrite: Release 1.0.0b1 brooklyn-data/dbt_artifacts#132). There are two places we could do this: b26c7ca

Those aren't mutually exclusive. Both could be good ideas in their own right.

Reproduction case

-- models/model_a.sql
select 1 as id, current_timestamp as ts

# models/metric.yml
version: 2

metrics:
  - name: some_metric
    label: Some Metric
    model: ref('model_a')

    type: count
    sql: id # superfluous here, but shown as an example

    timestamp: ts
    time_grains: [day]

-- models/model_b.sql

-- {{ metric('some_metric') }}

{% if execute %}
  {% set model_ref_node = graph.nodes.values() | selectattr('name', 'equalto', 'model_a') | first %}
  {% set relation = api.Relation.create(
      database = model_ref_node.database,
      schema = model_ref_node.schema,
      identifier = model_ref_node.alias
  )
  %}
{% else %}
  {% set relation = "" %}
{% endif %}

-- this one is a real ref
select * from {{ ref('model_a') }}
union all
-- this one is synthesized via 'graph' var
select * from {{ relation }}

I have two profile targets set up: prod (with schema: analytics) and dev (with schema: dbt_jcohen). The default is dev.

Run into a "prod" schema: dbt run --target prod
Move the "prod" manifest (target/manifest.json) into a new folder, e.g. state/
Change the metric definition (e.g. sql: id → sql: user_id)
Switch to using the dev target. dbt ls -s state:modified+ --state state/ --target dev to confirm that some_metric + model_b are both selected, and model_a is not selected
Ensure that dev schema is empty (e.g. drop schema dbt_jcohen cascade)
dbt run -s state:modified+ --state state/ --defer --target dev
Check logs / compiled SQL:

-- this one is a real ref
select * from "jerco"."analytics"."model_a"
union all
-- this one is synthesized via 'graph' var
select * from "jerco"."dbt_jcohen"."model_a"

Resolution

Following proposal 2 above, after either of the additions in b26c7ca, this works:

-- this one is a real ref
select * from "jerco"."analytics"."model_a"
union all
-- this one is synthesized via 'graph' var
select * from "jerco"."analytics"."model_a"

Any concerns?

dbt-core/core/dbt/contracts/graph/manifest.py

Lines 690 to 701 in 1071a46

    
               def build_flat_graph(self): 
        
                   """This attribute is used in context.common by each node, so we want to 
        
                   only build it once and avoid any concurrency issues around it. 
        
                   Make sure you don't call this until you're done with building your 
        
                   manifest! 
        
                   """ 
        
                   self.flat_graph = { 
        
                       "exposures": {k: v.to_dict(omit_none=False) for k, v in self.exposures.items()}, 
        
                       "metrics": {k: v.to_dict(omit_none=False) for k, v in self.metrics.items()}, 
        
                       "nodes": {k: v.to_dict(omit_none=False) for k, v in self.nodes.items()}, 
        
                       "sources": {k: v.to_dict(omit_none=False) for k, v in self.sources.items()}, 
        
                   }

This does require us to rebuild flat_graph after parsing is over, at the beginning of execution, since that's where deferral happens. Deferral is resolved before any individual nodes are run in threads, though, so I don't believe we'd need to worry about thread safety.

The text was updated successfully, but these errors were encountered:

leahwicz · 2022-07-26T17:22:59Z

Let's make sure to do a test case here before we close this out

callum-mcdata · 2022-08-08T17:49:14Z

Current State

As @jtcohen6 raised in the above issue, there are problems with how the metrics package is currently handling model references. For the curious, this is because metrics can have nested dependencies that resolve as models 1+n nodes upstream. For example, if you have an expression metric that is built on a chain of 4 other metrics, we need to resolve the model(s) reference at the end of that chain.

Right now we construct that information from graph. We can find the edges of the metric dependency, get the model name from the metric, construct that as a relation and then use it in our sql generation. This produces issues, however, when it comes to state deferral.

Desired End State

As most of this will happen behind the scenes, there are few constraints when it comes to resolving this issue. The end goal is that the relationship would be correctly picked up and understood . @jtcohen6 raised a few ideas on how to resolve this:

Add dependencies at parse time. Whenever dbt-core detects a call to metric(), it also tries to look up that metric's model dependencies, to record them in depends_on.nodes, too. This is potentially very slow, and I'm not even sure it's possible — the metrics haven't been parsed yet.
More complex check in the RuntimeRefResolver. We adjust the logic in RuntimeRefResolver (code linked above) so it checks to ensure that either:
- the ref'd model is in self.model.depends_on.nodes, OR
- the ref'd model is upstream of this model, because it's upstream of a metric that is in self.model.depends_on.nodes / self.model.metrics. This feels sorta gross, and like it requires information that's not in scope here.
Resolve the ref for the metric, rather than the downstream model. If there were a way to "compile" a metric so that it's the one resolving the ref, everything's golden. We'd need to store the resolved form of the metric's model reference ... in the graph somewhere ... or make it accessible as another Python method.

One thing I do want to call out is that we'd need this functionality to be able to support n+1 node depth. IE not just understanding metric --> model but also understanding metric --> ... --> model.

Would love to hear thoughts about solutions from the languages team!

jtcohen6 · 2022-08-10T13:28:37Z

@callum-mcdata Thanks for the detailed comment! I just opened it as a new issue, for a spike/investigation: #5637

Let's keep using this ticket for the narrowly scoped fix described above: rebuild flat_graph after deferral has been resolved

jtcohen6 added state Stateful selection (state:modified, defer) Team:Language labels Jul 25, 2022

github-actions bot changed the title ~~Update manifest flat_graph after deferral, to support dbt-metrics compilation~~ [CT-899] Update manifest flat_graph after deferral, to support dbt-metrics compilation Jul 25, 2022

jtcohen6 added this to the v1.3 milestone Jul 26, 2022

jtcohen6 mentioned this issue Jul 28, 2022

[CT-948] [Feature] Adding Metric Tree Parsing Helper Functions to dbt Core #5567

Closed

1 task

pquadri mentioned this issue Jul 29, 2022

--defer and metrics dbt-labs/dbt_metrics#67

Closed

callum-mcdata mentioned this issue Jul 29, 2022

Use Real Refs and Real Metrics - Fixing CI Defer dbt-labs/dbt_metrics#64

Closed

jtcohen6 mentioned this issue Aug 10, 2022

[CT-1023] [SPIKE] Use real ref in metric compilation logic #5637

Closed

gshank self-assigned this Sep 7, 2022

gshank mentioned this issue Sep 7, 2022

Call build_flat_graph in merge_from_artifact #5786

Merged

6 tasks

jtcohen6 mentioned this issue Sep 8, 2022

(RUNTIME-410) cache manifests in memory dbt-labs/dbt-server#85

Merged

gshank closed this as completed in #5786 Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-899] Update manifest flat_graph after deferral, to support dbt-metrics compilation #5525

[CT-899] Update manifest flat_graph after deferral, to support dbt-metrics compilation #5525

jtcohen6 commented Jul 25, 2022 •

edited

Loading

leahwicz commented Jul 26, 2022

callum-mcdata commented Aug 8, 2022

jtcohen6 commented Aug 10, 2022

[CT-899] Update manifest flat_graph after deferral, to support dbt-metrics compilation #5525

[CT-899] Update manifest flat_graph after deferral, to support dbt-metrics compilation #5525

Comments

jtcohen6 commented Jul 25, 2022 • edited Loading

Reproduction case

Resolution

Any concerns?

leahwicz commented Jul 26, 2022

callum-mcdata commented Aug 8, 2022

Current State

Desired End State

jtcohen6 commented Aug 10, 2022

jtcohen6 commented Jul 25, 2022 •

edited

Loading