Critical path model doesn't make sense with segmented project #83

alanmcruickshank · 2022-02-07T10:44:47Z

We very rarely refresh our whole dbt project in one go - much more likely is that individual marts are refreshed separately.

This makes the critical path model a little wierd, because it currently only picks up the most recent critical path (for whichever mart ran most recently).

I have a suggestion to make this better:

In our setup, each mart is identifiable by a unique combination of model selectors (another option would be to use the dbt cloud job id, but that wouldn't work for local deployments). That could be turned into an identifier with a schema something like: <target>|<selector>. That results in default|* for most simple projects (if no explicit selectors is rendered as *), or e.g. my_target|models/mart_a|my_package for more complicated setups (if multiple selectors are also delimited by |). I'm going to call this a job_selector.
Within each job_selector, the critical path makes sense again, so adding this as a column to that model, would then mean users could see the critical path for each job_selector. For users with a basic setup, this wouldn't add any additional granularity to that model, because they would only have one.
To limit explosion in larger projects we could allow configuration to limit which job_selector ids are allowed to materialise in this model (maybe not something to build in round 1 though).
This also helps to clear up the difference betweeen the existing latest_full_model_executions model and the new current_models model - in that the latter just has one row per node, but the former has one row per nod per job selector.

Does this sound like a viable solution to the problem? @NiallRees

The text was updated successfully, but these errors were encountered:

NiallRees · 2022-07-21T20:55:59Z

We've now reimplemented this package, opening it up to be compatible with more adapters (adding databricks compatibility out of the gate). This model has yet to be implemented in the new version of the package, but I'm going to leave this issue here so that we can talk through a new incarnation of it. I'm wondering if this model should have a set of entries for every dbt execution, which describe the critical path in each?

https://github.com/brooklyn-data/dbt_artifacts/releases/tag/1.0.0b1

alanmcruickshank mentioned this issue Jul 25, 2022

Add fct_dbt__critical_path back? #138

Closed

NiallRees closed this as completed Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Critical path model doesn't make sense with segmented project #83

Critical path model doesn't make sense with segmented project #83

alanmcruickshank commented Feb 7, 2022

NiallRees commented Jul 21, 2022

Critical path model doesn't make sense with segmented project #83

Critical path model doesn't make sense with segmented project #83

Comments

alanmcruickshank commented Feb 7, 2022

NiallRees commented Jul 21, 2022