You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We very rarely refresh our whole dbt project in one go - much more likely is that individual marts are refreshed separately.
This makes the critical path model a little wierd, because it currently only picks up the most recent critical path (for whichever mart ran most recently).
I have a suggestion to make this better:
In our setup, each mart is identifiable by a unique combination of model selectors (another option would be to use the dbt cloud job id, but that wouldn't work for local deployments). That could be turned into an identifier with a schema something like: <target>|<selector>. That results in default|* for most simple projects (if no explicit selectors is rendered as *), or e.g. my_target|models/mart_a|my_package for more complicated setups (if multiple selectors are also delimited by |). I'm going to call this a job_selector.
Within each job_selector, the critical path makes sense again, so adding this as a column to that model, would then mean users could see the critical path for each job_selector. For users with a basic setup, this wouldn't add any additional granularity to that model, because they would only have one.
To limit explosion in larger projects we could allow configuration to limit which job_selector ids are allowed to materialise in this model (maybe not something to build in round 1 though).
This also helps to clear up the difference betweeen the existing latest_full_model_executions model and the new current_models model - in that the latter just has one row per node, but the former has one row per nod per job selector.
Does this sound like a viable solution to the problem? @NiallRees
The text was updated successfully, but these errors were encountered:
We've now reimplemented this package, opening it up to be compatible with more adapters (adding databricks compatibility out of the gate). This model has yet to be implemented in the new version of the package, but I'm going to leave this issue here so that we can talk through a new incarnation of it. I'm wondering if this model should have a set of entries for every dbt execution, which describe the critical path in each?
We very rarely refresh our whole dbt project in one go - much more likely is that individual marts are refreshed separately.
This makes the critical path model a little wierd, because it currently only picks up the most recent critical path (for whichever mart ran most recently).
I have a suggestion to make this better:
<target>|<selector>
. That results indefault|*
for most simple projects (if no explicit selectors is rendered as*
), or e.g.my_target|models/mart_a|my_package
for more complicated setups (if multiple selectors are also delimited by|
). I'm going to call this ajob_selector
.job_selector
, the critical path makes sense again, so adding this as a column to that model, would then mean users could see the critical path for eachjob_selector
. For users with a basic setup, this wouldn't add any additional granularity to that model, because they would only have one.job_selector
ids are allowed to materialise in this model (maybe not something to build in round 1 though).latest_full_model_executions
model and the newcurrent_models
model - in that the latter just has one row per node, but the former has one row per nod per job selector.Does this sound like a viable solution to the problem? @NiallRees
The text was updated successfully, but these errors were encountered: