Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical path model doesn't make sense with segmented project #83

Closed
alanmcruickshank opened this issue Feb 7, 2022 · 1 comment
Closed

Comments

@alanmcruickshank
Copy link
Contributor

We very rarely refresh our whole dbt project in one go - much more likely is that individual marts are refreshed separately.

This makes the critical path model a little wierd, because it currently only picks up the most recent critical path (for whichever mart ran most recently).

I have a suggestion to make this better:

  • In our setup, each mart is identifiable by a unique combination of model selectors (another option would be to use the dbt cloud job id, but that wouldn't work for local deployments). That could be turned into an identifier with a schema something like: <target>|<selector>. That results in default|* for most simple projects (if no explicit selectors is rendered as *), or e.g. my_target|models/mart_a|my_package for more complicated setups (if multiple selectors are also delimited by |). I'm going to call this a job_selector.
  • Within each job_selector, the critical path makes sense again, so adding this as a column to that model, would then mean users could see the critical path for each job_selector. For users with a basic setup, this wouldn't add any additional granularity to that model, because they would only have one.
  • To limit explosion in larger projects we could allow configuration to limit which job_selector ids are allowed to materialise in this model (maybe not something to build in round 1 though).
  • This also helps to clear up the difference betweeen the existing latest_full_model_executions model and the new current_models model - in that the latter just has one row per node, but the former has one row per nod per job selector.

Does this sound like a viable solution to the problem? @NiallRees

@NiallRees
Copy link
Contributor

We've now reimplemented this package, opening it up to be compatible with more adapters (adding databricks compatibility out of the gate). This model has yet to be implemented in the new version of the package, but I'm going to leave this issue here so that we can talk through a new incarnation of it. I'm wondering if this model should have a set of entries for every dbt execution, which describe the critical path in each?

https://github.com/brooklyn-data/dbt_artifacts/releases/tag/1.0.0b1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants