Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1303] Respect node selection in catalog queries run by docs generate #6014

Closed
Tracked by #8316
jtcohen6 opened this issue Oct 6, 2022 · 4 comments · Fixed by #8772
Closed
Tracked by #8316

[CT-1303] Respect node selection in catalog queries run by docs generate #6014

jtcohen6 opened this issue Oct 6, 2022 · 4 comments · Fixed by #8772
Assignees
Labels
enhancement New feature or request node selection Functionality and syntax for selecting DAG nodes performance Team:Adapters Issues designated for the adapter area of the code user docs [docs.getdbt.com] Needs better documentation

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 6, 2022

When users specify dbt docs generate --select <selection_criteria>, that limits the set of project resources for which dbt will generate compile SQL. This is implicit behavior in the GenerateTask, because it inherits from the CompileTask, where that behavior is defined.

We should additionally make docs generate respect node selection for the purposes of generating catalogs, which can be very very expensive.

To really have an effect, this change would want to be paired with #4997, which would filter down our catalog queries to only the ones that are actually selected. Without that change, while we would successfully limit ourselves to cataloging metadata on only the schema(s) containing the selected resource(s), we'd still catalog the entire schema, which could include hundreds/thousands of unrelated resources.

Use case

  • I'm working in a project with hundreds/thousands of models
  • I just made changes to one model, e.g. by updating its SQL and adding descriptions
  • I ran that one model: dbt run --select my_one_model
  • Now I want to view documentation for just that one model: dbt docs generate --select my_one_model
  • I understand that catalog info (columns, table stats, etc) will be missing for all other models

Technical implementation

Note

To be clear, this would not accomplish what users tend to ask for when asking for more powerful docs generate --select, which is to produce a manifest.json with only the selected resources. Relevant discussions: #5096, #5244

@jtcohen6 jtcohen6 added enhancement New feature or request performance node selection Functionality and syntax for selecting DAG nodes Team:Adapters Issues designated for the adapter area of the code labels Oct 6, 2022
@github-actions github-actions bot changed the title Respect node selection in catalog queries run by docs generate [CT-1303] Respect node selection in catalog queries run by docs generate Oct 6, 2022
@scottsoithongsuk
Copy link

Thanks @jtcohen6 - the comment around not accomplishing what users tend to ask for (I'm in that boat!) - I've had a look at the discussions referenced, but wanted to ask if there's any indication on potential solution or timeline?
I'm looking for basically what's been released here, but to filter down the catalog.

@JavierLopezT
Copy link

Thanks @jtcohen6 for pointing me to this from #7813.

This is exactly what I am looking for :)

@kevinneville
Copy link
Contributor

Thanks @jtcohen6 for pointing me to this from #7813.

This is exactly what I am looking for :)

Please upvote this to get higher priority of this :D

@dbeatty10
Copy link
Contributor

In the meantime, here is an example override for default__get_catalog that I protyped for use with dbt-postgres (postgres__get_catalog ).

It includes a diff that can be used as a guide to make a similar override for other adapters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request node selection Functionality and syntax for selecting DAG nodes performance Team:Adapters Issues designated for the adapter area of the code user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants