Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the use of --selector and --models together #803

Closed
rahul-ve opened this issue Aug 9, 2021 · 4 comments
Closed

Understanding the use of --selector and --models together #803

rahul-ve opened this issue Aug 9, 2021 · 4 comments

Comments

@rahul-ve
Copy link

rahul-ve commented Aug 9, 2021

Hi team,

I am referring to the doc about node selection, https://docs.getdbt.com/reference/node-selection/syntax
Based on my testing locally, looks like if both --models and --selector options are passed in, --models options is being ignored This is not clearly documented hence asking. Is this the expected behavior?

We have a situation where we have a selector defined to exclude a bunch of models but we wanted to select a particular model from the excluded list to also be run at times. This can be achieved by modifying the selector using union, exclude logic but we only want the excluded model to run based on some logic during CI run.

Thanks!

@rahul-ve rahul-ve changed the title Understanding the use of --selector and --model together Understanding the use of --selector and --models together Aug 9, 2021
@jtcohen6
Copy link
Collaborator

jtcohen6 commented Aug 10, 2021

@rahul-ve Thanks for opening. The current behavior is exactly as you surmised: when --selector is provided, dbt ignores --models/--select and --exclude.

We might want to raise an exception if both --selector and one of the other flags are provided, to avoid the confusion underlying this issue. If so, the right places to do that are inside the tasks that support these arguments today: list, freshness, compile (other tasks inherit from compile).

Is adding that exception something you'd be interested in contributing? :)

You're also hinting at a more clever/complex behavior than dbt is capable of today: combining a yaml selector with runtime "overrides" in the form of --models and --exclude. I'm not sure if this is something we should try to support, or what the right approach would be. If I had to pick a behavior, it would be Union[selector, Difference[models, exclude]], which would give priority to the runtime criteria over the version-controlled selector definition.

But I'm inclined to prefer the approach you offer:

This can be achieved by modifying the selector using union, exclude logic but we only want the excluded model to run based on some logic during CI run.

I'd rather not mix CLI + versioned selection logic, and instead recommend extending existing selectors, e.g. by means of yaml anchors to stay DRY:

selectors:
  - name: my_standard_selection
    definition: &standard-selection
      method: fqn
      value: "*"
      exclude: ...
  - name: my_ci_logic
    description: "For use in CI"
    definition:
      union:
        - *standard-selection
        - one_model

@rahul-ve
Copy link
Author

@jtcohen6 Thanks for the prompt reply!

The current behavior is exactly as you surmised: when --selector is provided, dbt ignores --models/--select and --exclude.

Thanks for confirming, updating the docs will clear the confusion. I can contribute if the docs are open for public submissions.

We might want to raise an exception if both --selector and one of the other flags are provided

May be a warning, exception might be jarring for something like this and unnecessarily makes the user make another hop, I think a warning will achieve the same effect.

Is adding that exception something you'd be interested in contributing? :)

Happy to contribute, will have a look at your contributing guide.

Just to provide some context...

We currently use git diff --name-only --diff-filter=d origin/master..HEAD to identify models that have changed/added to restrict CI run times while dev/testing. We select these models at run time via --models options, e.g.: dbt run -m +model1+ +model2+ ...

We have to evaluate state option, but I think it will be a replacement for the git diff process described above.

The above does not work when we also have --selector option passed in but we can use one or the other now that we know the behavior.

I'm not sure if this is something we should try to support, or what the right approach would be. If I had to pick a behavior, it would be Union[selector, Difference[models, exclude]], which would give priority to the runtime criteria over the version-controlled selector definition.

I don't think any change to the functionality is warranted, clarifying the docs will address the issue of confusion.

@jtcohen6
Copy link
Collaborator

jtcohen6 commented Sep 3, 2021

@rahul-ve Really appreciate the detailed response!

I don't think any change to the functionality is warranted, clarifying the docs will address the issue of confusion.

I'm going to transfer this issue to the docs.getdbt.com repo. I think a quick change to the node selection overview or page on yaml selectors will do the trick.

@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-core Sep 3, 2021
jtcohen6 added a commit that referenced this issue Sep 18, 2021
jtcohen6 added a commit that referenced this issue Sep 20, 2021
* Switch --models to --select

* BQ snapshot config aliases

* Configurable postgres connect timeout

* Add list --output-keys. Add list RPC method

* Adapter unique_field dbt-labs/dbt-core#3796

* PR feedback: -s replaces -m

* Add BQ execution_project

* Add default property for yaml selectors

* Update migration guide. New fields in sources.json

* Test where config macro

* Dispatch for global macros

* Update build details

* Some self review

* Greedy flag/property for test selection

* Resolve #803 while we're here

* Fix broken link typo
jtcohen6 added a commit that referenced this issue Sep 20, 2021
* Switch --models to --select

* BQ snapshot config aliases

* Configurable postgres connect timeout

* Add list --output-keys. Add list RPC method

* Adapter unique_field dbt-labs/dbt-core#3796

* PR feedback: -s replaces -m

* Add BQ execution_project

* Add default property for yaml selectors

* Update migration guide. New fields in sources.json

* Test where config macro

* Dispatch for global macros

* Update build details

* Some self review

* Greedy flag/property for test selection

* Resolve #803 while we're here

* Fix broken link typo
@danieldiamond
Copy link
Contributor

@jtcohen6 on your suggestion above using anchors, do you know how to dynamically set the standard-selection alias?

context: we have two default selector definitions (depending on which environment) i.e. each selector definition has have the default rule

default: "{{ target.type == 'snowflake' | as_bool }}"

so there are two standard-definition selectors. and for my_ci_logic, i would it to defer to the right selector. the problem is I cant see a way to set the alias i.e. this doesn't work:

...
  - name: my_ci_logic
    description: "For use in CI"
    definition:
      union:
        - "*{{ target.type }}"

i'm also unable to set variables at the top level

variables:
  - name: dynamic_default_selection_name
    value: "{{ target.type }}"
...
  - name: my_ci_logic
    description: "For use in CI"
    definition:
      union:
        - *$(dynamic_default_selection_name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants