-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selector Upgrade #7101
base: main
Are you sure you want to change the base?
Selector Upgrade #7101
Conversation
return self.get_selected(self.project.get_selector(spec.value)) | ||
else: | ||
method = self.get_method(spec.method, spec.method_arguments) | ||
return set(method.search(included_nodes, spec.value)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The self.project.get_selector is the only place that checks if selectors actually exist
sel_def = definition.get("value") | ||
if sel_def not in selector_dict: | ||
raise DbtSelectorsError(f"Existing selector definition for {sel_def} not found.") | ||
return selector_dict[definition["value"]]["definition"] | ||
return definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only checked in graph/selector.py (or config/project.py)
definition = cls.parse_from_definition( | ||
selector["definition"], selector_dict=deepcopy(selector_dict) | ||
selector_dict[sel_name]["definition"] = cls.parse_from_definition( | ||
selector["definition"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The selector_dict doesn't initially resolve the selectors. This is left until the end to allow for it to be more dynamic
@@ -33,28 +33,64 @@ def parse_union( | |||
# turn ['a b', 'c'] -> ['a', 'b', 'c'] | |||
raw_specs = itertools.chain.from_iterable(r.split(" ") for r in components) | |||
union_components: List[SelectionSpec] = [] | |||
exclude_components: List[SelectionSpec] = [] | |||
in_exclude = False | |||
|
|||
flags = get_flags() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow for --select 'some_expression --exclude some_other_expression', having it be one string allows for the yaml syntax to support union/intersection/exclude in a single string
sel_def = definition.get("value") | ||
if sel_def not in result: | ||
raise DbtValidationError(f"Existing selector definition for {sel_def} not found.") | ||
return result[definition["value"]]["definition"] | ||
elif "method" in definition and "value" in definition: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to graph/selector.py / config/project.py
@@ -45,6 +45,7 @@ class MethodName(StrEnum): | |||
Metric = "metric" | |||
Result = "result" | |||
SourceStatus = "source_status" | |||
Selector = "selector" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "selector" is added without a corresponding class
@@ -58,7 +58,7 @@ def __init__(self, args, config, manifest): | |||
def _iterate_selected_nodes(self): | |||
selector = self.get_node_selector() | |||
spec = self.get_selection_spec() | |||
nodes = sorted(selector.get_selected(spec)) | |||
nodes = sorted(selector.get_selected(spec, self.config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_selected needs the project to call get_selector
@@ -125,7 +125,7 @@ def defer_to_manifest(self, adapter, selected_uids: AbstractSet[str]): | |||
def get_graph_queue(self) -> GraphQueue: | |||
selector = self.get_node_selector() | |||
spec = self.get_selection_spec() | |||
return selector.get_graph_queue(spec) | |||
return selector.get_graph_queue(spec, self.config) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_graph_queue needs the project to call get_selector
The --exclude operator was meant to work like a NOT operator like ...
where everything to the right of NOT would be used to exclude from the left. The reason to choose --exclude was that it was very unlikely that it would conflict with another model and break anything. Also, the cli could just append --exclude to the --select. With the selector being added, it effectively allows for parenthetical support and very advanced expressions. The exact syntax could easily change if it is desired. |
@acurtis-evi Very cool! :) I definitely want to take a look at this, and play around with some of the proposed additions from a UX perspective, before moving into more-detailed code review with the dev team. (@dbeatty10 too, if you're able & interested!) |
Just to make sure I'm following, this is a set difference, right? A quick brainstorm of ideas relating to the syntax:
|
It is a set difference. In this PR, the set difference part is isolated to one function parse_union in dbt/core/graph/cli.py. The idea is that it would work like
The other ideas in this PR are more powerful, particularly the ability to allow the selector yaml syntax to be identical to command line syntax and to be able to expand upon selectors themselves (selector:some_selector+1 for example) |
I think EXCEPT might be a better name than NOT |
Changed NOT => EXCEPT Support using --selector + --select / --exclude at the same time. A new selector named arg_selector is introduced which is the combination of --select/--exclude and if --selector is specified, that remains the primary action.
|
There is also support for the arg_selector in the selectors.yml. This PR moves the selectors.yml validation to runtime, allowing for selectors to be out of order and to refer to the arg_selector. It is possible to have cycles with this approach, but I tend to think that the benefit of the selection flexibility significantly outweighs the downside of it. |
This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days. |
It would be nice to be able to run selectors minus one or two models. Right now, the only way to do so is to define another selector, which is a cumbersome process in some environment (but would be very simple on the command line). |
resolves #5009
Description
There are some really cool features in DBT such as the ability to choose models and their descendents, graph ordered builds, and much more. However, I found that some of the syntax seemed to be lacking (particularly around the selectors yml syntax).
The definition part of the selector yaml supports some model selection operators, but requires that the end user use a different approach to describe what can be done in short-hand on the command line. For example, on the command line, I can select the children of a model using
In the yaml,
does not work and instead
must be used.
The yaml is very useful in defining routine selections. This inconsistency with the command line forces the end user to learn two approaches.
Furthermore, there is support from the command line to union and intersect using the --select operator. However, if one wishes to exclude, the separate --exclude operator is needed. The syntax is very straight forward for the command line approach, but again the yaml approach is different. I see value in the yaml approach. I just believe that the two should be compatible. In this PR, I propose that the --select can include NOT (all caps - but easy to change) as a way of embedding excluded models in a single string.
dbt ls --select 'selector:some_selector+1 NOT selector:some_selector'
In addition, dbt doesn't allow selectors to be expanded using the children/parent/ancestor operators. This PR addresses that and makes selectors just like any other method.
Syntax in the yaml is also supported
Most of the code removes the selectors dictionary from being passed around and instead opts to pass the project to the graph/cli get_selected and get_graph_queue methods.
Checklist
changie new
to create a changelog entry