-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve Ambiguous Group-By-Items in Queries #914
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
This PR needs to be rebased against main to resolve the missing snapshot test failure. |
9361ed4
to
0afccb1
Compare
a716bc7
to
5cc1518
Compare
99ad212
to
68ec4a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That took some time. Anyway, I left a ton of commentary in line. As with the others, it's all either minor stuff or things we should probably be addressing separately. This PR, in particular, is gnarly enough as it is, and since the test outputs remained consistent (with the exception of expected mechanical changes to some dataflow plan files) I think it's reasonably safe to say our behaviors haven't changed.
Let's get these merged so we can start doing user-facing testing!
return WhereFilterSpec( | ||
where_sql=f"({self.where_sql}) AND ({other.where_sql})", | ||
bind_parameters=self.bind_parameters.combine(other.bind_parameters), | ||
linkable_spec_set=self.linkable_spec_set.merge(other.linkable_spec_set), | ||
linkable_spec_set=self.linkable_spec_set.merge(other.linkable_spec_set).dedupe(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a change in behavior. Is it intentional?
Or is it a change in behavior? We do shadow-deduping on linkable spec sets all over the place so maybe it isn't a change.
def empty_instance(cls) -> WhereFilterSpec: | ||
# Need to revisit making WhereFilterSpec Mergeable as it's current not a collection, and it's odd to return this | ||
# no-op filter. Use cases would need to check whether a WhereSpec is a no-op before rendering it to avoid an | ||
# un-necessary WHERE clause. Making WhereFilterSpec map to a WhereFilterIntersection would make this more in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think we'd want to do that instead, but later.
@@ -31,7 +31,7 @@ def __init__(self, semantic_manifest_lookup: SemanticManifestLookup) -> None: # | |||
|
|||
def visit_metric_spec(self, metric_spec: MetricSpec) -> ColumnAssociation: # noqa: D | |||
return ColumnAssociation( | |||
column_name=metric_spec.element_name, | |||
column_name=metric_spec.element_name if metric_spec.alias is None else metric_spec.alias, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a behavior change for the ChangeAssociatedColumns transform class here:
Do we know if there are any places where this might not be desirable? I think it's ok, and nothing shows up in the snapshots, but I figured I'd check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this should be good given the way that this PR updates metric specs / dataflow construction was updated.
@@ -531,14 +534,6 @@ def from_reference(reference: MetricReference) -> MetricSpec: | |||
"""Initialize from a metric reference instance.""" | |||
return MetricSpec(element_name=reference.element_name) | |||
|
|||
@property | |||
def alias_spec(self) -> MetricSpec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this existed is we only wanted to use the alias at particular points in the query rendering. I don't think this matters, so I'm in favor of the simplification so long as it doesn't break anything.
|
||
@dataclass(frozen=True) | ||
class CumulativeMetricRequiresMetricTimeIssue(MetricFlowQueryResolutionIssue): | ||
"""Describes an issue where a query that includes a cumulative metric but does not include metric_time.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: where a query includes a cumulative metric
or with a query that includes a cumulative metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -1,217 +0,0 @@ | |||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query_spec = query_parser.parse_and_validate_query( | ||
metric_names=("bookings",), | ||
group_by_names=("listing__country_latest",), | ||
where_constraint_str="{{ Dimension('listing__country_latest') }} = 'us'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is so much better...
@@ -256,13 +252,12 @@ def test_query_parser_case_insensitivity(bookings_query_parser: MetricFlowQueryP | |||
|
|||
|
|||
def test_query_parser_invalid_group_by(bookings_query_parser: MetricFlowQueryParser) -> None: # noqa: D | |||
with pytest.raises(UnableToSatisfyQueryError): | |||
with pytest.raises(InvalidQueryException): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to communicate this out so MFS/SLG exception handling for alerts and stuff accounts for this in the same way they account for UnableToSatisfyQueryError.
) | ||
), | ||
) | ||
query_spec = query_parser.parse_and_validate_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YES!
metric_specs=(MetricSpec(element_name="bookings"),), | ||
dimension_specs=( | ||
DimensionSpec( | ||
element_name="is_instant", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh duh, that's why these changed in the snapshots.
5cc1518
to
69e03d5
Compare
d866d2c
to
98374e5
Compare
… cleanup. filter_spec_resolution_lookup is needed by the DataflowPlanBuilder to properly render where filters and generate a plan that fetches the group-by-items required by the filters. Some additional changes were made for cleanup: * Change MetricSpec.constraint -> .filter_specs. * Change MetricFlowQuerySpec.where_constraint -> .filter_intersection * Add a helper to change the TimeRangeConstraint for MetricFlowQuerySpec * Make WhereFilterSpec Mergeable * Remove MetricSpec.alias_spec as the alias was already a part of MetricSpec
These are issues that will be later used in query parsing / query resolution to indicate various error cases. The issues related to metric_time reflect checks already in place in the MetricFlowQueryParser.
These query validation rules run after a query has been parsed and resolved into specs.
These are inputs into the query resolver. These provide a common abstraction that allows different kinds of inputs (strings, query parameter classes) to map to the same kind of input to the query resolver.
The WhereFilterSpec contains rendered SQL and the specs for the group-by-items needed by the filter. Creation of a WhereFilterSpec needs to be updated to use the lookup so that ambiguous group-by-items can be replaced with concrete specs.
* The query resolver takes input classes w/ a spec pattern and maps them to concrete specs. Errors with the query are returned as issues. * The query parser converts strings / query parameter classes into resolver input classes and uses the query resolver to convert them into a MetricFlowQuerySpec to match the existing interface. * The interface to the query parser is retained for ease of migration, and issues are compiled and raised as an exception. The interface to the query parser can be updated later so that the individual issues can be inspected as objects.
The DataflowPlanBuilder needs to be updated so that the lookup is used to render the where filters and to figure out which group-by-items are needed to apply the filter. This commit also includes updates to follow earlier changes to the spec classes and some consistency fixes: * The alias field in the MetricSpec is used instead of a separate alias spec. * The filter field in MetricSpec is used to describe the filters that should be applied to the metric, excluding the filter that's in the metric definition. The convention is that the MetricSpec class describes how it's different from the plain metric that would be computed just from the config definition.
The TimeGranularitySolver is no longer needed as grain resolution of ambiguous time dimensions (including metric_time) is handled / tested via the query resolver.
This method was moved to the DataflowPlanBuilder as it was mostly applicable to that class.
These tests for query parser suggestions are temporarily removed until that's added to the new query parser / resolver, which will be done in a later PR.
…rain. This allows ambiguous group-by-items to be specified in filters.
Tests cases need to be updated to handle signature changes to the MetricFlowQueryParser, MetricFlowQuerySpec, WhereFilterSpec. In cases where a filter needs to be created for a query, the query parser is is used to generate the appropriate loookup.
98374e5
to
23edb4e
Compare
Description
This PR updates MF to resolve ambiguous group-by-items in queries by incorporating the foundation that was set up in prior PRs:
DataflowPlanBuilder
to use a lookup to resolve ambiguous group-by-items in filters.*QueryParameter
classes) into a set of common intermediate input classes (ResolverInput*
) that contain spec patters.MetricFlowQueryParser
to only parse the current inputs into the common intermediate input classes, and use the query resolver to validate inputs and resolve the query items into concrete specs.In addition, the way that query validation errors were collected and reported has been changed. Insted of raising an exception when an error is encountered, issues are collected and returned together. This makes resolving those errors easier for the user as all errors can be resolved in a single pass vs. fixing one error, re-running the validation to find the next error, etc.