Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Ambiguous Group-By-Items in Queries #914

Merged
merged 15 commits into from
Dec 16, 2023

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented Nov 30, 2023

Description

This PR updates MF to resolve ambiguous group-by-items in queries by incorporating the foundation that was set up in prior PRs:

  • Update the DataflowPlanBuilder to use a lookup to resolve ambiguous group-by-items in filters.
  • Map existing input types (strings that specify group-by-items in different ways, *QueryParameter classes) into a set of common intermediate input classes (ResolverInput*) that contain spec patters.
  • Implement a query resolver that resolves the patterns in input classes into concrete specs.
  • Rewrite MetricFlowQueryParser to only parse the current inputs into the common intermediate input classes, and use the query resolver to validate inputs and resolve the query items into concrete specs.

In addition, the way that query validation errors were collected and reported has been changed. Insted of raising an exception when an error is encountered, issues are collected and returned together. This makes resolving those errors easier for the user as all errors can be resolved in a single pass vs. fixing one error, re-running the validation to find the next error, etc.

@cla-bot cla-bot bot added the cla:yes label Nov 30, 2023
Copy link

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@plypaul plypaul marked this pull request as ready for review November 30, 2023 19:34
@plypaul
Copy link
Contributor Author

plypaul commented Nov 30, 2023

This PR needs to be rebased against main to resolve the missing snapshot test failure.

@plypaul plypaul force-pushed the plypaul--58.14--resolve-ambiguous-queries branch from 9361ed4 to 0afccb1 Compare November 30, 2023 21:39
@plypaul plypaul force-pushed the plypaul--58.13--add-filter-spec-resolver branch from a716bc7 to 5cc1518 Compare December 1, 2023 00:09
@plypaul plypaul force-pushed the plypaul--58.14--resolve-ambiguous-queries branch 2 times, most recently from 99ad212 to 68ec4a2 Compare December 1, 2023 02:02
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spongebob done

That took some time. Anyway, I left a ton of commentary in line. As with the others, it's all either minor stuff or things we should probably be addressing separately. This PR, in particular, is gnarly enough as it is, and since the test outputs remained consistent (with the exception of expected mechanical changes to some dataflow plan files) I think it's reasonably safe to say our behaviors haven't changed.

Let's get these merged so we can start doing user-facing testing!

return WhereFilterSpec(
where_sql=f"({self.where_sql}) AND ({other.where_sql})",
bind_parameters=self.bind_parameters.combine(other.bind_parameters),
linkable_spec_set=self.linkable_spec_set.merge(other.linkable_spec_set),
linkable_spec_set=self.linkable_spec_set.merge(other.linkable_spec_set).dedupe(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a change in behavior. Is it intentional?

Or is it a change in behavior? We do shadow-deduping on linkable spec sets all over the place so maybe it isn't a change.

def empty_instance(cls) -> WhereFilterSpec:
# Need to revisit making WhereFilterSpec Mergeable as it's current not a collection, and it's odd to return this
# no-op filter. Use cases would need to check whether a WhereSpec is a no-op before rendering it to avoid an
# un-necessary WHERE clause. Making WhereFilterSpec map to a WhereFilterIntersection would make this more in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we'd want to do that instead, but later.

@@ -31,7 +31,7 @@ def __init__(self, semantic_manifest_lookup: SemanticManifestLookup) -> None: #

def visit_metric_spec(self, metric_spec: MetricSpec) -> ColumnAssociation: # noqa: D
return ColumnAssociation(
column_name=metric_spec.element_name,
column_name=metric_spec.element_name if metric_spec.alias is None else metric_spec.alias,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behavior change for the ChangeAssociatedColumns transform class here:

https://github.com/dbt-labs/metricflow/blob/main/metricflow/plan_conversion/instance_converters.py#L928

Do we know if there are any places where this might not be desirable? I think it's ok, and nothing shows up in the snapshots, but I figured I'd check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this should be good given the way that this PR updates metric specs / dataflow construction was updated.

@@ -531,14 +534,6 @@ def from_reference(reference: MetricReference) -> MetricSpec:
"""Initialize from a metric reference instance."""
return MetricSpec(element_name=reference.element_name)

@property
def alias_spec(self) -> MetricSpec:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this existed is we only wanted to use the alias at particular points in the query rendering. I don't think this matters, so I'm in favor of the simplification so long as it doesn't break anything.


@dataclass(frozen=True)
class CumulativeMetricRequiresMetricTimeIssue(MetricFlowQueryResolutionIssue):
"""Describes an issue where a query that includes a cumulative metric but does not include metric_time."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: where a query includes a cumulative metric or with a query that includes a cumulative metric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -1,217 +0,0 @@
from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAY! Red diff party!

celebrate!

Comment on lines +409 to +412
query_spec = query_parser.parse_and_validate_query(
metric_names=("bookings",),
group_by_names=("listing__country_latest",),
where_constraint_str="{{ Dimension('listing__country_latest') }} = 'us'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is so much better...

@@ -256,13 +252,12 @@ def test_query_parser_case_insensitivity(bookings_query_parser: MetricFlowQueryP


def test_query_parser_invalid_group_by(bookings_query_parser: MetricFlowQueryParser) -> None: # noqa: D
with pytest.raises(UnableToSatisfyQueryError):
with pytest.raises(InvalidQueryException):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to communicate this out so MFS/SLG exception handling for alerts and stuff accounts for this in the same way they account for UnableToSatisfyQueryError.

)
),
)
query_spec = query_parser.parse_and_validate_query(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES!

metric_specs=(MetricSpec(element_name="bookings"),),
dimension_specs=(
DimensionSpec(
element_name="is_instant",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh duh, that's why these changed in the snapshots.

@plypaul plypaul force-pushed the plypaul--58.13--add-filter-spec-resolver branch from 5cc1518 to 69e03d5 Compare December 16, 2023 01:53
Base automatically changed from plypaul--58.13--add-filter-spec-resolver to main December 16, 2023 02:07
@plypaul plypaul force-pushed the plypaul--58.14--resolve-ambiguous-queries branch 2 times, most recently from d866d2c to 98374e5 Compare December 16, 2023 03:41
… cleanup.

filter_spec_resolution_lookup is needed by the DataflowPlanBuilder to properly
render where filters and generate a plan that fetches the group-by-items
required by the filters.

Some additional changes were made for cleanup:

* Change MetricSpec.constraint -> .filter_specs.
* Change MetricFlowQuerySpec.where_constraint -> .filter_intersection
* Add a helper to change the TimeRangeConstraint for MetricFlowQuerySpec
* Make WhereFilterSpec Mergeable
* Remove MetricSpec.alias_spec as the alias was already a part of MetricSpec
These are issues that will be later used in query parsing / query resolution
to indicate various error cases. The issues related to metric_time reflect
checks already in place in the MetricFlowQueryParser.
These query validation rules run after a query has been parsed and resolved into
specs.
These are inputs into the query resolver. These provide a common abstraction
that allows different kinds of inputs (strings, query parameter classes) to
map to the same kind of input to the query resolver.
The WhereFilterSpec contains rendered SQL and the specs for the group-by-items
needed by the filter. Creation of a WhereFilterSpec needs to be updated to use
the lookup so that ambiguous group-by-items can be replaced with concrete specs.
* The query resolver takes input classes w/ a spec pattern and maps them to
  concrete specs. Errors with the query are returned as issues.
* The query parser converts strings / query parameter classes into resolver
  input classes and uses the query resolver to convert them into a
  MetricFlowQuerySpec to match the existing interface.
* The interface to the query parser is retained for ease of migration, and
  issues are compiled and raised as an exception. The interface to the query
  parser can be updated later so that the individual issues can be inspected as
  objects.
The DataflowPlanBuilder needs to be updated so that the lookup is used to render
the where filters and to figure out which group-by-items are needed to apply the
filter.

This commit also includes updates to follow earlier changes to the spec classes
and some consistency fixes:

* The alias field in the MetricSpec is used instead of a separate alias spec.
* The filter field in MetricSpec is used to describe the filters that should be
  applied to the metric, excluding the filter that's in the metric definition.
  The convention is that the MetricSpec class describes how it's different from
  the plain metric that would be computed just from the config definition.
The TimeGranularitySolver is no longer needed as grain resolution of ambiguous
time dimensions (including metric_time) is handled / tested via the query
resolver.
This method was moved to the DataflowPlanBuilder as it was mostly applicable to
that class.
These tests for query parser suggestions are temporarily removed until that's
added to the new query parser / resolver, which will be done in a later PR.
…rain.

This allows ambiguous group-by-items to be specified in filters.
Tests cases need to be updated to handle signature changes to
the MetricFlowQueryParser, MetricFlowQuerySpec, WhereFilterSpec.

In cases where a filter needs to be created for a query, the query parser is
is used to generate the appropriate loookup.
@plypaul plypaul force-pushed the plypaul--58.14--resolve-ambiguous-queries branch from 98374e5 to 23edb4e Compare December 16, 2023 03:47
@plypaul plypaul merged commit 50115e0 into main Dec 16, 2023
9 checks passed
@plypaul plypaul deleted the plypaul--58.14--resolve-ambiguous-queries branch December 16, 2023 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants