Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] MetricFlow does not coerce specified time granularity in all cases #714

Closed
2 tasks done
tlento opened this issue Aug 8, 2023 · 5 comments · Fixed by #797
Closed
2 tasks done

[Bug] MetricFlow does not coerce specified time granularity in all cases #714

tlento opened this issue Aug 8, 2023 · 5 comments · Fixed by #797
Assignees
Labels
bug Something isn't working Done linear
Milestone

Comments

@tlento
Copy link
Contributor

tlento commented Aug 8, 2023

Is this a new bug in metricflow?

  • I believe this is a new bug in metricflow
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Currently, if a user defines a time dimension like this (approximately, pardon the probably broken YAML):

dimension:
  type: time
  type_params:
    granularity: day

We will still emit a timestamp type. This is not a real problem, as a timestamp type with granularity fixed to DAY simply means the extended values will be 0, i.e., all values will have the form YYYY-MM-DD 00:00:00

The problem is we don't coerce to granularity, so if the input data contains timestamps at second-level granularity, we'll keep all of those granularities, which can then cause some wonky behavior with group by expressions and the like.

Expected Behavior

MetricFlow should always ensure time dimensions are coerced to the specified granularity, both in the config and at query time. The latter appears to be fully supported but the former is not.

Steps To Reproduce

Make a config specifying a time dimension with coarser granularity than whatever is provided by the underlying data and observe that the input values do not get truncated in all cases.

Relevant log output

No response

Environment

- OS:
- Python:
- dbt:
- metricflow:

Which database are you using?

No response

Additional Context

No response

@tlento tlento added bug Something isn't working triage Tasks that need to be triaged labels Aug 8, 2023
@tlento tlento removed the triage Tasks that need to be triaged label Aug 10, 2023
@tlento
Copy link
Contributor Author

tlento commented Aug 10, 2023

Users encountering this issue can work around it by setting an expr to date_trunc.... or, preferably, by adding a dbt model that does the granularity conversions, but this may not be viable in all cases.

@tlento
Copy link
Contributor Author

tlento commented Aug 11, 2023

Note - this should ONLY do coercion on select level expressions (and group by for engines that don't group by the alias), because we need the filter expressions to render against the original type in some cases. Relatedly, the expr trick is problematic for partition columns.

@Jstein77 Jstein77 added this to the v0.200x milestone Aug 29, 2023
@tlento
Copy link
Contributor Author

tlento commented Aug 30, 2023

I'm investigating a fix for this, as a number of users have already tripped over it.

@tlento
Copy link
Contributor Author

tlento commented Sep 8, 2023

Update: we will add a new property and allow users to configure time dimensions such that an underlying granularity difference can be normalized via date_trunc without a need for a custom expr.

We decided on this because we have the following three options:

  1. Do what we do today, which is nothing
  2. Provide formal support for the user to specify that a given time dimension needs to be conformed to the proper granularity, and do that conformance on the initial SELECT
  3. Always conform time dimensions to the granularity specified in the config

Option 1 is off the table - this has to change, as it is surprising to users and produces incorrect output relative to what is specified in the semantic model.

Option 3 is kind of bad. We render more complex SQL and run more operations than needed, quite possibly to an extreme. If most data is pre-conformed in dbt to the expected granularity (which is reasonable to expect, as it is our recommended best practice) running useless date_trunc operations on every value is just wasteful.

That leaves option 2, so that is what we will go with.

@tlento
Copy link
Contributor Author

tlento commented Oct 2, 2023

Another update: we will NOT add a new property to enable this, we will instead coerce to specified granularity in every case.

We are open to adding a config override to disable this later if the added date_trunc calls should prove to be too bothersome for users who have granularity matched data, but for now we'll just keep it simple so we can get this out a little faster.

We're choosing to do this for two reasons:

  1. More users are encountering this issue, and every time they do their results are incorrect
  2. Filter predicate handling is not obvious - if users don't follow our suggested practices they could end up over-filtering more granular data (e.g., by requesting metric_time = '2021-01-01' on data with an underlying millisecond granularity)

The drawback here is it complicates partition pruning and predicate pushdown rendering, but those will be addressed separately.

@tlento tlento linked a pull request Oct 5, 2023 that will close this issue
@Jstein77 Jstein77 added Done and removed In Progress labels Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Done linear
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants