[Bug] MetricFlow does not coerce specified time granularity in all cases #714

tlento · 2023-08-08T16:37:32Z

Is this a new bug in metricflow?

I believe this is a new bug in metricflow
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Currently, if a user defines a time dimension like this (approximately, pardon the probably broken YAML):

dimension:
  type: time
  type_params:
    granularity: day

We will still emit a timestamp type. This is not a real problem, as a timestamp type with granularity fixed to DAY simply means the extended values will be 0, i.e., all values will have the form YYYY-MM-DD 00:00:00

The problem is we don't coerce to granularity, so if the input data contains timestamps at second-level granularity, we'll keep all of those granularities, which can then cause some wonky behavior with group by expressions and the like.

Expected Behavior

MetricFlow should always ensure time dimensions are coerced to the specified granularity, both in the config and at query time. The latter appears to be fully supported but the former is not.

Steps To Reproduce

Make a config specifying a time dimension with coarser granularity than whatever is provided by the underlying data and observe that the input values do not get truncated in all cases.

Relevant log output

No response

Environment

- OS:
- Python:
- dbt:
- metricflow:

Which database are you using?

No response

Additional Context

No response

The text was updated successfully, but these errors were encountered:

tlento · 2023-08-10T20:52:28Z

Users encountering this issue can work around it by setting an expr to date_trunc.... or, preferably, by adding a dbt model that does the granularity conversions, but this may not be viable in all cases.

tlento · 2023-08-11T17:41:05Z

Note - this should ONLY do coercion on select level expressions (and group by for engines that don't group by the alias), because we need the filter expressions to render against the original type in some cases. Relatedly, the expr trick is problematic for partition columns.

tlento · 2023-08-30T22:45:42Z

I'm investigating a fix for this, as a number of users have already tripped over it.

tlento · 2023-09-08T22:48:46Z

Update: we will add a new property and allow users to configure time dimensions such that an underlying granularity difference can be normalized via date_trunc without a need for a custom expr.

We decided on this because we have the following three options:

Do what we do today, which is nothing
Provide formal support for the user to specify that a given time dimension needs to be conformed to the proper granularity, and do that conformance on the initial SELECT
Always conform time dimensions to the granularity specified in the config

Option 1 is off the table - this has to change, as it is surprising to users and produces incorrect output relative to what is specified in the semantic model.

Option 3 is kind of bad. We render more complex SQL and run more operations than needed, quite possibly to an extreme. If most data is pre-conformed in dbt to the expected granularity (which is reasonable to expect, as it is our recommended best practice) running useless date_trunc operations on every value is just wasteful.

That leaves option 2, so that is what we will go with.

tlento · 2023-10-02T23:11:30Z

Another update: we will NOT add a new property to enable this, we will instead coerce to specified granularity in every case.

We are open to adding a config override to disable this later if the added date_trunc calls should prove to be too bothersome for users who have granularity matched data, but for now we'll just keep it simple so we can get this out a little faster.

We're choosing to do this for two reasons:

More users are encountering this issue, and every time they do their results are incorrect
Filter predicate handling is not obvious - if users don't follow our suggested practices they could end up over-filtering more granular data (e.g., by requesting metric_time = '2021-01-01' on data with an underlying millisecond granularity)

The drawback here is it complicates partition pruning and predicate pushdown rendering, but those will be addressed separately.

tlento added bug Something isn't working triage Tasks that need to be triaged labels Aug 8, 2023

tlento mentioned this issue Aug 8, 2023

[Feature] Add support for partition elimination in BigQuery #712

Closed

3 tasks

tlento removed the triage Tasks that need to be triaged label Aug 10, 2023

Jstein77 added the backlog label Aug 29, 2023

Jstein77 added this to the v0.200x milestone Aug 29, 2023

tlento added In Progress and removed backlog labels Aug 30, 2023

Jstein77 added the needs design label Aug 31, 2023

tlento mentioned this issue Sep 5, 2023

[Feature] Add a warehouse validation to detect if time granularity is configured correctly #758

Open

3 tasks

tlento self-assigned this Sep 5, 2023

tlento mentioned this issue Sep 8, 2023

Add time granularity handling configuration dbt-labs/dbt-semantic-interfaces#145

Open

3 tasks

Jstein77 added the linear label Sep 20, 2023

tlento removed the needs design label Oct 2, 2023

tlento linked a pull request Oct 5, 2023 that will close this issue

Coerce time granularity to configured value in all cases #797

Merged

tlento closed this as completed in #797 Oct 6, 2023

Jstein77 added Done and removed In Progress labels Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] MetricFlow does not coerce specified time granularity in all cases #714

[Bug] MetricFlow does not coerce specified time granularity in all cases #714

tlento commented Aug 8, 2023

tlento commented Aug 10, 2023

tlento commented Aug 11, 2023

tlento commented Aug 30, 2023

tlento commented Sep 8, 2023

tlento commented Oct 2, 2023

[Bug] MetricFlow does not coerce specified time granularity in all cases #714

[Bug] MetricFlow does not coerce specified time granularity in all cases #714

Comments

tlento commented Aug 8, 2023

Is this a new bug in metricflow?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database are you using?

Additional Context

tlento commented Aug 10, 2023

tlento commented Aug 11, 2023

tlento commented Aug 30, 2023

tlento commented Sep 8, 2023

tlento commented Oct 2, 2023