-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] MetricFlow does not coerce specified time granularity in all cases #714
Comments
Users encountering this issue can work around it by setting an |
Note - this should ONLY do coercion on select level expressions (and group by for engines that don't group by the alias), because we need the filter expressions to render against the original type in some cases. Relatedly, the |
I'm investigating a fix for this, as a number of users have already tripped over it. |
Update: we will add a new property and allow users to configure time dimensions such that an underlying granularity difference can be normalized via date_trunc without a need for a custom expr. We decided on this because we have the following three options:
Option 1 is off the table - this has to change, as it is surprising to users and produces incorrect output relative to what is specified in the semantic model. Option 3 is kind of bad. We render more complex SQL and run more operations than needed, quite possibly to an extreme. If most data is pre-conformed in dbt to the expected granularity (which is reasonable to expect, as it is our recommended best practice) running useless date_trunc operations on every value is just wasteful. That leaves option 2, so that is what we will go with. |
Another update: we will NOT add a new property to enable this, we will instead coerce to specified granularity in every case. We are open to adding a config override to disable this later if the added date_trunc calls should prove to be too bothersome for users who have granularity matched data, but for now we'll just keep it simple so we can get this out a little faster. We're choosing to do this for two reasons:
The drawback here is it complicates partition pruning and predicate pushdown rendering, but those will be addressed separately. |
Is this a new bug in metricflow?
Current Behavior
Currently, if a user defines a time dimension like this (approximately, pardon the probably broken YAML):
We will still emit a timestamp type. This is not a real problem, as a timestamp type with granularity fixed to DAY simply means the extended values will be 0, i.e., all values will have the form
YYYY-MM-DD 00:00:00
The problem is we don't coerce to granularity, so if the input data contains timestamps at second-level granularity, we'll keep all of those granularities, which can then cause some wonky behavior with group by expressions and the like.
Expected Behavior
MetricFlow should always ensure time dimensions are coerced to the specified granularity, both in the config and at query time. The latter appears to be fully supported but the former is not.
Steps To Reproduce
Make a config specifying a time dimension with coarser granularity than whatever is provided by the underlying data and observe that the input values do not get truncated in all cases.
Relevant log output
No response
Environment
Which database are you using?
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: