-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add support for partition elimination in BigQuery #712
Comments
I think we have a few options.
I'm sure there are other approaches I'm missing. None of these feel great to me, although I freely admit I would strongly prefer to live in a world free of time zones, but regardless I think we'll need a bit of a brainstorming/design session |
Without having any knowledge on the internals of this codebase, I'd like to add (as the person currently running into this problem), that for me simply removing the casting to As far as I understand (timezones still hurt my brain sometimes though), doing the following:
Simply assumes that the YYYY-MM-DD is in the same timezone as the For my problem to be solved, I do not need support for timezones. I simply need to have my But there might be low hanging fruit to allow date-level querying for timestamp fields, without having to go into how to deal with timezones. |
@Prokos I have a hypothesis. We currently render a Their documentation suggests that doing |
Well, so much for that hypothesis. This filter expression works on a table partitioned by
This also works:
Amazingly enough, this also works, both on timestamp_trunc(x, day) and timestamp_trunc(x, hour):
So the issue is with the TIMESTAMP <-> DATETIME conversion. The mechanism we use doesn't matter. This makes sense to me, in as much as anything involving date types and BigQuery makes sense to me, but it's a bit of a bummer, because the rendering change we require to unblock you is now considerably more complicated. This was a useful detour, though, because In going through this further I think this is an issue more specific to partition pruning - if your partitions are not date/time-based you can't use filter pruning at all except by accident, and that's a problem as well. So we have two issues:
I think the second of these allows for a faster path to a working solution for you while we figure out the right approach to the first. I'll look into this more on our side and see what we can come up with. More soon! |
We've decided on the following:
I'm going to update this issue to be narrowly focused on the last piece regarding time dimension partition filter rendering, which we can build and release independently as a first step towards providing the more robust filtering changes described in item 2 above. |
Is this your first time submitting a feature request?
Describe the feature
Currently, MetricFlow casts time filters in --start-time and --end-time to a DATETIME data type. For example a query like this:
mf query --metrics revenue --dimensions time --start-time 2023-08-06 --end-time 2023-08-06
Will render the following SQL in BigQuery:
WHERE time BETWEEN CAST('2023-08-06' AS DATETIME) AND CAST('2023-08-06' AS DATETIME)
This clashes with BigQuery's time partition table requirements which require a timestamp. For example, It's common to require a time filter on a large, time partitioned table. If a user tries to query without a time partition filter the query will fail. Therefore, if a user tries to specify a time filter query with MetricFlow it will fail because we are casting time filters as DATETIME when timestamp was expected. The error message is below.
Cannot query over table 'X' without a filter over column(s) 'time' that can be used for partition elimination
Additional BQ documentation is here: https://cloud.google.com/bigquery/docs/querying-partitioned-tables
Proposed solution
For the purposes of this specific issue we will alter the literal time comparison expression rendering to avoid casting the time dimension wherever possible. i.e., for BigQuery and any other engine supporting implicit literal type coercion we'd shift from something like this:
To something like this:
The specific rendering behavior in these scenarios will depend on the engine, but for BigQuery specifically it appears any valid string-literal representation of a date or time can be compared with a date, timestamp, or datetime column type without issue.
Describe alternatives you've considered
A community member tried a workaround of casting the time dimensions to a date:
However, we're still casting the start and end time as DATETIME which doesn't work for partition pruning.
Who will this benefit?
BigQuery users.
Are you interested in contributing this feature?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: