-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table time travel support #7292
Comments
Actually, looking at this a bit more in-depth, Then, while pre-populating the table providers for the referenced tables in the struct SessionContextProvider<'a> {
state: &'a SessionState,
tables: HashMap<(String, Option<TableVersion>), Arc<dyn TableSource>>,
} |
It seems like another possibility might be to add another argument to the (if we are going to update the scan method, we might want to think about making a |
True, but the main reason deferring the version resolution to the scan itself may be less favorable is that schemas can also evolve over different versions, and then we'd need to extend the Hence, why having a separate |
That makes sense 👍 |
Thanks; if there's a consensus on whether this would be worthwhile for DataFusion at all let me know, as I would love to contribute it. |
Maybe it would be worth sending a note to dev@arrow.apache.org to get a wider distribution. For example https://lists.apache.org/thread/837ghhjh9gd1kvg0qqnmxqs0rm62x0xt |
Great, will do that, thanks! |
While reviewing apache/datafusion-sqlparser-rs#951 I found the following reference from BigQuery about "time decorators" that might be able to support this feature without any datafusion changes: https://cloud.google.com/bigquery/docs/table-decorators#time_decorators
The downside is that the integer syntax to calculate is pretty bad, Adding real SQL support is likely a nicer option |
Oh that's a nice find, I haven't seen that. Indeed it seems it could provide time travel support without DF changes, albeit in a hacky way, by smuggling the version in the table name itself.
Agreed; real SQL support would be more expressive and explicit. |
I think we need more response from the community and review it. |
Yup, makes sense. Happy to leave this open (or close it) until definitive consensus is achieved. Just the sqlparser PR on it's own would go a long way in terms of easing the implementation of table time travel in DataFusion derived systems. |
That has been merge and released, so hopefully that helps |
Closing this, as I think the sqlparser changes and extensions will suffice for now, thanks! |
Is your feature request related to a problem or challenge?
A lot of DBs and table formats (e.g. Delta Lake) rely on one form or another of MVCC, whererby writes to a given table will result in a new table version. Typically, writes will usually append new files to the table state (
INSERT
) and/or potentially remove some files from the state (UPDATE
/DELETE
).A core feature of such systems is the ability to travel between different table versions, so that one can query some earlier (non-latest) table state. However, this is not currently officially supported by DataFusion, though it is doable in a hacky way (see below for details or here for an overview of how this works in seafowl right now).
Describe the solution you'd like
First part of the work would be in the
sqlparser
crate, which would need to support the standard temporal table specifier in the form of aAS OF
clause (https://en.wikipedia.org/wiki/SQL:2011)I think this should probably be captured in a new field in
TableFactor::Table
.Over at DataFusion side,
besides capturing the parsed version in theI imagine the main (breaking) change would be to alter the signature of theTableScan
logical plan,SchemaProvider::table
method to something likewith
TableVersion
being some kind of an enum covering time formats or literal version denotations for starters. That would enable the implementer of this trait to know which specific table version needs to be loaded (if any).Describe alternatives you've considered
The alternative that seafowl uses atm is the following:
sqlparser::ast::Statement
[0]sqlparser::ast::Query
, trying to see whether a table function syntax was used[1]TableProvider
for the specific table version in a new session context/state[2][0] https://seafowl.io/docs/guides/querying-time-travel#querying-older-table-versions
[1] https://github.com/splitgraph/seafowl/blob/main/src/version.rs#L58-L91
[2] https://github.com/splitgraph/seafowl/blob/main/src/context.rs#L542-L565
Additional context
If this is something that is deemed to be sufficiently important for/compatible with DataFusion I'd be happy to take on the work needed to implement this.
The text was updated successfully, but these errors were encountered: