New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL #34072
Conversation
Can one of the admins verify this patch? |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
Outdated
Show resolved
Hide resolved
This feature is very useful. We can dynamic adjustment table options when we submit a query. For example, we can implement time travel by spark SQL on the iceberg. |
@RussellSpitzer @aokolnychyi Could you please take a look, Thanks? |
This seems like a reasonable change to me, would we want to do similar things for write? Or would this also work for write if the relation is being written to rather than being read from? |
yeah, I agree that the idea makes sense too. one thing is if the syntax makes sense ... |
Thanks @HyukjinKwon @RussellSpitzer for your particular review |
35e7312
to
5f4f43a
Compare
@HyukjinKwon Help review when you have time, thank you. |
Hi, @rdblue , this dynamic table option feature is really helpful when processing Iceberg tables, could you help to take a look at this PR, thanks! |
Overall, this change makes sense to me. This isn't a great way to time travel because we want to load the table at a specific version/snapshot or time. But @huaxingao is working on that in a separate issue. For everything else, this is a great improvement. Thank you, @wang-zhun! |
Haven't had a chance to look through the implementation yet, but big +1 on the feature from my side. We maintain an internal DSv2 source that requires options to leverage more advanced functionality, and currently it's not possible to use those features via the pure SQL API. cc @wmoustafa |
+1 on the feature as well. |
Shall we initiate a discussion on Spark dev mailing list? I would like to have this feature too but not sure if the syntax makes sense. |
It seems weird to have it as a hint in SQL select statement (not clear that it's part of a table scan). Maybe better as a TVF argument? |
@rxin Thank your for your suggestion. It is very useful for us, we did not notice TVF before |
It seems that there are two types of options that can be supported through this API: options that do not change the query semantics/results and only affect physical plan choices and options that affect the query semantics. Examples of the former include setting the split size, while examples of the latter include setting snapshot ID or the timestamp of time travel queries. I think the latter mostly applies to temporal/versioning queries. Using the hints API sounds acceptable for the former. Hints + Options API is used in Flink, but seems to control physical plan options still. If most of the semantic options (i.e., beyond physical) revolve around snapshots, versioning, and time travel, we can consider the SQL 2011 Standard which added support for temporal databases. |
Just an FYI: I am working on time travel in this PR #34497 |
BTW, what's the benefit over having: -- option `key` being passed with `value`
SET spark.datasource.jdbc.key=value;
SELECT * FROM jdbc_catalog.db.table;
RESET spark.datasource.jdbc.key; besides that it's just shorter? |
Those are table-level options so one may set different key/value pairs per table within the same query. Also the setting and resetting approach is not friendly to concurrent queries. |
Can we fix the PR description better for that point? There seems already a way to resolve the issue PR description explains. |
There is also a catch for allowing the hints framework to control the physical plan properties as opposed to query semantics, which is that it does not seem straightforward to enforce that (i.e., preventing time travel from being communicated through hints). Does it make sense to make the |
Close this PR first and look forward to a better proposal implementation |
Hi , I still wanted to see how to solve the issue, and I made a pr #41683 to propose an implementation of the suggestion (TVF). The only caveat seems to be that parser does not support TVF for write relation, so we may need to find an alternate solution for write relation, but I hope it will work at least for read relation. |
What changes were proposed in this pull request?
Added new hint
OPTIONS
to support sql table-level options.Why are the changes needed?
Now a DataFrame API user can implement dynamic options through the DataFrameReader$option method, but Spark SQL users cannot use.
The table options were persisted to the Catalog and if we want to modify that, we should use another DDL like "ALTER TABLE ...". But there are some cases that user want to modify the table options dynamically just in the query:
- IcebergTable support time travelThese parameters setting is very common and ad-hoc, setting them flexibly would promote the user experience with Spark SQL especially for Now we support catalog expansion.
Does this PR introduce any user-facing change?
OPTIONS Hints
How was this patch tested?
Added Unit test.