-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17659] [SQL] Partitioned View is Not Supported By SHOW CREATE TABLE #15233
Conversation
Test build #65874 has finished for PR 15233 at commit
|
cc @cloud-fan @yhuai |
When the target view is a partitioned view, |
@@ -376,6 +376,10 @@ private[hive] class HiveClientImpl( | |||
unsupportedFeatures += "bucketing" | |||
} | |||
|
|||
if (h.getTableType == HiveTableType.VIRTUAL_VIEW && partCols.nonEmpty) { | |||
unsupportedFeatures += "partitioned view" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we read partitioned view in Spark SQL? What does partition
mean for a view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, partitioned view is a partition-aware view. Users can add or drop partition after creation. For more details, below is the Hive design doc:
https://cwiki.apache.org/confluence/display/Hive/PartitionedViews
Let me try whether reading partitioned view is partition aware in Spark SQL.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After digging it deeper and deeper, I am really doubting the initial motivation of partitioned views makes sense...
First, see the Hive design link: https://cwiki.apache.org/confluence/display/Hive/ViewDev
Update 30-Dec-2009: Prasad pointed out that even without supporting materialized views, it may be necessary to provide users with metadata about data dependencies between views and underlying table partitions so that users can avoid seeing inconsistent results during the window when not all partitions have been refreshed with the latest data. One option is to attempt to derive this information automatically (using an overconservative guess in cases where the dependency analysis can't be made smart enough); another is to allow view creators to declare the dependency rules in some fashion as part of the view definition. Based on a design review meeting, we will probably go with the automatic analysis approach once dependency tracking is implemented. The analysis will be performed on-demand, perhaps as part of describing the view or submitting a query job against it. Until this becomes available, users may be able to do their own analysis either via empirical lineage tools or via view->table dependency tracking metadata once it is implemented. See HIVE-1079.
Update 1-Feb-2011: For the latest on this, see PartitionedViews.
Basically, this feature just affects the metadata of views. It does not affect the query execution.
To add the partition info into the views, users have to manually issue the SQL:
ALTER VIEW view_name ADD [IF NOT EXISTS] partition_spec partition_spec ...
ALTER VIEW view_name DROP [IF EXISTS] partition_spec, partition_spec, ...
I read the code changes and test cases in the Hive JIRA: https://issues.apache.org/jira/browse/HIVE-1079. I think we do not need to worry about this Hive-specific feature. The usage scenario is very limited. Maybe the code changes in the existing PR is enough.
If you think we should support it, we might also need the code changes in SHOW PARTITIONS
and DESC table PARTITONS
. Then, we need to change the fromHivePartition
function, because getSD
will be NULL
for partitioned views; otherwise, we will get a NullPointerException
.
Sorry I missed this PR. it's straightforward and makes sense to me, can you reopen it? |
Sure, let me reopen it. : ) |
retest this please |
Test build #68386 has finished for PR 15233 at commit
|
LGTM |
…ABLE ### What changes were proposed in this pull request? `Partitioned View` is not supported by SPARK SQL. For Hive partitioned view, SHOW CREATE TABLE is unable to generate the right DDL. Thus, SHOW CREATE TABLE should not support it like the other Hive-only features. This PR is to issue an exception when detecting the view is a partitioned view. ### How was this patch tested? Added a test case Author: gatorsmile <gatorsmile@gmail.com> Closes #15233 from gatorsmile/partitionedView. (cherry picked from commit e256392) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Merging to master/2.1. Thanks! |
What is partitioned view? |
it's a hive concept: https://cwiki.apache.org/confluence/display/Hive/PartitionedViews We don't support it and probably don't need it. |
Wow that's pretty crazy. |
Yeah, the listed use cases are strange. Not sure why they implemented such a feature. Maybe some features depend on it? I did a search. This feature is not officially documented and Hive users also are confused about it. |
…ABLE ### What changes were proposed in this pull request? `Partitioned View` is not supported by SPARK SQL. For Hive partitioned view, SHOW CREATE TABLE is unable to generate the right DDL. Thus, SHOW CREATE TABLE should not support it like the other Hive-only features. This PR is to issue an exception when detecting the view is a partitioned view. ### How was this patch tested? Added a test case Author: gatorsmile <gatorsmile@gmail.com> Closes apache#15233 from gatorsmile/partitionedView.
What changes were proposed in this pull request?
Partitioned View
is not supported by SPARK SQL. For Hive partitioned view, SHOW CREATE TABLE is unable to generate the right DDL. Thus, SHOW CREATE TABLE should not support it like the other Hive-only features. This PR is to issue an exception when detecting the view is a partitioned view.How was this patch tested?
Added a test case