Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17659] [SQL] Partitioned View is Not Supported By SHOW CREATE TABLE #15233

Closed
wants to merge 1 commit into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

Partitioned View is not supported by SPARK SQL. For Hive partitioned view, SHOW CREATE TABLE is unable to generate the right DDL. Thus, SHOW CREATE TABLE should not support it like the other Hive-only features. This PR is to issue an exception when detecting the view is a partitioned view.

How was this patch tested?

Added a test case

@gatorsmile gatorsmile changed the title [SPARK-17659] [SQL] Partitioned View is Not Supported In SHOW CREATE TABLE [SPARK-17659] [SQL] Partitioned View is Not Supported By SHOW CREATE TABLE Sep 25, 2016
@SparkQA
Copy link

SparkQA commented Sep 25, 2016

Test build #65874 has finished for PR 15233 at commit c6d3acd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan @yhuai

@gatorsmile
Copy link
Member Author

When the target view is a partitioned view, ALTER VIEW AS SELECT and CREATE OR REPLACE VIEW should issue an error too.

@@ -376,6 +376,10 @@ private[hive] class HiveClientImpl(
unsupportedFeatures += "bucketing"
}

if (h.getTableType == HiveTableType.VIRTUAL_VIEW && partCols.nonEmpty) {
unsupportedFeatures += "partitioned view"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we read partitioned view in Spark SQL? What does partition mean for a view?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, partitioned view is a partition-aware view. Users can add or drop partition after creation. For more details, below is the Hive design doc:
https://cwiki.apache.org/confluence/display/Hive/PartitionedViews

Let me try whether reading partitioned view is partition aware in Spark SQL.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging it deeper and deeper, I am really doubting the initial motivation of partitioned views makes sense...

First, see the Hive design link: https://cwiki.apache.org/confluence/display/Hive/ViewDev

Update 30-Dec-2009: Prasad pointed out that even without supporting materialized views, it may be necessary to provide users with metadata about data dependencies between views and underlying table partitions so that users can avoid seeing inconsistent results during the window when not all partitions have been refreshed with the latest data. One option is to attempt to derive this information automatically (using an overconservative guess in cases where the dependency analysis can't be made smart enough); another is to allow view creators to declare the dependency rules in some fashion as part of the view definition. Based on a design review meeting, we will probably go with the automatic analysis approach once dependency tracking is implemented. The analysis will be performed on-demand, perhaps as part of describing the view or submitting a query job against it. Until this becomes available, users may be able to do their own analysis either via empirical lineage tools or via view->table dependency tracking metadata once it is implemented. See HIVE-1079.
Update 1-Feb-2011: For the latest on this, see PartitionedViews.

Basically, this feature just affects the metadata of views. It does not affect the query execution.

To add the partition info into the views, users have to manually issue the SQL:

ALTER VIEW view_name ADD [IF NOT EXISTS] partition_spec partition_spec ...
ALTER VIEW view_name DROP [IF EXISTS] partition_spec, partition_spec, ...

I read the code changes and test cases in the Hive JIRA: https://issues.apache.org/jira/browse/HIVE-1079. I think we do not need to worry about this Hive-specific feature. The usage scenario is very limited. Maybe the code changes in the existing PR is enough.

If you think we should support it, we might also need the code changes in SHOW PARTITIONS and DESC table PARTITONS. Then, we need to change the fromHivePartition function, because getSD will be NULL for partitioned views; otherwise, we will get a NullPointerException.

@gatorsmile gatorsmile closed this Nov 7, 2016
@cloud-fan
Copy link
Contributor

Sorry I missed this PR. it's straightforward and makes sense to me, can you reopen it?

@gatorsmile
Copy link
Member Author

Sure, let me reopen it. : )

@gatorsmile gatorsmile reopened this Nov 9, 2016
@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 9, 2016

Test build #68386 has finished for PR 15233 at commit c6d3acd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM

asfgit pushed a commit that referenced this pull request Nov 9, 2016
…ABLE

### What changes were proposed in this pull request?

`Partitioned View` is not supported by SPARK SQL. For Hive partitioned view, SHOW CREATE TABLE is unable to generate the right DDL. Thus, SHOW CREATE TABLE should not support it like the other Hive-only features. This PR is to issue an exception when detecting the view is a partitioned view.
### How was this patch tested?

Added a test case

Author: gatorsmile <gatorsmile@gmail.com>

Closes #15233 from gatorsmile/partitionedView.

(cherry picked from commit e256392)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
@gatorsmile
Copy link
Member Author

Merging to master/2.1. Thanks!

@asfgit asfgit closed this in e256392 Nov 9, 2016
@rxin
Copy link
Contributor

rxin commented Nov 9, 2016

What is partitioned view?

@cloud-fan
Copy link
Contributor

it's a hive concept: https://cwiki.apache.org/confluence/display/Hive/PartitionedViews

We don't support it and probably don't need it.

@rxin
Copy link
Contributor

rxin commented Nov 10, 2016

Wow that's pretty crazy.

@gatorsmile
Copy link
Member Author

Yeah, the listed use cases are strange. Not sure why they implemented such a feature. Maybe some features depend on it? I did a search. This feature is not officially documented and Hive users also are confused about it.

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ABLE

### What changes were proposed in this pull request?

`Partitioned View` is not supported by SPARK SQL. For Hive partitioned view, SHOW CREATE TABLE is unable to generate the right DDL. Thus, SHOW CREATE TABLE should not support it like the other Hive-only features. This PR is to issue an exception when detecting the view is a partitioned view.
### How was this patch tested?

Added a test case

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#15233 from gatorsmile/partitionedView.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants