Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22214][SQL] Refactor the list hive partitions code #19444

Closed
wants to merge 2 commits into from

Conversation

jiangxb1987
Copy link
Contributor

What changes were proposed in this pull request?

In this PR we make a few changes to the list hive partitions code, to make the code more extensible.
The following changes are made:

  1. In HiveClientImpl.getPartitions(), call client.getPartitions instead of shim.getAllPartitions when spec is empty;
  2. In HiveTableScanExec, previously we always call listPartitionsByFilter if the config metastorePartitionPruning is enabled, but actually, we'd better call listPartitions if partitionPruningPred is empty;
  3. We should use sessionCatalog instead of SharedState.externalCatalog in HiveTableScanExec.

How was this patch tested?

Tested by existing test cases since this is code refactor, no regression or behavior change is expected.

@@ -638,12 +638,14 @@ private[hive] class HiveClientImpl(
table: CatalogTable,
spec: Option[TablePartitionSpec]): Seq[CatalogTablePartition] = withHiveState {
val hiveTable = toHiveTable(table, Some(userName))
val parts = spec match {
case None => shim.getAllPartitions(client, hiveTable).map(fromHivePartition)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, HiveShim.getAllPartitions is only used to support HiveShim.getPartitionsByFilter for hive 0.12, we may consider completely remove the method in the future.

@SparkQA
Copy link

SparkQA commented Oct 6, 2017

Test build #82509 has finished for PR 19444 at commit 8f50c7c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

cc @gatorsmile @cloud-fan

@@ -638,12 +638,14 @@ private[hive] class HiveClientImpl(
table: CatalogTable,
spec: Option[TablePartitionSpec]): Seq[CatalogTablePartition] = withHiveState {
val hiveTable = toHiveTable(table, Some(userName))
val parts = spec match {
case None => shim.getAllPartitions(client, hiveTable).map(fromHivePartition)
val partialPartSpec = spec match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> partSpec

@gatorsmile
Copy link
Member

LGTM except a minor comment.

@gatorsmile
Copy link
Member

LGTM

@SparkQA
Copy link

SparkQA commented Oct 6, 2017

Test build #82519 has finished for PR 19444 at commit 1e119ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 08b204f Oct 6, 2017
@jiangxb1987 jiangxb1987 deleted the hivePartitions branch October 9, 2017 05:35
/**
* Initialize an empty spec.
*/
lazy val emptyTablePartitionSpec: TablePartitionSpec = Map.empty[String, String]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map.empty is already an object, I think we can jus inline it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wanted to refer the val emptyTablePartitionSpec as TablePartitionSpec, not Map[String, String], though they are equal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants