[SPARK-18949] [SQL] Add recoverPartitions API to Catalog #16356

gatorsmile · 2016-12-20T19:35:45Z

What changes were proposed in this pull request?

Currently, we only have a SQL interface for recovering all the partitions in the directory of a table and update the catalog. MSCK REPAIR TABLE or ALTER TABLE table RECOVER PARTITIONS. (Actually, very hard for me to remember MSCK and have no clue what it means)

After the new "Scalable Partition Handling", the table repair becomes much more important for making visible the data in the created data source partitioned table.

Thus, this PR is to add it into the Catalog interface. After this PR, users can repair the table by

spark.catalog.recoverPartitions("testTable")

How was this patch tested?

Modified the existing test cases.

gatorsmile · 2016-12-20T19:36:04Z

cc @rxin @cloud-fan @ericl

rxin · 2016-12-20T19:40:09Z

What is the SQL equivalent command? MSCK? Should we match that?

gatorsmile · 2016-12-20T19:41:58Z

We have two SQL equivalent commands:

ALTER TABLE table RECOVER PARTITIONS;
MSCK REPAIR TABLE table;

I am not good at naming. How about recoverPartitions?

SparkQA · 2016-12-20T19:52:42Z

Test build #70416 has finished for PR 16356 at commit 1f71236.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-12-20T19:52:46Z

Yea recoverPartitions sound a lot better.

rxin · 2016-12-20T19:52:58Z

We should also add the Python API.

gatorsmile · 2016-12-20T19:53:53Z

Sure, will do. Thanks!

rxin · 2016-12-20T21:24:12Z

We can laso merge this in branch-2.1. So let's do 2.1.1 as since version.

gatorsmile · 2016-12-20T21:32:42Z

project/MimaExcludes.scala

-    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.util.sketch.CountMinSketch.toByteArray")
+    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.util.sketch.CountMinSketch.toByteArray"),
+    // [SPARK-18949] [SQL] Add repairTable API to Catalog
+    ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.recoverPartitions")


When backporting this PR to 2.1.1, we might need to move this to the next section 2.1.x.

SparkQA · 2016-12-20T21:35:45Z

Test build #70422 has finished for PR 16356 at commit 494da5f.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-12-20T21:36:36Z

LGTM pending tests.

gatorsmile · 2016-12-20T21:47:22Z

python/pyspark/sql/catalog.py

@@ -258,6 +258,11 @@ def refreshTable(self, tableName):
        """Invalidate and refresh all the cached metadata of the given table."""
        self._jcatalog.refreshTable(tableName)

+    @since(2.1.1)


Will change it to 2.1. 2.1.1 causes the doc build break. If needed, I can do more investigation to see how to do it for 2.1.1

rxin · 2016-12-20T22:06:14Z

python/pyspark/sql/catalog.py

@@ -258,6 +258,11 @@ def refreshTable(self, tableName):
        """Invalidate and refresh all the cached metadata of the given table."""
        self._jcatalog.refreshTable(tableName)

+    @since(2.1)


you can do "2.1.1" as a string

I see. Thanks!

SparkQA · 2016-12-20T22:43:24Z

Test build #70418 has finished for PR 16356 at commit fb26533.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-21T00:43:52Z

Test build #70426 has finished for PR 16356 at commit 451ab05.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-21T01:31:07Z

Test build #70423 has finished for PR 16356 at commit 7cb5e3c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-21T07:19:18Z

Test build #3511 has finished for PR 16356 at commit 451ab05.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-12-21T07:39:39Z

Merging in master/branch-2.1.

rxin · 2016-12-21T07:40:21Z

Can you send a pr for branch-2.1?

gatorsmile · 2016-12-21T07:57:26Z

Sure, let me do it now.

### What changes were proposed in this pull request? This PR is to backport #16356 to Spark 2.1.1 branch. ---- Currently, we only have a SQL interface for recovering all the partitions in the directory of a table and update the catalog. `MSCK REPAIR TABLE` or `ALTER TABLE table RECOVER PARTITIONS`. (Actually, very hard for me to remember `MSCK` and have no clue what it means) After the new "Scalable Partition Handling", the table repair becomes much more important for making visible the data in the created data source partitioned table. Thus, this PR is to add it into the Catalog interface. After this PR, users can repair the table by ```Scala spark.catalog.recoverPartitions("testTable") ``` ### How was this patch tested? Modified the existing test cases. Author: gatorsmile <gatorsmile@gmail.com> Closes #16372 from gatorsmile/repairTable2.1.1.

### What changes were proposed in this pull request? Currently, we only have a SQL interface for recovering all the partitions in the directory of a table and update the catalog. `MSCK REPAIR TABLE` or `ALTER TABLE table RECOVER PARTITIONS`. (Actually, very hard for me to remember `MSCK` and have no clue what it means) After the new "Scalable Partition Handling", the table repair becomes much more important for making visible the data in the created data source partitioned table. Thus, this PR is to add it into the Catalog interface. After this PR, users can repair the table by ```Scala spark.catalog.recoverPartitions("testTable") ``` ### How was this patch tested? Modified the existing test cases. Author: gatorsmile <gatorsmile@gmail.com> Closes apache#16356 from gatorsmile/repairTable.

fix.

1f71236

gatorsmile changed the title ~~[SPARK-18949] [SQL] Add repairTable API to Catalog~~ [SPARK-18949] [SQL] Add recoverPartitions API to Catalog Dec 20, 2016

gatorsmile added 2 commits December 20, 2016 12:08

fix.

f212e1e

fix.

fb26533

update versions.

494da5f

gatorsmile commented Dec 20, 2016

View reviewed changes

update versions.

7cb5e3c

rxin reviewed Dec 20, 2016

View reviewed changes

update versions.

451ab05

asfgit closed this in 24c0c94 Dec 21, 2016

gatorsmile mentioned this pull request Dec 21, 2016

[SPARK-18949] [SQL] [BACKPORT-2.1] Add recoverPartitions API to Catalog #16372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-18949] [SQL] Add recoverPartitions API to Catalog #16356

[SPARK-18949] [SQL] Add recoverPartitions API to Catalog #16356

gatorsmile commented Dec 20, 2016 •

edited

Loading

gatorsmile commented Dec 20, 2016

rxin commented Dec 20, 2016

gatorsmile commented Dec 20, 2016

SparkQA commented Dec 20, 2016

rxin commented Dec 20, 2016

rxin commented Dec 20, 2016

gatorsmile commented Dec 20, 2016

rxin commented Dec 20, 2016

gatorsmile Dec 20, 2016

SparkQA commented Dec 20, 2016

rxin commented Dec 20, 2016

gatorsmile Dec 20, 2016

rxin Dec 20, 2016

gatorsmile Dec 20, 2016

SparkQA commented Dec 20, 2016

SparkQA commented Dec 21, 2016

SparkQA commented Dec 21, 2016

SparkQA commented Dec 21, 2016

rxin commented Dec 21, 2016

rxin commented Dec 21, 2016

gatorsmile commented Dec 21, 2016

[SPARK-18949] [SQL] Add recoverPartitions API to Catalog #16356

[SPARK-18949] [SQL] Add recoverPartitions API to Catalog #16356

Conversation

gatorsmile commented Dec 20, 2016 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile commented Dec 20, 2016

rxin commented Dec 20, 2016

gatorsmile commented Dec 20, 2016

SparkQA commented Dec 20, 2016

rxin commented Dec 20, 2016

rxin commented Dec 20, 2016

gatorsmile commented Dec 20, 2016

rxin commented Dec 20, 2016

gatorsmile Dec 20, 2016

Choose a reason for hiding this comment

SparkQA commented Dec 20, 2016

rxin commented Dec 20, 2016

gatorsmile Dec 20, 2016

Choose a reason for hiding this comment

rxin Dec 20, 2016

Choose a reason for hiding this comment

gatorsmile Dec 20, 2016

Choose a reason for hiding this comment

SparkQA commented Dec 20, 2016

SparkQA commented Dec 21, 2016

SparkQA commented Dec 21, 2016

SparkQA commented Dec 21, 2016

rxin commented Dec 21, 2016

rxin commented Dec 21, 2016

gatorsmile commented Dec 21, 2016

gatorsmile commented Dec 20, 2016 •

edited

Loading