Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18949] [SQL] [BACKPORT-2.1] Add recoverPartitions API to Catalog #16372

Closed
wants to merge 1 commit into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

This PR is to backport #16356 to Spark 2.1.1 branch.


Currently, we only have a SQL interface for recovering all the partitions in the directory of a table and update the catalog. MSCK REPAIR TABLE or ALTER TABLE table RECOVER PARTITIONS. (Actually, very hard for me to remember MSCK and have no clue what it means)

After the new "Scalable Partition Handling", the table repair becomes much more important for making visible the data in the created data source partitioned table.

Thus, this PR is to add it into the Catalog interface. After this PR, users can repair the table by

spark.catalog.recoverPartitions("testTable")

How was this patch tested?

Modified the existing test cases.

@SparkQA
Copy link

SparkQA commented Dec 21, 2016

Test build #70468 has finished for PR 16372 at commit 3ae670b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 21, 2016

Test build #70479 has finished for PR 16372 at commit 3ae670b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Dec 21, 2016

Merged in branch-2.1. Can you close the PR?

asfgit pushed a commit that referenced this pull request Dec 21, 2016
### What changes were proposed in this pull request?

This PR is to backport #16356 to Spark 2.1.1 branch.

----

Currently, we only have a SQL interface for recovering all the partitions in the directory of a table and update the catalog. `MSCK REPAIR TABLE` or `ALTER TABLE table RECOVER PARTITIONS`. (Actually, very hard for me to remember `MSCK` and have no clue what it means)

After the new "Scalable Partition Handling", the table repair becomes much more important for making visible the data in the created data source partitioned table.

Thus, this PR is to add it into the Catalog interface. After this PR, users can repair the table by
```Scala
spark.catalog.recoverPartitions("testTable")
```

### How was this patch tested?
Modified the existing test cases.

Author: gatorsmile <gatorsmile@gmail.com>

Closes #16372 from gatorsmile/repairTable2.1.1.
@gatorsmile
Copy link
Member Author

Thanks!

@gatorsmile gatorsmile closed this Dec 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants