-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18949] [SQL] Add recoverPartitions API to Catalog #16356
Conversation
What is the SQL equivalent command? MSCK? Should we match that? |
We have two SQL equivalent commands:
I am not good at naming. How about |
Test build #70416 has finished for PR 16356 at commit
|
Yea recoverPartitions sound a lot better. |
We should also add the Python API. |
Sure, will do. Thanks! |
We can laso merge this in branch-2.1. So let's do 2.1.1 as since version. |
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.util.sketch.CountMinSketch.toByteArray") | ||
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.util.sketch.CountMinSketch.toByteArray"), | ||
// [SPARK-18949] [SQL] Add repairTable API to Catalog | ||
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.recoverPartitions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When backporting this PR to 2.1.1, we might need to move this to the next section 2.1.x
.
Test build #70422 has finished for PR 16356 at commit
|
LGTM pending tests. |
@@ -258,6 +258,11 @@ def refreshTable(self, tableName): | |||
"""Invalidate and refresh all the cached metadata of the given table.""" | |||
self._jcatalog.refreshTable(tableName) | |||
|
|||
@since(2.1.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change it to 2.1
. 2.1.1
causes the doc build break. If needed, I can do more investigation to see how to do it for 2.1.1
@@ -258,6 +258,11 @@ def refreshTable(self, tableName): | |||
"""Invalidate and refresh all the cached metadata of the given table.""" | |||
self._jcatalog.refreshTable(tableName) | |||
|
|||
@since(2.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do "2.1.1" as a string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks!
Test build #70418 has finished for PR 16356 at commit
|
Test build #70426 has finished for PR 16356 at commit
|
Test build #70423 has finished for PR 16356 at commit
|
Test build #3511 has finished for PR 16356 at commit
|
Merging in master/branch-2.1. |
Can you send a pr for branch-2.1? |
Sure, let me do it now. |
### What changes were proposed in this pull request? This PR is to backport #16356 to Spark 2.1.1 branch. ---- Currently, we only have a SQL interface for recovering all the partitions in the directory of a table and update the catalog. `MSCK REPAIR TABLE` or `ALTER TABLE table RECOVER PARTITIONS`. (Actually, very hard for me to remember `MSCK` and have no clue what it means) After the new "Scalable Partition Handling", the table repair becomes much more important for making visible the data in the created data source partitioned table. Thus, this PR is to add it into the Catalog interface. After this PR, users can repair the table by ```Scala spark.catalog.recoverPartitions("testTable") ``` ### How was this patch tested? Modified the existing test cases. Author: gatorsmile <gatorsmile@gmail.com> Closes #16372 from gatorsmile/repairTable2.1.1.
### What changes were proposed in this pull request? Currently, we only have a SQL interface for recovering all the partitions in the directory of a table and update the catalog. `MSCK REPAIR TABLE` or `ALTER TABLE table RECOVER PARTITIONS`. (Actually, very hard for me to remember `MSCK` and have no clue what it means) After the new "Scalable Partition Handling", the table repair becomes much more important for making visible the data in the created data source partitioned table. Thus, this PR is to add it into the Catalog interface. After this PR, users can repair the table by ```Scala spark.catalog.recoverPartitions("testTable") ``` ### How was this patch tested? Modified the existing test cases. Author: gatorsmile <gatorsmile@gmail.com> Closes apache#16356 from gatorsmile/repairTable.
What changes were proposed in this pull request?
Currently, we only have a SQL interface for recovering all the partitions in the directory of a table and update the catalog.
MSCK REPAIR TABLE
orALTER TABLE table RECOVER PARTITIONS
. (Actually, very hard for me to rememberMSCK
and have no clue what it means)After the new "Scalable Partition Handling", the table repair becomes much more important for making visible the data in the created data source partitioned table.
Thus, this PR is to add it into the Catalog interface. After this PR, users can repair the table by
spark.catalog.recoverPartitions("testTable")
How was this patch tested?
Modified the existing test cases.