-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16119][sql] Support PURGE option to drop table / partition. #13831
Conversation
This option is used by Hive to directly delete the files instead of moving them to the trash. This is needed in certain configurations where moving the files does not work. For non-Hive tables and partitions, Spark already behaves as if the PURGE option was set, so there's no need to do anything. Hive support for PURGE was added in 0.14 (for tables) and 1.2 (for partitions), so the code reflects that: trying to use the option with older versions of Hive will cause an exception to be thrown. The change is a little noisier than I would like, because of the code to propagate the new flag through all the interfaces and implementations; the main changes are in the parser and in HiveShim, aside from the tests (DDLCommandSuite, VersionsSuite). Tested by running sql and catalyst unit tests, plus VersionsSuite which has been updated to test the version-specific behavior. I also ran an internal test suite that uses PURGE and would not pass previously.
@andrewor14 @yhuai not sure who should review this? Also, using a parameter with a default value might avoid some of the noise, but I didn't see that approach used anywhere else in these APIs, so didn't go that route. |
Test build #60987 has finished for PR 13831 at commit
|
@vanzin Thank you for the PR. I probably will not be able to review it until we get 2.0 out. Will take a look after the release. |
Test build #61004 has finished for PR 13831 at commit
|
Test build #61406 has finished for PR 13831 at commit
|
Test build #61859 has finished for PR 13831 at commit
|
@@ -776,21 +778,29 @@ class DDLCommandSuite extends PlanTest { | |||
val parsed2 = parser.parsePlan(s"DROP TABLE IF EXISTS $tableName1") | |||
val parsed3 = parser.parsePlan(s"DROP TABLE $tableName2") | |||
val parsed4 = parser.parsePlan(s"DROP TABLE IF EXISTS $tableName2") | |||
assertUnsupported(s"DROP TABLE IF EXISTS $tableName2 PURGE") | |||
val parsed5 = parser.parsePlan(s"DROP TABLE $tableName2 PURGE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for completeness, probably worth also checking IF EXISTS ... PURGE combo
you could avoid changing all the callsites by adding a default of EDIT: oops, now I see you made the same comment earlier. |
@@ -179,7 +179,8 @@ case class DescribeDatabaseCommand( | |||
case class DropTableCommand( | |||
tableName: TableIdentifier, | |||
ifExists: Boolean, | |||
isView: Boolean) extends RunnableCommand { | |||
isView: Boolean, | |||
purge: Boolean) extends RunnableCommand { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor but it would be nice to have an assert(!isView || !purge, "Cannot purge a view")
. presumably guaranteed by the parser anyway, but it would be nice to have it codified here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not personally against the assert but I can't find anywhere else where the code is asserting something the parser already checks. So I'll pass on this unless someone that plays with this code more says it's ok.
it would be nice to test using purge actually does a purge, ie nothing is in the trash afterwards. I don't see any easy way to do that, though -- maybe using |
minor comments andrequest for some extra tests, but overall looks good |
None of the tests run on actual HDFS cluster, so I doubt that would be possible without a lot more work... |
Test build #61991 has finished for PR 13831 at commit
|
I'd like to get this in soon, so pending tests, I'll leave this open until tomorrow and then push, unless I hear otherwise. |
Test build #62104 has finished for PR 13831 at commit
|
dropTableMethod.invoke(hive, dbName, tableName, deleteData: JBoolean, | ||
ignoreIfNotExists: JBoolean, purge: JBoolean) | ||
} catch { | ||
case e: InvocationTargetException => throw e.getCause() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, why include this extra handling here? its not used on any of the other reflective calls. I'm guessing it just makes the exception a little cleaner, so really could be everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(See test failures before that commit.)
hmm, yeah I was hoping you could fake this with a localfilesystem, but I futzed with it for a little bit and couldn't get it to work. ok then. lgtm |
Merging to master. |
This option is used by Hive to directly delete the files instead of
moving them to the trash. This is needed in certain configurations
where moving the files does not work. For non-Hive tables and partitions,
Spark already behaves as if the PURGE option was set, so there's no
need to do anything.
Hive support for PURGE was added in 0.14 (for tables) and 1.2 (for
partitions), so the code reflects that: trying to use the option with
older versions of Hive will cause an exception to be thrown.
The change is a little noisier than I would like, because of the code
to propagate the new flag through all the interfaces and implementations;
the main changes are in the parser and in HiveShim, aside from the tests
(DDLCommandSuite, VersionsSuite).
Tested by running sql and catalyst unit tests, plus VersionsSuite which
has been updated to test the version-specific behavior. I also ran an
internal test suite that uses PURGE and would not pass previously.