Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16119][sql] Support PURGE option to drop table / partition. #13831

Closed
wants to merge 7 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Jun 22, 2016

This option is used by Hive to directly delete the files instead of
moving them to the trash. This is needed in certain configurations
where moving the files does not work. For non-Hive tables and partitions,
Spark already behaves as if the PURGE option was set, so there's no
need to do anything.

Hive support for PURGE was added in 0.14 (for tables) and 1.2 (for
partitions), so the code reflects that: trying to use the option with
older versions of Hive will cause an exception to be thrown.

The change is a little noisier than I would like, because of the code
to propagate the new flag through all the interfaces and implementations;
the main changes are in the parser and in HiveShim, aside from the tests
(DDLCommandSuite, VersionsSuite).

Tested by running sql and catalyst unit tests, plus VersionsSuite which
has been updated to test the version-specific behavior. I also ran an
internal test suite that uses PURGE and would not pass previously.

This option is used by Hive to directly delete the files instead of
moving them to the trash. This is needed in certain configurations
where moving the files does not work. For non-Hive tables and partitions,
Spark already behaves as if the PURGE option was set, so there's no
need to do anything.

Hive support for PURGE was added in 0.14 (for tables) and 1.2 (for
partitions), so the code reflects that: trying to use the option with
older versions of Hive will cause an exception to be thrown.

The change is a little noisier than I would like, because of the code
to propagate the new flag through all the interfaces and implementations;
the main changes are in the parser and in HiveShim, aside from the tests
(DDLCommandSuite, VersionsSuite).

Tested by running sql and catalyst unit tests, plus VersionsSuite which
has been updated to test the version-specific behavior. I also ran an
internal test suite that uses PURGE and would not pass previously.
@vanzin
Copy link
Contributor Author

vanzin commented Jun 22, 2016

@andrewor14 @yhuai not sure who should review this?

Also, using a parameter with a default value might avoid some of the noise, but I didn't see that approach used anywhere else in these APIs, so didn't go that route.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #60987 has finished for PR 13831 at commit a53b7fd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Jun 22, 2016

@vanzin Thank you for the PR. I probably will not be able to review it until we get 2.0 out. Will take a look after the release.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61004 has finished for PR 13831 at commit b084081.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 28, 2016

Test build #61406 has finished for PR 13831 at commit 3dca099.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class Normalizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class PolynomialExpansion @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • public class JavaPackage
    • case class ShowFunctionsCommand(
    • case class StreamingRelationExec(sourceName: String, output: Seq[Attribute]) extends LeafExecNode

@SparkQA
Copy link

SparkQA commented Jul 6, 2016

Test build #61859 has finished for PR 13831 at commit cd0f2bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -776,21 +778,29 @@ class DDLCommandSuite extends PlanTest {
val parsed2 = parser.parsePlan(s"DROP TABLE IF EXISTS $tableName1")
val parsed3 = parser.parsePlan(s"DROP TABLE $tableName2")
val parsed4 = parser.parsePlan(s"DROP TABLE IF EXISTS $tableName2")
assertUnsupported(s"DROP TABLE IF EXISTS $tableName2 PURGE")
val parsed5 = parser.parsePlan(s"DROP TABLE $tableName2 PURGE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for completeness, probably worth also checking IF EXISTS ... PURGE combo

@squito
Copy link
Contributor

squito commented Jul 8, 2016

you could avoid changing all the callsites by adding a default of purge = false, but I'm guessing that is discouraged in the sql code (since eg. there is not default for ignoreIfNotExists).

EDIT: oops, now I see you made the same comment earlier.

@@ -179,7 +179,8 @@ case class DescribeDatabaseCommand(
case class DropTableCommand(
tableName: TableIdentifier,
ifExists: Boolean,
isView: Boolean) extends RunnableCommand {
isView: Boolean,
purge: Boolean) extends RunnableCommand {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor but it would be nice to have an assert(!isView || !purge, "Cannot purge a view"). presumably guaranteed by the parser anyway, but it would be nice to have it codified here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not personally against the assert but I can't find anywhere else where the code is asserting something the parser already checks. So I'll pass on this unless someone that plays with this code more says it's ok.

@squito
Copy link
Contributor

squito commented Jul 8, 2016

it would be nice to test using purge actually does a purge, ie nothing is in the trash afterwards. I don't see any easy way to do that, though -- maybe using TrashPolicy.getCurrentTrashDir()?

@squito
Copy link
Contributor

squito commented Jul 8, 2016

minor comments andrequest for some extra tests, but overall looks good

@vanzin
Copy link
Contributor Author

vanzin commented Jul 8, 2016

it would be nice to test using purge actually does a purge

None of the tests run on actual HDFS cluster, so I doubt that would be possible without a lot more work...

@SparkQA
Copy link

SparkQA commented Jul 8, 2016

Test build #61991 has finished for PR 13831 at commit c108930.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Jul 11, 2016

I'd like to get this in soon, so pending tests, I'll leave this open until tomorrow and then push, unless I hear otherwise.

@SparkQA
Copy link

SparkQA commented Jul 11, 2016

Test build #62104 has finished for PR 13831 at commit f5fcc84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dropTableMethod.invoke(hive, dbName, tableName, deleteData: JBoolean,
ignoreIfNotExists: JBoolean, purge: JBoolean)
} catch {
case e: InvocationTargetException => throw e.getCause()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why include this extra handling here? its not used on any of the other reflective calls. I'm guessing it just makes the exception a little cleaner, so really could be everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b084081

(See test failures before that commit.)

@squito
Copy link
Contributor

squito commented Jul 11, 2016

it would be nice to test using purge actually does a purge

None of the tests run on actual HDFS cluster, so I doubt that would be possible without a lot more work...

hmm, yeah I was hoping you could fake this with a localfilesystem, but I futzed with it for a little bit and couldn't get it to work. ok then.

lgtm

@vanzin
Copy link
Contributor Author

vanzin commented Jul 12, 2016

Merging to master.

@asfgit asfgit closed this in 7f96886 Jul 12, 2016
@vanzin vanzin deleted the SPARK-16119 branch July 12, 2016 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants