Skip to content

[SPARK-15538][SPARK-15536][SQL] Truncate table fixes round 2#13315

Closed
andrewor14 wants to merge 6 commits intoapache:masterfrom
andrewor14:truncate-table
Closed

[SPARK-15538][SPARK-15536][SQL] Truncate table fixes round 2#13315
andrewor14 wants to merge 6 commits intoapache:masterfrom
andrewor14:truncate-table

Conversation

@andrewor14
Copy link
Contributor

@andrewor14 andrewor14 commented May 26, 2016

What changes were proposed in this pull request?

Two more changes:
(1) Fix truncate table for data source tables (only for cases without PARTITION)
(2) Disallow truncating external tables or views

How was this patch tested?

DDLSuite

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59320 has finished for PR 13315 at commit f02b591.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14 andrewor14 changed the title [SPARK-15538][SQL] Fix TRUNCATE TABLE for datasource tables [SPARK-15538][SPARK-15539][SQL] Truncate table fixes round 2 May 26, 2016
@andrewor14
Copy link
Contributor Author

@hvanhovell @yhuai

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59323 has finished for PR 13315 at commit 2377f13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sureshthalamati
Copy link
Contributor

@andrewor14
I tried this PR. It does not seem to work/raise error if user attempts to truncate on partition table with out specifying the partition spec.
val df = Seq((1 , "john", "CA") ,(2,"Mike", "NY"), (3, "Robert", "CA")).toDF("id", "name", "state")
df.write.format("parquet").partitionBy("state").saveAsTable("emp16")
sql("truncate table emp16")
sql("select count(*) from emp16").show

@andrewor14
Copy link
Contributor Author

andrewor14 commented May 26, 2016

@sureshthalamati what behavior did you expect? I would think that truncating a partitioned table without specifying the specs should just delete data from all partitions, which is what this patch does.

scala> sql("SELECT * FROM emp16").show()
+---+------+-----+
| id|  name|state|
+---+------+-----+
|  3|Robert|   CA|
|  1|  john|   CA|
|  2|  Mike|   NY|
+---+------+-----+
scala> sql("TRUNCATE TABLE emp16")
scala> sql("SELECT * FROM emp16").show()
+---+----+-----+
| id|name|state|
+---+----+-----+
+---+----+-----+

@yhuai
Copy link
Contributor

yhuai commented May 26, 2016

{{sql("truncate table emp16") }} should delete all partitions, right?

val locations = if (partitionSpec.isDefined) {
catalog.listPartitions(tableName, partitionSpec).map(_.storage.locationUri)
if (table.partitionColumnNames.nonEmpty) {
catalog.listPartitions(tableName).map(_.storage.locationUri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for hive table, right?

@sureshthalamati
Copy link
Contributor

@andrewor14 The behavior you mentioned is the one I was expecting also. For some reason it does not work in my env. I was trying for a spark-shell . What catalog are you using ? Probably I just need to try with a clean build.

@sureshthalamati
Copy link
Contributor

@andrewor14 After I cleaned up the old metastore_db , truncate worked as expected. Thanks

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59401 has finished for PR 13315 at commit d994460.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sureshthalamati
Copy link
Contributor

@andrewor14 one minor thing I noticed when I looked at the table data directories; In the case of regular tables truncate keeps the partition directories, for data source table partition directories also deleted. It may be ok , thought I will mention it incase if we want to be consistent

@andrewor14
Copy link
Contributor Author

I tested partitioned data source tables too. If you add data back then the partitions will be created again. I think that's OK.

@yhuai
Copy link
Contributor

yhuai commented May 26, 2016

lgtm

1 similar comment
@sureshthalamati
Copy link
Contributor

lgtm

@hvanhovell
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59417 has finished for PR 13315 at commit e857027.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 27, 2016

Test build #59434 has finished for PR 13315 at commit 5d3028d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor Author

Thanks for all the LGTMs. I'm going to merge this into master 2.0.

asfgit pushed a commit that referenced this pull request May 27, 2016
## What changes were proposed in this pull request?

Two more changes:
(1) Fix truncate table for data source tables (only for cases without `PARTITION`)
(2) Disallow truncating external tables or views

## How was this patch tested?

`DDLSuite`

Author: Andrew Or <andrew@databricks.com>

Closes #13315 from andrewor14/truncate-table.
@andrewor14 andrewor14 changed the title [SPARK-15538][SPARK-15539][SQL] Truncate table fixes round 2 [SPARK-15538][SPARK-15536][SQL] Truncate table fixes round 2 May 27, 2016
@asfgit asfgit closed this in 008a537 May 27, 2016
@andrewor14 andrewor14 deleted the truncate-table branch May 27, 2016 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants