Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19166][SQL]rename from InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions to deleteMatchingPrefix #16545

Closed
wants to merge 3 commits into from

Conversation

windpiger
Copy link
Contributor

What changes were proposed in this pull request?

InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions delete all files that match a static prefix, such as a partition file path(/table/foo=1), or a no partition file path(/xxx/a.json).

while the method name deleteMatchingPartitions indicates that only the partition file will be deleted. This name make a confused.

It is better to rename the method name.

How was this patch tested?

…ommand.deleteMatchingPartitions to InsertIntoHadoopFsRelationCommand.deleteMatchingPrefix
@windpiger windpiger changed the title [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions to InsertIntoHadoopFsRelationCommand.deleteMatchingPrefix [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions to deleteMatchingPrefix Jan 11, 2017
@SparkQA
Copy link

SparkQA commented Jan 11, 2017

Test build #71190 has started for PR 16545 at commit 0622c04.

@srowen
Copy link
Member

srowen commented Jan 11, 2017

I don't think this kind of thing is worth changing; the docs look correct. When would you have other files as the peer of a partition directory?

@windpiger
Copy link
Contributor Author

thanks! even though it is a no-partition file, it will also be deleted, so I think this change will more clear

@srowen
Copy link
Member

srowen commented Jan 13, 2017

Why would a file exist there? if it doesn't exist in any normal operation then I don't see a good motive for changing this, as it works as designed already.

@windpiger
Copy link
Contributor Author

thanks!

val df = spark.read.json("/path/jsonfile")
val df1 = spark.read.json("/path/jsonfile_1")

df.createOrReplaceTempView("t")
df1.createOrReplaceTempView("t1")

spark.sql("insert overwrite table t select * from t1")

this sql executed will hit the function deleteMatchingPartitions here
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L104

while /path/jsonfile is not a partition path file, deleteMatchingPartitions is not properly here?

@srowen
Copy link
Member

srowen commented Jan 14, 2017

Are you saying the problem arises when the path to one dataset/table is a prefix of another?

@windpiger
Copy link
Contributor Author

sorry, it is not the point. the example make some confuse.
replaced it with another example.

val df = spark.read.json("/path/a")
val df1 = spark.read.json("/path/b")

df.createOrReplaceTempView("x")
df1.createOrReplaceTempView("y")

spark.sql("insert overwrite table x select * from y")

this sql executed will hit the function deleteMatchingPartitions here
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L104

the point is /path/a is not a partition path(e.g. /path/a=1),we still delete it through deleteMatchingPartitions.

From the method name MatchingPartitions ,we may think it is only apply to a partition file(e.g. /path/a=1). If it also apply to no partition file, we will confused.

if we change the name from deleteMatchingPartitions to deleteMatchingPrefix , is't more clear?

@srowen
Copy link
Member

srowen commented Jan 15, 2017

So you're saying it works fine, you just take issue with the internal method/doc name? OK, but in the non-partitioned case, deleting everything is also a matter of deleting /a*. Implicitly there is one partition in the sense used here. I just don't see a big deal here.

But the example in the comment you added seems to refer to a different case, about a file matching the same prefix being deleted?

@windpiger
Copy link
Contributor Author

windpiger commented Jan 16, 2017

  1. yes, it take issue with the internal method/doc name, just a minor improvement .
    after all, partitioned and non-partitioned case are not different.

2.a file matching the same prefix being deleted, this doesn't happen, it is ok.
it seems like deleteMatchingPrefix also make some confused.

maybe deleteMatchingPaths more properly.

@srowen

@SparkQA
Copy link

SparkQA commented Jan 16, 2017

Test build #71413 has finished for PR 16545 at commit 6d1defb.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2017

Test build #71896 has finished for PR 16545 at commit 15f6759.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger windpiger closed this Jan 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants