[SPARK-27140][SQL]The feature is 'insert overwrite local directory' has an inconsistent behavior in different environment.#23950
[SPARK-27140][SQL]The feature is 'insert overwrite local directory' has an inconsistent behavior in different environment.#23950beliefer wants to merge 2 commits intoapache:masterfrom beliefer:test-insert-overwrite-noexist-local-path
Conversation
|
@maropu I need your some help, please tell jenkins to test. |
|
Test build #4589 has finished for PR 23950 at commit
|
|
@maropu Please tell jenkins to test again, I want know the configuration of master. Thanks a lot! |
|
@maropu I need your some help, please tell jenkins to test. |
|
Test build #4595 has finished for PR 23950 at commit
|
|
@maropu @dongjoon-hyun Please help me,to find the reason. |
|
cc @maropu @gatorsmile @dongjoon-hyun @janewangfb @cloud-fan |
|
@maropu please review this PR again, thanks. |
|
@srowen Maybe you could help me, to review this PR,thanks a lot! |
|
Is there anything to review here? looks like you're trying to get it to fail, and there was a failure two weeks ago. We're having Jenkins trouble right now but you can retrigger it when it comes up. |
Thanks a lot! I can always see your reply, as I expected. |
|
@srowen Thanks a lot! I can always see your reply, as I expected. |
|
@dongjoon-hyun Could you review this PR and help me to find the reason? thanks. |
|
I don't get it, is this related to #23841 ? |
I really want to know the reason of inconsistent behavior in different environment. I guess this maybe is a bug, because I can't find the chance to create the target path that doesn't exist yet. |
|
OK, I don't know if anyone can or will help though. Let's stick to your original PR. A PR isn't for investigation. |
OK.maybe I could open a issue. |
|
No, you already opened two. Let's stick to your other PR/JIRA |
OK. |
What changes were proposed in this pull request?
Maropu and I have some conversation about insert overwrite noexist local path.
In local[*] mode, maropu give a test case as follows:
This test case prove spark will create the not exists path and move middle result from local temporary path to created path.This test based on newest master.
I follow the test case provided by maropu,but find another behavior.
I run these SQL maropu provided on local[*] deploy mode based on 2.3.0.
Inconsistent behavior appears as follows:
Then I pull the master branch and compile it and deploy it on my hadoop cluster.I get the inconsistent behavior again.
The spark version to test is 3.0.0.
The /tmp/noexistdir/t is a file too.
I want add a UT to master and need jenkins run it so that prove it or tell me some information.
UT results are the same as those of maropu's test, but different from mine.
The
insert overwrite local directorywill useLocalFileSystem. I have check the source of HadoopLocalFileSystem.LocalFileSystemdon't implement the methodrename.LocalFileSystemextendsChecksumFileSystemand the latter implement the methodrename.The method
renameofChecksumFileSystemas follows:If target path is a directory,
ChecksumFileSystemwill move source file into target path.If target path is not a directory,
ChecksumFileSystemwill rename source file to target file.There exists a variable named
fsthat is aRawLocalFileSystem.RawLocalFileSystemwill call the methodrenameof UNIXFileSystem orWinNTFileSystem.I have tried to find out why UT and my spark behave differently when executing insert overwrite local directory in local mode. But I'm failed! According to the source code of InsertIntoHiveDirCommand, there no chance to create the target path that doesn't exist yet.Could you help me, find out the reason. Thanks!
How was this patch tested?
UT