Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18582. skip unnecessary cleanup logic in distcp #5251

Merged
merged 1 commit into from
Jan 24, 2023

Conversation

MaxWk
Copy link
Contributor

@MaxWk MaxWk commented Dec 22, 2022

Description of PR

reduce execution time when distcp in direct or append mode.https://issues.apache.org/jira/browse/HADOOP-18582

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (HADOOP-18582)?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 3s trunk passed
+1 💚 compile 0m 31s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 26s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 28s trunk passed
+1 💚 mvnsite 0m 31s trunk passed
+1 💚 javadoc 0m 33s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 24s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 0m 57s trunk passed
+1 💚 shadedclient 23m 5s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 23s the patch passed
+1 💚 compile 0m 24s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 24s the patch passed
+1 💚 compile 0m 21s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 21s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 14s the patch passed
+1 💚 mvnsite 0m 24s the patch passed
+1 💚 javadoc 0m 18s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 0m 48s the patch passed
+1 💚 shadedclient 22m 38s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 45m 20s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 32s The patch does not generate ASF License warnings.
141m 54s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/1/artifact/out/Dockerfile
GITHUB PR #5251
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 6150b08acf80 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7e62438
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/1/testReport/
Max. process+thread count 536 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxWk , thank you for the patch. I entered one comment. Can you also please confirm if your Apache JIRA ID is 10000kang? If so, I can get HADOOP-18582 assigned to you.

@@ -152,6 +152,15 @@ public void abortJob(JobContext jobContext,
}

private void cleanupTempFiles(JobContext context) {
final boolean directWrite = context.getConfiguration().getBoolean(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an existing call to Configuration conf = context.getConfiguration(); below. Can you please lift that up and then reuse it for the conf.getBoolean(...) calls that you need to add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@MaxWk
Copy link
Contributor Author

MaxWk commented Dec 23, 2022

yes,that's me

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 5s trunk passed
+1 💚 compile 0m 35s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 32s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 37s trunk passed
+1 💚 javadoc 0m 37s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 29s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 1m 1s trunk passed
+1 💚 shadedclient 19m 59s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 26s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 25s the patch passed
+1 💚 compile 0m 23s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 23s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 17s /results-checkstyle-hadoop-tools_hadoop-distcp.txt hadoop-tools/hadoop-distcp: The patch generated 4 new + 10 unchanged - 0 fixed = 14 total (was 10)
+1 💚 mvnsite 0m 24s the patch passed
+1 💚 javadoc 0m 20s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 0m 51s the patch passed
+1 💚 shadedclient 19m 35s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 21m 25s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
111m 4s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/2/artifact/out/Dockerfile
GITHUB PR #5251
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 5b0d062f8cd0 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b7a32fe
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/2/testReport/
Max. process+thread count 694 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@MaxWk MaxWk requested a review from cnauroth December 23, 2022 16:49
Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxWk MaxWk force-pushed the trunk branch 2 times, most recently from 021e8f2 to 1d7dcd5 Compare December 24, 2022 02:06
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 55s trunk passed
+1 💚 compile 0m 30s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 26s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 33s trunk passed
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 24s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 1m 4s trunk passed
+1 💚 shadedclient 26m 6s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 28s the patch passed
+1 💚 compile 0m 21s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 21s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 15s /results-checkstyle-hadoop-tools_hadoop-distcp.txt hadoop-tools/hadoop-distcp: The patch generated 4 new + 10 unchanged - 0 fixed = 14 total (was 10)
+1 💚 mvnsite 0m 23s the patch passed
+1 💚 javadoc 0m 18s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 0m 52s the patch passed
+1 💚 shadedclient 23m 6s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 40m 3s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
141m 14s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/3/artifact/out/Dockerfile
GITHUB PR #5251
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux e551d29cbc27 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 021e8f2
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/3/testReport/
Max. process+thread count 593 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 4s trunk passed
+1 💚 compile 0m 35s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 32s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 33s trunk passed
+1 💚 mvnsite 0m 37s trunk passed
+1 💚 javadoc 0m 37s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 1m 2s trunk passed
+1 💚 shadedclient 20m 0s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 25s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 25s the patch passed
+1 💚 compile 0m 23s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 23s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 17s the patch passed
+1 💚 mvnsite 0m 25s the patch passed
+1 💚 javadoc 0m 20s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 0m 52s the patch passed
+1 💚 shadedclient 19m 40s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 22m 3s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
111m 55s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/4/artifact/out/Dockerfile
GITHUB PR #5251
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 7c892506ede5 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1d7dcd5
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/4/testReport/
Max. process+thread count 762 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

production code looks good, just test tuning. we always want meaninful messages on failure, and ContractTestUtils in hadoop-common does a lot for asserting about filesystem state

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 59s trunk passed
+1 💚 compile 0m 30s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 26s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 28s trunk passed
+1 💚 mvnsite 0m 31s trunk passed
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 24s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 0m 57s trunk passed
+1 💚 shadedclient 23m 25s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 23s the patch passed
+1 💚 compile 0m 24s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 24s the patch passed
+1 💚 compile 0m 21s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 21s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 14s /results-checkstyle-hadoop-tools_hadoop-distcp.txt hadoop-tools/hadoop-distcp: The patch generated 1 new + 10 unchanged - 0 fixed = 11 total (was 10)
+1 💚 mvnsite 0m 23s the patch passed
+1 💚 javadoc 0m 18s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 16s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 0m 47s the patch passed
+1 💚 shadedclient 22m 52s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 44m 29s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
141m 34s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/5/artifact/out/Dockerfile
GITHUB PR #5251
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 65413776e525 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7511907
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/5/testReport/
Max. process+thread count 536 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5251/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@MaxWk MaxWk requested review from steveloughran and cnauroth and removed request for cnauroth and steveloughran December 27, 2022 02:31
Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@MaxWk , thank you for the contribution and for incorporating code review feedback.

I won't commit yet, in case Steve wants to approve too. (I see Ayush has already approved.)

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has been no additional feedback. I'll plan on committing this one later this week.

@cnauroth cnauroth merged commit 3b7b79b into apache:trunk Jan 24, 2023
cnauroth pushed a commit that referenced this pull request Jan 24, 2023
Co-authored-by: 万康 <mingge@xiaohongshu.com>
Reviewed-by: Steve Loughran <stevel@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
(cherry picked from commit 3b7b79b)
Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have committed this to trunk and branch-3.3. (This code doesn't compile when applied to branch-3.2, so I couldn't commit it there.)

@MaxWk , thank you for the contribution. @steveloughran and @ayushtkn , thank you for code reviewing.

@steveloughran
Copy link
Contributor

nice.

final boolean append = conf.getBoolean(
DistCpOptionSwitch.APPEND.getConfigLabel(), false);
final boolean useTempTarget = !append && !directWrite;
if (!useTempTarget) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this, I think the cleanup should only be skipped on direct not append.

why so, append will choose between overwrite and append depending on the state of the destination file -a decision made case by case. when doing distcp with -append, new files or those whose dest checksum doesn't match that of the source for the same #of bytes will use a temp file unless -direct is set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that same check as here in RetriableFileCopyCommand

    final boolean toAppend = action == FileAction.APPEND;
    final boolean useTempTarget = !toAppend && !directWrite;
    Path targetPath = useTempTarget ? getTempFile(target, context) : target;

https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java#L135-L137

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's doing it file by file: some files will be appended to, but new ones, ones with checksum mismatch are overwritten by first writing to temp, then rename, as usual

any -append job which added files can create temp paths

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx for the explanation. Will give some time to the original author, In case no one volunteers for few days, I will fix this.

@steveloughran
Copy link
Contributor

Going to have to do a followup here; we still need to do cleanup on -append.

anyone want to take that up?

@ayushtkn
Copy link
Member

Have removed it for append in #5409

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants