Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18793: S3A StagingCommitter does not clean up staging-uploads directory #5818

Merged
merged 1 commit into from
Jul 8, 2023

Conversation

hdaikoku
Copy link
Contributor

@hdaikoku hdaikoku commented Jul 7, 2023

Description of PR

This PR fixes a bug in StagingCommitter, which leaks the staging uploads directory, by deleting the directory in StagingCommitter#cleanup().

How was this patch tested?

Ran the integration tests in org.apache.hadoop.fs.s3a.commit.staging.integration against a LocalStack instance running on my laptop.

    <property>
        <name>fs.s3a.endpoint</name>
        <value>s3.ap-south-1.localhost.localstack.cloud:4566</value>
    </property>

    <property>
        <name>fs.s3a.connection.ssl.enabled</name>
        <value>false</value>
    </property>
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol
[INFO] Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.196 s - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol
[INFO] Running org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol
[INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.62 s - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol
[INFO] Running org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol
[WARNING] Tests run: 24, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 26.316 s - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol
[INFO] Running org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.166 s - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
[INFO] 
[INFO] Results:
[INFO] 
[WARNING] Tests run: 74, Failures: 0, Errors: 0, Skipped: 1

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

Comment on lines -559 to -565
Path attemptPath = getJobAttemptPath(context);
ignoreIOExceptions(LOG,
"Deleting Job attempt Path", attemptPath.toString(),
() -> deleteWithWarning(
getJobAttemptFileSystem(context),
attemptPath,
true));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these as attemptPath here is a sub dir under ${UUID}/staging-uploads, which has already been deleted in deleteStagingUploadsParentDirectory().

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 21s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 39s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 37s trunk passed
+1 💚 mvnsite 0m 45s trunk passed
+1 💚 javadoc 0m 33s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 38s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 13s trunk passed
+1 💚 shadedclient 33m 39s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 22s the patch passed
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 18s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 27s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 6s the patch passed
+1 💚 shadedclient 33m 18s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 39s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
130m 15s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5818/1/artifact/out/Dockerfile
GITHUB PR #5818
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux aad5e097edc1 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 47aefe5
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5818/1/testReport/
Max. process+thread count 699 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5818/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect! nice tests in particular.

return job;
}

public JobContext getJContext() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use JobContext and TaskContext in getters, even if field names aren't as informative

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i'm not worrying about this; will merge as is.

@steveloughran steveloughran merged commit e3683a9 into apache:trunk Jul 8, 2023
1 check passed
asfgit pushed a commit that referenced this pull request Jul 8, 2023
@hdaikoku
Copy link
Contributor Author

hdaikoku commented Jul 8, 2023

Thank you for the review 🙏

@steveloughran
Copy link
Contributor

oh, thank for a lovely little patch!

jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants