Skip to content

Conversation

@cbevard1
Copy link
Contributor

@cbevard1 cbevard1 commented Jun 23, 2023

Description of PR

This PR improves the ability to recovery partial S3A uploads.

Changed the handleSyncableInvocation() to call flush() after warning that the syncable API isn't supported. This mirrors the downgradeSyncable behavior of BufferedIOStatisticsOutputStream and RawLocalFileSystem.
Changed the DiskBlock temporary file names to include the S3 key to allow partial uploads to be recovered.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 44s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 0m 32s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 34s trunk passed
+1 💚 mvnsite 0m 39s trunk passed
+1 💚 javadoc 0m 31s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 16s trunk passed
+1 💚 shadedclient 24m 27s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 28s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 0m 33s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 javac 0m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 20s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 8 new + 5 unchanged - 0 fixed = 13 total (was 5)
+1 💚 mvnsite 0m 30s the patch passed
+1 💚 javadoc 0m 15s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 0m 23s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 6s the patch passed
+1 💚 shadedclient 24m 21s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 27s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
104m 3s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5771/1/artifact/out/Dockerfile
GITHUB PR #5771
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1304a7468aeb 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 95138e0
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5771/1/testReport/
Max. process+thread count 608 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5771/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@cbevard1
Copy link
Contributor Author

cbevard1 commented Jul 5, 2023

@steveloughran Can you take a look at this new PR when you get a chance?

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented; most on the tests. sorry...but I spend more time waiting for tests or debugging failures than anything else, so I'm very fussy there.

* Buffer blocks to disk.
*/
static class DiskBlockFactory extends BlockFactory {
private static final String ESCAPED_FORWARD_SLASH = "EFS";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think I might prefer something more distinguishable from text, e.g "FS" and "BS" so its easier to read in a dir listing

{
// avoid validating multiple times.
// if the jvm running is version 9+ then defer to java.io.File validation implementation
if(Float.parseFloat(System.getProperty("java.class.version")) >= 53) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go in org.apache.hadoop.util.Shell; there's already something similar. will need a test somehow.

int maxRandomSuffixLen = 19; // Long.toUnsignedString(Long.MAX_VALUE).length()

String name;
int nameMax = 255; // unable to access the underlying FS directly, so assume 255
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a constant, e.g ASSUMED_MAX_FILENAME

int nameMax = 255; // unable to access the underlying FS directly, so assume 255
int excess = prefixLength + maxRandomSuffixLen + suffixLength - nameMax;

// shorten the prefix length if the file name exceeds 255 chars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and replace explicit size with "too long"

// shorten the prefix length if the file name exceeds 255 chars
if (excess > 0) {
// Attempt to shorten the prefix length to no less than 3
prefixLength = shortenSubName(prefixLength, excess, 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, make a constant and use in both places

mock(S3ADataBlocks.BlockFactory.class);
long blockSize = Constants.DEFAULT_MULTIPART_SIZE;
WriteOperationHelper oHelper = mock(WriteOperationHelper.class);
AuditSpan auditSpan = mock(AuditSpan.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use org.apache.hadoop.fs.s3a.audit.impl.NoopSpan here; one less thing to mock.

import org.junit.Test;
import org.junit.rules.Timeout;

import java.io.File;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ordering...these should come first.


@Before
public void init() throws IOException {
Files.createDirectories(TEMP_DIR.toPath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use S3ATestUtils code here for handling of parallel test cases and ease of future maintenance

Configuration conf = new Configuration(false)
S3ATestUtils.prepareTestConfiguration(conf)
conf.get(BUFFER_DIR)

public void testSafeCreateTempFile() throws Throwable {
// fitting name isn't changed
File noChangesRequired = S3AFileSystem.safeCreateTempFile("noChangesRequired", ".tmp", TEMP_DIR);
assertTrue(noChangesRequired.exists());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad news, use AssertJ. more verbose but the assertions it raised include details about the failure. which is what I insist on, sorry. If a yetus build fails, i want more than a line number. if this isn't done automatically, used .describedAs() which takes String.format() string + varargs list

assertTrue(noChangesRequiredName.endsWith(".tmp"));

// a long prefix should be truncated
File excessivelyLongPrefix = S3AFileSystem.safeCreateTempFile(longStr, ".tmp", TEMP_DIR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each of these should be split into their own test method so they can fail independently. I know, we ignore that in ITests, but that is because they're integration tests with longer setup overhead

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Oct 19, 2025
@github-actions github-actions bot closed this Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants