HADOOP-16202. Enhance openFile() for better read performance against object stores #2584

steveloughran · 2021-01-04T14:09:34Z

This PR

declares a standard set of openFile() parameters for setting length, seek/read policy
and split start/end.
supports all of these in s3a
as well as the threshold to trigger an async drain of s3a stream in a seek,
fs.s3a.input.async.drain.threshold
documents all of this
fixes up all places in the code which reads a whole file (fs shell, distcp etc)
to set its read policy to be whole-file.

as a result of this, on the s3a client you can open files without needing
a head request, or even a filestatus, just the length.
and if, in a cluster, you set the default read policy to be random,
shell and distcp read performance doesn't collapse.

steveloughran · 2021-01-05T12:45:15Z

MR client not compiling; not seeing useful information from yetus.

steveloughran · 2021-01-09T18:04:53Z

style

./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FutureIO.java:210:    FSBuilder<T, U> propagateOptions(: 'FSBuilder' has incorrect indentation level 4, expected level should be 6. [Indentation]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:182:import static org.apache.hadoop.fs.impl.AbstractFSBuilderImpl.rejectUnknownMandatoryKeys;:15: Unused import - org.apache.hadoop.fs.impl.AbstractFSBuilderImpl.rejectUnknownMandatoryKeys. [UnusedImports]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/S3AOpenFileOperation.java:220:        fileStatus = createStatus(path, fileLength, blockSize);: 'if' child has incorrect indentation level 8, expected level should be 6. [Indentation]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/S3AOpenFileOperation.java:221:      }: 'if rcurly' has incorrect indentation level 6, expected level should be 4. [Indentation]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/S3AOpenFileOperation.java:323:    private OpenFileInformation(:13: More than 7 parameters (found 10). [ParameterNumber]
./hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestS3AOpenFileOperation.java:119:    assertOpenFile(: 'assertOpenFile' has incorrect indentation level 4, expected level should be 6. [Indentation]

javadocs

[ERROR]                      ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FutureIOSupport.java:126: error: bad use of '>'
[ERROR]    *   fs.example.s3a.option => s3a:option
[ERROR]                               ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FutureIOSupport.java:127: error: bad use of '>'
[ERROR]    *   fs.example.fs.io.policy => s3a.io.policy
[ERROR]                                 ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FutureIOSupport.java:128: error: bad use of '>'
[ERROR]    *   fs.example.something => something
[ERROR]                              ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FutureIOSupport.java:154: error: bad use of '>'
[ERROR]    *   fs.example.s3a.option => s3a:option
[ERROR]                               ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FutureIOSupport.java:155: error: bad use of '>'
[ERROR]    *   fs.example.fs.io.policy => s3a.io.policy
[ERROR]                                 ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FutureIOSupport.java:156: error: bad use of '>'
[ERROR]    *   fs.example.something => something
[ERROR]                              ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FutureIO.java:197: error: bad use of '>'
[ERROR]    *   fs.example.s3a.option => s3a:option
[ERROR]                               ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FutureIO.java:198: error: bad use of '>'
[ERROR]    *   fs.example.fs.io.policy => s3a.io.policy
[ERROR]                                 ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FutureIO.java:199: error: bad use of '>'
[ERROR]    *   fs.example.something => something
[ERROR]                              ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FutureIO.java:227: error: bad use of '>'
[ERROR]    *   fs.example.s3a.option => s3a:option
[ERROR]                               ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FutureIO.java:228: error: bad use of '>'
[ERROR]    *   fs.example.fs.io.policy => s3a.io.policy
[ERROR]                                 ^
[ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2584/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FutureIO.java:229: error: bad use of '>'
[ERROR]    *   fs.example.something => something

steveloughran · 2021-01-13T16:56:13Z

I'm thinking we should be more ambitious in read policy than just "fadvise", because we can then use it as a declaration for the input streams to tune all their params, eg. buffer sizing, whether to do async prefetch.

Then we could allow stores to support not-just seek policies, but declare what you were planning to read, e.g. "parquet-bytebuffer", to mean "I'm reading parquet files through the bytebuffer positioned read API"

openFile("s3a://datasets/set1/input.parquet).
  opt("fs.openfile.policy, "parquet-vectored, parquet,random")
 .build().get()

example opt(fs.openfile.read.policy, "parquet-vectored, parquet, random") to mean "optimise for parquet for vectored IO, then generic vectored IO, then generic random IO". Store implementors would get to make their own decisions as to what to set based on profiling &c. We'd need the applications to set policy on openFile() -so would need to know what names to use. That we can discuss with them, maybe by predefining some options which may be supported

steveloughran · 2021-01-15T12:21:51Z

@ThomasMarquardt could you take a look @ this ?

I've updated the docs as suggested
proposed making the policy broader than just seek policy, so allowing stores to turn on whatever other tuning options they have, especially for file types they've profiled
The goal there is rather than set cluster wide options which work well for some datatypes but are suboptimal for others, the app provides more information down.

weakness there is that with multiple libraries working with Parquet data (spark, parquet.jar, iceberg, impala) it's not enough to declare the format. You'd really need to declare your app and version

ThomasMarquardt

My biggest concern, or point of confusion, is the must() vs opt() thing. It feels too complex to me,. Instead, I think we should stick with opt() and allow filesystems to ignore options they don't understand. I guess I'm thinking along the lines that the app passes options as hints to the filesystem in hope that the hints might help, but things will still work if the filesystem does not understand the app's hint.

ThomasMarquardt · 2021-01-15T18:42:41Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java

        parameters.getMandatoryKeys(),
-        Collections.emptySet(),
+        FS_OPTION_OPENFILE_STANDARD_OPTIONS,


With this change, ChecksumFileSystem.openFileWithOptions has the same implementation as the base class, so you can remove this override.

ThomasMarquardt · 2021-01-15T18:44:04Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSBuilder.java

+   * @see #opt(String, String)
+   */
+  B opt(@Nonnull String key, long value);
+


We could use properties (getter and setter) for the standard options. We'd have one for buffer size, one for file length, etc.

I agree, but as well as it already being out there, I want to let applications compile against any version of hadoop with the API, even if a specific FS option isn't available, alongside allowing for custom FS opts. As an example, I have a PoC of parquet lib which uses this which is designed to compile against 3.3.x. (that isn't something I've stuck up as I use it to see how this stuff could be added to a library...highlights what is broken right now, specifically; S3A openFile.withFileStatus fails if used via Hive because hive wraps the FileStatus to a different type from S3AFileStatus.

ThomasMarquardt · 2021-01-15T18:46:45Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java

@@ -4655,7 +4656,7 @@ public FutureDataInputStreamBuilder openFile(PathHandle pathHandle)
      final OpenFileParameters parameters) throws IOException {
    AbstractFSBuilderImpl.rejectUnknownMandatoryKeys(


This rejectUnknownMandatoryKeys function would not be necessary if the mandatory keys were strongly typed fields.

I agree, but we need flexibility of linking, ability of clients to work with any FS implementation, etc etc

ThomasMarquardt · 2021-01-15T18:49:11Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java

+   * @param deleteSource delete the source?
+   * @param overwrite overwrite files at destination?
+   * @param conf configuration to use when opening files
+   * @return true if the operation succeeded.


What is the benefit of returning true on success and throwing on a failure, as opposed to returning void and throwing on failure?

none. its just always been that way. and yes, it is wrong

ThomasMarquardt · 2021-01-15T18:51:43Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Options.java

+    /**
+     * Prefix for all standard filesystem options: {@value}.
+     */
+    public static final String FILESYSTEM_OPTION = "fs.option.";


I think you could remove this and define the constants below with a string literal, or at least make it private.

made it private

ThomasMarquardt · 2021-01-15T19:27:46Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Tail.java

-    try {
+    // Always do sequential reads.
+    try (FSDataInputStream in = item.openFile(
+        FS_OPTION_OPENFILE_READ_POLICY_NORMAL)) {


The comment above says do sequential reads, but then the sequential option isn't passed here, but seems like it should be.

ThomasMarquardt · 2021-01-15T19:31:14Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstreambuilder.md

    .withFileStatus(statusFromListing)
    .build()
    .get();
 ```

+Here the seek policy of `random` has been specified,


Now the terminology is "read policy".

ThomasMarquardt · 2021-01-15T19:42:21Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/openfile.md

+relevant for object stores where the probes for existence are expensive, and,
+even with an asynchronous open, may be considered needless.
+
+### <a name="openfile(pathhandle)"></a> `FutureDataInputStreamBuilder openFile(PathHandle)`


Consider combining the docs for openFile(Path) and openFile(PathHandle) to avoid repetition.

ThomasMarquardt · 2021-01-15T19:54:30Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/openfile.md

+If set as an `opt()` parameter, unsupported "standard" options MUST be ignored,
+as MUST unrecognized standard options.
+
+If set as an `must()` parameter, unsupported "standard" options MUST be ignored.
+unrecognized standard options MUST be rejected.


I've read this a few times and it is not clear. I think you mean to say:

An option set via opt() that is not supported by the filesystem must be ignored. On the other hand, an option set via must() that is not supported by the filesystem must result in an exception.

ThomasMarquardt · 2021-01-15T19:59:50Z

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/openfile.md

+This means that it is not a requirement for the stores to actually read the
+the read policy or file length values and use them when opening files.


Now I'm very confused. Sounds like options set via must() can also be ignored, so why do we have opt() and must()?

I think we should keep this simple, and only have opt(). Get rid of must(). If a file system does not support the option, it should be ignored. Nice and simple.

i'm trying to say that the read and length policies must be understood, but may not be used, if the implementation chooses not to

steveloughran · 2021-01-16T13:17:28Z

we've already shipped that
it's in createFile too

imagine in future you want to do something more than a hint. Case in point: the SQL API in AWS S3. there's no way an app could issue SQL commands which include transforming the response without knowing whether or not they are processed. hence must().

how about i clarify this in the docs and recommend use of opt() except in such special cases?

ThomasMarquardt · 2021-01-16T17:17:49Z

OK, so must() has already shipped, in that case sure, it should stay. I still have a question. If I pass an option via must(), can it be ignored by the implementation? I read your documentation, specifically, "This means that it is not a requirement for the stores to actually read the the read policy or file length values and use them when opening files." To me, this says that options passed via must() can be ignored. If this is true, then what is the difference between must() and opt()? Is it that must() can result in an unsupported exception, but opt() never results in an unsupported exception?

steveloughran · 2021-01-20T22:29:56Z

correct.
must() == raise an error if not recognised.
opt() entirely optional

note that createFile() has the same api/rules/ common builder codebase. i hope to add some options there for the stores to say "skip all probes for paths existing, being a dir, having a parent..." goal is max performance writes on paths where the writer knows it is building up a dir tree from scratch, or using unique names. in particular: flink/spark checkpoints

steveloughran · 2021-01-21T11:43:15Z

Thomas, to clarify a bit more

yes, an FS can choose to ignore an option, -but it must recognise the option and so make a conscious decision "this doesn't matter"

so with the standard length/read policy options, an FS can concude "I see that and don't support it", and just ignore it -a decision made on an option-by-option basis.
if something unrecognised comes in as .must() the FS must raise an exception "I don't recognise that".
if something it does recognise as a .must() which it knows it doesn't support and which it MUST be able to, then it MUST raise an exception.

Example: vectored IO will only work if the FS supports seek, which ftp doesn't. Assuming we add an option to say "I Want vectored IO" then all filesystems will get this out the box as a well known-and-supported feature, but ftp will need some explicit code to say "fail if vectored IO is requested"

steveloughran · 2021-02-01T18:39:54Z

rebased to fix compile problems; the final patch is the one with changes since Thomas's last review

move all the text on options into the fsdatainputstreambuilder.md file
try to define the validation of must/opt in pseudo-python
make sure the doc is consistent in saying FNFE and permission issues MUST be delayed until future.get() and MAY be delayed until the first actual read. This is to give object stores the maximum time for async probes. Example: you could initiate a HEAD request in build() but not block for its completion until read()

S3A added some extra tuning of the input stream/openFile logic, so the stream no longer gets access to the S3 client, simply callbacks

Testing: in progress

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Options.java

steveloughran · 2022-04-13T12:01:17Z

@mehakmeet thanks, yes, sounds like it. file a JIRA 😁

dannycjones

disclaimer: mainly been focusing on S3A so new to common libs here

Looks good!

I have a few comments on documentation since I was referring to it a lot. I can absolutely see the benefits of this API for future enhancements like prefetching.

I haven't run any tests. If requester pays has issues with endpoint, feel free to assign the JIRA to me.

dannycjones · 2022-04-07T11:26:35Z

...mon-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/AbstractFSBuilderImpl.java

+  @Override
+  public B opt(@Nonnull final String key, final long value) {
+    mandatoryKeys.remove(key);
+    optionalKeys.add(key);
+    options.setLong(key, value);
+    return getThisBuilder();
+  }


JavaDoc?

/** * Set optional long parameter for the Builder. * * @see #opt(String, String) */

dannycjones · 2022-04-07T12:11:07Z

...op-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FutureIOSupport.java

@@ -53,6 +52,7 @@ private FutureIOSupport() {
  /**
   * Given a future, evaluate it. Raised exceptions are
   * extracted and handled.
+   * See {@link FutureIO#awaitFuture(Future, long, TimeUnit)}.


I think we want to reference the awaitFuture with only a future as arg?

Suggested change

* See {@link FutureIO#awaitFuture(Future, long, TimeUnit)}.

* See {@link FutureIO#awaitFuture(Future)}.

no, because this is the (deprecated, internal) FutureIOSuppport class which forwards to the public FutureIO.awaitFuture(future, timeout, unit); in o.a.h.util.functional.
looking at that method though, i can see it is incompletely javadoc'd, so i've updated it

dannycjones · 2022-04-07T12:17:16Z

...op-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FutureIOSupport.java

@@ -136,56 +129,39 @@ private FutureIOSupport() {
   * @param <U> type of builder
   * @return the builder passed in.
   */
+  @Deprecated


Why deprecate this method when other methods promoted to FutureIO are happy without a deprecated flag?

Should we encourage Hadoop developers to move to FutureIO once promoted from FutureIOSupport?

FutureIOSupport is private fs.impl, tagged private unstable. only FS implementations should be using it. I've just made sure there are no refs in our own code, so it's only needed to ensure that third party code still links

dannycjones · 2022-04-07T15:43:53Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java

+            .opt(FS_OPTION_OPENFILE_LENGTH,
+                srcStatus.getLen())   // file length hint for object stores


When should we use FS_OPTION_OPENFILE_LENGTH option vs. .withFileStatus(status)?

dannycjones · 2022-04-11T07:59:01Z

hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/AbstractS3AMockTest.java

+    // this is so stream draining is always blocking, allowing
+    // assertions to be safely made without worrying
+    // about any race conditions
+    conf.setInt(ASYNC_DRAIN_THRESHOLD, 128_000);


Hoping to better understand why the change is needed - what did the race conditions look like?

Only after posting this has it clicked - we just want to make sure any assertions on the stream are completed after drain? Makes sense.

Integer.MAX_VALUE might make it more explicit - I was wondering the significance of 128_000.

dannycjones · 2022-04-13T08:51:13Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

   */
-  @Deprecated


Is this method missing an @Override tag? FileSystem#getDefaultBlockSize() is deprecated

dannycjones · 2022-04-13T09:01:52Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

+    if (fileStatus == null) {
      // we check here for the passed in status
      // being a directory
-      if (fileStatus.isDirectory()) {
-        throw new FileNotFoundException(path.toString() + " is a directory");
-      }
-    } else {


This comment is no longer accurate, I think? It belongs with line 4893 (or we can drop the comment).

// we check here for the passed in status // being a directory

dannycjones · 2022-04-13T09:53:08Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/OpenFileSupport.java

+        .withAsyncDrainThreshold(
+            options.getLong(ASYNC_DRAIN_THRESHOLD,
+                defaultReadAhead))


Default should be defaultAsyncDrainThreshold, not defaultReadAhead?

dannycjones · 2022-04-13T12:03:37Z

...op-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/performance/ITestS3AOpenCost.java

+  @Test
+  public void testOpenFileLongerLength() throws Throwable {
+    // do a second read with the length declared as short.
+    // we now expect the bytes read to be shorter.


Comment needs updating for this test?

reviewed and updated all comments

dannycjones · 2022-04-13T12:05:08Z

...op-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/performance/ITestS3AOpenCost.java

+    // the stream gets opened during read
+    long readLen = verifyMetrics(() ->
+            readStream(in),
+        always(NO_IO),


I didn't understand here - what do we mean by NO_IO? We are reading all of the stream, right?

hadoop-yetus · 2022-04-13T15:51:21Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 57s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 19 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	15m 52s		Maven dependency ordering for branch
+1 💚	mvninstall	27m 46s		trunk passed
+1 💚	compile	24m 52s		trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚	compile	21m 21s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	checkstyle	4m 5s		trunk passed
+1 💚	mvnsite	7m 44s		trunk passed
+1 💚	javadoc	6m 8s		trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	6m 31s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	12m 12s		trunk passed
+1 💚	shadedclient	24m 0s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	24m 22s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 23s		Maven dependency ordering for patch
+1 💚	mvninstall	4m 53s		the patch passed
+1 💚	compile	24m 14s		the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
-1 ❌	javac	24m 14s	/results-compile-javac-root-jdkUbuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04.txt	root-jdkUbuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 generated 1 new + 1810 unchanged - 0 fixed = 1811 total (was 1810)
+1 💚	compile	21m 33s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
-1 ❌	javac	21m 33s	/results-compile-javac-root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt	root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~~20.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~~20.04-b07 generated 1 new + 1684 unchanged - 0 fixed = 1685 total (was 1684)
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	3m 59s		root: The patch generated 0 new + 847 unchanged - 2 fixed = 847 total (was 849)
+1 💚	mvnsite	7m 39s		the patch passed
+1 💚	xml	0m 2s		The patch has no ill-formed XML file.
+1 💚	javadoc	6m 1s		the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 37s		hadoop-common in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 53s		hadoop-yarn-common in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 32s		hadoop-mapreduce-client-core in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 34s		hadoop-mapreduce-client-app in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 31s		hadoop-distcp in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 33s		hadoop-mapreduce-examples in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 32s		hadoop-streaming in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 39s		hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~~20.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~~20.04-b07 generated 0 new + 38 unchanged - 1 fixed = 38 total (was 39)
+1 💚	javadoc	0m 36s		hadoop-azure in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	spotbugs	13m 40s		the patch passed
+1 💚	shadedclient	24m 12s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	17m 43s		hadoop-common in the patch passed.
+1 💚	unit	4m 45s		hadoop-yarn-common in the patch passed.
+1 💚	unit	6m 19s		hadoop-mapreduce-client-core in the patch passed.
+1 💚	unit	8m 26s		hadoop-mapreduce-client-app in the patch passed.
+1 💚	unit	45m 45s		hadoop-distcp in the patch passed.
+1 💚	unit	0m 51s		hadoop-mapreduce-examples in the patch passed.
+1 💚	unit	6m 57s		hadoop-streaming in the patch passed.
+1 💚	unit	2m 36s		hadoop-aws in the patch passed.
+1 💚	unit	2m 11s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 55s		The patch does not generate ASF License warnings.
		365m 21s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2584/22/artifact/out/Dockerfile
GITHUB PR	#2584
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint xml
uname	Linux ec4ef1f6463e 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `e7b29ef`
Default Java	Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2584/22/testReport/
Max. process+thread count	2202 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-tools/hadoop-distcp hadoop-mapreduce-project/hadoop-mapreduce-examples hadoop-tools/hadoop-streaming hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2584/22/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Feedback from dannycjones. Change-Id: I546f28411c2475e1254b259c7e0734cc868ea9f0

Change-Id: If30684e9b4d39e9d1ba9cfdf50963b655c20144f

hadoop-yetus · 2022-04-19T23:44:04Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 2s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 2s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 21 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	16m 27s		Maven dependency ordering for branch
+1 💚	mvninstall	28m 19s		trunk passed
+1 💚	compile	25m 21s		trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚	compile	21m 58s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	checkstyle	4m 37s		trunk passed
+1 💚	mvnsite	11m 18s		trunk passed
+1 💚	javadoc	9m 28s		trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	9m 56s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	15m 41s		trunk passed
+1 💚	shadedclient	24m 34s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	25m 1s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 27s		Maven dependency ordering for patch
+1 💚	mvninstall	5m 43s		the patch passed
+1 💚	compile	24m 18s		the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
-1 ❌	javac	24m 18s	/results-compile-javac-root-jdkUbuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04.txt	root-jdkUbuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 generated 1 new + 1810 unchanged - 0 fixed = 1811 total (was 1810)
+1 💚	compile	21m 56s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
-1 ❌	javac	21m 56s	/results-compile-javac-root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt	root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~~20.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~~20.04-b07 generated 1 new + 1684 unchanged - 0 fixed = 1685 total (was 1684)
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	4m 28s	/results-checkstyle-root.txt	root: The patch generated 1 new + 691 unchanged - 2 fixed = 692 total (was 693)
+1 💚	mvnsite	11m 7s		the patch passed
+1 💚	xml	0m 2s		The patch has no ill-formed XML file.
+1 💚	javadoc	9m 29s		the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	2m 0s		hadoop-common in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	1m 16s		hadoop-yarn-common in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 55s		hadoop-mapreduce-client-core in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 58s		hadoop-mapreduce-client-app in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 54s		hadoop-distcp in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 57s		hadoop-mapreduce-examples in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	0m 55s		hadoop-streaming in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	javadoc	1m 2s		hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~~20.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~~20.04-b07 generated 0 new + 38 unchanged - 1 fixed = 38 total (was 39)
+1 💚	javadoc	1m 0s		hadoop-azure in the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚	spotbugs	17m 0s		the patch passed
+1 💚	shadedclient	25m 11s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	18m 12s		hadoop-common in the patch passed.
+1 💚	unit	5m 11s		hadoop-yarn-common in the patch passed.
+1 💚	unit	6m 37s		hadoop-mapreduce-client-core in the patch passed.
+1 💚	unit	8m 54s		hadoop-mapreduce-client-app in the patch passed.
+1 💚	unit	46m 17s		hadoop-distcp in the patch passed.
+1 💚	unit	1m 16s		hadoop-mapreduce-examples in the patch passed.
+1 💚	unit	7m 25s		hadoop-streaming in the patch passed.
+1 💚	unit	3m 1s		hadoop-aws in the patch passed.
+1 💚	unit	2m 35s		hadoop-azure in the patch passed.
+1 💚	asflicense	1m 21s		The patch does not generate ASF License warnings.
		404m 50s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2584/24/artifact/out/Dockerfile
GITHUB PR	#2584
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint xml
uname	Linux 8a3c71cc6719 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `bf8e1d4`
Default Java	Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2584/24/testReport/
Max. process+thread count	1375 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-tools/hadoop-distcp hadoop-mapreduce-project/hadoop-mapreduce-examples hadoop-tools/hadoop-streaming hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2584/24/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Change-Id: I64ac45369e4a6e1e9cd651b01acd380d258782fb

mukund-thakur

LGTM +1
Ran aws tests as well didn't saw any new failure.

This defines standard option and values for the openFile() builder API for opening a file: fs.option.openfile.read.policy A list of the desired read policy, in preferred order. standard values are adaptive, default, random, sequential, vector, whole-file fs.option.openfile.length How long the file is. fs.option.openfile.split.start start of a task's split fs.option.openfile.split.end end of a task's split These can be used by filesystem connectors to optimize their reading of the source file, including but not limited to * skipping existence/length probes when opening a file * choosing a policy for prefetching/caching data The hadoop shell commands which read files all declare "whole-file" and "sequential", as appropriate. Contributed by Steve Loughran. Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1

These changes ensure that sequential files are opened with the right read policy, and split start/end is passed in. As well as offering opportunities for filesystem clients to choose fetch/cache/seek policies, the settings ensure that processing text files on an s3 bucket where the default policy is "random" will still be processed efficiently. This commit depends on the associated hadoop-common patch, which must be committed first. Contributed by Steve Loughran. Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94

S3A input stream support for the few fs.option.openfile settings. As well as supporting the read policy option and values, if the file length is declared in fs.option.openfile.length then no HEAD request will be issued when opening a file. This can cut a few tens of milliseconds off the operation. The patch adds a new openfile parameter/FS configuration option fs.s3a.input.async.drain.threshold (default: 16000). It declares the number of bytes remaining in the http input stream above which any operation to read and discard the rest of the stream, "draining", is executed asynchronously. This asynchronous draining offers some performance benefit on seek-heavy file IO. Contributed by Steve Loughran. Change-Id: I9b0626bbe635e9fd97ac0f463f5e7167e0111e39

Stops the abfs connector warning if openFile().withFileStatus() is invoked with a FileStatus is not an abfs VersionedFileStatus. Contributed by Steve Loughran. Change-Id: I85076b365eb30aaef2ed35139fa8714efd4d048e

…1) This defines standard option and values for the openFile() builder API for opening a file: fs.option.openfile.read.policy A list of the desired read policy, in preferred order. standard values are adaptive, default, random, sequential, vector, whole-file fs.option.openfile.length How long the file is. fs.option.openfile.split.start start of a task's split fs.option.openfile.split.end end of a task's split These can be used by filesystem connectors to optimize their reading of the source file, including but not limited to * skipping existence/length probes when opening a file * choosing a policy for prefetching/caching data The hadoop shell commands which read files all declare "whole-file" and "sequential", as appropriate. Contributed by Steve Loughran. Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1

…e#2584/2) These changes ensure that sequential files are opened with the right read policy, and split start/end is passed in. As well as offering opportunities for filesystem clients to choose fetch/cache/seek policies, the settings ensure that processing text files on an s3 bucket where the default policy is "random" will still be processed efficiently. This commit depends on the associated hadoop-common patch, which must be committed first. Contributed by Steve Loughran. Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94

S3A input stream support for the few fs.option.openfile settings. As well as supporting the read policy option and values, if the file length is declared in fs.option.openfile.length then no HEAD request will be issued when opening a file. This can cut a few tens of milliseconds off the operation. The patch adds a new openfile parameter/FS configuration option fs.s3a.input.async.drain.threshold (default: 16000). It declares the number of bytes remaining in the http input stream above which any operation to read and discard the rest of the stream, "draining", is executed asynchronously. This asynchronous draining offers some performance benefit on seek-heavy file IO. Contributed by Steve Loughran. Change-Id: I9b0626bbe635e9fd97ac0f463f5e7167e0111e39

Stops the abfs connector warning if openFile().withFileStatus() is invoked with a FileStatus is not an abfs VersionedFileStatus. Contributed by Steve Loughran. Change-Id: I85076b365eb30aaef2ed35139fa8714efd4d048e

This defines standard option and values for the openFile() builder API for opening a file: fs.option.openfile.read.policy A list of the desired read policy, in preferred order. standard values are adaptive, default, random, sequential, vector, whole-file fs.option.openfile.length How long the file is. fs.option.openfile.split.start start of a task's split fs.option.openfile.split.end end of a task's split These can be used by filesystem connectors to optimize their reading of the source file, including but not limited to * skipping existence/length probes when opening a file * choosing a policy for prefetching/caching data The hadoop shell commands which read files all declare "whole-file" and "sequential", as appropriate. Contributed by Steve Loughran. Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1

These changes ensure that sequential files are opened with the right read policy, and split start/end is passed in. As well as offering opportunities for filesystem clients to choose fetch/cache/seek policies, the settings ensure that processing text files on an s3 bucket where the default policy is "random" will still be processed efficiently. This commit depends on the associated hadoop-common patch, which must be committed first. Contributed by Steve Loughran. Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94

S3A input stream support for the few fs.option.openfile settings. As well as supporting the read policy option and values, if the file length is declared in fs.option.openfile.length then no HEAD request will be issued when opening a file. This can cut a few tens of milliseconds off the operation. The patch adds a new openfile parameter/FS configuration option fs.s3a.input.async.drain.threshold (default: 16000). It declares the number of bytes remaining in the http input stream above which any operation to read and discard the rest of the stream, "draining", is executed asynchronously. This asynchronous draining offers some performance benefit on seek-heavy file IO. Contributed by Steve Loughran. Change-Id: I9b0626bbe635e9fd97ac0f463f5e7167e0111e39

Stops the abfs connector warning if openFile().withFileStatus() is invoked with a FileStatus is not an abfs VersionedFileStatus. Contributed by Steve Loughran. Change-Id: I85076b365eb30aaef2ed35139fa8714efd4d048e

steveloughran · 2022-04-29T10:56:06Z

merged

…1) This defines standard option and values for the openFile() builder API for opening a file: fs.option.openfile.read.policy A list of the desired read policy, in preferred order. standard values are adaptive, default, random, sequential, vector, whole-file fs.option.openfile.length How long the file is. fs.option.openfile.split.start start of a task's split fs.option.openfile.split.end end of a task's split These can be used by filesystem connectors to optimize their reading of the source file, including but not limited to * skipping existence/length probes when opening a file * choosing a policy for prefetching/caching data The hadoop shell commands which read files all declare "whole-file" and "sequential", as appropriate. Contributed by Steve Loughran. Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1

…e#2584/2) These changes ensure that sequential files are opened with the right read policy, and split start/end is passed in. As well as offering opportunities for filesystem clients to choose fetch/cache/seek policies, the settings ensure that processing text files on an s3 bucket where the default policy is "random" will still be processed efficiently. This commit depends on the associated hadoop-common patch, which must be committed first. Contributed by Steve Loughran. Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94

S3A input stream support for the few fs.option.openfile settings. As well as supporting the read policy option and values, if the file length is declared in fs.option.openfile.length then no HEAD request will be issued when opening a file. This can cut a few tens of milliseconds off the operation. The patch adds a new openfile parameter/FS configuration option fs.s3a.input.async.drain.threshold (default: 16000). It declares the number of bytes remaining in the http input stream above which any operation to read and discard the rest of the stream, "draining", is executed asynchronously. This asynchronous draining offers some performance benefit on seek-heavy file IO. Contributed by Steve Loughran. Change-Id: I9b0626bbe635e9fd97ac0f463f5e7167e0111e39

Stops the abfs connector warning if openFile().withFileStatus() is invoked with a FileStatus is not an abfs VersionedFileStatus. Contributed by Steve Loughran. Change-Id: I85076b365eb30aaef2ed35139fa8714efd4d048e

steveloughran force-pushed the s3/HADOOP-16202-enhance-openfile branch from 3b3a4f8 to 41aa610 Compare January 8, 2021 16:30

steveloughran force-pushed the s3/HADOOP-16202-enhance-openfile branch from 41aa610 to 0214072 Compare January 13, 2021 18:51

This comment has been minimized.

Sign in to view

apache deleted a comment from hadoop-yetus Jan 15, 2021

ThomasMarquardt reviewed Jan 15, 2021

View reviewed changes

steveloughran added fs fs/s3 changes related to hadoop-aws; submitter must declare test endpoint labels Jan 22, 2021

steveloughran force-pushed the s3/HADOOP-16202-enhance-openfile branch from 26dacf1 to cac2661 Compare February 1, 2021 18:27

steveloughran force-pushed the s3/HADOOP-16202-enhance-openfile branch from cac2661 to 7e105dc Compare February 15, 2021 10:33

steveloughran force-pushed the s3/HADOOP-16202-enhance-openfile branch from 7e105dc to f5013f9 Compare March 17, 2021 21:08

steveloughran force-pushed the s3/HADOOP-16202-enhance-openfile branch 3 times, most recently from 340be22 to 700f998 Compare April 26, 2021 09:58

apache deleted a comment from hadoop-yetus Jul 12, 2021

steveloughran force-pushed the s3/HADOOP-16202-enhance-openfile branch from 700f998 to d28fa9d Compare August 26, 2021 19:42

snvijaya mentioned this pull request Aug 27, 2021

HADOOP-17682. ABFS: Support FileStatus input to OpenFileWithOptions() via OpenFileParameters #2975

Merged

apache deleted a comment from hadoop-yetus Aug 27, 2021

steveloughran force-pushed the s3/HADOOP-16202-enhance-openfile branch from 103472c to c26e133 Compare September 6, 2021 15:20

steveloughran force-pushed the s3/HADOOP-16202-enhance-openfile branch from c26e133 to fe4fa5c Compare November 24, 2021 17:02

steveloughran commented Dec 2, 2021

View reviewed changes

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Options.java Outdated Show resolved Hide resolved

steveloughran mentioned this pull request Apr 13, 2022

HDFS-14478: Add libhdfs APIs for openFile #4166

Merged

4 tasks

dannycjones reviewed Apr 13, 2022

View reviewed changes

steveloughran added 2 commits April 19, 2022 17:56

HADOOP-16202. enhance-openfile: review feedback

98ebf76

Feedback from dannycjones. Change-Id: I546f28411c2475e1254b259c7e0734cc868ea9f0

Merge branch 'trunk' into s3/HADOOP-16202-enhance-openfile

bf8e1d4

Change-Id: If30684e9b4d39e9d1ba9cfdf50963b655c20144f

apache deleted a comment from hadoop-yetus Apr 22, 2022

HADOOP-16202. checkstyle; unused import

60cb6b5

Change-Id: I64ac45369e4a6e1e9cd651b01acd380d258782fb

mukund-thakur approved these changes Apr 22, 2022

View reviewed changes

apache deleted a comment from hadoop-yetus Apr 29, 2022

steveloughran closed this Apr 29, 2022

steveloughran deleted the s3/HADOOP-16202-enhance-openfile branch December 15, 2022 14:18

		@@ -4655,7 +4656,7 @@ public FutureDataInputStreamBuilder openFile(PathHandle pathHandle)
		final OpenFileParameters parameters) throws IOException {
		AbstractFSBuilderImpl.rejectUnknownMandatoryKeys(

		This means that it is not a requirement for the stores to actually read the
		the read policy or file length values and use them when opening files.

	* See {@link FutureIO#awaitFuture(Future, long, TimeUnit)}.
	* See {@link FutureIO#awaitFuture(Future)}.

		.opt(FS_OPTION_OPENFILE_LENGTH,
		srcStatus.getLen()) // file length hint for object stores

HADOOP-16202. Enhance openFile() for better read performance against object stores #2584

HADOOP-16202. Enhance openFile() for better read performance against object stores #2584

Conversation

steveloughran commented Jan 4, 2021 • edited Loading

steveloughran commented Jan 5, 2021

steveloughran commented Jan 9, 2021

steveloughran commented Jan 13, 2021 • edited Loading

This comment has been minimized.

steveloughran commented Jan 15, 2021

ThomasMarquardt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveloughran commented Jan 16, 2021

ThomasMarquardt commented Jan 16, 2021

steveloughran commented Jan 20, 2021

steveloughran commented Jan 21, 2021

steveloughran commented Feb 1, 2021

steveloughran commented Apr 13, 2022

dannycjones left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Apr 13, 2022

hadoop-yetus commented Apr 19, 2022

mukund-thakur left a comment

Choose a reason for hiding this comment

steveloughran commented Apr 29, 2022

steveloughran commented Jan 4, 2021 •

edited

Loading

steveloughran commented Jan 13, 2021 •

edited

Loading