HADOOP-16202. Enhanced openFile() -branch-3.3 backport #4238

steveloughran · 2022-04-27T11:45:44Z

Description of PR

backport of HADOOP-16202. Enhanced openFile() to branch-3.3, plus a couple of other cherrypicks from trunk to ease the backporting.

if yetus is happy i wil merge the entire sequence in as the ordered chain of commits

How was this patch tested?

cloud store testing in progress against aws london and azure cardiff.

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

… via OpenFileParameters (apache#2975)

…1) This defines standard option and values for the openFile() builder API for opening a file: fs.option.openfile.read.policy A list of the desired read policy, in preferred order. standard values are adaptive, default, random, sequential, vector, whole-file fs.option.openfile.length How long the file is. fs.option.openfile.split.start start of a task's split fs.option.openfile.split.end end of a task's split These can be used by filesystem connectors to optimize their reading of the source file, including but not limited to * skipping existence/length probes when opening a file * choosing a policy for prefetching/caching data The hadoop shell commands which read files all declare "whole-file" and "sequential", as appropriate. Contributed by Steve Loughran. Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1

…e#2584/2) These changes ensure that sequential files are opened with the right read policy, and split start/end is passed in. As well as offering opportunities for filesystem clients to choose fetch/cache/seek policies, the settings ensure that processing text files on an s3 bucket where the default policy is "random" will still be processed efficiently. This commit depends on the associated hadoop-common patch, which must be committed first. Contributed by Steve Loughran. Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94

S3A input stream support for the few fs.option.openfile settings. As well as supporting the read policy option and values, if the file length is declared in fs.option.openfile.length then no HEAD request will be issued when opening a file. This can cut a few tens of milliseconds off the operation. The patch adds a new openfile parameter/FS configuration option fs.s3a.input.async.drain.threshold (default: 16000). It declares the number of bytes remaining in the http input stream above which any operation to read and discard the rest of the stream, "draining", is executed asynchronously. This asynchronous draining offers some performance benefit on seek-heavy file IO. Contributed by Steve Loughran. Change-Id: I9b0626bbe635e9fd97ac0f463f5e7167e0111e39

Stops the abfs connector warning if openFile().withFileStatus() is invoked with a FileStatus is not an abfs VersionedFileStatus. Contributed by Steve Loughran. Change-Id: I85076b365eb30aaef2ed35139fa8714efd4d048e

hadoop-yetus · 2022-04-27T16:50:50Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	10m 15s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 2s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 23 new or modified test files.
			_ branch-3.3 Compile Tests _
+0 🆗	mvndep	14m 55s		Maven dependency ordering for branch
+1 💚	mvninstall	26m 52s		branch-3.3 passed
+1 💚	compile	18m 45s		branch-3.3 passed
+1 💚	checkstyle	3m 21s		branch-3.3 passed
+1 💚	mvnsite	10m 43s		branch-3.3 passed
+1 💚	javadoc	9m 48s		branch-3.3 passed
+1 💚	spotbugs	15m 21s		branch-3.3 passed
+1 💚	shadedclient	27m 13s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 26s		Maven dependency ordering for patch
+1 💚	mvninstall	6m 11s		the patch passed
+1 💚	compile	18m 34s		the patch passed
-1 ❌	javac	18m 34s	/results-compile-javac-root.txt	root generated 1 new + 1926 unchanged - 0 fixed = 1927 total (was 1926)
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	3m 12s		root: The patch generated 0 new + 696 unchanged - 2 fixed = 696 total (was 698)
+1 💚	mvnsite	10m 49s		the patch passed
+1 💚	xml	0m 1s		The patch has no ill-formed XML file.
+1 💚	javadoc	1m 57s		hadoop-common in the patch passed.
+1 💚	javadoc	1m 21s		hadoop-yarn-common in the patch passed.
+1 💚	javadoc	0m 51s		hadoop-mapreduce-client-core in the patch passed.
+1 💚	javadoc	0m 55s		hadoop-mapreduce-client-app in the patch passed.
+1 💚	javadoc	0m 53s		hadoop-mapreduce-examples in the patch passed.
+1 💚	javadoc	0m 52s		hadoop-streaming in the patch passed.
+1 💚	javadoc	0m 51s		hadoop-distcp in the patch passed.
+1 💚	javadoc	0m 57s		hadoop-tools_hadoop-aws generated 0 new + 38 unchanged - 1 fixed = 38 total (was 39)
+1 💚	javadoc	0m 55s		hadoop-azure in the patch passed.
+1 💚	spotbugs	17m 44s		the patch passed
+1 💚	shadedclient	28m 7s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	17m 46s		hadoop-common in the patch passed.
+1 💚	unit	5m 3s		hadoop-yarn-common in the patch passed.
+1 💚	unit	6m 30s		hadoop-mapreduce-client-core in the patch passed.
+1 💚	unit	8m 47s		hadoop-mapreduce-client-app in the patch passed.
+1 💚	unit	1m 13s		hadoop-mapreduce-examples in the patch passed.
+1 💚	unit	6m 53s		hadoop-streaming in the patch passed.
+1 💚	unit	15m 40s		hadoop-distcp in the patch passed.
+1 💚	unit	2m 39s		hadoop-aws in the patch passed.
+1 💚	unit	2m 38s		hadoop-azure in the patch passed.
+1 💚	asflicense	1m 18s		The patch does not generate ASF License warnings.
		304m 3s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4238/1/artifact/out/Dockerfile
GITHUB PR	#4238
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint xml
uname	Linux 8bcdd5b97190 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	branch-3.3 / `74d3b18`
Default Java	Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4238/1/testReport/
Max. process+thread count	2240 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-examples hadoop-tools/hadoop-streaming hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4238/1/console
versions	git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

steveloughran · 2022-04-28T11:05:22Z

merged locally; closing

sumangala-patki and others added 5 commits April 27, 2022 12:34

HADOOP-17682. ABFS: Support FileStatus input to OpenFileWithOptions()…

97735de

… via OpenFileParameters (apache#2975)

HADOOP-16202. Enhanced openFile(): hadoop-azure changes. (apache#2584/4)

74d3b18

Stops the abfs connector warning if openFile().withFileStatus() is invoked with a FileStatus is not an abfs VersionedFileStatus. Contributed by Steve Loughran. Change-Id: I85076b365eb30aaef2ed35139fa8714efd4d048e

steveloughran closed this Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HADOOP-16202. Enhanced openFile() -branch-3.3 backport #4238

HADOOP-16202. Enhanced openFile() -branch-3.3 backport #4238

steveloughran commented Apr 27, 2022

hadoop-yetus commented Apr 27, 2022

steveloughran commented Apr 28, 2022

HADOOP-16202. Enhanced openFile() -branch-3.3 backport #4238

HADOOP-16202. Enhanced openFile() -branch-3.3 backport #4238

Conversation

steveloughran commented Apr 27, 2022

Description of PR

How was this patch tested?

For code changes:

hadoop-yetus commented Apr 27, 2022

steveloughran commented Apr 28, 2022