diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
index 2b5014b7b234d..2da45be7d6f98 100644
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
@@ -1,6 +1,19 @@
-## NOTICE
+
+
+### Description of PR
+
+
+### How was this patch tested?
+
+
+### For code changes:
+
+- [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
+- [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
+- [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
+- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
-Please create an issue in ASF JIRA before opening a pull request,
-and you need to set the title of the pull request which starts with
-the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
-For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
diff --git a/BUILDING.txt b/BUILDING.txt
index c34946aa993b7..5f40a0d7dc3b3 100644
--- a/BUILDING.txt
+++ b/BUILDING.txt
@@ -8,10 +8,10 @@ Requirements:
* Maven 3.3 or later
* Boost 1.72 (if compiling native code)
* Protocol Buffers 3.7.1 (if compiling native code)
-* CMake 3.1 or newer (if compiling native code)
+* CMake 3.19 or newer (if compiling native code)
* Zlib devel (if compiling native code)
* Cyrus SASL devel (if compiling native code)
-* One of the compilers that support thread_local storage: GCC 4.8.1 or later, Visual Studio,
+* One of the compilers that support thread_local storage: GCC 9.3.0 or later, Visual Studio,
Clang (community version), Clang (version for iOS 9 and later) (if compiling native code)
* openssl devel (if compiling native hadoop-pipes and to get the best HDFS encryption performance)
* Linux FUSE (Filesystem in Userspace) version 2.6 or above (if compiling fuse_dfs)
@@ -51,39 +51,47 @@ Known issues:
and run your IDE and Docker etc inside that VM.
----------------------------------------------------------------------------------
-Installing required packages for clean install of Ubuntu 14.04 LTS Desktop:
+Installing required packages for clean install of Ubuntu 18.04 LTS Desktop.
+(For Ubuntu 20.04, gcc/g++ and cmake bundled with Ubuntu can be used.
+Refer to dev-support/docker/Dockerfile):
-* Oracle JDK 1.8 (preferred)
- $ sudo apt-get purge openjdk*
- $ sudo apt-get install software-properties-common
- $ sudo add-apt-repository ppa:webupd8team/java
+* Open JDK 1.8
$ sudo apt-get update
- $ sudo apt-get install oracle-java8-installer
+ $ sudo apt-get -y install openjdk-8-jdk
* Maven
$ sudo apt-get -y install maven
* Native libraries
$ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev libsasl2-dev
+* GCC 9.3.0
+ $ sudo apt-get -y install software-properties-common
+ $ sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
+ $ sudo apt-get update
+ $ sudo apt-get -y install g++-9 gcc-9
+ $ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 60 --slave /usr/bin/g++ g++ /usr/bin/g++-9
+* CMake 3.19
+ $ curl -L https://cmake.org/files/v3.19/cmake-3.19.0.tar.gz > cmake-3.19.0.tar.gz
+ $ tar -zxvf cmake-3.19.0.tar.gz && cd cmake-3.19.0
+ $ ./bootstrap
+ $ make -j$(nproc)
+ $ sudo make install
* Protocol Buffers 3.7.1 (required to build native code)
- $ mkdir -p /opt/protobuf-3.7-src \
- && curl -L -s -S \
- https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protobuf-java-3.7.1.tar.gz \
- -o /opt/protobuf-3.7.1.tar.gz \
- && tar xzf /opt/protobuf-3.7.1.tar.gz --strip-components 1 -C /opt/protobuf-3.7-src \
- && cd /opt/protobuf-3.7-src \
- && ./configure\
- && make install \
- && rm -rf /opt/protobuf-3.7-src
+ $ curl -L -s -S https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protobuf-java-3.7.1.tar.gz -o protobuf-3.7.1.tar.gz
+ $ mkdir protobuf-3.7-src
+ $ tar xzf protobuf-3.7.1.tar.gz --strip-components 1 -C protobuf-3.7-src && cd protobuf-3.7-src
+ $ ./configure
+ $ make -j$(nproc)
+ $ sudo make install
* Boost
- $ curl -L https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download > boost_1_72_0.tar.bz2 \
- && tar --bzip2 -xf boost_1_72_0.tar.bz2 \
- && cd boost_1_72_0 \
- && ./bootstrap.sh --prefix=/usr/ \
- && ./b2 --without-python install
+ $ curl -L https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download > boost_1_72_0.tar.bz2
+ $ tar --bzip2 -xf boost_1_72_0.tar.bz2 && cd boost_1_72_0
+ $ ./bootstrap.sh --prefix=/usr/
+ $ ./b2 --without-python
+ $ sudo ./b2 --without-python install
Optional packages:
* Snappy compression (only used for hadoop-mapreduce-client-nativetask)
- $ sudo apt-get install snappy libsnappy-dev
+ $ sudo apt-get install libsnappy-dev
* Intel ISA-L library for erasure coding
Please refer to https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version
(OR https://github.com/01org/isa-l)
@@ -103,7 +111,7 @@ Maven main modules:
- hadoop-project (Parent POM for all Hadoop Maven modules. )
(All plugins & dependencies versions are defined here.)
- hadoop-project-dist (Parent POM for modules that generate distributions.)
- - hadoop-annotations (Generates the Hadoop doclet used to generated the Javadocs)
+ - hadoop-annotations (Generates the Hadoop doclet used to generate the Javadocs)
- hadoop-assemblies (Maven assemblies used by the different modules)
- hadoop-maven-plugins (Maven plugins used in project)
- hadoop-build-tools (Build tools like checkstyle, etc.)
@@ -120,7 +128,7 @@ Maven main modules:
----------------------------------------------------------------------------------
Where to run Maven from?
- It can be run from any module. The only catch is that if not run from utrunk
+ It can be run from any module. The only catch is that if not run from trunk
all modules that are not part of the build run must be installed in the local
Maven cache or available in a Maven repository.
@@ -131,11 +139,11 @@ Maven build goals:
* Compile : mvn compile [-Pnative]
* Run tests : mvn test [-Pnative] [-Pshelltest]
* Create JAR : mvn package
- * Run findbugs : mvn compile findbugs:findbugs
+ * Run spotbugs : mvn compile spotbugs:spotbugs
* Run checkstyle : mvn compile checkstyle:checkstyle
* Install JAR in M2 cache : mvn install
* Deploy JAR to Maven repo : mvn deploy
- * Run clover : mvn test -Pclover [-DcloverLicenseLocation=${user.name}/.clover.license]
+ * Run clover : mvn test -Pclover
* Run Rat : mvn apache-rat:check
* Build javadocs : mvn javadoc:javadoc
* Build distribution : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar][-Preleasedocs][-Pyarn-ui]
@@ -176,7 +184,6 @@ Maven build goals:
we silently build a version of libhadoop.so that cannot make use of snappy.
This option is recommended if you plan on making use of snappy and want
to get more repeatable builds.
-
* Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy
header files and library files. You do not need this option if you have
installed snappy using a package manager.
@@ -319,40 +326,35 @@ to update SNAPSHOTs from external repos.
----------------------------------------------------------------------------------
Importing projects to eclipse
-When you import the project to eclipse, install hadoop-maven-plugins at first.
+At first, install artifacts including hadoop-maven-plugins at the top of the source tree.
- $ cd hadoop-maven-plugins
- $ mvn install
+ $ mvn clean install -DskipTests -DskipShade
-Then, generate eclipse project files.
-
- $ mvn eclipse:eclipse -DskipTests
-
-At last, import to eclipse by specifying the root directory of the project via
-[File] > [Import] > [Existing Projects into Workspace].
+Then, import to eclipse by specifying the root directory of the project via
+[File] > [Import] > [Maven] > [Existing Maven Projects].
----------------------------------------------------------------------------------
Building distributions:
-Create binary distribution without native code and without documentation:
+Create binary distribution without native code and without Javadocs:
$ mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true
-Create binary distribution with native code and with documentation:
+Create binary distribution with native code:
- $ mvn package -Pdist,native,docs -DskipTests -Dtar
+ $ mvn package -Pdist,native -DskipTests -Dtar
Create source distribution:
$ mvn package -Psrc -DskipTests
-Create source and binary distributions with native code and documentation:
+Create source and binary distributions with native code:
- $ mvn package -Pdist,native,docs,src -DskipTests -Dtar
+ $ mvn package -Pdist,native,src -DskipTests -Dtar
Create a local staging version of the website (in /tmp/hadoop-site)
- $ mvn clean site -Preleasedocs; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
+ $ mvn site site:stage -Preleasedocs,docs -DstagingDirectory=/tmp/hadoop-site
Note that the site needs to be built in a second pass after other artifacts.
@@ -453,6 +455,17 @@ Building on CentOS 8
* Install libraries provided by CentOS 8.
$ sudo dnf install libtirpc-devel zlib-devel lz4-devel bzip2-devel openssl-devel cyrus-sasl-devel libpmem-devel
+* Install GCC 9.3.0
+ $ sudo dnf -y install gcc-toolset-9-gcc gcc-toolset-9-gcc-c++
+ $ source /opt/rh/gcc-toolset-9/enable
+
+* Install CMake 3.19
+ $ curl -L https://cmake.org/files/v3.19/cmake-3.19.0.tar.gz > cmake-3.19.0.tar.gz
+ $ tar -zxvf cmake-3.19.0.tar.gz && cd cmake-3.19.0
+ $ ./bootstrap
+ $ make -j$(nproc)
+ $ sudo make install
+
* Install boost.
$ curl -L -o boost_1_72_0.tar.bz2 https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download
$ tar xjf boost_1_72_0.tar.bz2
@@ -489,7 +502,7 @@ Requirements:
* Maven 3.0 or later
* Boost 1.72
* Protocol Buffers 3.7.1
-* CMake 3.1 or newer
+* CMake 3.19 or newer
* Visual Studio 2010 Professional or Higher
* Windows SDK 8.1 (if building CPU rate control for the container executor)
* zlib headers (if building native code bindings for zlib)
diff --git a/Jenkinsfile b/Jenkinsfile
deleted file mode 100644
index 944a35b868b3a..0000000000000
--- a/Jenkinsfile
+++ /dev/null
@@ -1,220 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements. See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership. The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License. You may obtain a copy of the License at
-//
-// http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied. See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-pipeline {
-
- agent {
- label 'Hadoop'
- }
-
- options {
- buildDiscarder(logRotator(numToKeepStr: '5'))
- timeout (time: 20, unit: 'HOURS')
- timestamps()
- checkoutToSubdirectory('src')
- }
-
- environment {
- SOURCEDIR = 'src'
- // will also need to change notification section below
- PATCHDIR = 'out'
- DOCKERFILE = "${SOURCEDIR}/dev-support/docker/Dockerfile"
- YETUS='yetus'
- // Branch or tag name. Yetus release tags are 'rel/X.Y.Z'
- YETUS_VERSION='6ab19e71eaf3234863424c6f684b34c1d3dcc0ce'
- }
-
- parameters {
- string(name: 'JIRA_ISSUE_KEY',
- defaultValue: '',
- description: 'The JIRA issue that has a patch needing pre-commit testing. Example: HADOOP-1234')
- }
-
- stages {
- stage ('install yetus') {
- steps {
- dir("${WORKSPACE}/${YETUS}") {
- checkout([
- $class: 'GitSCM',
- branches: [[name: "${env.YETUS_VERSION}"]],
- userRemoteConfigs: [[ url: 'https://github.com/apache/yetus.git']]]
- )
- }
- }
- }
-
- stage ('precommit-run') {
- steps {
- withCredentials(
- [usernamePassword(credentialsId: 'apache-hadoop-at-github.com',
- passwordVariable: 'GITHUB_TOKEN',
- usernameVariable: 'GITHUB_USER'),
- usernamePassword(credentialsId: 'hadoopqa-at-asf-jira',
- passwordVariable: 'JIRA_PASSWORD',
- usernameVariable: 'JIRA_USER')]) {
- sh '''#!/usr/bin/env bash
-
- set -e
-
- TESTPATCHBIN="${WORKSPACE}/${YETUS}/precommit/src/main/shell/test-patch.sh"
-
- # this must be clean for every run
- if [[ -d "${WORKSPACE}/${PATCHDIR}" ]]; then
- rm -rf "${WORKSPACE}/${PATCHDIR}"
- fi
- mkdir -p "${WORKSPACE}/${PATCHDIR}"
-
- # if given a JIRA issue, process it. If CHANGE_URL is set
- # (e.g., Github Branch Source plugin), process it.
- # otherwise exit, because we don't want Hadoop to do a
- # full build. We wouldn't normally do this check for smaller
- # projects. :)
- if [[ -n "${JIRA_ISSUE_KEY}" ]]; then
- YETUS_ARGS+=("${JIRA_ISSUE_KEY}")
- elif [[ -z "${CHANGE_URL}" ]]; then
- echo "Full build skipped" > "${WORKSPACE}/${PATCHDIR}/report.html"
- exit 0
- fi
-
- YETUS_ARGS+=("--patch-dir=${WORKSPACE}/${PATCHDIR}")
-
- # where the source is located
- YETUS_ARGS+=("--basedir=${WORKSPACE}/${SOURCEDIR}")
-
- # our project defaults come from a personality file
- YETUS_ARGS+=("--project=hadoop")
- YETUS_ARGS+=("--personality=${WORKSPACE}/${SOURCEDIR}/dev-support/bin/hadoop.sh")
-
- # lots of different output formats
- YETUS_ARGS+=("--brief-report-file=${WORKSPACE}/${PATCHDIR}/brief.txt")
- YETUS_ARGS+=("--console-report-file=${WORKSPACE}/${PATCHDIR}/console.txt")
- YETUS_ARGS+=("--html-report-file=${WORKSPACE}/${PATCHDIR}/report.html")
-
- # enable writing back to Github
- YETUS_ARGS+=(--github-token="${GITHUB_TOKEN}")
-
- # enable writing back to ASF JIRA
- YETUS_ARGS+=(--jira-password="${JIRA_PASSWORD}")
- YETUS_ARGS+=(--jira-user="${JIRA_USER}")
-
- # auto-kill any surefire stragglers during unit test runs
- YETUS_ARGS+=("--reapermode=kill")
-
- # set relatively high limits for ASF machines
- # changing these to higher values may cause problems
- # with other jobs on systemd-enabled machines
- YETUS_ARGS+=("--proclimit=5500")
- YETUS_ARGS+=("--dockermemlimit=20g")
-
- # -1 findbugs issues that show up prior to the patch being applied
- YETUS_ARGS+=("--findbugs-strict-precheck")
-
- # rsync these files back into the archive dir
- YETUS_ARGS+=("--archive-list=checkstyle-errors.xml,findbugsXml.xml")
-
- # URL for user-side presentation in reports and such to our artifacts
- # (needs to match the archive bits below)
- YETUS_ARGS+=("--build-url-artifacts=artifact/out")
-
- # plugins to enable
- YETUS_ARGS+=("--plugins=all")
-
- # use Hadoop's bundled shelldocs
- YETUS_ARGS+=("--shelldocs=${WORKSPACE}/${SOURCEDIR}/dev-support/bin/shelldocs")
-
- # don't let these tests cause -1s because we aren't really paying that
- # much attention to them
- YETUS_ARGS+=("--tests-filter=checkstyle")
-
- # run in docker mode and specifically point to our
- # Dockerfile since we don't want to use the auto-pulled version.
- YETUS_ARGS+=("--docker")
- YETUS_ARGS+=("--dockerfile=${DOCKERFILE}")
- YETUS_ARGS+=("--mvn-custom-repos")
-
- # effectively treat dev-suport as a custom maven module
- YETUS_ARGS+=("--skip-dirs=dev-support")
-
- # help keep the ASF boxes clean
- YETUS_ARGS+=("--sentinel")
-
- # use emoji vote so it is easier to find the broken line
- YETUS_ARGS+=("--github-use-emoji-vote")
-
- # test with Java 8 and 11
- YETUS_ARGS+=("--java-home=/usr/lib/jvm/java-8-openjdk-amd64")
- YETUS_ARGS+=("--multijdkdirs=/usr/lib/jvm/java-11-openjdk-amd64")
- YETUS_ARGS+=("--multijdktests=compile")
-
- # custom javadoc goals
- YETUS_ARGS+=("--mvn-javadoc-goals=process-sources,javadoc:javadoc-no-fork")
-
- "${TESTPATCHBIN}" "${YETUS_ARGS[@]}"
- '''
- }
- }
- }
-
- }
-
- post {
- always {
- script {
- // Yetus output
- archiveArtifacts "${env.PATCHDIR}/**"
- // Publish the HTML report so that it can be looked at
- // Has to be relative to WORKSPACE.
- publishHTML (target: [
- allowMissing: true,
- keepAll: true,
- alwaysLinkToLastBuild: true,
- // Has to be relative to WORKSPACE
- reportDir: "${env.PATCHDIR}",
- reportFiles: 'report.html',
- reportName: 'Yetus Report'
- ])
- // Publish JUnit results
- try {
- junit "${env.SOURCEDIR}/**/target/surefire-reports/*.xml"
- } catch(e) {
- echo 'junit processing: ' + e.toString()
- }
- }
- }
-
- // Jenkins pipeline jobs fill slaves on PRs without this :(
- cleanup() {
- script {
- sh '''
- # See YETUS-764
- if [ -f "${WORKSPACE}/${PATCHDIR}/pidfile.txt" ]; then
- echo "test-patch process appears to still be running: killing"
- kill `cat "${WORKSPACE}/${PATCHDIR}/pidfile.txt"` || true
- sleep 10
- fi
- if [ -f "${WORKSPACE}/${PATCHDIR}/cidfile.txt" ]; then
- echo "test-patch container appears to still be running: killing"
- docker kill `cat "${WORKSPACE}/${PATCHDIR}/cidfile.txt"` || true
- fi
- # See HADOOP-13951
- chmod -R u+rxw "${WORKSPACE}"
- '''
- deleteDir()
- }
- }
- }
-}
diff --git a/LICENSE-binary b/LICENSE-binary
index 4a4b953913c8f..de4e1cb75b356 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -214,18 +214,18 @@ com.aliyun:aliyun-java-sdk-core:3.4.0
com.aliyun:aliyun-java-sdk-ecs:4.2.0
com.aliyun:aliyun-java-sdk-ram:3.0.0
com.aliyun:aliyun-java-sdk-sts:3.0.0
-com.aliyun.oss:aliyun-sdk-oss:3.4.1
+com.aliyun.oss:aliyun-sdk-oss:3.13.2
com.amazonaws:aws-java-sdk-bundle:1.11.901
com.cedarsoftware:java-util:1.9.0
com.cedarsoftware:json-io:2.5.1
-com.fasterxml.jackson.core:jackson-annotations:2.9.9
-com.fasterxml.jackson.core:jackson-core:2.9.9
-com.fasterxml.jackson.core:jackson-databind:2.9.9.2
-com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:2.9.9
-com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:2.9.9
-com.fasterxml.jackson.module:jackson-module-jaxb-annotations:2.9.9
+com.fasterxml.jackson.core:jackson-annotations:2.13.0
+com.fasterxml.jackson.core:jackson-core:2.13.0
+com.fasterxml.jackson.core:jackson-databind:2.13.0
+com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:2.13.0
+com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:2.13.0
+com.fasterxml.jackson.module:jackson-module-jaxb-annotations:2.13.0
com.fasterxml.uuid:java-uuid-generator:3.1.4
-com.fasterxml.woodstox:woodstox-core:5.0.3
+com.fasterxml.woodstox:woodstox-core:5.3.0
com.github.davidmoten:rxjava-extras:0.8.0.17
com.github.stephenc.jcip:jcip-annotations:1.0-1
com.google:guice:4.0
@@ -240,17 +240,16 @@ com.google.guava:guava:20.0
com.google.guava:guava:27.0-jre
com.google.guava:listenablefuture:9999.0-empty-to-avoid-conflict-with-guava
com.microsoft.azure:azure-storage:7.0.0
-com.nimbusds:nimbus-jose-jwt:4.41.1
+com.nimbusds:nimbus-jose-jwt:9.8.1
com.squareup.okhttp:okhttp:2.7.5
com.squareup.okio:okio:1.6.0
-com.zaxxer:HikariCP-java7:2.4.12
+com.zaxxer:HikariCP:4.0.3
commons-beanutils:commons-beanutils:1.9.3
commons-cli:commons-cli:1.2
commons-codec:commons-codec:1.11
commons-collections:commons-collections:3.2.2
commons-daemon:commons-daemon:1.0.13
-commons-io:commons-io:2.5
-commons-lang:commons-lang:2.6
+commons-io:commons-io:2.8.0
commons-logging:commons-logging:1.1.3
commons-net:commons-net:3.6
de.ruedigermoeller:fst:2.50
@@ -283,30 +282,30 @@ javax.inject:javax.inject:1
log4j:log4j:1.2.17
net.java.dev.jna:jna:5.2.0
net.minidev:accessors-smart:1.2
-net.minidev:json-smart:2.3
+net.minidev:json-smart:2.4.7
org.apache.avro:avro:1.7.7
org.apache.commons:commons-collections4:4.2
-org.apache.commons:commons-compress:1.19
+org.apache.commons:commons-compress:1.21
org.apache.commons:commons-configuration2:2.1.1
org.apache.commons:commons-csv:1.0
org.apache.commons:commons-digester:1.8.1
-org.apache.commons:commons-lang3:3.7
+org.apache.commons:commons-lang3:3.12.0
org.apache.commons:commons-math3:3.1.1
org.apache.commons:commons-text:1.4
org.apache.commons:commons-validator:1.6
-org.apache.curator:curator-client:2.13.0
-org.apache.curator:curator-framework:2.13.0
-org.apache.curator:curator-recipes:2.13.0
+org.apache.curator:curator-client:5.2.0
+org.apache.curator:curator-framework:5.2.0
+org.apache.curator:curator-recipes:5.2.0
org.apache.geronimo.specs:geronimo-jcache_1.0_spec:1.0-alpha-1
-org.apache.hbase:hbase-annotations:1.4.8
-org.apache.hbase:hbase-client:1.4.8
-org.apache.hbase:hbase-common:1.4.8
-org.apache.hbase:hbase-protocol:1.4.8
+org.apache.hbase:hbase-annotations:1.7.1
+org.apache.hbase:hbase-client:1.7.1
+org.apache.hbase:hbase-common:1.7.1
+org.apache.hbase:hbase-protocol:1.7.1
org.apache.htrace:htrace-core:3.1.0-incubating
org.apache.htrace:htrace-core4:4.1.0-incubating
org.apache.httpcomponents:httpclient:4.5.6
org.apache.httpcomponents:httpcore:4.4.10
-org.apache.kafka:kafka-clients:2.4.0
+org.apache.kafka:kafka-clients:2.8.1
org.apache.kerby:kerb-admin:1.0.1
org.apache.kerby:kerb-client:1.0.1
org.apache.kerby:kerb-common:1.0.1
@@ -322,29 +321,30 @@ org.apache.kerby:kerby-pkix:1.0.1
org.apache.kerby:kerby-util:1.0.1
org.apache.kerby:kerby-xdr:1.0.1
org.apache.kerby:token-provider:1.0.1
+org.apache.solr:solr-solrj:8.8.2
org.apache.yetus:audience-annotations:0.5.0
-org.apache.zookeeper:zookeeper:3.4.13
+org.apache.zookeeper:zookeeper:3.6.3
org.codehaus.jackson:jackson-core-asl:1.9.13
org.codehaus.jackson:jackson-jaxrs:1.9.13
org.codehaus.jackson:jackson-mapper-asl:1.9.13
org.codehaus.jackson:jackson-xc:1.9.13
org.codehaus.jettison:jettison:1.1
-org.eclipse.jetty:jetty-annotations:9.3.27.v20190418
-org.eclipse.jetty:jetty-http:9.3.27.v20190418
-org.eclipse.jetty:jetty-io:9.3.27.v20190418
-org.eclipse.jetty:jetty-jndi:9.3.27.v20190418
-org.eclipse.jetty:jetty-plus:9.3.27.v20190418
-org.eclipse.jetty:jetty-security:9.3.27.v20190418
-org.eclipse.jetty:jetty-server:9.3.27.v20190418
-org.eclipse.jetty:jetty-servlet:9.3.27.v20190418
-org.eclipse.jetty:jetty-util:9.3.27.v20190418
-org.eclipse.jetty:jetty-util-ajax:9.3.27.v20190418
-org.eclipse.jetty:jetty-webapp:9.3.27.v20190418
-org.eclipse.jetty:jetty-xml:9.3.27.v20190418
-org.eclipse.jetty.websocket:javax-websocket-client-impl:9.3.27.v20190418
-org.eclipse.jetty.websocket:javax-websocket-server-impl:9.3.27.v20190418
+org.eclipse.jetty:jetty-annotations:9.4.44.v20210927
+org.eclipse.jetty:jetty-http:9.4.44.v20210927
+org.eclipse.jetty:jetty-io:9.4.44.v20210927
+org.eclipse.jetty:jetty-jndi:9.4.44.v20210927
+org.eclipse.jetty:jetty-plus:9.4.44.v20210927
+org.eclipse.jetty:jetty-security:9.4.44.v20210927
+org.eclipse.jetty:jetty-server:9.4.44.v20210927
+org.eclipse.jetty:jetty-servlet:9.4.44.v20210927
+org.eclipse.jetty:jetty-util:9.4.44.v20210927
+org.eclipse.jetty:jetty-util-ajax:9.4.44.v20210927
+org.eclipse.jetty:jetty-webapp:9.4.44.v20210927
+org.eclipse.jetty:jetty-xml:9.4.44.v20210927
+org.eclipse.jetty.websocket:javax-websocket-client-impl:9.4.44.v20210927
+org.eclipse.jetty.websocket:javax-websocket-server-impl:9.4.44.v20210927
org.ehcache:ehcache:3.3.1
-org.lz4:lz4-java:1.6.0
+org.lz4:lz4-java:1.7.1
org.objenesis:objenesis:2.6
org.xerial.snappy:snappy-java:1.0.5
org.yaml:snakeyaml:1.16:
@@ -364,9 +364,9 @@ hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/com
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/util/tree.h
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/compat/{fstatat|openat|unlinkat}.h
-com.github.luben:zstd-jni:1.4.3-1
+com.github.luben:zstd-jni:1.4.9-1
dnsjava:dnsjava:2.1.7
-org.codehaus.woodstox:stax2-api:3.1.4
+org.codehaus.woodstox:stax2-api:4.2.1
BSD 3-Clause
@@ -405,7 +405,7 @@ hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/dataTables.bootstrap.css
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/dataTables.bootstrap.js
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/dust-full-2.0.0.min.js
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/dust-helpers-1.1.1.min.js
-hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/jquery-3.5.1.min.js
+hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/jquery-3.6.0.min.js
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/jquery.dataTables.min.js
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/moment.min.js
hadoop-tools/hadoop-sls/src/main/html/js/thirdparty/bootstrap.min.js
@@ -468,8 +468,8 @@ com.microsoft.azure:azure-cosmosdb-gateway:2.4.5
com.microsoft.azure:azure-data-lake-store-sdk:2.3.3
com.microsoft.azure:azure-keyvault-core:1.0.0
com.microsoft.sqlserver:mssql-jdbc:6.2.1.jre7
-org.bouncycastle:bcpkix-jdk15on:1.60
-org.bouncycastle:bcprov-jdk15on:1.60
+org.bouncycastle:bcpkix-jdk15on:1.68
+org.bouncycastle:bcprov-jdk15on:1.68
org.checkerframework:checker-qual:2.5.2
org.codehaus.mojo:animal-sniffer-annotations:1.17
org.jruby.jcodings:jcodings:1.0.13
@@ -495,6 +495,7 @@ javax.annotation:javax.annotation-api:1.3.2
javax.servlet:javax.servlet-api:3.1.0
javax.servlet.jsp:jsp-api:2.1
javax.websocket:javax.websocket-api:1.0
+javax.ws.rs:javax.ws.rs-api:2.1.1
javax.ws.rs:jsr311-api:1.1.1
javax.xml.bind:jaxb-api:2.2.11
@@ -502,7 +503,7 @@ javax.xml.bind:jaxb-api:2.2.11
Eclipse Public License 1.0
--------------------------
-junit:junit:4.12
+junit:junit:4.13.2
HSQL License
@@ -514,7 +515,7 @@ org.hsqldb:hsqldb:2.3.4
JDOM License
------------
-org.jdom:jdom:1.1
+org.jdom:jdom2:2.0.6.1
Public Domain
diff --git a/LICENSE.txt b/LICENSE.txt
index 3c079898b9071..763cf2ce53f4d 100644
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -245,7 +245,7 @@ hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/dataTables.bootstrap.css
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/dataTables.bootstrap.js
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/dust-full-2.0.0.min.js
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/dust-helpers-1.1.1.min.js
-hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/jquery-3.5.1.min.js
+hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/jquery-3.6.0.min.js
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/jquery.dataTables.min.js
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/moment.min.js
hadoop-tools/hadoop-sls/src/main/html/js/thirdparty/bootstrap.min.js
diff --git a/NOTICE-binary b/NOTICE-binary
index 2f8a9241a8d00..b96e052658876 100644
--- a/NOTICE-binary
+++ b/NOTICE-binary
@@ -66,7 +66,7 @@ available from http://www.digip.org/jansson/.
AWS SDK for Java
-Copyright 2010-2014 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+Copyright 2010-2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
This product includes software developed by
Amazon Technologies, Inc (http://www.amazon.com/).
diff --git a/dev-support/Jenkinsfile b/dev-support/Jenkinsfile
new file mode 100644
index 0000000000000..0ec32e385d275
--- /dev/null
+++ b/dev-support/Jenkinsfile
@@ -0,0 +1,338 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+def getGithubCreds() {
+ return [usernamePassword(credentialsId: 'apache-hadoop-at-github.com',
+ passwordVariable: 'GITHUB_TOKEN',
+ usernameVariable: 'GITHUB_USER')]
+}
+
+// Publish JUnit results only if there are XML files under surefire-reports
+def publishJUnitResults() {
+ def findCmdExitCode = sh script: "find ${SOURCEDIR} -wholename */target/surefire-reports/*.xml | egrep .", returnStatus: true
+ boolean surefireReportsExist = findCmdExitCode == 0
+ if (surefireReportsExist) {
+ echo "XML files found under surefire-reports, running junit"
+ // The path should be relative to WORKSPACE for the junit.
+ SRC = "${SOURCEDIR}/**/target/surefire-reports/*.xml".replace("$WORKSPACE/","")
+ try {
+ junit "${SRC}"
+ } catch(e) {
+ echo 'junit processing: ' + e.toString()
+ }
+ } else {
+ echo "No XML files found under surefire-reports, skipping junit"
+ }
+}
+
+pipeline {
+
+ agent {
+ label 'Hadoop'
+ }
+
+ options {
+ buildDiscarder(logRotator(numToKeepStr: '5'))
+ timeout (time: 24, unit: 'HOURS')
+ timestamps()
+ checkoutToSubdirectory('src')
+ }
+
+ environment {
+ YETUS='yetus'
+ // Branch or tag name. Yetus release tags are 'rel/X.Y.Z'
+ YETUS_VERSION='f9ba0170a5787a5f4662d3769804fef0226a182f'
+ }
+
+ parameters {
+ string(name: 'JIRA_ISSUE_KEY',
+ defaultValue: '',
+ description: 'The JIRA issue that has a patch needing pre-commit testing. Example: HADOOP-1234')
+ }
+
+ stages {
+ stage ('install yetus') {
+ steps {
+ dir("${WORKSPACE}/${YETUS}") {
+ checkout([
+ $class: 'GitSCM',
+ branches: [[name: "${env.YETUS_VERSION}"]],
+ userRemoteConfigs: [[ url: 'https://github.com/apache/yetus.git']]]
+ )
+ }
+ }
+ }
+
+ // Setup codebase so that each platform's build happens in its own exclusive copy of the
+ // codebase.
+ // Primarily because YETUS messes up the git branch information and affects the subsequent
+ // optional stages after the first one.
+ stage ('setup sources') {
+ steps {
+ dir("${WORKSPACE}/centos-7") {
+ sh '''#!/usr/bin/env bash
+
+ cp -Rp ${WORKSPACE}/src ${WORKSPACE}/centos-7
+ '''
+ }
+
+ dir("${WORKSPACE}/centos-8") {
+ sh '''#!/usr/bin/env bash
+
+ cp -Rp ${WORKSPACE}/src ${WORKSPACE}/centos-8
+ '''
+ }
+
+ dir("${WORKSPACE}/debian-10") {
+ sh '''#!/usr/bin/env bash
+
+ cp -Rp ${WORKSPACE}/src ${WORKSPACE}/debian-10
+ '''
+ }
+
+ dir("${WORKSPACE}/ubuntu-focal") {
+ sh '''#!/usr/bin/env bash
+
+ cp -Rp ${WORKSPACE}/src ${WORKSPACE}/ubuntu-focal
+ '''
+ }
+ }
+ }
+
+ // This is an optional stage which runs only when there's a change in
+ // C++/C++ build/platform.
+ // This stage serves as a means of cross platform validation, which is
+ // really needed to ensure that any C++ related/platform change doesn't
+ // break the Hadoop build on Centos 7.
+ stage ('precommit-run Centos 7') {
+ environment {
+ SOURCEDIR = "${WORKSPACE}/centos-7/src"
+ PATCHDIR = "${WORKSPACE}/centos-7/out"
+ DOCKERFILE = "${SOURCEDIR}/dev-support/docker/Dockerfile_centos_7"
+ IS_OPTIONAL = 1
+ }
+
+ steps {
+ withCredentials(getGithubCreds()) {
+ sh '''#!/usr/bin/env bash
+
+ chmod u+x "${SOURCEDIR}/dev-support/jenkins.sh"
+ "${SOURCEDIR}/dev-support/jenkins.sh" run_ci
+ '''
+ }
+ }
+
+ post {
+ // Since this is an optional platform, we want to copy the artifacts
+ // and archive it only if the build fails, to help with debugging.
+ failure {
+ sh '''#!/usr/bin/env bash
+
+ cp -Rp "${WORKSPACE}/centos-7/out" "${WORKSPACE}"
+ '''
+ archiveArtifacts "out/**"
+ }
+
+ cleanup() {
+ script {
+ sh '''#!/usr/bin/env bash
+
+ chmod u+x "${SOURCEDIR}/dev-support/jenkins.sh"
+ "${SOURCEDIR}/dev-support/jenkins.sh" cleanup_ci_proc
+ '''
+ }
+ }
+ }
+ }
+
+ // This is an optional stage which runs only when there's a change in
+ // C++/C++ build/platform.
+ // This stage serves as a means of cross platform validation, which is
+ // really needed to ensure that any C++ related/platform change doesn't
+ // break the Hadoop build on Centos 8.
+ stage ('precommit-run Centos 8') {
+ environment {
+ SOURCEDIR = "${WORKSPACE}/centos-8/src"
+ PATCHDIR = "${WORKSPACE}/centos-8/out"
+ DOCKERFILE = "${SOURCEDIR}/dev-support/docker/Dockerfile_centos_8"
+ IS_OPTIONAL = 1
+ }
+
+ steps {
+ withCredentials(getGithubCreds()) {
+ sh '''#!/usr/bin/env bash
+
+ chmod u+x "${SOURCEDIR}/dev-support/jenkins.sh"
+ "${SOURCEDIR}/dev-support/jenkins.sh" run_ci
+ '''
+ }
+ }
+
+ post {
+ // Since this is an optional platform, we want to copy the artifacts
+ // and archive it only if the build fails, to help with debugging.
+ failure {
+ sh '''#!/usr/bin/env bash
+
+ cp -Rp "${WORKSPACE}/centos-8/out" "${WORKSPACE}"
+ '''
+ archiveArtifacts "out/**"
+ }
+
+ cleanup() {
+ script {
+ sh '''#!/usr/bin/env bash
+
+ chmod u+x "${SOURCEDIR}/dev-support/jenkins.sh"
+ "${SOURCEDIR}/dev-support/jenkins.sh" cleanup_ci_proc
+ '''
+ }
+ }
+ }
+ }
+
+ // This is an optional stage which runs only when there's a change in
+ // C++/C++ build/platform.
+ // This stage serves as a means of cross platform validation, which is
+ // really needed to ensure that any C++ related/platform change doesn't
+ // break the Hadoop build on Debian 10.
+ stage ('precommit-run Debian 10') {
+ environment {
+ SOURCEDIR = "${WORKSPACE}/debian-10/src"
+ PATCHDIR = "${WORKSPACE}/debian-10/out"
+ DOCKERFILE = "${SOURCEDIR}/dev-support/docker/Dockerfile_debian_10"
+ IS_OPTIONAL = 1
+ }
+
+ steps {
+ withCredentials(getGithubCreds()) {
+ sh '''#!/usr/bin/env bash
+
+ chmod u+x "${SOURCEDIR}/dev-support/jenkins.sh"
+ "${SOURCEDIR}/dev-support/jenkins.sh" run_ci
+ '''
+ }
+ }
+
+ post {
+ // Since this is an optional platform, we want to copy the artifacts
+ // and archive it only if the build fails, to help with debugging.
+ failure {
+ sh '''#!/usr/bin/env bash
+
+ cp -Rp "${WORKSPACE}/debian-10/out" "${WORKSPACE}"
+ '''
+ archiveArtifacts "out/**"
+ }
+
+ cleanup() {
+ script {
+ sh '''#!/usr/bin/env bash
+
+ chmod u+x "${SOURCEDIR}/dev-support/jenkins.sh"
+ "${SOURCEDIR}/dev-support/jenkins.sh" cleanup_ci_proc
+ '''
+ }
+ }
+ }
+ }
+
+ // We want to use Ubuntu Focal as our main CI and thus, this stage
+ // isn't optional (runs for all the PRs).
+ stage ('precommit-run Ubuntu focal') {
+ environment {
+ SOURCEDIR = "${WORKSPACE}/ubuntu-focal/src"
+ PATCHDIR = "${WORKSPACE}/ubuntu-focal/out"
+ DOCKERFILE = "${SOURCEDIR}/dev-support/docker/Dockerfile"
+ IS_OPTIONAL = 0
+ }
+
+ steps {
+ withCredentials(getGithubCreds()) {
+ sh '''#!/usr/bin/env bash
+
+ chmod u+x "${SOURCEDIR}/dev-support/jenkins.sh"
+ "${SOURCEDIR}/dev-support/jenkins.sh" run_ci
+ '''
+ }
+ }
+
+ post {
+ always {
+ script {
+ // Publish status if it was missed (YETUS-1059)
+ withCredentials(
+ [usernamePassword(credentialsId: '683f5dcf-5552-4b28-9fb1-6a6b77cf53dd',
+ passwordVariable: 'GITHUB_TOKEN',
+ usernameVariable: 'GITHUB_USER')]) {
+ sh '''#!/usr/bin/env bash
+
+ # Copy the artifacts of Ubuntu focal build to workspace
+ cp -Rp "${WORKSPACE}/ubuntu-focal/out" "${WORKSPACE}"
+
+ # Send Github status
+ chmod u+x "${SOURCEDIR}/dev-support/jenkins.sh"
+ "${SOURCEDIR}/dev-support/jenkins.sh" github_status_recovery
+ '''
+ }
+
+ // YETUS output
+ archiveArtifacts "out/**"
+
+ // Publish the HTML report so that it can be looked at
+ // Has to be relative to WORKSPACE.
+ publishHTML (target: [
+ allowMissing: true,
+ keepAll: true,
+ alwaysLinkToLastBuild: true,
+ // Has to be relative to WORKSPACE
+ reportDir: "out",
+ reportFiles: 'report.html',
+ reportName: 'Yetus Report'
+ ])
+
+ publishJUnitResults()
+ }
+ }
+
+ cleanup() {
+ script {
+ sh '''#!/usr/bin/env bash
+
+ chmod u+x "${SOURCEDIR}/dev-support/jenkins.sh"
+ "${SOURCEDIR}/dev-support/jenkins.sh" cleanup_ci_proc
+ '''
+ }
+ }
+ }
+ }
+ }
+
+ post {
+ // Jenkins pipeline jobs fill slaves on PRs without this :(
+ cleanup() {
+ script {
+ sh '''#!/usr/bin/env bash
+
+ # See HADOOP-13951
+ chmod -R u+rxw "${WORKSPACE}"
+ '''
+ deleteDir()
+ }
+ }
+ }
+}
diff --git a/dev-support/bin/checkcompatibility.py b/dev-support/bin/checkcompatibility.py
index ad1e9cbe47ff2..e8c0e26a712db 100755
--- a/dev-support/bin/checkcompatibility.py
+++ b/dev-support/bin/checkcompatibility.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
@@ -30,33 +30,16 @@
import shutil
import subprocess
import sys
-import urllib2
-try:
- import argparse
-except ImportError:
- sys.stderr.write("Please install argparse, e.g. via `pip install argparse`.")
- sys.exit(2)
+import urllib.request
+import argparse
# Various relative paths
REPO_DIR = os.getcwd()
def check_output(*popenargs, **kwargs):
- r"""Run command with arguments and return its output as a byte string.
- Backported from Python 2.7 as it's implemented as pure python on stdlib.
- >>> check_output(['/usr/bin/python', '--version'])
- Python 2.6.2
- """
- process = subprocess.Popen(stdout=subprocess.PIPE, *popenargs, **kwargs)
- output, _ = process.communicate()
- retcode = process.poll()
- if retcode:
- cmd = kwargs.get("args")
- if cmd is None:
- cmd = popenargs[0]
- error = subprocess.CalledProcessError(retcode, cmd)
- error.output = output
- raise error
- return output
+ """ Run command with arguments and return its output as a string. """
+ return subprocess.check_output(*popenargs, **kwargs, encoding='utf-8')
+
def get_repo_dir():
""" Return the path to the top of the repo. """
@@ -139,7 +122,7 @@ def checkout_java_acc(force):
url = "https://github.com/lvc/japi-compliance-checker/archive/1.8.tar.gz"
scratch_dir = get_scratch_dir()
path = os.path.join(scratch_dir, os.path.basename(url))
- jacc = urllib2.urlopen(url)
+ jacc = urllib.request.urlopen(url)
with open(path, 'wb') as w:
w.write(jacc.read())
@@ -192,9 +175,9 @@ def run_java_acc(src_name, src_jars, dst_name, dst_jars, annotations):
if annotations is not None:
annotations_path = os.path.join(get_scratch_dir(), "annotations.txt")
- with file(annotations_path, "w") as f:
+ with open(annotations_path, "w") as f:
for ann in annotations:
- print >>f, ann
+ print(ann, file=f)
args += ["-annotations-list", annotations_path]
subprocess.check_call(args)
@@ -264,8 +247,8 @@ def main():
parser.add_argument("--skip-build",
action="store_true",
help="Skip building the projects.")
- parser.add_argument("src_rev", nargs=1, help="Source revision.")
- parser.add_argument("dst_rev", nargs="?", default="HEAD",
+ parser.add_argument("src_rev", nargs=1, type=str, help="Source revision.")
+ parser.add_argument("dst_rev", nargs="?", type=str, default="HEAD",
help="Destination revision. " +
"If not specified, will use HEAD.")
diff --git a/dev-support/bin/create-release b/dev-support/bin/create-release
index 39a5d0d319837..31ae6ee1b0659 100755
--- a/dev-support/bin/create-release
+++ b/dev-support/bin/create-release
@@ -514,7 +514,7 @@ function dockermode
echo "USER ${user_name}"
printf "\n\n"
- ) | docker build -t "${imgname}" -
+ ) | docker build -t "${imgname}" -f - "${BASEDIR}"/dev-support/docker/
run docker run -i -t \
--privileged \
diff --git a/dev-support/bin/dist-copynativelibs b/dev-support/bin/dist-copynativelibs
index 7f2b6ad1f5649..95de186e7e729 100755
--- a/dev-support/bin/dist-copynativelibs
+++ b/dev-support/bin/dist-copynativelibs
@@ -164,7 +164,7 @@ fi
# Windows doesn't have a LIB_DIR, everything goes into bin
-if [[ -d "${BIN_DIR}" ]] ; then
+if [[ -d "${BIN_DIR}" && $(ls -A "${BIN_DIR}") ]] ; then
mkdir -p "${TARGET_BIN_DIR}"
cd "${BIN_DIR}" || exit 1
${TAR} ./* | (cd "${TARGET_BIN_DIR}"/ || exit 1; ${UNTAR})
diff --git a/dev-support/bin/hadoop.sh b/dev-support/bin/hadoop.sh
index 3343014aae8bb..763b0507e4114 100755
--- a/dev-support/bin/hadoop.sh
+++ b/dev-support/bin/hadoop.sh
@@ -355,6 +355,7 @@ function personality_modules
fi
;;
unit)
+ extra="-Dsurefire.rerunFailingTestsCount=2"
if [[ "${BUILDMODE}" = full ]]; then
ordering=mvnsrc
elif [[ "${CHANGED_MODULES[*]}" =~ \. ]]; then
@@ -363,7 +364,7 @@ function personality_modules
if [[ ${TEST_PARALLEL} = "true" ]] ; then
if hadoop_test_parallel; then
- extra="-Pparallel-tests"
+ extra="${extra} -Pparallel-tests"
if [[ -n ${TEST_THREADS:-} ]]; then
extra="${extra} -DtestsThreadCount=${TEST_THREADS}"
fi
@@ -482,7 +483,7 @@ function personality_file_tests
fi
if [[ ${filename} =~ \.java$ ]]; then
- add_test findbugs
+ add_test spotbugs
fi
}
@@ -512,7 +513,7 @@ function shadedclient_initialize
maven_add_install shadedclient
}
-## @description build client facing shaded artifacts and test them
+## @description build client facing shaded and non-shaded artifacts and test them
## @audience private
## @stability evolving
## @param repostatus
@@ -545,12 +546,19 @@ function shadedclient_rebuild
return 0
fi
- big_console_header "Checking client artifacts on ${repostatus}"
+ big_console_header "Checking client artifacts on ${repostatus} with shaded clients"
echo_and_redirect "${logfile}" \
"${MAVEN}" "${MAVEN_ARGS[@]}" verify -fae --batch-mode -am \
"${modules[@]}" \
- -Dtest=NoUnitTests -Dmaven.javadoc.skip=true -Dcheckstyle.skip=true -Dfindbugs.skip=true
+ -Dtest=NoUnitTests -Dmaven.javadoc.skip=true -Dcheckstyle.skip=true -Dspotbugs.skip=true
+
+ big_console_header "Checking client artifacts on ${repostatus} with non-shaded clients"
+
+ echo_and_redirect "${logfile}" \
+ "${MAVEN}" "${MAVEN_ARGS[@]}" verify -fae --batch-mode -am \
+ "${modules[@]}" \
+ -DskipShade -Dtest=NoUnitTests -Dmaven.javadoc.skip=true -Dcheckstyle.skip=true -Dspotbugs.skip=true
count=$("${GREP}" -c '\[ERROR\]' "${logfile}")
if [[ ${count} -gt 0 ]]; then
diff --git a/dev-support/bin/test-patch b/dev-support/bin/test-patch
index 8ff8119b3e086..5faf472d325e8 100755
--- a/dev-support/bin/test-patch
+++ b/dev-support/bin/test-patch
@@ -15,4 +15,4 @@
# limitations under the License.
BINDIR=$(cd -P -- "$(dirname -- "${BASH_SOURCE-0}")" >/dev/null && pwd -P)
-exec "${BINDIR}/yetus-wrapper" test-patch --project=hadoop --skip-dir=dev-support "$@"
+exec "${BINDIR}/yetus-wrapper" test-patch --project=hadoop --skip-dirs=dev-support "$@"
diff --git a/dev-support/bin/yetus-wrapper b/dev-support/bin/yetus-wrapper
index bca2316ae6784..8532d1749701b 100755
--- a/dev-support/bin/yetus-wrapper
+++ b/dev-support/bin/yetus-wrapper
@@ -77,7 +77,7 @@ WANTED="$1"
shift
ARGV=("$@")
-HADOOP_YETUS_VERSION=${HADOOP_YETUS_VERSION:-0.10.0}
+HADOOP_YETUS_VERSION=${HADOOP_YETUS_VERSION:-0.13.0}
BIN=$(yetus_abs "${BASH_SOURCE-$0}")
BINDIR=$(dirname "${BIN}")
diff --git a/dev-support/code-formatter/hadoop_idea_formatter.xml b/dev-support/code-formatter/hadoop_idea_formatter.xml
new file mode 100644
index 0000000000000..a69acd9698d55
--- /dev/null
+++ b/dev-support/code-formatter/hadoop_idea_formatter.xml
@@ -0,0 +1,75 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/dev-support/determine-flaky-tests-hadoop.py b/dev-support/determine-flaky-tests-hadoop.py
deleted file mode 100755
index 8644299bba4a2..0000000000000
--- a/dev-support/determine-flaky-tests-hadoop.py
+++ /dev/null
@@ -1,245 +0,0 @@
-#!/usr/bin/env python
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# Given a jenkins test job, this script examines all runs of the job done
-# within specified period of time (number of days prior to the execution
-# time of this script), and reports all failed tests.
-#
-# The output of this script includes a section for each run that has failed
-# tests, with each failed test name listed.
-#
-# More importantly, at the end, it outputs a summary section to list all failed
-# tests within all examined runs, and indicate how many runs a same test
-# failed, and sorted all failed tests by how many runs each test failed.
-#
-# This way, when we see failed tests in PreCommit build, we can quickly tell
-# whether a failed test is a new failure, or it failed before and how often it
-# failed, so to have idea whether it may just be a flaky test.
-#
-# Of course, to be 100% sure about the reason of a test failure, closer look
-# at the failed test for the specific run is necessary.
-#
-import sys
-import platform
-sysversion = sys.hexversion
-onward30 = False
-if sysversion < 0x020600F0:
- sys.exit("Minimum supported python version is 2.6, the current version is " +
- "Python" + platform.python_version())
-
-if sysversion == 0x030000F0:
- sys.exit("There is a known bug with Python" + platform.python_version() +
- ", please try a different version");
-
-if sysversion < 0x03000000:
- import urllib2
-else:
- onward30 = True
- import urllib.request
-
-import datetime
-import json as simplejson
-import logging
-from optparse import OptionParser
-import time
-
-# Configuration
-DEFAULT_JENKINS_URL = "https://builds.apache.org"
-DEFAULT_JOB_NAME = "Hadoop-Common-trunk"
-DEFAULT_NUM_PREVIOUS_DAYS = 14
-DEFAULT_TOP_NUM_FAILED_TEST = -1
-
-SECONDS_PER_DAY = 86400
-
-# total number of runs to examine
-numRunsToExamine = 0
-
-#summary mode
-summary_mode = False
-
-#total number of errors
-error_count = 0
-
-""" Parse arguments """
-def parse_args():
- parser = OptionParser()
- parser.add_option("-J", "--jenkins-url", type="string",
- dest="jenkins_url", help="Jenkins URL",
- default=DEFAULT_JENKINS_URL)
- parser.add_option("-j", "--job-name", type="string",
- dest="job_name", help="Job name to look at",
- default=DEFAULT_JOB_NAME)
- parser.add_option("-n", "--num-days", type="int",
- dest="num_prev_days", help="Number of days to examine",
- default=DEFAULT_NUM_PREVIOUS_DAYS)
- parser.add_option("-t", "--top", type="int",
- dest="num_failed_tests",
- help="Summary Mode, only show top number of failed tests",
- default=DEFAULT_TOP_NUM_FAILED_TEST)
-
- (options, args) = parser.parse_args()
- if args:
- parser.error("unexpected arguments: " + repr(args))
- return options
-
-""" Load data from specified url """
-def load_url_data(url):
- if onward30:
- ourl = urllib.request.urlopen(url)
- codec = ourl.info().get_param('charset')
- content = ourl.read().decode(codec)
- data = simplejson.loads(content, strict=False)
- else:
- ourl = urllib2.urlopen(url)
- data = simplejson.load(ourl, strict=False)
- return data
-
-""" List all builds of the target project. """
-def list_builds(jenkins_url, job_name):
- global summary_mode
- url = "%(jenkins)s/job/%(job_name)s/api/json?tree=builds[url,result,timestamp]" % dict(
- jenkins=jenkins_url,
- job_name=job_name)
-
- try:
- data = load_url_data(url)
-
- except:
- if not summary_mode:
- logging.error("Could not fetch: %s" % url)
- error_count += 1
- raise
- return data['builds']
-
-""" Find the names of any tests which failed in the given build output URL. """
-def find_failing_tests(testReportApiJson, jobConsoleOutput):
- global summary_mode
- global error_count
- ret = set()
- try:
- data = load_url_data(testReportApiJson)
-
- except:
- if not summary_mode:
- logging.error(" Could not open testReport, check " +
- jobConsoleOutput + " for why it was reported failed")
- error_count += 1
- return ret
-
- for suite in data['suites']:
- for cs in suite['cases']:
- status = cs['status']
- errDetails = cs['errorDetails']
- if (status == 'REGRESSION' or status == 'FAILED' or (errDetails is not None)):
- ret.add(cs['className'] + "." + cs['name'])
-
- if len(ret) == 0 and (not summary_mode):
- logging.info(" No failed tests in testReport, check " +
- jobConsoleOutput + " for why it was reported failed.")
- return ret
-
-""" Iterate runs of specfied job within num_prev_days and collect results """
-def find_flaky_tests(jenkins_url, job_name, num_prev_days):
- global numRunsToExamine
- global summary_mode
- all_failing = dict()
- # First list all builds
- builds = list_builds(jenkins_url, job_name)
-
- # Select only those in the last N days
- min_time = int(time.time()) - SECONDS_PER_DAY * num_prev_days
- builds = [b for b in builds if (int(b['timestamp']) / 1000) > min_time]
-
- # Filter out only those that failed
- failing_build_urls = [(b['url'] , b['timestamp']) for b in builds
- if (b['result'] in ('UNSTABLE', 'FAILURE'))]
-
- tnum = len(builds)
- num = len(failing_build_urls)
- numRunsToExamine = tnum
- if not summary_mode:
- logging.info(" THERE ARE " + str(num) + " builds (out of " + str(tnum)
- + ") that have failed tests in the past " + str(num_prev_days) + " days"
- + ((".", ", as listed below:\n")[num > 0]))
-
- for failed_build_with_time in failing_build_urls:
- failed_build = failed_build_with_time[0];
- jobConsoleOutput = failed_build + "Console";
- testReport = failed_build + "testReport";
- testReportApiJson = testReport + "/api/json";
-
- ts = float(failed_build_with_time[1]) / 1000.
- st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
- if not summary_mode:
- logging.info("===>%s" % str(testReport) + " (" + st + ")")
- failing = find_failing_tests(testReportApiJson, jobConsoleOutput)
- if failing:
- for ftest in failing:
- if not summary_mode:
- logging.info(" Failed test: %s" % ftest)
- all_failing[ftest] = all_failing.get(ftest,0)+1
-
- return all_failing
-
-def main():
- global numRunsToExamine
- global summary_mode
- logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.INFO)
-
- # set up logger to write to stdout
- soh = logging.StreamHandler(sys.stdout)
- soh.setLevel(logging.INFO)
- logger = logging.getLogger()
- logger.removeHandler(logger.handlers[0])
- logger.addHandler(soh)
-
- opts = parse_args()
- logging.info("****Recently FAILED builds in url: " + opts.jenkins_url
- + "/job/" + opts.job_name + "")
-
- if opts.num_failed_tests != -1:
- summary_mode = True
-
- all_failing = find_flaky_tests(opts.jenkins_url, opts.job_name,
- opts.num_prev_days)
- if len(all_failing) == 0:
- raise SystemExit(0)
-
- if summary_mode and opts.num_failed_tests < len(all_failing):
- logging.info("\nAmong " + str(numRunsToExamine) +
- " runs examined, top " + str(opts.num_failed_tests) +
- " failed tests <#failedRuns: testName>:")
- else:
- logging.info("\nAmong " + str(numRunsToExamine) +
- " runs examined, all failed tests <#failedRuns: testName>:")
-
- # print summary section: all failed tests sorted by how many times they failed
- line_count = 0
- for tn in sorted(all_failing, key=all_failing.get, reverse=True):
- logging.info(" " + str(all_failing[tn])+ ": " + tn)
- if summary_mode:
- line_count += 1
- if line_count == opts.num_failed_tests:
- break
-
- if summary_mode and error_count > 0:
- logging.info("\n" + str(error_count) + " errors found, you may "
- + "re-run in non summary mode to see error details.");
-
-if __name__ == "__main__":
- main()
diff --git a/dev-support/docker/Dockerfile b/dev-support/docker/Dockerfile
index 4bce9cf71d729..fac364bbd4363 100644
--- a/dev-support/docker/Dockerfile
+++ b/dev-support/docker/Dockerfile
@@ -1,4 +1,3 @@
-
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
@@ -18,7 +17,7 @@
# Dockerfile for installing the necessary dependencies for building Hadoop.
# See BUILDING.txt.
-FROM ubuntu:bionic
+FROM ubuntu:focal
WORKDIR /root
@@ -33,162 +32,69 @@ RUN echo APT::Install-Suggests "0"\; >> /etc/apt/apt.conf.d/10disableextras
ENV DEBIAN_FRONTEND noninteractive
ENV DEBCONF_TERSE true
-# hadolint ignore=DL3008
+######
+# Platform package dependency resolver
+######
+COPY pkg-resolver pkg-resolver
+RUN chmod a+x pkg-resolver/*.sh pkg-resolver/*.py \
+ && chmod a+r pkg-resolver/*.json
+
+######
+# Install packages from apt
+######
+# hadolint ignore=DL3008,SC2046
RUN apt-get -q update \
- && apt-get -q install -y --no-install-recommends \
- ant \
- apt-utils \
- bats \
- build-essential \
- bzip2 \
- clang \
- cmake \
- curl \
- doxygen \
- findbugs \
- fuse \
- g++ \
- gcc \
- git \
- gnupg-agent \
- libbcprov-java \
- libbz2-dev \
- libcurl4-openssl-dev \
- libfuse-dev \
- libprotobuf-dev \
- libprotoc-dev \
- libsasl2-dev \
- libsnappy-dev \
- libssl-dev \
- libtool \
- libzstd-dev \
- locales \
- make \
- maven \
- openjdk-11-jdk \
- openjdk-8-jdk \
- pinentry-curses \
- pkg-config \
- python \
- python2.7 \
- python-pip \
- python-pkg-resources \
- python-setuptools \
- python-wheel \
- rsync \
- shellcheck \
- software-properties-common \
- sudo \
- valgrind \
- zlib1g-dev \
+ && apt-get -q install -y --no-install-recommends python3 \
+ && apt-get -q install -y --no-install-recommends $(pkg-resolver/resolve.py ubuntu:focal) \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
+RUN locale-gen en_US.UTF-8
+ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en' LC_ALL='en_US.UTF-8'
+ENV PYTHONIOENCODING=utf-8
+
######
# Set env vars required to build Hadoop
######
ENV MAVEN_HOME /usr
# JAVA_HOME must be set in Maven >= 3.5.0 (MNG-6003)
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
-ENV FINDBUGS_HOME /usr
#######
-# Install Boost 1.72 (1.65 ships with Bionic)
+# Set env vars for SpotBugs 4.2.2
#######
-# hadolint ignore=DL3003
-RUN mkdir -p /opt/boost-library \
- && curl -L https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download > boost_1_72_0.tar.bz2 \
- && mv boost_1_72_0.tar.bz2 /opt/boost-library \
- && cd /opt/boost-library \
- && tar --bzip2 -xf boost_1_72_0.tar.bz2 \
- && cd /opt/boost-library/boost_1_72_0 \
- && ./bootstrap.sh --prefix=/usr/ \
- && ./b2 --without-python install \
- && cd /root \
- && rm -rf /opt/boost-library
+ENV SPOTBUGS_HOME /opt/spotbugs
-######
-# Install Google Protobuf 3.7.1 (3.0.0 ships with Bionic)
-######
-# hadolint ignore=DL3003
-RUN mkdir -p /opt/protobuf-src \
- && curl -L -s -S \
- https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protobuf-java-3.7.1.tar.gz \
- -o /opt/protobuf.tar.gz \
- && tar xzf /opt/protobuf.tar.gz --strip-components 1 -C /opt/protobuf-src \
- && cd /opt/protobuf-src \
- && ./configure --prefix=/opt/protobuf \
- && make install \
- && cd /root \
- && rm -rf /opt/protobuf-src
+#######
+# Set env vars for Google Protobuf 3.7.1
+#######
ENV PROTOBUF_HOME /opt/protobuf
ENV PATH "${PATH}:/opt/protobuf/bin"
-####
-# Install pylint at fixed version (2.0.0 removed python2 support)
-# https://github.com/PyCQA/pylint/issues/2294
-####
-RUN pip2 install \
- astroid==1.6.6 \
- isort==4.3.21 \
- configparser==4.0.2 \
- pylint==1.9.2
-
-####
-# Install dateutil.parser
-####
-RUN pip2 install python-dateutil==2.7.3
-
-###
-# Install node.js 10.x for web UI framework (4.2.6 ships with Xenial)
-###
-# hadolint ignore=DL3008
-RUN curl -L -s -S https://deb.nodesource.com/setup_10.x | bash - \
- && apt-get install -y --no-install-recommends nodejs \
- && apt-get clean \
- && rm -rf /var/lib/apt/lists/* \
- && npm install -g bower@1.8.8
-
-###
-## Install Yarn 1.12.1 for web UI framework
-####
-RUN curl -s -S https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add - \
- && echo 'deb https://dl.yarnpkg.com/debian/ stable main' > /etc/apt/sources.list.d/yarn.list \
- && apt-get -q update \
- && apt-get install -y --no-install-recommends yarn=1.21.1-1 \
- && apt-get clean \
- && rm -rf /var/lib/apt/lists/*
-
-###
-# Install hadolint
-####
-RUN curl -L -s -S \
- https://github.com/hadolint/hadolint/releases/download/v1.11.1/hadolint-Linux-x86_64 \
- -o /bin/hadolint \
- && chmod a+rx /bin/hadolint \
- && shasum -a 512 /bin/hadolint | \
- awk '$1!="734e37c1f6619cbbd86b9b249e69c9af8ee1ea87a2b1ff71dccda412e9dac35e63425225a95d71572091a3f0a11e9a04c2fc25d9e91b840530c26af32b9891ca" {exit(1)}'
-
###
# Avoid out of memory errors in builds
###
-ENV MAVEN_OPTS -Xms256m -Xmx1536m
+ENV MAVEN_OPTS -Xms256m -Xmx3072m
# Skip gpg verification when downloading Yetus via yetus-wrapper
ENV HADOOP_SKIP_YETUS_VERIFICATION true
+####
+# Install packages
+####
+RUN pkg-resolver/install-common-pkgs.sh
+RUN pkg-resolver/install-spotbugs.sh ubuntu:focal
+RUN pkg-resolver/install-boost.sh ubuntu:focal
+RUN pkg-resolver/install-protobuf.sh ubuntu:focal
+RUN pkg-resolver/install-hadolint.sh ubuntu:focal
+RUN pkg-resolver/install-intel-isa-l.sh ubuntu:focal
+
###
# Everything past this point is either not needed for testing or breaks Yetus.
# So tell Yetus not to read the rest of the file:
# YETUS CUT HERE
###
-# Hugo static website generator for new hadoop site
-RUN curl -L -o hugo.deb https://github.com/gohugoio/hugo/releases/download/v0.58.3/hugo_0.58.3_Linux-64bit.deb \
- && dpkg --install hugo.deb \
- && rm hugo.deb
-
-
# Add a welcome message and environment checks.
COPY hadoop_env_checks.sh /root/hadoop_env_checks.sh
RUN chmod 755 /root/hadoop_env_checks.sh
diff --git a/dev-support/docker/Dockerfile_aarch64 b/dev-support/docker/Dockerfile_aarch64
index 19cfd13b5c763..dd0348961f464 100644
--- a/dev-support/docker/Dockerfile_aarch64
+++ b/dev-support/docker/Dockerfile_aarch64
@@ -17,7 +17,7 @@
# Dockerfile for installing the necessary dependencies for building Hadoop.
# See BUILDING.txt.
-FROM ubuntu:bionic
+FROM ubuntu:focal
WORKDIR /root
@@ -33,146 +33,44 @@ ENV DEBIAN_FRONTEND noninteractive
ENV DEBCONF_TERSE true
######
-# Install common dependencies from packages. Versions here are either
-# sufficient or irrelevant.
+# Platform package dependency resolver
######
-# hadolint ignore=DL3008
+COPY pkg-resolver pkg-resolver
+RUN chmod a+x pkg-resolver/*.sh pkg-resolver/*.py \
+ && chmod a+r pkg-resolver/*.json
+
+######
+# Install packages from apt
+######
+# hadolint ignore=DL3008,SC2046
RUN apt-get -q update \
- && apt-get -q install -y --no-install-recommends \
- ant \
- apt-utils \
- bats \
- build-essential \
- bzip2 \
- clang \
- cmake \
- curl \
- doxygen \
- findbugs \
- fuse \
- g++ \
- gcc \
- git \
- gnupg-agent \
- libbcprov-java \
- libbz2-dev \
- libcurl4-openssl-dev \
- libfuse-dev \
- libprotobuf-dev \
- libprotoc-dev \
- libsasl2-dev \
- libsnappy-dev \
- libssl-dev \
- libtool \
- libzstd-dev \
- locales \
- make \
- maven \
- openjdk-11-jdk \
- openjdk-8-jdk \
- pinentry-curses \
- pkg-config \
- python \
- python2.7 \
- python-pip \
- python-pkg-resources \
- python-setuptools \
- python-wheel \
- rsync \
- shellcheck \
- software-properties-common \
- sudo \
- valgrind \
- zlib1g-dev \
+ && apt-get -q install -y --no-install-recommends python3 \
+ && apt-get -q install -y --no-install-recommends $(pkg-resolver/resolve.py ubuntu:focal::arch64) \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
+RUN locale-gen en_US.UTF-8
+ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en' LC_ALL='en_US.UTF-8'
+ENV PYTHONIOENCODING=utf-8
+
######
# Set env vars required to build Hadoop
######
ENV MAVEN_HOME /usr
# JAVA_HOME must be set in Maven >= 3.5.0 (MNG-6003)
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-arm64
-ENV FINDBUGS_HOME /usr
#######
-# Install Boost 1.72 (1.65 ships with Bionic)
+# Set env vars for SpotBugs 4.2.2
#######
-# hadolint ignore=DL3003
-RUN mkdir -p /opt/boost-library \
- && curl -L https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download > boost_1_72_0.tar.bz2 \
- && mv boost_1_72_0.tar.bz2 /opt/boost-library \
- && cd /opt/boost-library \
- && tar --bzip2 -xf boost_1_72_0.tar.bz2 \
- && cd /opt/boost-library/boost_1_72_0 \
- && ./bootstrap.sh --prefix=/usr/ \
- && ./b2 --without-python install \
- && cd /root \
- && rm -rf /opt/boost-library
+ENV SPOTBUGS_HOME /opt/spotbugs
-######
-# Install Google Protobuf 3.7.1 (3.0.0 ships with Bionic)
-######
-# hadolint ignore=DL3003
-RUN mkdir -p /opt/protobuf-src \
- && curl -L -s -S \
- https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protobuf-java-3.7.1.tar.gz \
- -o /opt/protobuf.tar.gz \
- && tar xzf /opt/protobuf.tar.gz --strip-components 1 -C /opt/protobuf-src \
- && cd /opt/protobuf-src \
- && ./configure --prefix=/opt/protobuf \
- && make install \
- && cd /root \
- && rm -rf /opt/protobuf-src
+#######
+# Set env vars for Google Protobuf 3.7.1
+#######
ENV PROTOBUF_HOME /opt/protobuf
ENV PATH "${PATH}:/opt/protobuf/bin"
-####
-# Install pylint at fixed version (2.0.0 removed python2 support)
-# https://github.com/PyCQA/pylint/issues/2294
-####
-RUN pip2 install \
- astroid==1.6.6 \
- isort==4.3.21 \
- configparser==4.0.2 \
- pylint==1.9.2
-
-####
-# Install dateutil.parser
-####
-RUN pip2 install python-dateutil==2.7.3
-
-###
-# Install node.js 10.x for web UI framework (4.2.6 ships with Xenial)
-###
-# hadolint ignore=DL3008
-RUN curl -L -s -S https://deb.nodesource.com/setup_10.x | bash - \
- && apt-get install -y --no-install-recommends nodejs \
- && apt-get clean \
- && rm -rf /var/lib/apt/lists/* \
- && npm install -g bower@1.8.8
-
-###
-## Install Yarn 1.12.1 for web UI framework
-####
-RUN curl -s -S https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add - \
- && echo 'deb https://dl.yarnpkg.com/debian/ stable main' > /etc/apt/sources.list.d/yarn.list \
- && apt-get -q update \
- && apt-get install -y --no-install-recommends yarn=1.21.1-1 \
- && apt-get clean \
- && rm -rf /var/lib/apt/lists/*
-
-###
-# Install phantomjs built for aarch64
-####
-RUN mkdir -p /opt/phantomjs \
- && curl -L -s -S \
- https://github.com/liusheng/phantomjs/releases/download/2.1.1/phantomjs-2.1.1-linux-aarch64.tar.bz2 \
- -o /opt/phantomjs/phantomjs-2.1.1-linux-aarch64.tar.bz2 \
- && tar xvjf /opt/phantomjs/phantomjs-2.1.1-linux-aarch64.tar.bz2 --strip-components 1 -C /opt/phantomjs \
- && cp /opt/phantomjs/bin/phantomjs /usr/bin/ \
- && rm -rf /opt/phantomjs
-
###
# Avoid out of memory errors in builds
###
@@ -181,18 +79,23 @@ ENV MAVEN_OPTS -Xms256m -Xmx1536m
# Skip gpg verification when downloading Yetus via yetus-wrapper
ENV HADOOP_SKIP_YETUS_VERIFICATION true
+# Force PhantomJS to be in 'headless' mode, do not connect to Xwindow
+ENV QT_QPA_PLATFORM offscreen
+
+####
+# Install packages
+####
+RUN pkg-resolver/install-common-pkgs.sh
+RUN pkg-resolver/install-spotbugs.sh ubuntu:focal::arch64
+RUN pkg-resolver/install-boost.sh ubuntu:focal::arch64
+RUN pkg-resolver/install-protobuf.sh ubuntu:focal::arch64
+
###
# Everything past this point is either not needed for testing or breaks Yetus.
# So tell Yetus not to read the rest of the file:
# YETUS CUT HERE
###
-# Hugo static website generator (for new hadoop site docs)
-RUN curl -L -o hugo.deb https://github.com/gohugoio/hugo/releases/download/v0.58.3/hugo_0.58.3_Linux-ARM64.deb \
- && dpkg --install hugo.deb \
- && rm hugo.deb
-
-
# Add a welcome message and environment checks.
COPY hadoop_env_checks.sh /root/hadoop_env_checks.sh
RUN chmod 755 /root/hadoop_env_checks.sh
diff --git a/dev-support/docker/Dockerfile_centos_7 b/dev-support/docker/Dockerfile_centos_7
new file mode 100644
index 0000000000000..ccb445be269fe
--- /dev/null
+++ b/dev-support/docker/Dockerfile_centos_7
@@ -0,0 +1,96 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Dockerfile for installing the necessary dependencies for building Hadoop.
+# See BUILDING.txt.
+
+FROM centos:7
+
+WORKDIR /root
+
+SHELL ["/bin/bash", "-o", "pipefail", "-c"]
+
+######
+# Platform package dependency resolver
+######
+COPY pkg-resolver pkg-resolver
+RUN chmod a+x pkg-resolver/*.sh pkg-resolver/*.py \
+ && chmod a+r pkg-resolver/*.json
+
+######
+# Install packages from yum
+######
+# hadolint ignore=DL3008,SC2046
+RUN yum update -y \
+ && yum groupinstall -y "Development Tools" \
+ && yum install -y \
+ centos-release-scl \
+ python3 \
+ && yum install -y $(pkg-resolver/resolve.py centos:7)
+
+# Set GCC 9 as the default C/C++ compiler
+RUN echo "source /opt/rh/devtoolset-9/enable" >> /etc/bashrc
+SHELL ["/bin/bash", "--login", "-c"]
+
+######
+# Set the environment variables needed for CMake
+# to find and use GCC 9 for compilation
+######
+ENV GCC_HOME "/opt/rh/devtoolset-9"
+ENV CC "${GCC_HOME}/root/usr/bin/gcc"
+ENV CXX "${GCC_HOME}/root/usr/bin/g++"
+ENV SHLVL 1
+ENV LD_LIBRARY_PATH "${GCC_HOME}/root/usr/lib64:${GCC_HOME}/root/usr/lib:${GCC_HOME}/root/usr/lib64/dyninst:${GCC_HOME}/root/usr/lib/dyninst:${GCC_HOME}/root/usr/lib64:${GCC_HOME}/root/usr/lib:/usr/lib:/usr/lib64"
+ENV PCP_DIR "${GCC_HOME}/root"
+ENV MANPATH "${GCC_HOME}/root/usr/share/man:"
+ENV PATH "${GCC_HOME}/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
+ENV PKG_CONFIG_PATH "${GCC_HOME}/root/usr/lib64/pkgconfig"
+ENV INFOPATH "${GCC_HOME}/root/usr/share/info"
+
+# TODO: Set locale
+
+######
+# Set env vars required to build Hadoop
+######
+ENV MAVEN_HOME /opt/maven
+ENV PATH "${PATH}:${MAVEN_HOME}/bin"
+# JAVA_HOME must be set in Maven >= 3.5.0 (MNG-6003)
+ENV JAVA_HOME /usr/lib/jvm/java-1.8.0
+
+#######
+# Set env vars for SpotBugs
+#######
+ENV SPOTBUGS_HOME /opt/spotbugs
+
+#######
+# Set env vars for Google Protobuf
+#######
+ENV PROTOBUF_HOME /opt/protobuf
+ENV PATH "${PATH}:/opt/protobuf/bin"
+
+######
+# Install packages
+######
+RUN pkg-resolver/install-maven.sh centos:7
+RUN pkg-resolver/install-cmake.sh centos:7
+RUN pkg-resolver/install-zstandard.sh centos:7
+RUN pkg-resolver/install-yasm.sh centos:7
+RUN pkg-resolver/install-protobuf.sh centos:7
+RUN pkg-resolver/install-boost.sh centos:7
+RUN pkg-resolver/install-spotbugs.sh centos:7
+RUN pkg-resolver/install-nodejs.sh centos:7
+RUN pkg-resolver/install-git.sh centos:7
+RUN pkg-resolver/install-common-pkgs.sh
diff --git a/dev-support/docker/Dockerfile_centos_8 b/dev-support/docker/Dockerfile_centos_8
new file mode 100644
index 0000000000000..8f3b008f7ba03
--- /dev/null
+++ b/dev-support/docker/Dockerfile_centos_8
@@ -0,0 +1,118 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Dockerfile for installing the necessary dependencies for building Hadoop.
+# See BUILDING.txt.
+
+FROM centos:8
+
+WORKDIR /root
+
+SHELL ["/bin/bash", "-o", "pipefail", "-c"]
+
+######
+# Platform package dependency resolver
+######
+COPY pkg-resolver pkg-resolver
+RUN chmod a+x pkg-resolver/*.sh pkg-resolver/*.py \
+ && chmod a+r pkg-resolver/*.json
+
+######
+# Centos 8 has reached its EOL and the packages
+# are no longer available on mirror.centos.org site.
+# Please see https://www.centos.org/centos-linux-eol/
+######
+RUN pkg-resolver/set-vault-as-baseurl-centos.sh centos:8
+
+######
+# Install packages from yum
+######
+# hadolint ignore=DL3008,SC2046
+RUN yum update -y \
+ && yum install -y python3 \
+ && yum install -y $(pkg-resolver/resolve.py centos:8)
+
+####
+# Install EPEL
+####
+RUN pkg-resolver/install-epel.sh centos:8
+
+RUN dnf --enablerepo=powertools install -y \
+ doxygen \
+ snappy-devel \
+ yasm
+
+RUN dnf install -y \
+ bouncycastle \
+ gcc-toolset-9-gcc \
+ gcc-toolset-9-gcc-c++ \
+ libpmem-devel
+
+# Set GCC 9 as the default C/C++ compiler
+RUN echo "source /opt/rh/gcc-toolset-9/enable" >> /etc/bashrc
+SHELL ["/bin/bash", "--login", "-c"]
+
+######
+# Set the environment variables needed for CMake
+# to find and use GCC 9 for compilation
+######
+ENV GCC_HOME "/opt/rh/gcc-toolset-9"
+ENV CC "${GCC_HOME}/root/usr/bin/gcc"
+ENV CXX "${GCC_HOME}/root/usr/bin/g++"
+ENV MODULES_RUN_QUARANTINE "LD_LIBRARY_PATH LD_PRELOAD"
+ENV MODULES_CMD "/usr/share/Modules/libexec/modulecmd.tcl"
+ENV SHLVL 1
+ENV MODULEPATH "/etc/scl/modulefiles:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles"
+ENV MODULEPATH_modshare "/usr/share/modulefiles:1:/usr/share/Modules/modulefiles:1:/etc/modulefiles:1"
+ENV MODULESHOME "/usr/share/Modules"
+ENV LD_LIBRARY_PATH "${GCC_HOME}/root/usr/lib64:${GCC_HOME}/root/usr/lib:${GCC_HOME}/root/usr/lib64/dyninst:${GCC_HOME}/root/usr/lib/dyninst:${GCC_HOME}/root/usr/lib64:${GCC_HOME}/root/usr/lib:/usr/lib:/usr/lib64"
+ENV PCP_DIR "${GCC_HOME}/root"
+ENV MANPATH "${GCC_HOME}/root/usr/share/man::"
+ENV PATH "${GCC_HOME}/root/usr/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
+ENV PKG_CONFIG_PATH "${GCC_HOME}/root/usr/lib64/pkgconfig"
+ENV INFOPATH "${GCC_HOME}/root/usr/share/info"
+
+# TODO: Set locale
+
+######
+# Set env vars required to build Hadoop
+######
+ENV MAVEN_HOME /opt/maven
+ENV PATH "${PATH}:${MAVEN_HOME}/bin"
+# JAVA_HOME must be set in Maven >= 3.5.0 (MNG-6003)
+ENV JAVA_HOME /usr/lib/jvm/java-1.8.0
+
+#######
+# Set env vars for SpotBugs
+#######
+ENV SPOTBUGS_HOME /opt/spotbugs
+
+#######
+# Set env vars for Google Protobuf
+#######
+ENV PROTOBUF_HOME /opt/protobuf
+ENV PATH "${PATH}:/opt/protobuf/bin"
+
+######
+# Install packages
+######
+RUN pkg-resolver/install-maven.sh centos:8
+RUN pkg-resolver/install-cmake.sh centos:8
+RUN pkg-resolver/install-boost.sh centos:8
+RUN pkg-resolver/install-spotbugs.sh centos:8
+RUN pkg-resolver/install-protobuf.sh centos:8
+RUN pkg-resolver/install-zstandard.sh centos:8
+RUN pkg-resolver/install-common-pkgs.sh
diff --git a/dev-support/docker/Dockerfile_debian_10 b/dev-support/docker/Dockerfile_debian_10
new file mode 100644
index 0000000000000..256f0d5786ab9
--- /dev/null
+++ b/dev-support/docker/Dockerfile_debian_10
@@ -0,0 +1,101 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Dockerfile for installing the necessary dependencies for building Hadoop.
+# See BUILDING.txt.
+
+FROM debian:10
+
+WORKDIR /root
+
+SHELL ["/bin/bash", "-o", "pipefail", "-c"]
+
+#####
+# Disable suggests/recommends
+#####
+RUN echo APT::Install-Recommends "0"\; > /etc/apt/apt.conf.d/10disableextras
+RUN echo APT::Install-Suggests "0"\; >> /etc/apt/apt.conf.d/10disableextras
+
+ENV DEBIAN_FRONTEND noninteractive
+ENV DEBCONF_TERSE true
+
+######
+# Platform package dependency resolver
+######
+COPY pkg-resolver pkg-resolver
+RUN chmod a+x pkg-resolver/install-pkg-resolver.sh
+RUN pkg-resolver/install-pkg-resolver.sh debian:10
+
+######
+# Install packages from apt
+######
+# hadolint ignore=DL3008,SC2046
+RUN apt-get -q update \
+ && apt-get -q install -y --no-install-recommends $(pkg-resolver/resolve.py debian:10) \
+ && echo 'deb http://deb.debian.org/debian bullseye main' >> /etc/apt/sources.list \
+ && apt-get -q update \
+ && apt-get -q install -y --no-install-recommends -t bullseye $(pkg-resolver/resolve.py --release=bullseye debian:10) \
+ && apt-get clean \
+ && rm -rf /var/lib/apt/lists/*
+
+# TODO : Set locale
+
+######
+# Set env vars required to build Hadoop
+######
+ENV MAVEN_HOME /usr
+# JAVA_HOME must be set in Maven >= 3.5.0 (MNG-6003)
+ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
+
+#######
+# Set env vars for SpotBugs 4.2.2
+#######
+ENV SPOTBUGS_HOME /opt/spotbugs
+
+#######
+# Set env vars for Google Protobuf 3.7.1
+#######
+ENV PROTOBUF_HOME /opt/protobuf
+ENV PATH "${PATH}:/opt/protobuf/bin"
+
+###
+# Avoid out of memory errors in builds
+###
+ENV MAVEN_OPTS -Xms256m -Xmx3072m
+
+# Skip gpg verification when downloading Yetus via yetus-wrapper
+ENV HADOOP_SKIP_YETUS_VERIFICATION true
+
+####
+# Install packages
+####
+RUN pkg-resolver/install-spotbugs.sh debian:10
+RUN pkg-resolver/install-boost.sh debian:10
+RUN pkg-resolver/install-protobuf.sh debian:10
+RUN pkg-resolver/install-hadolint.sh debian:10
+RUN pkg-resolver/install-intel-isa-l.sh debian:10
+
+###
+# Everything past this point is either not needed for testing or breaks Yetus.
+# So tell Yetus not to read the rest of the file:
+# YETUS CUT HERE
+###
+
+# Add a welcome message and environment checks.
+COPY hadoop_env_checks.sh /root/hadoop_env_checks.sh
+RUN chmod 755 /root/hadoop_env_checks.sh
+# hadolint ignore=SC2016
+RUN echo '${HOME}/hadoop_env_checks.sh' >> /root/.bashrc
diff --git a/dev-support/docker/README.md b/dev-support/docker/README.md
new file mode 100644
index 0000000000000..4419b6c06f339
--- /dev/null
+++ b/dev-support/docker/README.md
@@ -0,0 +1,114 @@
+
+
+# Docker images for building Hadoop
+
+This folder contains the Dockerfiles for building Hadoop on various platforms.
+
+# Dependency management
+
+The mode of installation of the dependencies needed for building Hadoop varies from one platform to
+the other. Different platforms have different toolchains. Some packages tend to be polymorphic
+across platforms and most commonly, a package that's readily available in one platform's toolchain
+isn't available on another. We thus, resort to building and installing the package from source,
+causing duplication of code since this needs to be done for all the Dockerfiles pertaining to all
+the platforms. We need a system to track a dependency - for a package - for a platform
+
+- (and optionally) for a release. Thus, there's a lot of diversity that needs to be handled for
+ managing package dependencies and
+ `pkg-resolver` caters to that.
+
+## Supported platforms
+
+`pkg-resolver/platforms.json` contains a list of the supported platforms for dependency management.
+
+## Package dependencies
+
+`pkg-resolver/packages.json` maps a dependency to a given platform. Here's the schema of this JSON.
+
+```json
+{
+ "dependency_1": {
+ "platform_1": "package_1",
+ "platform_2": [
+ "package_1",
+ "package_2"
+ ]
+ },
+ "dependency_2": {
+ "platform_1": [
+ "package_1",
+ "package_2",
+ "package_3"
+ ]
+ },
+ "dependency_3": {
+ "platform_1": {
+ "release_1": "package_1_1_1",
+ "release_2": [
+ "package_1_2_1",
+ "package_1_2_2"
+ ]
+ },
+ "platform_2": [
+ "package_2_1",
+ {
+ "release_1": "package_2_1_1"
+ }
+ ]
+ }
+}
+```
+
+The root JSON element contains unique _dependency_ children. This in turn contains the name of the _
+platforms_ and the list of _packages_ to be installed for that platform. Just to give an example of
+how to interpret the above JSON -
+
+1. For `dependency_1`, `package_1` needs to be installed for `platform_1`.
+2. For `dependency_2`, `package_1` and `package_2` needs to be installed for `platform_2`.
+3. For `dependency_2`, `package_1`, `package_3` and `package_3` needs to be installed for
+ `platform_1`.
+4. For `dependency_3`, `package_1_1_1` gets installed only if `release_1` has been specified
+ for `platform_1`.
+5. For `dependency_3`, the packages `package_1_2_1` and `package_1_2_2` gets installed only
+ if `release_2` has been specified for `platform_1`.
+6. For `dependency_3`, for `platform_2`, `package_2_1` is always installed, but `package_2_1_1` gets
+ installed only if `release_1` has been specified.
+
+### Tool help
+
+```shell
+$ pkg-resolver/resolve.py -h
+usage: resolve.py [-h] [-r RELEASE] platform
+
+Platform package dependency resolver for building Apache Hadoop
+
+positional arguments:
+ platform The name of the platform to resolve the dependencies for
+
+optional arguments:
+ -h, --help show this help message and exit
+ -r RELEASE, --release RELEASE
+ The release label to filter the packages for the given platform
+```
+
+## Standalone packages
+
+Most commonly, some packages are not available across the toolchains in various platforms. Thus, we
+would need to build and install them. Since we need to do this across all the Dockerfiles for all
+the platforms, it could lead to code duplication and managing them becomes a hassle. Thus, we put
+the build steps in a `pkg-resolver/install-.sh` and invoke this in all the Dockerfiles.
\ No newline at end of file
diff --git a/dev-support/docker/pkg-resolver/check_platform.py b/dev-support/docker/pkg-resolver/check_platform.py
new file mode 100644
index 0000000000000..fa5529a58be20
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/check_platform.py
@@ -0,0 +1,50 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Checks whether the given platform is supported for building Apache Hadoop
+"""
+
+import json
+import sys
+
+
+def get_platforms():
+ """
+ :return: A list of the supported platforms managed by pkg-resolver.
+ """
+
+ with open('pkg-resolver/platforms.json', encoding='utf-8', mode='r') as platforms_file:
+ return json.loads(platforms_file.read())
+
+
+def is_supported_platform(platform):
+ """
+ :param platform: The name of the platform
+ :return: Whether the platform is supported
+ """
+ return platform in get_platforms()
+
+
+if __name__ == '__main__':
+ if len(sys.argv) != 2:
+ print('ERROR: Expecting 1 argument, {} were provided'.format(len(sys.argv) - 1),
+ file=sys.stderr)
+ sys.exit(1)
+
+ sys.exit(0 if is_supported_platform(sys.argv[1]) else 1)
diff --git a/dev-support/docker/pkg-resolver/install-boost.sh b/dev-support/docker/pkg-resolver/install-boost.sh
new file mode 100644
index 0000000000000..eaca09effa2c0
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-boost.sh
@@ -0,0 +1,56 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="1.72.0"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "1.72.0" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "1.72.0" ]; then
+ # hadolint ignore=DL3003
+ mkdir -p /opt/boost-library &&
+ curl -L https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download >boost_1_72_0.tar.bz2 &&
+ mv boost_1_72_0.tar.bz2 /opt/boost-library &&
+ cd /opt/boost-library &&
+ tar --bzip2 -xf boost_1_72_0.tar.bz2 &&
+ cd /opt/boost-library/boost_1_72_0 &&
+ ./bootstrap.sh --prefix=/usr/ &&
+ ./b2 --without-python install &&
+ cd /root &&
+ rm -rf /opt/boost-library
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-cmake.sh b/dev-support/docker/pkg-resolver/install-cmake.sh
new file mode 100644
index 0000000000000..29e2733e70196
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-cmake.sh
@@ -0,0 +1,53 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="3.19.0"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "3.19.0" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "3.19.0" ]; then
+ # hadolint ignore=DL3003
+ mkdir -p /tmp/cmake /opt/cmake &&
+ curl -L -s -S https://cmake.org/files/v3.19/cmake-3.19.0.tar.gz -o /tmp/cmake/cmake-3.19.0.tar.gz &&
+ tar xzf /tmp/cmake/cmake-3.19.0.tar.gz --strip-components 1 -C /opt/cmake &&
+ cd /opt/cmake || exit && ./bootstrap &&
+ make "-j$(nproc)" &&
+ make install &&
+ cd /root || exit
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-common-pkgs.sh b/dev-support/docker/pkg-resolver/install-common-pkgs.sh
new file mode 100644
index 0000000000000..f91617db6c143
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-common-pkgs.sh
@@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+######
+# Install pylint and python-dateutil
+######
+pip3 install pylint==2.6.0 python-dateutil==2.8.1
diff --git a/dev-support/docker/pkg-resolver/install-epel.sh b/dev-support/docker/pkg-resolver/install-epel.sh
new file mode 100644
index 0000000000000..875dce3a9ae85
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-epel.sh
@@ -0,0 +1,49 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="8"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "8" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "8" ]; then
+ mkdir -p /tmp/epel &&
+ curl -L -s -S https://download-ib01.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm \
+ -o /tmp/epel/epel-release-latest-8.noarch.rpm &&
+ rpm -Uvh /tmp/epel/epel-release-latest-8.noarch.rpm
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-git.sh b/dev-support/docker/pkg-resolver/install-git.sh
new file mode 100644
index 0000000000000..353641819842c
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-git.sh
@@ -0,0 +1,55 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="2.9.5"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "2.9.5" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "2.9.5" ]; then
+ # hadolint ignore=DL3003
+ mkdir -p /tmp/git /opt/git &&
+ curl -L -s -S https://mirrors.edge.kernel.org/pub/software/scm/git/git-2.9.5.tar.gz >/tmp/git/git-2.9.5.tar.gz &&
+ tar xzf /tmp/git/git-2.9.5.tar.gz --strip-components 1 -C /opt/git &&
+ cd /opt/git || exit &&
+ make configure &&
+ ./configure --prefix=/usr/local &&
+ make "-j$(nproc)" &&
+ make install &&
+ cd /root || exit
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-hadolint.sh b/dev-support/docker/pkg-resolver/install-hadolint.sh
new file mode 100644
index 0000000000000..1e2081f38c403
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-hadolint.sh
@@ -0,0 +1,35 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+curl -L -s -S \
+ https://github.com/hadolint/hadolint/releases/download/v1.11.1/hadolint-Linux-x86_64 \
+ -o /bin/hadolint &&
+ chmod a+rx /bin/hadolint &&
+ shasum -a 512 /bin/hadolint |
+ awk '$1!="734e37c1f6619cbbd86b9b249e69c9af8ee1ea87a2b1ff71dccda412e9dac35e63425225a95d71572091a3f0a11e9a04c2fc25d9e91b840530c26af32b9891ca" {exit(1)}'
diff --git a/dev-support/docker/pkg-resolver/install-intel-isa-l.sh b/dev-support/docker/pkg-resolver/install-intel-isa-l.sh
new file mode 100644
index 0000000000000..c6b4de782282e
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-intel-isa-l.sh
@@ -0,0 +1,58 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="2.29.0"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "2.29.0" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "2.29.0" ]; then
+ # hadolint ignore=DL3003,DL3008
+ mkdir -p /opt/isa-l-src &&
+ curl -L -s -S \
+ https://github.com/intel/isa-l/archive/v2.29.0.tar.gz \
+ -o /opt/isa-l.tar.gz &&
+ tar xzf /opt/isa-l.tar.gz --strip-components 1 -C /opt/isa-l-src &&
+ cd /opt/isa-l-src &&
+ ./autogen.sh &&
+ ./configure &&
+ make "-j$(nproc)" &&
+ make install &&
+ cd /root &&
+ rm -rf /opt/isa-l-src
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-maven.sh b/dev-support/docker/pkg-resolver/install-maven.sh
new file mode 100644
index 0000000000000..f9ff961a190f9
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-maven.sh
@@ -0,0 +1,49 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="3.6.3"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "3.6.3" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "3.6.3" ]; then
+ mkdir -p /opt/maven /tmp/maven &&
+ curl -L -s -S https://mirrors.estointernet.in/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz \
+ -o /tmp/maven/apache-maven-3.6.3-bin.tar.gz &&
+ tar xzf /tmp/maven/apache-maven-3.6.3-bin.tar.gz --strip-components 1 -C /opt/maven
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-nodejs.sh b/dev-support/docker/pkg-resolver/install-nodejs.sh
new file mode 100644
index 0000000000000..5ba1c22808640
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-nodejs.sh
@@ -0,0 +1,54 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="14.16.1"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "14.16.1" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "14.16.1" ]; then
+ # hadolint ignore=DL3003
+ mkdir -p /tmp/node &&
+ curl -L -s -S https://nodejs.org/dist/v14.16.1/node-v14.16.1.tar.gz -o /tmp/node-v14.16.1.tar.gz &&
+ tar xzf /tmp/node-v14.16.1.tar.gz --strip-components 1 -C /tmp/node &&
+ cd /tmp/node || exit &&
+ ./configure &&
+ make "-j$(nproc)" &&
+ make install &&
+ cd /root || exit
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-pkg-resolver.sh b/dev-support/docker/pkg-resolver/install-pkg-resolver.sh
new file mode 100644
index 0000000000000..70e94b3792d9c
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-pkg-resolver.sh
@@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: No platform specified, please specify one"
+ exit 1
+fi
+
+chmod a+x pkg-resolver/*.sh pkg-resolver/*.py
+chmod a+r pkg-resolver/*.json
+
+if [ "$1" == "debian:10" ]; then
+ apt-get -q update
+ apt-get -q install -y --no-install-recommends python3 \
+ python3-pip \
+ python3-pkg-resources \
+ python3-setuptools \
+ python3-wheel
+ pip3 install pylint==2.6.0 python-dateutil==2.8.1
+else
+ # Need to add the code for the rest of the platforms - HADOOP-17920
+ echo "ERROR: The given platform $1 is not yet supported or is invalid"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-protobuf.sh b/dev-support/docker/pkg-resolver/install-protobuf.sh
new file mode 100644
index 0000000000000..7303b4048226a
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-protobuf.sh
@@ -0,0 +1,57 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="3.7.1"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "3.7.1" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "3.7.1" ]; then
+ # hadolint ignore=DL3003
+ mkdir -p /opt/protobuf-src &&
+ curl -L -s -S \
+ https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protobuf-java-3.7.1.tar.gz \
+ -o /opt/protobuf.tar.gz &&
+ tar xzf /opt/protobuf.tar.gz --strip-components 1 -C /opt/protobuf-src &&
+ cd /opt/protobuf-src &&
+ ./configure --prefix=/opt/protobuf &&
+ make "-j$(nproc)" &&
+ make install &&
+ cd /root &&
+ rm -rf /opt/protobuf-src
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-spotbugs.sh b/dev-support/docker/pkg-resolver/install-spotbugs.sh
new file mode 100644
index 0000000000000..65a8f2e692418
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-spotbugs.sh
@@ -0,0 +1,50 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="4.2.2"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "4.2.2" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "4.2.2" ]; then
+ mkdir -p /opt/spotbugs &&
+ curl -L -s -S https://github.com/spotbugs/spotbugs/releases/download/4.2.2/spotbugs-4.2.2.tgz \
+ -o /opt/spotbugs.tgz &&
+ tar xzf /opt/spotbugs.tgz --strip-components 1 -C /opt/spotbugs &&
+ chmod +x /opt/spotbugs/bin/*
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-yasm.sh b/dev-support/docker/pkg-resolver/install-yasm.sh
new file mode 100644
index 0000000000000..a5f6162bc38d7
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-yasm.sh
@@ -0,0 +1,49 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="1.2.0-4"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "1.2.0-4" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "1.2.0-4" ]; then
+ mkdir -p /tmp/yasm &&
+ curl -L -s -S https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/y/yasm-1.2.0-4.el7.x86_64.rpm \
+ -o /tmp/yasm-1.2.0-4.el7.x86_64.rpm &&
+ rpm -Uvh /tmp/yasm-1.2.0-4.el7.x86_64.rpm
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/install-zstandard.sh b/dev-support/docker/pkg-resolver/install-zstandard.sh
new file mode 100644
index 0000000000000..3aafd469d2be3
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/install-zstandard.sh
@@ -0,0 +1,53 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+pkg-resolver/check_platform.py "$1"
+if [ $? -eq 1 ]; then
+ echo "ERROR: Unsupported platform $1"
+ exit 1
+fi
+
+default_version="1.4.9"
+version_to_install=$default_version
+if [ -n "$2" ]; then
+ version_to_install="$2"
+fi
+
+if [ "$version_to_install" != "1.4.9" ]; then
+ echo "WARN: Don't know how to install version $version_to_install, installing the default version $default_version instead"
+ version_to_install=$default_version
+fi
+
+if [ "$version_to_install" == "1.4.9" ]; then
+ # hadolint ignore=DL3003
+ mkdir -p /opt/zstd /tmp/zstd &&
+ curl -L -s -S https://github.com/facebook/zstd/archive/refs/tags/v1.4.9.tar.gz -o /tmp/zstd/v1.4.9.tar.gz &&
+ tar xzf /tmp/zstd/v1.4.9.tar.gz --strip-components 1 -C /opt/zstd &&
+ cd /opt/zstd || exit &&
+ make "-j$(nproc)" &&
+ make install &&
+ cd /root || exit
+else
+ echo "ERROR: Don't know how to install version $version_to_install"
+ exit 1
+fi
diff --git a/dev-support/docker/pkg-resolver/packages.json b/dev-support/docker/pkg-resolver/packages.json
new file mode 100644
index 0000000000000..afe8a7a32b107
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/packages.json
@@ -0,0 +1,361 @@
+{
+ "ant": {
+ "debian:10": "ant",
+ "ubuntu:focal": "ant",
+ "ubuntu:focal::arch64": "ant",
+ "centos:7": "ant",
+ "centos:8": "ant"
+ },
+ "apt-utils": {
+ "debian:10": "apt-utils",
+ "ubuntu:focal": "apt-utils",
+ "ubuntu:focal::arch64": "apt-utils"
+ },
+ "automake": {
+ "debian:10": "automake",
+ "ubuntu:focal": "automake",
+ "ubuntu:focal::arch64": "automake",
+ "centos:7": "automake",
+ "centos:8": "automake"
+ },
+ "autoconf": {
+ "centos:7": "autoconf"
+ },
+ "bats": {
+ "debian:10": "bats",
+ "ubuntu:focal": "bats",
+ "ubuntu:focal::arch64": "bats"
+ },
+ "build-essential": {
+ "debian:10": "build-essential",
+ "ubuntu:focal": "build-essential",
+ "ubuntu:focal::arch64": "build-essential",
+ "centos:7": "build-essential"
+ },
+ "bzip2": {
+ "debian:10": [
+ "bzip2",
+ "libbz2-dev"
+ ],
+ "ubuntu:focal": [
+ "bzip2",
+ "libbz2-dev"
+ ],
+ "ubuntu:focal::arch64": [
+ "bzip2",
+ "libbz2-dev"
+ ],
+ "centos:7": [
+ "bzip2",
+ "bzip2-devel"
+ ],
+ "centos:8": [
+ "bzip2",
+ "bzip2-devel"
+ ]
+ },
+ "clang": {
+ "debian:10": "clang",
+ "ubuntu:focal": "clang",
+ "ubuntu:focal::arch64": "clang",
+ "centos:7": "clang",
+ "centos:8": "clang"
+ },
+ "cmake": {
+ "debian:10": "cmake",
+ "ubuntu:focal": "cmake",
+ "ubuntu:focal::arch64": "cmake"
+ },
+ "curl": {
+ "debian:10": [
+ "curl",
+ "libcurl4-openssl-dev"
+ ],
+ "ubuntu:focal": [
+ "curl",
+ "libcurl4-openssl-dev"
+ ],
+ "ubuntu:focal::arch64": [
+ "curl",
+ "libcurl4-openssl-dev"
+ ],
+ "centos:7": [
+ "curl",
+ "libcurl-devel"
+ ],
+ "centos:8": [
+ "curl",
+ "libcurl-devel"
+ ]
+ },
+ "doxygen": {
+ "debian:10": "doxygen",
+ "ubuntu:focal": "doxygen",
+ "ubuntu:focal::arch64": "doxygen",
+ "centos:7": "doxygen"
+ },
+ "dnf": {
+ "centos:8": "dnf"
+ },
+ "fuse": {
+ "debian:10": [
+ "fuse",
+ "libfuse-dev"
+ ],
+ "ubuntu:focal": [
+ "fuse",
+ "libfuse-dev"
+ ],
+ "ubuntu:focal::arch64": [
+ "fuse",
+ "libfuse-dev"
+ ],
+ "centos:7": [
+ "fuse",
+ "fuse-libs",
+ "fuse-devel"
+ ],
+ "centos:8": [
+ "fuse",
+ "fuse-libs",
+ "fuse-devel"
+ ]
+ },
+ "gcc": {
+ "debian:10": {
+ "bullseye": [
+ "gcc",
+ "g++"
+ ]
+ },
+ "ubuntu:focal": [
+ "gcc",
+ "g++"
+ ],
+ "ubuntu:focal::arch64": [
+ "gcc",
+ "g++"
+ ],
+ "centos:7": [
+ "centos-release-scl",
+ "devtoolset-9"
+ ]
+ },
+ "gettext": {
+ "centos:7": "gettext-devel"
+ },
+ "git": {
+ "debian:10": "git",
+ "ubuntu:focal": "git",
+ "ubuntu:focal::arch64": "git",
+ "centos:8": "git"
+ },
+ "gnupg-agent": {
+ "debian:10": "gnupg-agent",
+ "ubuntu:focal": "gnupg-agent",
+ "ubuntu:focal::arch64": "gnupg-agent"
+ },
+ "hugo": {
+ "debian:10": "hugo",
+ "ubuntu:focal": "hugo",
+ "ubuntu:focal::arch64": "hugo"
+ },
+ "libbcprov-java": {
+ "debian:10": "libbcprov-java",
+ "ubuntu:focal": "libbcprov-java",
+ "ubuntu:focal::arch64": "libbcprov-java"
+ },
+ "libtool": {
+ "debian:10": "libtool",
+ "ubuntu:focal": "libtool",
+ "ubuntu:focal::arch64": "libtool",
+ "centos:7": "libtool",
+ "centos:8": "libtool"
+ },
+ "openssl": {
+ "debian:10": "libssl-dev",
+ "ubuntu:focal": "libssl-dev",
+ "ubuntu:focal::arch64": "libssl-dev",
+ "centos:7": "openssl-devel",
+ "centos:8": "openssl-devel"
+ },
+ "perl": {
+ "centos:7": [
+ "perl-CPAN",
+ "perl-devel"
+ ]
+ },
+ "protocol-buffers": {
+ "debian:10": [
+ "libprotobuf-dev",
+ "libprotoc-dev"
+ ],
+ "ubuntu:focal": [
+ "libprotobuf-dev",
+ "libprotoc-dev"
+ ],
+ "ubuntu:focal::arch64": [
+ "libprotobuf-dev",
+ "libprotoc-dev"
+ ]
+ },
+ "sasl": {
+ "debian:10": "libsasl2-dev",
+ "ubuntu:focal": "libsasl2-dev",
+ "ubuntu:focal::arch64": "libsasl2-dev",
+ "centos:7": "cyrus-sasl-devel",
+ "centos:8": "cyrus-sasl-devel"
+ },
+ "snappy": {
+ "debian:10": "libsnappy-dev",
+ "ubuntu:focal": "libsnappy-dev",
+ "ubuntu:focal::arch64": "libsnappy-dev",
+ "centos:7": "snappy-devel"
+ },
+ "zlib": {
+ "debian:10": [
+ "libzstd-dev",
+ "zlib1g-dev"
+ ],
+ "ubuntu:focal": [
+ "libzstd-dev",
+ "zlib1g-dev"
+ ],
+ "ubuntu:focal::arch64": [
+ "libzstd-dev",
+ "zlib1g-dev"
+ ],
+ "centos:7": [
+ "zlib-devel",
+ "lz4-devel"
+ ],
+ "centos:8": [
+ "zlib-devel",
+ "lz4-devel"
+ ]
+ },
+ "locales": {
+ "debian:10": "locales",
+ "ubuntu:focal": "locales",
+ "ubuntu:focal::arch64": "locales"
+ },
+ "libtirpc-devel": {
+ "centos:7": "libtirpc-devel",
+ "centos:8": "libtirpc-devel"
+ },
+ "libpmem": {
+ "centos:7": "libpmem-devel"
+ },
+ "make": {
+ "debian:10": "make",
+ "ubuntu:focal": "make",
+ "ubuntu:focal::arch64": "make",
+ "centos:7": "make",
+ "centos:8": "make"
+ },
+ "maven": {
+ "debian:10": "maven",
+ "ubuntu:focal": "maven",
+ "ubuntu:focal::arch64": "maven"
+ },
+ "java": {
+ "debian:10": "openjdk-11-jdk",
+ "ubuntu:focal": [
+ "openjdk-8-jdk",
+ "openjdk-11-jdk"
+ ],
+ "ubuntu:focal::arch64": [
+ "openjdk-8-jdk",
+ "openjdk-11-jdk"
+ ]
+ },
+ "pinentry-curses": {
+ "debian:10": "pinentry-curses",
+ "ubuntu:focal": "pinentry-curses",
+ "ubuntu:focal::arch64": "pinentry-curses",
+ "centos:7": "pinentry-curses",
+ "centos:8": "pinentry-curses"
+ },
+ "pkg-config": {
+ "debian:10": "pkg-config",
+ "ubuntu:focal": "pkg-config",
+ "ubuntu:focal::arch64": "pkg-config",
+ "centos:8": "pkg-config"
+ },
+ "python": {
+ "debian:10": [
+ "python3",
+ "python3-pip",
+ "python3-pkg-resources",
+ "python3-setuptools",
+ "python3-wheel"
+ ],
+ "ubuntu:focal": [
+ "python3",
+ "python3-pip",
+ "python3-pkg-resources",
+ "python3-setuptools",
+ "python3-wheel"
+ ],
+ "ubuntu:focal::arch64": [
+ "python2.7",
+ "python3",
+ "python3-pip",
+ "python3-pkg-resources",
+ "python3-setuptools",
+ "python3-wheel"
+ ],
+ "centos:7": [
+ "python3",
+ "python3-pip",
+ "python3-setuptools",
+ "python3-wheel"
+ ],
+ "centos:8": [
+ "python3",
+ "python3-pip",
+ "python3-setuptools",
+ "python3-wheel"
+ ]
+ },
+ "rsync": {
+ "debian:10": "rsync",
+ "ubuntu:focal": "rsync",
+ "ubuntu:focal::arch64": "rsync",
+ "centos:7": "rsync",
+ "centos:8": "rsync"
+ },
+ "shellcheck": {
+ "debian:10": "shellcheck",
+ "ubuntu:focal": "shellcheck",
+ "ubuntu:focal::arch64": "shellcheck"
+ },
+ "shasum": {
+ "centos:7": "perl-Digest-SHA",
+ "centos:8": "perl-Digest-SHA"
+ },
+ "software-properties-common": {
+ "debian:10": "software-properties-common",
+ "ubuntu:focal": "software-properties-common",
+ "ubuntu:focal::arch64": "software-properties-common"
+ },
+ "sudo": {
+ "debian:10": "sudo",
+ "ubuntu:focal": "sudo",
+ "ubuntu:focal::arch64": "sudo",
+ "centos:7": "sudo",
+ "centos:8": "sudo"
+ },
+ "valgrind": {
+ "debian:10": "valgrind",
+ "ubuntu:focal": "valgrind",
+ "ubuntu:focal::arch64": "valgrind",
+ "centos:7": "valgrind",
+ "centos:8": "valgrind"
+ },
+ "yasm": {
+ "debian:10": "yasm",
+ "ubuntu:focal": "yasm",
+ "ubuntu:focal::arch64": "yasm"
+ }
+}
diff --git a/dev-support/docker/pkg-resolver/platforms.json b/dev-support/docker/pkg-resolver/platforms.json
new file mode 100644
index 0000000000000..93e2a93df4220
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/platforms.json
@@ -0,0 +1,7 @@
+[
+ "ubuntu:focal",
+ "ubuntu:focal::arch64",
+ "centos:7",
+ "centos:8",
+ "debian:10"
+]
\ No newline at end of file
diff --git a/dev-support/docker/pkg-resolver/resolve.py b/dev-support/docker/pkg-resolver/resolve.py
new file mode 100644
index 0000000000000..bf3b8491f9407
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/resolve.py
@@ -0,0 +1,98 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Platform package dependency resolver for building Apache Hadoop.
+"""
+
+import argparse
+import json
+import sys
+from check_platform import is_supported_platform
+
+
+def get_packages(platform, release=None):
+ """
+ Resolve and get the list of packages to install for the given platform.
+
+ :param platform: The platform for which the packages needs to be resolved.
+ :param release: An optional parameter that filters the packages of the given platform for the
+ specified release.
+ :return: A list of resolved packages to install.
+ """
+ with open('pkg-resolver/packages.json', encoding='utf-8', mode='r') as pkg_file:
+ pkgs = json.loads(pkg_file.read())
+ packages = []
+
+ def process_package(package, in_release=False):
+ """
+ Processes the given package object that belongs to a platform and adds it to the packages
+ list variable in the parent scope.
+ In essence, this method recursively traverses the JSON structure defined in packages.json
+ and performs the core filtering.
+
+ :param package: The package object to process.
+ :param in_release: A boolean that indicates whether the current travels belongs to a package
+ that needs to be filtered for the given release label.
+ """
+ if isinstance(package, list):
+ for entry in package:
+ process_package(entry, in_release)
+ elif isinstance(package, dict):
+ if release is None:
+ return
+ for entry in package.get(release, []):
+ process_package(entry, in_release=True)
+ elif isinstance(package, str):
+ # Filter out the package that doesn't belong to this release,
+ # if a release label has been specified.
+ if release is not None and not in_release:
+ return
+ packages.append(package)
+ else:
+ raise Exception('Unknown package of type: {}'.format(type(package)))
+
+ for platforms in filter(lambda x: x.get(platform) is not None, pkgs.values()):
+ process_package(platforms.get(platform))
+ return packages
+
+
+if __name__ == '__main__':
+ if len(sys.argv) < 2:
+ print('ERROR: Need at least 1 argument, {} were provided'.format(len(sys.argv) - 1),
+ file=sys.stderr)
+ sys.exit(1)
+
+ arg_parser = argparse.ArgumentParser(
+ description='Platform package dependency resolver for building Apache Hadoop')
+ arg_parser.add_argument('-r', '--release', nargs=1, type=str,
+ help='The release label to filter the packages for the given platform')
+ arg_parser.add_argument('platform', nargs=1, type=str,
+ help='The name of the platform to resolve the dependencies for')
+ args = arg_parser.parse_args()
+
+ if not is_supported_platform(args.platform[0]):
+ print(
+ 'ERROR: The given platform {} is not supported. '
+ 'Please refer to platforms.json for a list of supported platforms'.format(
+ args.platform), file=sys.stderr)
+ sys.exit(1)
+
+ packages_to_install = get_packages(args.platform[0],
+ args.release[0] if args.release is not None else None)
+ print(' '.join(packages_to_install))
diff --git a/dev-support/docker/pkg-resolver/set-vault-as-baseurl-centos.sh b/dev-support/docker/pkg-resolver/set-vault-as-baseurl-centos.sh
new file mode 100644
index 0000000000000..4be4cd956b15b
--- /dev/null
+++ b/dev-support/docker/pkg-resolver/set-vault-as-baseurl-centos.sh
@@ -0,0 +1,33 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if [ $# -lt 1 ]; then
+ echo "ERROR: Need at least 1 argument, $# were provided"
+ exit 1
+fi
+
+if [ "$1" == "centos:7" ] || [ "$1" == "centos:8" ]; then
+ cd /etc/yum.repos.d/ || exit &&
+ sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-* &&
+ sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-* &&
+ yum update -y &&
+ cd /root || exit
+else
+ echo "ERROR: Setting the archived baseurl is only supported for centos 7 and 8 environments"
+ exit 1
+fi
diff --git a/dev-support/git-jira-validation/README.md b/dev-support/git-jira-validation/README.md
new file mode 100644
index 0000000000000..308c54228d17c
--- /dev/null
+++ b/dev-support/git-jira-validation/README.md
@@ -0,0 +1,134 @@
+
+
+Apache Hadoop Git/Jira FixVersion validation
+============================================================
+
+Git commits in Apache Hadoop contains Jira number of the format
+HADOOP-XXXX or HDFS-XXXX or YARN-XXXX or MAPREDUCE-XXXX.
+While creating a release candidate, we also include changelist
+and this changelist can be identified based on Fixed/Closed Jiras
+with the correct fix versions. However, sometimes we face few
+inconsistencies between fixed Jira and Git commit message.
+
+git_jira_fix_version_check.py script takes care of
+identifying all git commits with commit
+messages with any of these issues:
+
+1. commit is reverted as per commit message
+2. commit does not contain Jira number format in message
+3. Jira does not have expected fixVersion
+4. Jira has expected fixVersion, but it is not yet resolved
+
+Moreover, this script also finds any resolved Jira with expected
+fixVersion but without any corresponding commit present.
+
+This should be useful as part of RC preparation.
+
+git_jira_fix_version_check supports python3 and it required
+installation of jira:
+
+```
+$ python3 --version
+Python 3.9.7
+
+$ python3 -m venv ./venv
+
+$ ./venv/bin/pip install -r dev-support/git-jira-validation/requirements.txt
+
+$ ./venv/bin/python dev-support/git-jira-validation/git_jira_fix_version_check.py
+
+```
+
+The script also requires below inputs:
+```
+1. First commit hash to start excluding commits from history:
+ Usually we can provide latest commit hash from last tagged release
+ so that the script will only loop through all commits in git commit
+ history before this commit hash. e.g for 3.3.2 release, we can provide
+ git hash: fa4915fdbbbec434ab41786cb17b82938a613f16
+ because this commit bumps up hadoop pom versions to 3.3.2:
+ https://github.com/apache/hadoop/commit/fa4915fdbbbec434ab41786cb17b82938a613f16
+
+2. Fix Version:
+ Exact fixVersion that we would like to compare all Jira's fixVersions
+ with. e.g for 3.3.2 release, it should be 3.3.2.
+
+3. JIRA Project Name:
+ The exact name of Project as case-sensitive e.g HADOOP / OZONE
+
+4. Path of project's working dir with release branch checked-in:
+ Path of project from where we want to compare git hashes from. Local fork
+ of the project should be up-to date with upstream and expected release
+ branch should be checked-in.
+
+5. Jira server url (default url: https://issues.apache.org/jira):
+ Default value of server points to ASF Jiras but this script can be
+ used outside of ASF Jira too.
+```
+
+
+Example of script execution:
+```
+JIRA Project Name (e.g HADOOP / OZONE etc): HADOOP
+First commit hash to start excluding commits from history: fa4915fdbbbec434ab41786cb17b82938a613f16
+Fix Version: 3.3.2
+Jira server url (default: https://issues.apache.org/jira):
+Path of project's working dir with release branch checked-in: /Users/vjasani/Documents/src/hadoop-3.3/hadoop
+
+Check git status output and verify expected branch
+
+On branch branch-3.3.2
+Your branch is up to date with 'origin/branch-3.3.2'.
+
+nothing to commit, working tree clean
+
+
+Jira/Git commit message diff starting: ##############################################
+Jira not present with version: 3.3.2. Commit: 8cd8e435fb43a251467ca74fadcb14f21a3e8163 HADOOP-17198. Support S3 Access Points (#3260) (branch-3.3.2) (#3955)
+WARN: Jira not found. Commit: 8af28b7cca5c6020de94e739e5373afc69f399e5 Updated the index as per 3.3.2 release
+WARN: Jira not found. Commit: e42e483d0085aa46543ebcb1196dd155ddb447d0 Make upstream aware of 3.3.1 release
+Commit seems reverted. Commit: 6db1165380cd308fb74c9d17a35c1e57174d1e09 Revert "HDFS-14099. Unknown frame descriptor when decompressing multiple frames (#3836)"
+Commit seems reverted. Commit: 1e3f94fa3c3d4a951d4f7438bc13e6f008f228f4 Revert "HDFS-16333. fix balancer bug when transfer an EC block (#3679)"
+Jira not present with version: 3.3.2. Commit: ce0bc7b473a62a580c1227a4de6b10b64b045d3a HDFS-16344. Improve DirectoryScanner.Stats#toString (#3695)
+Jira not present with version: 3.3.2. Commit: 30f0629d6e6f735c9f4808022f1a1827c5531f75 HDFS-16339. Show the threshold when mover threads quota is exceeded (#3689)
+Jira not present with version: 3.3.2. Commit: e449daccf486219e3050254d667b74f92e8fc476 YARN-11007. Correct words in YARN documents (#3680)
+Commit seems reverted. Commit: 5c189797828e60a3329fd920ecfb99bcbccfd82d Revert "HDFS-16336. Addendum: De-flake TestRollingUpgrade#testRollback (#3686)"
+Jira not present with version: 3.3.2. Commit: 544dffd179ed756bc163e4899e899a05b93d9234 HDFS-16171. De-flake testDecommissionStatus (#3280)
+Jira not present with version: 3.3.2. Commit: c6914b1cb6e4cab8263cd3ae5cc00bc7a8de25de HDFS-16350. Datanode start time should be set after RPC server starts successfully (#3711)
+Jira not present with version: 3.3.2. Commit: 328d3b84dfda9399021ccd1e3b7afd707e98912d HDFS-16336. Addendum: De-flake TestRollingUpgrade#testRollback (#3686)
+Jira not present with version: 3.3.2. Commit: 3ae8d4ccb911c9ababd871824a2fafbb0272c016 HDFS-16336. De-flake TestRollingUpgrade#testRollback (#3686)
+Jira not present with version: 3.3.2. Commit: 15d3448e25c797b7d0d401afdec54683055d4bb5 HADOOP-17975. Fallback to simple auth does not work for a secondary DistributedFileSystem instance. (#3579)
+Jira not present with version: 3.3.2. Commit: dd50261219de71eaa0a1ad28529953e12dfb92e0 YARN-10991. Fix to ignore the grouping "[]" for resourcesStr in parseResourcesString method (#3592)
+Jira not present with version: 3.3.2. Commit: ef462b21bf03b10361d2f9ea7b47d0f7360e517f HDFS-16332. Handle invalid token exception in sasl handshake (#3677)
+WARN: Jira not found. Commit: b55edde7071419410ea5bea4ce6462b980e48f5b Also update hadoop.version to 3.3.2
+...
+...
+...
+Found first commit hash after which git history is redundant. commit: fa4915fdbbbec434ab41786cb17b82938a613f16
+Exiting successfully
+Jira/Git commit message diff completed: ##############################################
+
+Any resolved Jira with fixVersion 3.3.2 but corresponding commit not present
+Starting diff: ##############################################
+HADOOP-18066 is marked resolved with fixVersion 3.3.2 but no corresponding commit found
+HADOOP-17936 is marked resolved with fixVersion 3.3.2 but no corresponding commit found
+Completed diff: ##############################################
+
+
+```
+
diff --git a/dev-support/git-jira-validation/git_jira_fix_version_check.py b/dev-support/git-jira-validation/git_jira_fix_version_check.py
new file mode 100644
index 0000000000000..c2e12a13aae22
--- /dev/null
+++ b/dev-support/git-jira-validation/git_jira_fix_version_check.py
@@ -0,0 +1,118 @@
+#!/usr/bin/env python3
+############################################################################
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+############################################################################
+"""An application to assist Release Managers with ensuring that histories in
+Git and fixVersions in JIRA are in agreement. See README.md for a detailed
+explanation.
+"""
+
+
+import os
+import re
+import subprocess
+
+from jira import JIRA
+
+jira_project_name = input("JIRA Project Name (e.g HADOOP / OZONE etc): ") \
+ or "HADOOP"
+# Define project_jira_keys with - appended. e.g for HADOOP Jiras,
+# project_jira_keys should include HADOOP-, HDFS-, YARN-, MAPREDUCE-
+project_jira_keys = [jira_project_name + '-']
+if jira_project_name == 'HADOOP':
+ project_jira_keys.append('HDFS-')
+ project_jira_keys.append('YARN-')
+ project_jira_keys.append('MAPREDUCE-')
+
+first_exclude_commit_hash = input("First commit hash to start excluding commits from history: ")
+fix_version = input("Fix Version: ")
+
+jira_server_url = input(
+ "Jira server url (default: https://issues.apache.org/jira): ") \
+ or "https://issues.apache.org/jira"
+
+jira = JIRA(server=jira_server_url)
+
+local_project_dir = input("Path of project's working dir with release branch checked-in: ")
+os.chdir(local_project_dir)
+
+GIT_STATUS_MSG = subprocess.check_output(['git', 'status']).decode("utf-8")
+print('\nCheck git status output and verify expected branch\n')
+print(GIT_STATUS_MSG)
+
+print('\nJira/Git commit message diff starting: ##############################################')
+
+issue_set_from_commit_msg = set()
+
+for commit in subprocess.check_output(['git', 'log', '--pretty=oneline']).decode(
+ "utf-8").splitlines():
+ if commit.startswith(first_exclude_commit_hash):
+ print("Found first commit hash after which git history is redundant. commit: "
+ + first_exclude_commit_hash)
+ print("Exiting successfully")
+ break
+ if re.search('revert', commit, re.IGNORECASE):
+ print("Commit seems reverted. \t\t\t Commit: " + commit)
+ continue
+ ACTUAL_PROJECT_JIRA = None
+ for project_jira in project_jira_keys:
+ if project_jira in commit:
+ ACTUAL_PROJECT_JIRA = project_jira
+ break
+ if not ACTUAL_PROJECT_JIRA:
+ print("WARN: Jira not found. \t\t\t Commit: " + commit)
+ continue
+ JIRA_NUM = ''
+ for c in commit.split(ACTUAL_PROJECT_JIRA)[1]:
+ if c.isdigit():
+ JIRA_NUM = JIRA_NUM + c
+ else:
+ break
+ issue = jira.issue(ACTUAL_PROJECT_JIRA + JIRA_NUM)
+ EXPECTED_FIX_VERSION = False
+ for version in issue.fields.fixVersions:
+ if version.name == fix_version:
+ EXPECTED_FIX_VERSION = True
+ break
+ if not EXPECTED_FIX_VERSION:
+ print("Jira not present with version: " + fix_version + ". \t Commit: " + commit)
+ continue
+ if issue.fields.status is None or issue.fields.status.name not in ('Resolved', 'Closed'):
+ print("Jira is not resolved yet? \t\t Commit: " + commit)
+ else:
+ # This means Jira corresponding to current commit message is resolved with expected
+ # fixVersion.
+ # This is no-op by default, if needed, convert to print statement.
+ issue_set_from_commit_msg.add(ACTUAL_PROJECT_JIRA + JIRA_NUM)
+
+print('Jira/Git commit message diff completed: ##############################################')
+
+print('\nAny resolved Jira with fixVersion ' + fix_version
+ + ' but corresponding commit not present')
+print('Starting diff: ##############################################')
+all_issues_with_fix_version = jira.search_issues(
+ 'project=' + jira_project_name + ' and status in (Resolved,Closed) and fixVersion='
+ + fix_version)
+
+for issue in all_issues_with_fix_version:
+ if issue.key not in issue_set_from_commit_msg:
+ print(issue.key + ' is marked resolved with fixVersion ' + fix_version
+ + ' but no corresponding commit found')
+
+print('Completed diff: ##############################################')
diff --git a/dev-support/git-jira-validation/requirements.txt b/dev-support/git-jira-validation/requirements.txt
new file mode 100644
index 0000000000000..ae7535a119fa9
--- /dev/null
+++ b/dev-support/git-jira-validation/requirements.txt
@@ -0,0 +1,18 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+jira==3.1.1
diff --git a/dev-support/hadoop-vote.sh b/dev-support/hadoop-vote.sh
new file mode 100755
index 0000000000000..3d381fb0b4be2
--- /dev/null
+++ b/dev-support/hadoop-vote.sh
@@ -0,0 +1,201 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# This script is useful to perform basic sanity tests for the given
+# Hadoop RC. It checks for the Checksum, Signature, Rat check,
+# Build from source and building tarball from the source.
+
+set -e -o pipefail
+
+usage() {
+ SCRIPT=$(basename "${BASH_SOURCE[@]}")
+
+ cat << __EOF
+hadoop-vote. A script for standard vote which verifies the following items
+1. Checksum of sources and binaries
+2. Signature of sources and binaries
+3. Rat check
+4. Built from source
+5. Built tar from source
+
+Usage: ${SCRIPT} -s | --source [-k | --key ] [-f | --keys-file-url ] [-o | --output-dir ] [-D property[=value]] [-P profiles]
+ ${SCRIPT} -h | --help
+
+ -h | --help Show this screen.
+ -s | --source '' A URL pointing to the release candidate sources and binaries
+ e.g. https://dist.apache.org/repos/dist/dev/hadoop/hadoop-RC0/
+ -k | --key '' A signature of the public key, e.g. 9AD2AE49
+ -f | --keys-file-url '' the URL of the key file, default is
+ https://downloads.apache.org/hadoop/common/KEYS
+ -o | --output-dir '' directory which has the stdout and stderr of each verification target
+ -D | list of maven properties to set for the mvn invocations, e.g. <-D hbase.profile=2.0 -D skipTests> Defaults to unset
+ -P | list of maven profiles to set for the build from source, e.g. <-P native -P yarn-ui>
+__EOF
+}
+
+MVN_PROPERTIES=()
+MVN_PROFILES=()
+
+while ((${#})); do
+ case "${1}" in
+ -h | --help )
+ usage; exit 0 ;;
+ -s | --source )
+ SOURCE_URL="${2}"; shift 2 ;;
+ -k | --key )
+ SIGNING_KEY="${2}"; shift 2 ;;
+ -f | --keys-file-url )
+ KEY_FILE_URL="${2}"; shift 2 ;;
+ -o | --output-dir )
+ OUTPUT_DIR="${2}"; shift 2 ;;
+ -D )
+ MVN_PROPERTIES+=("-D ${2}"); shift 2 ;;
+ -P )
+ MVN_PROFILES+=("-P ${2}"); shift 2 ;;
+ * )
+ usage >&2; exit 1 ;;
+ esac
+done
+
+# Source url must be provided
+if [ -z "${SOURCE_URL}" ]; then
+ usage;
+ exit 1
+fi
+
+cat << __EOF
+Although This tool helps verifying Hadoop RC build and unit tests,
+operator may still consider verifying the following manually:
+1. Verify the API compatibility report
+2. Integration/performance/benchmark tests
+3. Object store specific Integration tests against an endpoint
+4. Verify overall unit test stability from Jenkins builds or locally
+5. Other concerns if any
+__EOF
+
+[[ "${SOURCE_URL}" != */ ]] && SOURCE_URL="${SOURCE_URL}/"
+HADOOP_RC_VERSION=$(tr "/" "\n" <<< "${SOURCE_URL}" | tail -n2)
+HADOOP_VERSION=$(echo "${HADOOP_RC_VERSION}" | sed -e 's/-RC[0-9]//g' | sed -e 's/hadoop-//g')
+JAVA_VERSION=$(java -version 2>&1 | cut -f3 -d' ' | head -n1 | sed -e 's/"//g')
+OUTPUT_DIR="${OUTPUT_DIR:-$(pwd)}"
+
+if [ ! -d "${OUTPUT_DIR}" ]; then
+ echo "Output directory ${OUTPUT_DIR} does not exist, please create it before running this script."
+ exit 1
+fi
+
+OUTPUT_PATH_PREFIX="${OUTPUT_DIR}"/"${HADOOP_RC_VERSION}"
+
+# default value for verification targets, 0 = failed
+SIGNATURE_PASSED=0
+CHECKSUM_PASSED=0
+RAT_CHECK_PASSED=0
+BUILD_FROM_SOURCE_PASSED=0
+BUILD_TAR_FROM_SOURCE_PASSED=0
+
+function download_and_import_keys() {
+ KEY_FILE_URL="${KEY_FILE_URL:-https://downloads.apache.org/hadoop/common/KEYS}"
+ echo "Obtain and import the publisher key(s) from ${KEY_FILE_URL}"
+ # download the keys file into file KEYS
+ wget -O KEYS "${KEY_FILE_URL}"
+ gpg --import KEYS
+ if [ -n "${SIGNING_KEY}" ]; then
+ gpg --list-keys "${SIGNING_KEY}"
+ fi
+}
+
+function download_release_candidate () {
+ # get all files from release candidate repo
+ wget -r -np -N -nH --cut-dirs 4 "${SOURCE_URL}"
+}
+
+function verify_signatures() {
+ rm -f "${OUTPUT_PATH_PREFIX}"_verify_signatures
+ for file in *.tar.gz; do
+ gpg --verify "${file}".asc "${file}" 2>&1 | tee -a "${OUTPUT_PATH_PREFIX}"_verify_signatures && SIGNATURE_PASSED=1 || SIGNATURE_PASSED=0
+ done
+}
+
+function verify_checksums() {
+ rm -f "${OUTPUT_PATH_PREFIX}"_verify_checksums
+ SHA_EXT=$(find . -name "*.sha*" | awk -F '.' '{ print $NF }' | head -n 1)
+ for file in *.tar.gz; do
+ sha512sum --tag "${file}" > "${file}"."${SHA_EXT}".tmp
+ diff "${file}"."${SHA_EXT}".tmp "${file}"."${SHA_EXT}" 2>&1 | tee -a "${OUTPUT_PATH_PREFIX}"_verify_checksums && CHECKSUM_PASSED=1 || CHECKSUM_PASSED=0
+ rm -f "${file}"."${SHA_EXT}".tmp
+ done
+}
+
+function unzip_from_source() {
+ tar -zxvf hadoop-"${HADOOP_VERSION}"-src.tar.gz
+ cd hadoop-"${HADOOP_VERSION}"-src
+}
+
+function rat_test() {
+ rm -f "${OUTPUT_PATH_PREFIX}"_rat_test
+ mvn clean apache-rat:check "${MVN_PROPERTIES[@]}" 2>&1 | tee "${OUTPUT_PATH_PREFIX}"_rat_test && RAT_CHECK_PASSED=1
+}
+
+function build_from_source() {
+ rm -f "${OUTPUT_PATH_PREFIX}"_build_from_source
+ # No unit test run.
+ mvn clean install "${MVN_PROPERTIES[@]}" -DskipTests "${MVN_PROFILES[@]}" 2>&1 | tee "${OUTPUT_PATH_PREFIX}"_build_from_source && BUILD_FROM_SOURCE_PASSED=1
+}
+
+function build_tar_from_source() {
+ rm -f "${OUTPUT_PATH_PREFIX}"_build_tar_from_source
+ # No unit test run.
+ mvn clean package "${MVN_PROPERTIES[@]}" -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true 2>&1 | tee "${OUTPUT_PATH_PREFIX}"_build_tar_from_source && BUILD_TAR_FROM_SOURCE_PASSED=1
+}
+
+function execute() {
+ ${1} || print_when_exit
+}
+
+function print_when_exit() {
+ cat << __EOF
+ * Signature: $( ((SIGNATURE_PASSED)) && echo "ok" || echo "failed" )
+ * Checksum : $( ((CHECKSUM_PASSED)) && echo "ok" || echo "failed" )
+ * Rat check (${JAVA_VERSION}): $( ((RAT_CHECK_PASSED)) && echo "ok" || echo "failed" )
+ - mvn clean apache-rat:check ${MVN_PROPERTIES[@]}
+ * Built from source (${JAVA_VERSION}): $( ((BUILD_FROM_SOURCE_PASSED)) && echo "ok" || echo "failed" )
+ - mvn clean install ${MVN_PROPERTIES[@]} -DskipTests ${MVN_PROFILES[@]}
+ * Built tar from source (${JAVA_VERSION}): $( ((BUILD_TAR_FROM_SOURCE_PASSED)) && echo "ok" || echo "failed" )
+ - mvn clean package ${MVN_PROPERTIES[@]} -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true
+__EOF
+ if ((CHECKSUM_PASSED)) && ((SIGNATURE_PASSED)) && ((RAT_CHECK_PASSED)) && ((BUILD_FROM_SOURCE_PASSED)) && ((BUILD_TAR_FROM_SOURCE_PASSED)) ; then
+ exit 0
+ fi
+ exit 1
+}
+
+pushd "${OUTPUT_DIR}"
+
+download_and_import_keys
+download_release_candidate
+
+execute verify_signatures
+execute verify_checksums
+execute unzip_from_source
+execute rat_test
+execute build_from_source
+execute build_tar_from_source
+
+popd
+
+print_when_exit
diff --git a/dev-support/jenkins.sh b/dev-support/jenkins.sh
new file mode 100644
index 0000000000000..1bb080d19cabc
--- /dev/null
+++ b/dev-support/jenkins.sh
@@ -0,0 +1,250 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This script is called from the Jenkinsfile, which ultimately runs
+# the CI through Yetus.
+# We use Ubuntu Focal as the main platform for building Hadoop, thus
+# it runs for all the PRs. Additionally, we also ensure that
+# Hadoop builds across the supported platforms whenever there's a change
+# in any of the C/C++ files, C/C++ build files or platform changes.
+
+## @description Check if the given extension is related to C/C++
+## @param seeking
+## @return 0 if yes
+## @return 1 if no
+is_c_cpp_extension() {
+ local c_cpp_extension=("c" "cc" "cpp" "h" "hpp")
+ local seeking=$1
+
+ for element in "${c_cpp_extension[@]}"; do
+ if [[ $element == "$seeking" ]]; then
+ return 0
+ fi
+ done
+ return 1
+}
+
+## @description Check if the given relative path corresponds to
+## change in platform files
+## @param in_path
+## @return 0 if yes
+## @return 1 if no
+is_platform_change() {
+ declare in_path
+ in_path="${SOURCEDIR}"/"${1}"
+
+ for path in "${SOURCEDIR}"/dev-support/docker/Dockerfile* "${SOURCEDIR}"/dev-support/docker/pkg-resolver/*.json; do
+ if [ "${in_path}" == "${path}" ]; then
+ echo "Found C/C++ platform related changes in ${in_path}"
+ return 0
+ fi
+ done
+ return 1
+}
+
+## @description Checks if the given path corresponds to a change
+## in C/C++ files or related to C/C++ build system
+## @param path
+## @return 0 if yes
+## @return 1 if no
+is_c_cpp_change() {
+ shopt -s nocasematch
+
+ local path=$1
+ declare filename
+ filename=$(basename -- "${path}")
+ extension=${filename##*.}
+
+ if is_c_cpp_extension "${extension}"; then
+ echo "Found C/C++ changes in ${path}"
+ return 0
+ fi
+
+ if [[ $filename =~ CMakeLists\.txt ]]; then
+ echo "Found C/C++ build related changes in ${path}"
+ return 0
+ fi
+ return 1
+}
+
+## @description Check if the CI needs to be run - CI will always run if
+## IS_OPTIONAL is 0, or if there's any change in
+## C/C++ files or C/C++ build or platform
+## @return 0 if yes
+## @return 1 if no
+function check_ci_run() {
+ # Get the first commit of this PR relative to the trunk branch
+ firstCommitOfThisPr=$(git --git-dir "${SOURCEDIR}/.git" rev-parse origin/trunk)
+
+ # Loop over the paths of all the changed files and check if the criteria
+ # to run the CI has been satisfied
+ for path in $(git --git-dir "${SOURCEDIR}/.git" diff --name-only "${firstCommitOfThisPr}" HEAD); do
+ if is_c_cpp_change "${path}"; then
+ return 0
+ fi
+
+ if is_platform_change "${path}"; then
+ return 0
+ fi
+ done
+
+ # We must run the CI if it's not optional
+ if [ "$IS_OPTIONAL" -eq 0 ]; then
+ return 0
+ fi
+ return 1
+}
+
+## @description Run the CI using YETUS
+function run_ci() {
+ TESTPATCHBIN="${WORKSPACE}/${YETUS}/precommit/src/main/shell/test-patch.sh"
+
+ # this must be clean for every run
+ if [[ -d "${PATCHDIR}" ]]; then
+ rm -rf "${PATCHDIR:?}"
+ fi
+ mkdir -p "${PATCHDIR}"
+
+ # if given a JIRA issue, process it. If CHANGE_URL is set
+ # (e.g., Github Branch Source plugin), process it.
+ # otherwise exit, because we don't want Hadoop to do a
+ # full build. We wouldn't normally do this check for smaller
+ # projects. :)
+ if [[ -n "${JIRA_ISSUE_KEY}" ]]; then
+ YETUS_ARGS+=("${JIRA_ISSUE_KEY}")
+ elif [[ -z "${CHANGE_URL}" ]]; then
+ echo "Full build skipped" >"${PATCHDIR}/report.html"
+ exit 0
+ fi
+
+ YETUS_ARGS+=("--patch-dir=${PATCHDIR}")
+
+ # where the source is located
+ YETUS_ARGS+=("--basedir=${SOURCEDIR}")
+
+ # our project defaults come from a personality file
+ YETUS_ARGS+=("--project=hadoop")
+ YETUS_ARGS+=("--personality=${SOURCEDIR}/dev-support/bin/hadoop.sh")
+
+ # lots of different output formats
+ YETUS_ARGS+=("--brief-report-file=${PATCHDIR}/brief.txt")
+ YETUS_ARGS+=("--console-report-file=${PATCHDIR}/console.txt")
+ YETUS_ARGS+=("--html-report-file=${PATCHDIR}/report.html")
+
+ # enable writing back to Github
+ YETUS_ARGS+=("--github-token=${GITHUB_TOKEN}")
+
+ # auto-kill any surefire stragglers during unit test runs
+ YETUS_ARGS+=("--reapermode=kill")
+
+ # set relatively high limits for ASF machines
+ # changing these to higher values may cause problems
+ # with other jobs on systemd-enabled machines
+ YETUS_ARGS+=("--proclimit=5500")
+ YETUS_ARGS+=("--dockermemlimit=22g")
+
+ # -1 spotbugs issues that show up prior to the patch being applied
+ YETUS_ARGS+=("--spotbugs-strict-precheck")
+
+ # rsync these files back into the archive dir
+ YETUS_ARGS+=("--archive-list=checkstyle-errors.xml,spotbugsXml.xml")
+
+ # URL for user-side presentation in reports and such to our artifacts
+ # (needs to match the archive bits below)
+ YETUS_ARGS+=("--build-url-artifacts=artifact/out")
+
+ # plugins to enable
+ YETUS_ARGS+=("--plugins=all,-jira")
+
+ # don't let these tests cause -1s because we aren't really paying that
+ # much attention to them
+ YETUS_ARGS+=("--tests-filter=checkstyle")
+
+ # run in docker mode and specifically point to our
+ # Dockerfile since we don't want to use the auto-pulled version.
+ YETUS_ARGS+=("--docker")
+ YETUS_ARGS+=("--dockerfile=${DOCKERFILE}")
+ YETUS_ARGS+=("--mvn-custom-repos")
+
+ # effectively treat dev-suport as a custom maven module
+ YETUS_ARGS+=("--skip-dirs=dev-support")
+
+ # help keep the ASF boxes clean
+ YETUS_ARGS+=("--sentinel")
+
+ # test with Java 8 and 11
+ YETUS_ARGS+=("--java-home=/usr/lib/jvm/java-8-openjdk-amd64")
+ YETUS_ARGS+=("--multijdkdirs=/usr/lib/jvm/java-11-openjdk-amd64")
+ YETUS_ARGS+=("--multijdktests=compile")
+
+ # custom javadoc goals
+ YETUS_ARGS+=("--mvn-javadoc-goals=process-sources,javadoc:javadoc-no-fork")
+
+ # write Yetus report as GitHub comment (YETUS-1102)
+ YETUS_ARGS+=("--github-write-comment")
+ YETUS_ARGS+=("--github-use-emoji-vote")
+
+ "${TESTPATCHBIN}" "${YETUS_ARGS[@]}"
+}
+
+## @description Cleans up the processes started by YETUS
+function cleanup_ci_proc() {
+ # See YETUS-764
+ if [ -f "${PATCHDIR}/pidfile.txt" ]; then
+ echo "test-patch process appears to still be running: killing"
+ kill "$(cat "${PATCHDIR}/pidfile.txt")" || true
+ sleep 10
+ fi
+ if [ -f "${PATCHDIR}/cidfile.txt" ]; then
+ echo "test-patch container appears to still be running: killing"
+ docker kill "$(cat "${PATCHDIR}/cidfile.txt")" || true
+ fi
+}
+
+## @description Invokes github_status_recovery in YETUS's precommit
+function github_status_recovery() {
+ YETUS_ARGS+=("--github-token=${GITHUB_TOKEN}")
+ YETUS_ARGS+=("--patch-dir=${PATCHDIR}")
+ TESTPATCHBIN="${WORKSPACE}/${YETUS}/precommit/src/main/shell/github-status-recovery.sh"
+ /usr/bin/env bash "${TESTPATCHBIN}" "${YETUS_ARGS[@]}" "${EXTRA_ARGS}" || true
+}
+
+if [ -z "$1" ]; then
+ echo "Must specify an argument for jenkins.sh"
+ echo "run_ci - Runs the CI based on platform image as defined by DOCKERFILE"
+ echo "cleanup_ci_proc - Cleans up the processes spawned for running the CI"
+ echo "github_status_recovery - Sends Github status (refer to YETUS precommit for more details)"
+ exit 1
+fi
+
+# Process arguments to jenkins.sh
+if [ "$1" == "run_ci" ]; then
+ # Check if the CI needs to be run, if so, do so :)
+ if check_ci_run; then
+ run_ci
+ else
+ echo "No C/C++ file or C/C++ build or platform changes found, will not run CI for this platform"
+ fi
+elif [ "$1" == "cleanup_ci_proc" ]; then
+ cleanup_ci_proc
+elif [ "$1" == "github_status_recovery" ]; then
+ github_status_recovery
+else
+ echo "Don't know how to process $1"
+ exit 1
+fi
diff --git a/hadoop-build-tools/src/main/resources/checkstyle/checkstyle.xml b/hadoop-build-tools/src/main/resources/checkstyle/checkstyle.xml
index 51f9acc4015ce..ca8d137dd5e49 100644
--- a/hadoop-build-tools/src/main/resources/checkstyle/checkstyle.xml
+++ b/hadoop-build-tools/src/main/resources/checkstyle/checkstyle.xml
@@ -69,7 +69,9 @@
-
+
+
+
@@ -120,9 +122,8 @@
-
-
-
+
+
@@ -158,7 +159,9 @@
-
+
+
+
diff --git a/hadoop-client-modules/hadoop-client-api/pom.xml b/hadoop-client-modules/hadoop-client-api/pom.xml
index 8f3de76ca9462..9b70f8c8b01e0 100644
--- a/hadoop-client-modules/hadoop-client-api/pom.xml
+++ b/hadoop-client-modules/hadoop-client-api/pom.xml
@@ -67,6 +67,13 @@
+
+
+ org.xerial.snappy
+ snappy-java
+
@@ -87,6 +94,10 @@
org.apache.maven.pluginsmaven-shade-plugin
+
+ true
+ true
+ org.apache.hadoop
@@ -105,6 +116,10 @@
org.apache.hadoop:*
+
+
+ org.xerial.snappy:*
+
@@ -126,9 +141,7 @@
org/apache/hadoop/*org/apache/hadoop/**/*
-
- org/apache/htrace/*
- org/apache/htrace/**/*
+
org/slf4j/*org/slf4j/**/*org/apache/commons/logging/*
@@ -145,6 +158,9 @@
org/xml/sax/**/*org/bouncycastle/*org/bouncycastle/**/*
+
+ org/xerial/snappy/*
+ org/xerial/snappy/**/*
@@ -163,6 +179,8 @@
com/sun/security/**/*com/sun/jndi/**/*com/sun/management/**/*
+ com/ibm/security/*
+ com/ibm/security/**/*
@@ -221,6 +239,9 @@
net/topology/*net/topology/**/*
+
+ net/jpountz/*
+ net/jpountz/**/*
diff --git a/hadoop-client-modules/hadoop-client-check-invariants/pom.xml b/hadoop-client-modules/hadoop-client-check-invariants/pom.xml
index 144f2a66ff7d7..b1c00678406d7 100644
--- a/hadoop-client-modules/hadoop-client-check-invariants/pom.xml
+++ b/hadoop-client-modules/hadoop-client-check-invariants/pom.xml
@@ -56,7 +56,7 @@
org.codehaus.mojoextra-enforcer-rules
- 1.0-beta-3
+ 1.5.1
@@ -80,8 +80,6 @@
but enforcer still sees it.
-->
org.apache.hadoop:hadoop-annotations
-
- org.apache.htrace:htrace-core4org.slf4j:slf4j-api
@@ -92,6 +90,8 @@
com.google.code.findbugs:jsr305org.bouncycastle:*
+
+ org.xerial.snappy:*
diff --git a/hadoop-client-modules/hadoop-client-check-invariants/src/test/resources/ensure-jars-have-correct-contents.sh b/hadoop-client-modules/hadoop-client-check-invariants/src/test/resources/ensure-jars-have-correct-contents.sh
index 7242ade356fda..2e927402d2542 100644
--- a/hadoop-client-modules/hadoop-client-check-invariants/src/test/resources/ensure-jars-have-correct-contents.sh
+++ b/hadoop-client-modules/hadoop-client-check-invariants/src/test/resources/ensure-jars-have-correct-contents.sh
@@ -67,6 +67,8 @@ allowed_expr+="|^krb5_udp-template.conf$"
# Jetty uses this style sheet for directory listings. TODO ensure our
# internal use of jetty disallows directory listings and remove this.
allowed_expr+="|^jetty-dir.css$"
+# Snappy java is native library. We cannot relocate it to under org/apache/hadoop.
+allowed_expr+="|^org/xerial/"
allowed_expr+=")"
declare -i bad_artifacts=0
diff --git a/hadoop-client-modules/hadoop-client-check-test-invariants/pom.xml b/hadoop-client-modules/hadoop-client-check-test-invariants/pom.xml
index 1a5d27ce213aa..0e576ac6f0666 100644
--- a/hadoop-client-modules/hadoop-client-check-test-invariants/pom.xml
+++ b/hadoop-client-modules/hadoop-client-check-test-invariants/pom.xml
@@ -60,7 +60,7 @@
org.codehaus.mojoextra-enforcer-rules
- 1.0-beta-3
+ 1.5.1
@@ -84,8 +84,6 @@
but enforcer still sees it.
-->
org.apache.hadoop:hadoop-annotations
-
- org.apache.htrace:htrace-core4org.slf4j:slf4j-api
@@ -100,6 +98,8 @@
com.google.code.findbugs:jsr305org.bouncycastle:*
+
+ org.xerial.snappy:*
diff --git a/hadoop-client-modules/hadoop-client-check-test-invariants/src/test/resources/ensure-jars-have-correct-contents.sh b/hadoop-client-modules/hadoop-client-check-test-invariants/src/test/resources/ensure-jars-have-correct-contents.sh
index d77424e6b7899..0dbfefbf4f16d 100644
--- a/hadoop-client-modules/hadoop-client-check-test-invariants/src/test/resources/ensure-jars-have-correct-contents.sh
+++ b/hadoop-client-modules/hadoop-client-check-test-invariants/src/test/resources/ensure-jars-have-correct-contents.sh
@@ -58,13 +58,6 @@ allowed_expr+="|^org.apache.hadoop.application-classloader.properties$"
allowed_expr+="|^java.policy$"
# * Used by javax.annotation
allowed_expr+="|^jndi.properties$"
-# * allowing native libraries from rocksdb. Leaving native libraries as it is.
-allowed_expr+="|^librocksdbjni-linux32.so"
-allowed_expr+="|^librocksdbjni-linux64.so"
-allowed_expr+="|^librocksdbjni-osx.jnilib"
-allowed_expr+="|^librocksdbjni-win64.dll"
-allowed_expr+="|^librocksdbjni-linux-ppc64le.so"
-
allowed_expr+=")"
declare -i bad_artifacts=0
diff --git a/hadoop-client-modules/hadoop-client-integration-tests/pom.xml b/hadoop-client-modules/hadoop-client-integration-tests/pom.xml
index 978918e406773..ba593ebd1b42d 100644
--- a/hadoop-client-modules/hadoop-client-integration-tests/pom.xml
+++ b/hadoop-client-modules/hadoop-client-integration-tests/pom.xml
@@ -52,6 +52,11 @@
junittest
+
+ org.lz4
+ lz4-java
+ test
+
@@ -179,6 +184,12 @@
hadoop-hdfstesttest-jar
+
+
+ org.ow2.asm
+ asm-commons
+
+ org.apache.hadoop
@@ -186,6 +197,12 @@
testtest-jar
+
+ org.apache.hadoop
+ hadoop-common
+ test
+ test-jar
+
diff --git a/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseHadoopCodecs.java b/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseHadoopCodecs.java
new file mode 100644
index 0000000000000..fd0effa143b95
--- /dev/null
+++ b/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseHadoopCodecs.java
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+
+package org.apache.hadoop.example;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+
+import java.io.*;
+import java.util.Arrays;
+import java.util.Random;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.io.DataOutputBuffer;
+import org.apache.hadoop.io.RandomDatum;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionInputStream;
+import org.apache.hadoop.io.compress.CompressionOutputStream;
+import org.apache.hadoop.io.compress.zlib.ZlibFactory;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Ensure that we can perform codec operations given the API and runtime jars
+ * by performing some simple smoke tests.
+ */
+public class ITUseHadoopCodecs {
+
+ private static final Logger LOG = LoggerFactory.getLogger(ITUseHadoopCodecs.class);
+
+ private Configuration haddopConf = new Configuration();
+ private int dataCount = 100;
+ private int dataSeed = new Random().nextInt();
+
+ @Test
+ public void testGzipCodec() throws IOException {
+ ZlibFactory.setNativeZlibLoaded(false);
+ assertFalse(ZlibFactory.isNativeZlibLoaded(haddopConf));
+ codecTest(haddopConf, dataSeed, 0, "org.apache.hadoop.io.compress.GzipCodec");
+ codecTest(haddopConf, dataSeed, dataCount, "org.apache.hadoop.io.compress.GzipCodec");
+ }
+
+ @Test
+ public void testSnappyCodec() throws IOException {
+ codecTest(haddopConf, dataSeed, 0, "org.apache.hadoop.io.compress.SnappyCodec");
+ codecTest(haddopConf, dataSeed, dataCount, "org.apache.hadoop.io.compress.SnappyCodec");
+ }
+
+ @Test
+ public void testLz4Codec() {
+ Arrays.asList(false, true).forEach(config -> {
+ haddopConf.setBoolean(
+ CommonConfigurationKeys.IO_COMPRESSION_CODEC_LZ4_USELZ4HC_KEY,
+ config);
+ try {
+ codecTest(haddopConf, dataSeed, 0, "org.apache.hadoop.io.compress.Lz4Codec");
+ codecTest(haddopConf, dataSeed, dataCount, "org.apache.hadoop.io.compress.Lz4Codec");
+ } catch (IOException e) {
+ throw new RuntimeException("failed when running codecTest", e);
+ }
+ });
+ }
+
+ private void codecTest(Configuration conf, int seed, int count, String codecClass)
+ throws IOException {
+
+ // Create the codec
+ CompressionCodec codec = null;
+ try {
+ codec = (CompressionCodec)
+ ReflectionUtils.newInstance(conf.getClassByName(codecClass), conf);
+ } catch (ClassNotFoundException cnfe) {
+ throw new IOException("Illegal codec!");
+ }
+ LOG.info("Created a Codec object of type: " + codecClass);
+
+ // Generate data
+ DataOutputBuffer data = new DataOutputBuffer();
+ RandomDatum.Generator generator = new RandomDatum.Generator(seed);
+ for(int i = 0; i < count; ++i) {
+ generator.next();
+ RandomDatum key = generator.getKey();
+ RandomDatum value = generator.getValue();
+
+ key.write(data);
+ value.write(data);
+ }
+ LOG.info("Generated " + count + " records");
+
+ // Compress data
+ DataOutputBuffer compressedDataBuffer = new DataOutputBuffer();
+ try (CompressionOutputStream deflateFilter =
+ codec.createOutputStream(compressedDataBuffer);
+ DataOutputStream deflateOut =
+ new DataOutputStream(new BufferedOutputStream(deflateFilter))) {
+ deflateOut.write(data.getData(), 0, data.getLength());
+ deflateOut.flush();
+ deflateFilter.finish();
+ }
+
+ // De-compress data
+ DataInputBuffer deCompressedDataBuffer = new DataInputBuffer();
+ deCompressedDataBuffer.reset(compressedDataBuffer.getData(), 0,
+ compressedDataBuffer.getLength());
+ DataInputBuffer originalData = new DataInputBuffer();
+ originalData.reset(data.getData(), 0, data.getLength());
+ try (CompressionInputStream inflateFilter =
+ codec.createInputStream(deCompressedDataBuffer);
+ DataInputStream originalIn =
+ new DataInputStream(new BufferedInputStream(originalData))) {
+
+ // Check
+ int expected;
+ do {
+ expected = originalIn.read();
+ assertEquals("Inflated stream read by byte does not match",
+ expected, inflateFilter.read());
+ } while (expected != -1);
+ }
+
+ LOG.info("SUCCESS! Completed checking " + count + " records");
+ }
+}
diff --git a/hadoop-client-modules/hadoop-client-minicluster/pom.xml b/hadoop-client-modules/hadoop-client-minicluster/pom.xml
index 70a627cdc06d2..d70198ac428fa 100644
--- a/hadoop-client-modules/hadoop-client-minicluster/pom.xml
+++ b/hadoop-client-modules/hadoop-client-minicluster/pom.xml
@@ -40,6 +40,12 @@
hadoop-client-apiruntime
+
+
+ org.xerial.snappy
+ snappy-java
+ runtime
+ org.apache.hadoophadoop-client-runtime
@@ -326,6 +332,10 @@
org.apache.hadoop.thirdpartyhadoop-shaded-guava
+
+ org.ow2.asm
+ asm-commons
+
- com.sun.jersey
- jersey-core
+ com.sun.jersey
+ jersey-coretrue
@@ -445,9 +455,19 @@
true
- com.sun.jersey
- jersey-servlet
+ com.sun.jersey
+ jersey-servlettrue
+
+
+ javax.servlet
+ servlet-api
+
+
+ javax.enterprise
+ cdi-api
+
+
@@ -672,7 +692,6 @@
org.apache.hadoop:hadoop-client-apiorg.apache.hadoop:hadoop-client-runtime
- org.apache.htrace:htrace-core4org.slf4j:slf4j-apicommons-logging:commons-loggingjunit:junit
@@ -683,6 +702,9 @@
org.bouncycastle:*
+
+ org.xerial.snappy:*
+ javax.ws.rs:javax.ws.rs-api
@@ -729,6 +751,12 @@
testdata/*
+
+ com.fasterxml.jackson.*:*
+
+ META-INF/versions/11/module-info.class
+
+
@@ -761,13 +789,6 @@
xml.xsd
-
-
- org.rocksdb:rocksdbjni
-
- HISTORY-JAVA.md
-
- org.eclipse.jetty:*
@@ -840,6 +861,18 @@
*/**
+
+ org.eclipse.jetty:jetty-util-ajax
+
+ */**
+
+
+
+ org.eclipse.jetty:jetty-server
+
+ jetty-dir.css
+
+
@@ -858,9 +891,7 @@
org/apache/hadoop/*org/apache/hadoop/**/*
-
- org/apache/htrace/*
- org/apache/htrace/**/*
+
org/slf4j/*org/slf4j/**/*org/apache/commons/logging/*
@@ -881,6 +912,9 @@
org/xml/sax/**/*org/bouncycastle/*org/bouncycastle/**/*
+
+ org/xerial/snappy/*
+ org/xerial/snappy/**/*
@@ -906,6 +940,8 @@
com/sun/security/**/*com/sun/jndi/**/*com/sun/management/**/*
+ com/ibm/security/*
+ com/ibm/security/**/*
@@ -999,6 +1035,9 @@
net/topology/*net/topology/**/*
+
+ net/jpountz/*
+ net/jpountz/**/*
diff --git a/hadoop-client-modules/hadoop-client-runtime/pom.xml b/hadoop-client-modules/hadoop-client-runtime/pom.xml
index ebaafff89bbb3..35fbd7665fb26 100644
--- a/hadoop-client-modules/hadoop-client-runtime/pom.xml
+++ b/hadoop-client-modules/hadoop-client-runtime/pom.xml
@@ -60,6 +60,12 @@
hadoop-client-apiruntime
+
+
+ org.xerial.snappy
+ snappy-java
+ runtime
+
@@ -75,15 +81,9 @@
-
- org.apache.htrace
- htrace-core4
- runtime
- org.slf4jslf4j-api
@@ -146,8 +146,6 @@
org.apache.hadoop:hadoop-client-api
-
- org.apache.htrace:htrace-core4org.slf4j:slf4j-api
@@ -163,6 +161,9 @@
org.ow2.asm:*org.bouncycastle:*
+
+ org.xerial.snappy:*
+ javax.ws.rs:javax.ws.rs-api
@@ -242,6 +243,12 @@
google/protobuf/**/*.proto
+
+ com.fasterxml.jackson.*:*
+
+ META-INF/versions/11/module-info.class
+
+
@@ -250,9 +257,7 @@
org/apache/hadoop/*org/apache/hadoop/**/*
-
- org/apache/htrace/*
- org/apache/htrace/**/*
+
org/slf4j/*org/slf4j/**/*org/apache/commons/logging/*
@@ -269,6 +274,9 @@
org/xml/sax/**/*org/bouncycastle/*org/bouncycastle/**/*
+
+ org/xerial/snappy/*
+ org/xerial/snappy/**/*
@@ -287,6 +295,8 @@
com/sun/security/**/*com/sun/jndi/**/*com/sun/management/**/*
+ com/ibm/security/*
+ com/ibm/security/**/*
@@ -359,6 +369,9 @@
net/topology/*net/topology/**/*
+
+ net/jpountz/*
+ net/jpountz/**/*
diff --git a/hadoop-cloud-storage-project/hadoop-cloud-storage/pom.xml b/hadoop-cloud-storage-project/hadoop-cloud-storage/pom.xml
index 11b092674cf4f..33d3f9578172f 100644
--- a/hadoop-cloud-storage-project/hadoop-cloud-storage/pom.xml
+++ b/hadoop-cloud-storage-project/hadoop-cloud-storage/pom.xml
@@ -101,6 +101,10 @@
org.apache.zookeeperzookeeper
+
+ org.projectlombok
+ lombok
+
@@ -133,5 +137,10 @@
hadoop-coscompile
+
+ org.apache.hadoop
+ hadoop-huaweicloud
+ compile
+
diff --git a/hadoop-cloud-storage-project/hadoop-cos/pom.xml b/hadoop-cloud-storage-project/hadoop-cos/pom.xml
index d18b09f450408..fa47e354c7998 100644
--- a/hadoop-cloud-storage-project/hadoop-cos/pom.xml
+++ b/hadoop-cloud-storage-project/hadoop-cos/pom.xml
@@ -64,10 +64,9 @@
- org.codehaus.mojo
- findbugs-maven-plugin
+ com.github.spotbugs
+ spotbugs-maven-plugin
- truetrue${basedir}/dev-support/findbugs-exclude.xml
diff --git a/hadoop-cloud-storage-project/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/CosNFileSystem.java b/hadoop-cloud-storage-project/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/CosNFileSystem.java
index 94b10ad44012b..4dda1260731d3 100644
--- a/hadoop-cloud-storage-project/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/CosNFileSystem.java
+++ b/hadoop-cloud-storage-project/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/CosNFileSystem.java
@@ -28,11 +28,11 @@
import java.util.HashMap;
import java.util.Set;
import java.util.TreeSet;
+import java.util.concurrent.ExecutorService;
import java.util.concurrent.TimeUnit;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
-import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ListeningExecutorService;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
@@ -71,8 +71,8 @@ public class CosNFileSystem extends FileSystem {
private String owner = "Unknown";
private String group = "Unknown";
- private ListeningExecutorService boundedIOThreadPool;
- private ListeningExecutorService boundedCopyThreadPool;
+ private ExecutorService boundedIOThreadPool;
+ private ExecutorService boundedCopyThreadPool;
public CosNFileSystem() {
}
diff --git a/hadoop-cloud-storage-project/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/auth/COSCredentialsProviderList.java b/hadoop-cloud-storage-project/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/auth/COSCredentialsProviderList.java
index d2d2f8c9a7cab..66ef4b1c6fd87 100644
--- a/hadoop-cloud-storage-project/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/auth/COSCredentialsProviderList.java
+++ b/hadoop-cloud-storage-project/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/auth/COSCredentialsProviderList.java
@@ -24,7 +24,7 @@
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicInteger;
-import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions;
+import org.apache.hadoop.util.Preconditions;
import com.qcloud.cos.auth.AnonymousCOSCredentials;
import com.qcloud.cos.auth.COSCredentials;
import com.qcloud.cos.auth.COSCredentialsProvider;
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/dev-support/findbugs-exclude.xml b/hadoop-cloud-storage-project/hadoop-huaweicloud/dev-support/findbugs-exclude.xml
new file mode 100644
index 0000000000000..40d78d0cd6cec
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/dev-support/findbugs-exclude.xml
@@ -0,0 +1,18 @@
+
+
+
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/pom.xml b/hadoop-cloud-storage-project/hadoop-huaweicloud/pom.xml
new file mode 100755
index 0000000000000..b96883b9ac80d
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/pom.xml
@@ -0,0 +1,191 @@
+
+
+
+ 4.0.0
+
+ org.apache.hadoop
+ hadoop-project
+ 3.4.0-SNAPSHOT
+ ../../hadoop-project
+
+ hadoop-huaweicloud
+ 3.4.0-SNAPSHOT
+ Apache Hadoop OBS support
+
+ This module contains code to support integration with OBS.
+ It also declares the dependencies needed to work with OBS services.
+
+ jar
+
+ UTF-8
+ true
+ 3.20.4.2
+
+
+
+
+ tests-off
+
+
+ src/test/resources/auth-keys.xml
+
+
+
+ true
+
+
+
+ tests-on
+
+
+ src/test/resources/auth-keys.xml
+
+
+
+ false
+
+
+
+
+
+
+
+ com.github.spotbugs
+ spotbugs-maven-plugin
+
+ true
+ ${basedir}/dev-support/findbugs-exclude.xml
+
+ Max
+
+
+
+ org.apache.maven.plugins
+ maven-surefire-plugin
+
+ 3600
+
+
+
+ org.apache.maven.plugins
+ maven-dependency-plugin
+
+
+ deplist
+ compile
+
+ list
+
+
+ ${project.basedir}/target/hadoop-cloud-storage-deps/${project.artifactId}.cloud-storage-optional.txt
+
+
+
+
+
+
+
+
+ org.apache.hadoop
+ hadoop-common
+ provided
+
+
+ jdk.tools
+ jdk.tools
+
+
+ org.javassist
+ javassist
+
+
+
+
+ org.apache.hadoop
+ hadoop-common
+ test
+ test-jar
+
+
+ junit
+ junit
+ ${junit.version}
+ test
+
+
+ org.mockito
+ mockito-all
+ 1.10.19
+ test
+
+
+ org.apache.hadoop
+ hadoop-mapreduce-client-jobclient
+ test
+
+
+ org.apache.hadoop
+ hadoop-yarn-server-tests
+ test
+ test-jar
+
+
+ org.apache.hadoop
+ hadoop-mapreduce-examples
+ test
+ jar
+
+
+ org.apache.hadoop
+ hadoop-distcp
+ test
+
+
+ org.apache.hadoop
+ hadoop-distcp
+ test
+ test-jar
+
+
+ com.huaweicloud
+ esdk-obs-java
+ ${esdk.version}
+
+
+ okio
+ com.squareup.okio
+
+
+ log4j-core
+ org.apache.logging.log4j
+
+
+ log4j-api
+ org.apache.logging.log4j
+
+
+
+
+ org.powermock
+ powermock-api-mockito
+ 1.7.4
+ test
+
+
+ org.powermock
+ powermock-module-junit4
+ 1.7.4
+ test
+
+
+
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/BasicSessionCredential.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/BasicSessionCredential.java
new file mode 100644
index 0000000000000..7110af101ae00
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/BasicSessionCredential.java
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+/**
+ * Interface class for getting basic session credential.
+ */
+public interface BasicSessionCredential {
+ /**
+ * Get OBS access key.
+ *
+ * @return OBS access key
+ */
+ String getOBSAccessKeyId();
+
+ /**
+ * Get OBS secret key.
+ *
+ * @return OBS secret key
+ */
+ String getOBSSecretKey();
+
+ /**
+ * Get session token.
+ *
+ * @return session token
+ */
+ String getSessionToken();
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/DefaultOBSClientFactory.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/DefaultOBSClientFactory.java
new file mode 100644
index 0000000000000..e46a21bba7ad4
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/DefaultOBSClientFactory.java
@@ -0,0 +1,361 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import com.obs.services.IObsCredentialsProvider;
+import com.obs.services.ObsClient;
+import com.obs.services.ObsConfiguration;
+import com.obs.services.internal.ext.ExtObsConfiguration;
+import com.obs.services.model.AuthTypeEnum;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.conf.Configured;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.net.URI;
+import java.util.Optional;
+
+/**
+ * The default factory implementation, which calls the OBS SDK to configure and
+ * create an {@link ObsClient} that communicates with the OBS service.
+ */
+class DefaultOBSClientFactory extends Configured implements OBSClientFactory {
+
+ /**
+ * Class logger.
+ */
+ private static final Logger LOG = LoggerFactory.getLogger(
+ DefaultOBSClientFactory.class);
+
+ /**
+ * Initializes all OBS SDK settings related to connection management.
+ *
+ * @param conf Hadoop configuration
+ * @param obsConf OBS SDK configuration
+ */
+ @SuppressWarnings("deprecation")
+ private static void initConnectionSettings(final Configuration conf,
+ final ExtObsConfiguration obsConf) {
+
+ obsConf.setMaxConnections(
+ OBSCommonUtils.intOption(conf, OBSConstants.MAXIMUM_CONNECTIONS,
+ OBSConstants.DEFAULT_MAXIMUM_CONNECTIONS,
+ 1));
+
+ boolean secureConnections = conf.getBoolean(
+ OBSConstants.SECURE_CONNECTIONS,
+ OBSConstants.DEFAULT_SECURE_CONNECTIONS);
+
+ obsConf.setHttpsOnly(secureConnections);
+
+ obsConf.setMaxErrorRetry(
+ OBSCommonUtils.intOption(conf, OBSConstants.MAX_ERROR_RETRIES,
+ OBSConstants.DEFAULT_MAX_ERROR_RETRIES, 0));
+
+ obsConf.setConnectionTimeout(
+ OBSCommonUtils.intOption(conf, OBSConstants.ESTABLISH_TIMEOUT,
+ OBSConstants.DEFAULT_ESTABLISH_TIMEOUT, 0));
+
+ obsConf.setSocketTimeout(
+ OBSCommonUtils.intOption(conf, OBSConstants.SOCKET_TIMEOUT,
+ OBSConstants.DEFAULT_SOCKET_TIMEOUT, 0));
+
+ obsConf.setIdleConnectionTime(
+ OBSCommonUtils.intOption(conf, OBSConstants.IDLE_CONNECTION_TIME,
+ OBSConstants.DEFAULT_IDLE_CONNECTION_TIME,
+ 1));
+
+ obsConf.setMaxIdleConnections(
+ OBSCommonUtils.intOption(conf, OBSConstants.MAX_IDLE_CONNECTIONS,
+ OBSConstants.DEFAULT_MAX_IDLE_CONNECTIONS,
+ 1));
+
+ obsConf.setReadBufferSize(
+ OBSCommonUtils.intOption(conf, OBSConstants.READ_BUFFER_SIZE,
+ OBSConstants.DEFAULT_READ_BUFFER_SIZE,
+ -1)); // to be
+ // modified
+ obsConf.setWriteBufferSize(
+ OBSCommonUtils.intOption(conf, OBSConstants.WRITE_BUFFER_SIZE,
+ OBSConstants.DEFAULT_WRITE_BUFFER_SIZE,
+ -1)); // to be
+ // modified
+ obsConf.setUploadStreamRetryBufferSize(
+ OBSCommonUtils.intOption(conf,
+ OBSConstants.UPLOAD_STREAM_RETRY_SIZE,
+ OBSConstants.DEFAULT_UPLOAD_STREAM_RETRY_SIZE, 1));
+
+ obsConf.setSocketReadBufferSize(
+ OBSCommonUtils.intOption(conf, OBSConstants.SOCKET_RECV_BUFFER,
+ OBSConstants.DEFAULT_SOCKET_RECV_BUFFER, -1));
+ obsConf.setSocketWriteBufferSize(
+ OBSCommonUtils.intOption(conf, OBSConstants.SOCKET_SEND_BUFFER,
+ OBSConstants.DEFAULT_SOCKET_SEND_BUFFER, -1));
+
+ obsConf.setKeepAlive(conf.getBoolean(OBSConstants.KEEP_ALIVE,
+ OBSConstants.DEFAULT_KEEP_ALIVE));
+ obsConf.setValidateCertificate(
+ conf.getBoolean(OBSConstants.VALIDATE_CERTIFICATE,
+ OBSConstants.DEFAULT_VALIDATE_CERTIFICATE));
+ obsConf.setVerifyResponseContentType(
+ conf.getBoolean(OBSConstants.VERIFY_RESPONSE_CONTENT_TYPE,
+ OBSConstants.DEFAULT_VERIFY_RESPONSE_CONTENT_TYPE));
+ obsConf.setCname(
+ conf.getBoolean(OBSConstants.CNAME, OBSConstants.DEFAULT_CNAME));
+ obsConf.setIsStrictHostnameVerification(
+ conf.getBoolean(OBSConstants.STRICT_HOSTNAME_VERIFICATION,
+ OBSConstants.DEFAULT_STRICT_HOSTNAME_VERIFICATION));
+
+ // sdk auth type negotiation enable
+ obsConf.setAuthTypeNegotiation(
+ conf.getBoolean(OBSConstants.SDK_AUTH_TYPE_NEGOTIATION_ENABLE,
+ OBSConstants.DEFAULT_SDK_AUTH_TYPE_NEGOTIATION_ENABLE));
+ // set SDK AUTH TYPE to OBS when auth type negotiation unenabled
+ if (!obsConf.isAuthTypeNegotiation()) {
+ obsConf.setAuthType(AuthTypeEnum.OBS);
+ }
+
+ // okhttp retryOnConnectionFailure switch, default set to true
+ obsConf.retryOnConnectionFailureInOkhttp(
+ conf.getBoolean(OBSConstants.SDK_RETRY_ON_CONNECTION_FAILURE_ENABLE,
+ OBSConstants.DEFAULT_SDK_RETRY_ON_CONNECTION_FAILURE_ENABLE));
+
+ // sdk max retry times on unexpected end of stream exception,
+ // default: -1 don't retry
+ int retryTime = conf.getInt(
+ OBSConstants.SDK_RETRY_TIMES_ON_UNEXPECTED_END_EXCEPTION,
+ OBSConstants.DEFAULT_SDK_RETRY_TIMES_ON_UNEXPECTED_END_EXCEPTION);
+ if (retryTime > 0
+ && retryTime < OBSConstants.DEFAULT_MAX_SDK_CONNECTION_RETRY_TIMES
+ || !obsConf.isRetryOnConnectionFailureInOkhttp() && retryTime < 0) {
+ retryTime = OBSConstants.DEFAULT_MAX_SDK_CONNECTION_RETRY_TIMES;
+ }
+ obsConf.setMaxRetryOnUnexpectedEndException(retryTime);
+ }
+
+ /**
+ * Initializes OBS SDK proxy support if configured.
+ *
+ * @param conf Hadoop configuration
+ * @param obsConf OBS SDK configuration
+ * @throws IllegalArgumentException if misconfigured
+ * @throws IOException on any failure to initialize proxy
+ */
+ private static void initProxySupport(final Configuration conf,
+ final ExtObsConfiguration obsConf)
+ throws IllegalArgumentException, IOException {
+ String proxyHost = conf.getTrimmed(OBSConstants.PROXY_HOST, "");
+ int proxyPort = conf.getInt(OBSConstants.PROXY_PORT, -1);
+
+ if (!proxyHost.isEmpty() && proxyPort < 0) {
+ if (conf.getBoolean(OBSConstants.SECURE_CONNECTIONS,
+ OBSConstants.DEFAULT_SECURE_CONNECTIONS)) {
+ LOG.warn("Proxy host set without port. Using HTTPS default "
+ + OBSConstants.DEFAULT_HTTPS_PORT);
+ obsConf.getHttpProxy()
+ .setProxyPort(OBSConstants.DEFAULT_HTTPS_PORT);
+ } else {
+ LOG.warn("Proxy host set without port. Using HTTP default "
+ + OBSConstants.DEFAULT_HTTP_PORT);
+ obsConf.getHttpProxy()
+ .setProxyPort(OBSConstants.DEFAULT_HTTP_PORT);
+ }
+ }
+ String proxyUsername = conf.getTrimmed(OBSConstants.PROXY_USERNAME);
+ String proxyPassword = null;
+ char[] proxyPass = conf.getPassword(OBSConstants.PROXY_PASSWORD);
+ if (proxyPass != null) {
+ proxyPassword = new String(proxyPass).trim();
+ }
+ if ((proxyUsername == null) != (proxyPassword == null)) {
+ String msg =
+ "Proxy error: " + OBSConstants.PROXY_USERNAME + " or "
+ + OBSConstants.PROXY_PASSWORD
+ + " set without the other.";
+ LOG.error(msg);
+ throw new IllegalArgumentException(msg);
+ }
+ obsConf.setHttpProxy(proxyHost, proxyPort, proxyUsername,
+ proxyPassword);
+ if (LOG.isDebugEnabled()) {
+ LOG.debug(
+ "Using proxy server {}:{} as user {} on "
+ + "domain {} as workstation {}",
+ obsConf.getHttpProxy().getProxyAddr(),
+ obsConf.getHttpProxy().getProxyPort(),
+ obsConf.getHttpProxy().getProxyUName(),
+ obsConf.getHttpProxy().getDomain(),
+ obsConf.getHttpProxy().getWorkstation());
+ }
+ }
+
+ /**
+ * Creates an {@link ObsClient} from the established configuration.
+ *
+ * @param conf Hadoop configuration
+ * @param obsConf ObsConfiguration
+ * @param name URL
+ * @return ObsClient client
+ * @throws IOException on any failure to create Huawei OBS client
+ */
+ private static ObsClient createHuaweiObsClient(final Configuration conf,
+ final ObsConfiguration obsConf, final URI name)
+ throws IOException {
+ Class> credentialsProviderClass;
+ BasicSessionCredential credentialsProvider;
+ ObsClient obsClient;
+
+ try {
+ credentialsProviderClass = conf.getClass(
+ OBSConstants.OBS_CREDENTIALS_PROVIDER, null);
+ } catch (RuntimeException e) {
+ Throwable c = e.getCause() != null ? e.getCause() : e;
+ throw new IOException(
+ "From option " + OBSConstants.OBS_CREDENTIALS_PROVIDER + ' '
+ + c, c);
+ }
+
+ if (credentialsProviderClass == null) {
+ return createObsClientWithoutCredentialsProvider(conf, obsConf,
+ name);
+ }
+
+ try {
+ Constructor> cons =
+ credentialsProviderClass.getDeclaredConstructor(URI.class,
+ Configuration.class);
+ credentialsProvider = (BasicSessionCredential) cons.newInstance(
+ name, conf);
+ } catch (NoSuchMethodException
+ | SecurityException
+ | IllegalAccessException
+ | InstantiationException
+ | InvocationTargetException e) {
+ Throwable c = e.getCause() != null ? e.getCause() : e;
+ throw new IOException(
+ "From option " + OBSConstants.OBS_CREDENTIALS_PROVIDER + ' '
+ + c, c);
+ }
+
+ String sessionToken = credentialsProvider.getSessionToken();
+ String ak = credentialsProvider.getOBSAccessKeyId();
+ String sk = credentialsProvider.getOBSSecretKey();
+ String endPoint = conf.getTrimmed(OBSConstants.ENDPOINT, "");
+ obsConf.setEndPoint(endPoint);
+ if (sessionToken != null && sessionToken.length() != 0) {
+ obsClient = new ObsClient(ak, sk, sessionToken, obsConf);
+ } else {
+ obsClient = new ObsClient(ak, sk, obsConf);
+ }
+ return obsClient;
+ }
+
+ private static ObsClient createObsClientWithoutCredentialsProvider(
+ final Configuration conf, final ObsConfiguration obsConf,
+ final URI name) throws IOException {
+ ObsClient obsClient;
+ OBSLoginHelper.Login creds = OBSCommonUtils.getOBSAccessKeys(name,
+ conf);
+
+ String ak = creds.getUser();
+ String sk = creds.getPassword();
+ String token = creds.getToken();
+
+ String endPoint = conf.getTrimmed(OBSConstants.ENDPOINT, "");
+ obsConf.setEndPoint(endPoint);
+
+ if (!StringUtils.isEmpty(ak) || !StringUtils.isEmpty(sk)) {
+ obsClient = new ObsClient(ak, sk, token, obsConf);
+ return obsClient;
+ }
+
+ Class> securityProviderClass;
+ try {
+ securityProviderClass = conf.getClass(
+ OBSConstants.OBS_SECURITY_PROVIDER, null);
+ LOG.info("From option {} get {}",
+ OBSConstants.OBS_SECURITY_PROVIDER, securityProviderClass);
+ } catch (RuntimeException e) {
+ Throwable c = e.getCause() != null ? e.getCause() : e;
+ throw new IOException(
+ "From option " + OBSConstants.OBS_SECURITY_PROVIDER + ' ' + c,
+ c);
+ }
+
+ if (securityProviderClass == null) {
+ obsClient = new ObsClient(ak, sk, token, obsConf);
+ return obsClient;
+ }
+
+ IObsCredentialsProvider securityProvider;
+ try {
+ Optional cons = tryGetConstructor(
+ securityProviderClass,
+ new Class[] {URI.class, Configuration.class});
+
+ if (cons.isPresent()) {
+ securityProvider = (IObsCredentialsProvider) cons.get()
+ .newInstance(name, conf);
+ } else {
+ securityProvider
+ = (IObsCredentialsProvider) securityProviderClass
+ .getDeclaredConstructor().newInstance();
+ }
+
+ } catch (NoSuchMethodException
+ | IllegalAccessException
+ | InstantiationException
+ | InvocationTargetException
+ | RuntimeException e) {
+ Throwable c = e.getCause() != null ? e.getCause() : e;
+ throw new IOException(
+ "From option " + OBSConstants.OBS_SECURITY_PROVIDER + ' ' + c,
+ c);
+ }
+ obsClient = new ObsClient(securityProvider, obsConf);
+
+ return obsClient;
+ }
+
+ public static Optional tryGetConstructor(final Class mainClss,
+ final Class[] args) {
+ try {
+ Constructor constructor = mainClss.getDeclaredConstructor(args);
+ return Optional.ofNullable(constructor);
+ } catch (NoSuchMethodException e) {
+ // ignore
+ return Optional.empty();
+ }
+ }
+
+ @Override
+ public ObsClient createObsClient(final URI name) throws IOException {
+ Configuration conf = getConf();
+ ExtObsConfiguration obsConf = new ExtObsConfiguration();
+ initConnectionSettings(conf, obsConf);
+ initProxySupport(conf, obsConf);
+
+ return createHuaweiObsClient(conf, obsConf, name);
+ }
+}
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestLocalMetadataStoreScale.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/FileConflictException.java
similarity index 60%
rename from hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestLocalMetadataStoreScale.java
rename to hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/FileConflictException.java
index 7477adeeb07b5..7384251b70830 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestLocalMetadataStoreScale.java
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/FileConflictException.java
@@ -16,23 +16,25 @@
* limitations under the License.
*/
-package org.apache.hadoop.fs.s3a.scale;
-
-import org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore;
-import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
-import org.apache.hadoop.fs.s3a.s3guard.S3Guard;
+package org.apache.hadoop.fs.obs;
import java.io.IOException;
/**
- * Scale test for LocalMetadataStore.
+ * OBS file conflict exception.
*/
-public class ITestLocalMetadataStoreScale
- extends AbstractITestS3AMetadataStoreScale {
- @Override
- public MetadataStore createMetadataStore() throws IOException {
- MetadataStore ms = new LocalMetadataStore();
- ms.initialize(getFileSystem(), new S3Guard.TtlTimeProvider(getConf()));
- return ms;
+class FileConflictException extends IOException {
+ private static final long serialVersionUID = -897856973823710492L;
+
+ /**
+ * Constructs a FileConflictException with the specified detail
+ * message. The string s can be retrieved later by the
+ * {@link Throwable#getMessage}
+ * method of class java.lang.Throwable.
+ *
+ * @param s the detail message.
+ */
+ FileConflictException(final String s) {
+ super(s);
}
}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBS.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBS.java
new file mode 100644
index 0000000000000..3f05f007ee578
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBS.java
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.DelegateToFileSystem;
+
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+
+/**
+ * OBS implementation of AbstractFileSystem, which delegates to the {@link
+ * OBSFileSystem}.
+ */
+@InterfaceAudience.Public
+@InterfaceStability.Evolving
+public final class OBS extends DelegateToFileSystem {
+
+ /**
+ * @param theUri URI of the file system
+ * @param conf Configuration for the file system
+ * @throws IOException on any failure to initialize this instance
+ * @throws URISyntaxException theUri has syntax error
+ */
+ public OBS(final URI theUri, final Configuration conf)
+ throws IOException, URISyntaxException {
+ super(theUri, new OBSFileSystem(), conf, "obs", false);
+ }
+
+ @Override
+ public int getUriDefaultPort() {
+ return OBSConstants.OBS_DEFAULT_PORT;
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSBlockOutputStream.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSBlockOutputStream.java
new file mode 100644
index 0000000000000..22c6cb5c350c9
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSBlockOutputStream.java
@@ -0,0 +1,814 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.util.Preconditions;
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.Futures;
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ListenableFuture;
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ListeningExecutorService;
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.MoreExecutors;
+import com.obs.services.exception.ObsException;
+import com.obs.services.model.CompleteMultipartUploadResult;
+import com.obs.services.model.PartEtag;
+import com.obs.services.model.PutObjectRequest;
+import com.obs.services.model.UploadPartRequest;
+import com.obs.services.model.UploadPartResult;
+import com.obs.services.model.fs.WriteFileRequest;
+import com.sun.istack.NotNull;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.Syncable;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+/**
+ * OBS output stream based on block buffering.
+ *
+ * Upload files/parts directly via different buffering mechanisms: including
+ * memory and disk.
+ *
+ *
If the stream is closed and no update has started, then the upload is
+ * instead done as a single PUT operation.
+ *
+ *
Unstable: statistics and error handling might evolve.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+class OBSBlockOutputStream extends OutputStream implements Syncable {
+
+ /**
+ * Class logger.
+ */
+ private static final Logger LOG = LoggerFactory.getLogger(
+ OBSBlockOutputStream.class);
+
+ /**
+ * Owner FileSystem.
+ */
+ private final OBSFileSystem fs;
+
+ /**
+ * Key of the object being uploaded.
+ */
+ private final String key;
+
+ /**
+ * Length of object.
+ */
+ private long objectLen;
+
+ /**
+ * Size of all blocks.
+ */
+ private final int blockSize;
+
+ /**
+ * Callback for progress.
+ */
+ private final ListeningExecutorService executorService;
+
+ /**
+ * Factory for creating blocks.
+ */
+ private final OBSDataBlocks.BlockFactory blockFactory;
+
+ /**
+ * Preallocated byte buffer for writing single characters.
+ */
+ private final byte[] singleCharWrite = new byte[1];
+
+ /**
+ * Closed flag.
+ */
+ private final AtomicBoolean closed = new AtomicBoolean(false);
+
+ /**
+ * Has exception flag.
+ */
+ private final AtomicBoolean hasException = new AtomicBoolean(false);
+
+ /**
+ * Has flushed flag.
+ */
+ private final AtomicBoolean appendAble;
+
+ /**
+ * Multipart upload details; null means none started.
+ */
+ private MultiPartUpload multiPartUpload;
+
+ /**
+ * Current data block. Null means none currently active.
+ */
+ private OBSDataBlocks.DataBlock activeBlock;
+
+ /**
+ * Count of blocks uploaded.
+ */
+ private long blockCount = 0;
+
+ /**
+ * Write operation helper; encapsulation of the filesystem operations.
+ */
+ private OBSWriteOperationHelper writeOperationHelper;
+
+ /**
+ * Flag for mocking upload part error.
+ */
+ private boolean mockUploadPartError = false;
+
+ /**
+ * An OBS output stream which uploads partitions in a separate pool of
+ * threads; different {@link OBSDataBlocks.BlockFactory} instances can control
+ * where data is buffered.
+ *
+ * @param owner OBSFilesystem
+ * @param obsObjectKey OBS object to work on
+ * @param objLen object length
+ * @param execService the executor service to use to schedule work
+ * @param isAppendable if append is supported
+ * @throws IOException on any problem
+ */
+ OBSBlockOutputStream(
+ final OBSFileSystem owner,
+ final String obsObjectKey,
+ final long objLen,
+ final ExecutorService execService,
+ final boolean isAppendable)
+ throws IOException {
+ this.appendAble = new AtomicBoolean(isAppendable);
+ this.fs = owner;
+ this.key = obsObjectKey;
+ this.objectLen = objLen;
+ this.blockFactory = owner.getBlockFactory();
+ this.blockSize = (int) owner.getPartSize();
+ this.writeOperationHelper = owner.getWriteHelper();
+ Preconditions.checkArgument(
+ owner.getPartSize() >= OBSConstants.MULTIPART_MIN_SIZE,
+ "Block size is too small: %d", owner.getPartSize());
+ this.executorService = MoreExecutors.listeningDecorator(
+ execService);
+ this.multiPartUpload = null;
+ // create that first block. This guarantees that an open + close
+ // sequence writes a 0-byte entry.
+ createBlockIfNeeded();
+ LOG.debug(
+ "Initialized OBSBlockOutputStream for {}" + " output to {}",
+ owner.getWriteHelper(),
+ activeBlock);
+ }
+
+ /**
+ * Demand create a destination block.
+ *
+ * @return the active block; null if there isn't one.
+ * @throws IOException on any failure to create
+ */
+ private synchronized OBSDataBlocks.DataBlock createBlockIfNeeded()
+ throws IOException {
+ if (activeBlock == null) {
+ blockCount++;
+ if (blockCount >= OBSConstants.MAX_MULTIPART_COUNT) {
+ LOG.warn(
+ "Number of partitions in stream exceeds limit for OBS: "
+ + OBSConstants.MAX_MULTIPART_COUNT
+ + " write may fail.");
+ }
+ activeBlock = blockFactory.create(blockCount, this.blockSize);
+ }
+ return activeBlock;
+ }
+
+ /**
+ * Synchronized accessor to the active block.
+ *
+ * @return the active block; null if there isn't one.
+ */
+ synchronized OBSDataBlocks.DataBlock getActiveBlock() {
+ return activeBlock;
+ }
+
+ /**
+ * Set mock error.
+ *
+ * @param isException mock error
+ */
+ @VisibleForTesting
+ public void mockPutPartError(final boolean isException) {
+ this.mockUploadPartError = isException;
+ }
+
+ /**
+ * Predicate to query whether or not there is an active block.
+ *
+ * @return true if there is an active block.
+ */
+ private synchronized boolean hasActiveBlock() {
+ return activeBlock != null;
+ }
+
+ /**
+ * Clear the active block.
+ */
+ private synchronized void clearActiveBlock() {
+ if (activeBlock != null) {
+ LOG.debug("Clearing active block");
+ }
+ activeBlock = null;
+ }
+
+ /**
+ * Check for the filesystem being open.
+ *
+ * @throws IOException if the filesystem is closed.
+ */
+ private void checkOpen() throws IOException {
+ if (closed.get()) {
+ throw new IOException(
+ "Filesystem " + writeOperationHelper.toString(key) + " closed");
+ }
+ }
+
+ /**
+ * The flush operation does not trigger an upload; that awaits the next block
+ * being full. What it does do is call {@code flush() } on the current block,
+ * leaving it to choose how to react.
+ *
+ * @throws IOException Any IO problem.
+ */
+ @Override
+ public synchronized void flush() throws IOException {
+ checkOpen();
+ OBSDataBlocks.DataBlock dataBlock = getActiveBlock();
+ if (dataBlock != null) {
+ dataBlock.flush();
+ }
+ }
+
+ /**
+ * Writes a byte to the destination. If this causes the buffer to reach its
+ * limit, the actual upload is submitted to the threadpool.
+ *
+ * @param b the int of which the lowest byte is written
+ * @throws IOException on any problem
+ */
+ @Override
+ public synchronized void write(final int b) throws IOException {
+ singleCharWrite[0] = (byte) b;
+ write(singleCharWrite, 0, 1);
+ }
+
+ /**
+ * Writes a range of bytes from to the memory buffer. If this causes the
+ * buffer to reach its limit, the actual upload is submitted to the threadpool
+ * and the remainder of the array is written to memory (recursively).
+ *
+ * @param source byte array containing
+ * @param offset offset in array where to start
+ * @param len number of bytes to be written
+ * @throws IOException on any problem
+ */
+ @Override
+ public synchronized void write(@NotNull final byte[] source,
+ final int offset, final int len)
+ throws IOException {
+ if (hasException.get()) {
+ String closeWarning = String.format(
+ "write has error. bs : pre upload obs[%s] has error.", key);
+ LOG.warn(closeWarning);
+ throw new IOException(closeWarning);
+ }
+ OBSDataBlocks.validateWriteArgs(source, offset, len);
+ checkOpen();
+ if (len == 0) {
+ return;
+ }
+
+ OBSDataBlocks.DataBlock block = createBlockIfNeeded();
+ int written = block.write(source, offset, len);
+ int remainingCapacity = block.remainingCapacity();
+ try {
+ innerWrite(source, offset, len, written, remainingCapacity);
+ } catch (IOException e) {
+ LOG.error(
+ "Write data for key {} of bucket {} error, error message {}",
+ key, fs.getBucket(),
+ e.getMessage());
+ throw e;
+ }
+ }
+
+ private synchronized void innerWrite(final byte[] source, final int offset,
+ final int len,
+ final int written, final int remainingCapacity)
+ throws IOException {
+
+ if (written < len) {
+ // not everything was written the block has run out
+ // of capacity
+ // Trigger an upload then process the remainder.
+ LOG.debug(
+ "writing more data than block has capacity -triggering upload");
+ if (appendAble.get()) {
+ // to write a buffer then append to obs
+ LOG.debug("[Append] open stream and single write size {} "
+ + "greater than buffer size {}, append buffer to obs.",
+ len, blockSize);
+ flushCurrentBlock();
+ } else {
+ // block output stream logic, multi-part upload
+ uploadCurrentBlock();
+ }
+ // tail recursion is mildly expensive, but given buffer sizes
+ // must be MB. it's unlikely to recurse very deeply.
+ this.write(source, offset + written, len - written);
+ } else {
+ if (remainingCapacity == 0) {
+ // the whole buffer is done, trigger an upload
+ if (appendAble.get()) {
+ // to write a buffer then append to obs
+ LOG.debug("[Append] open stream and already write size "
+ + "equal to buffer size {}, append buffer to obs.",
+ blockSize);
+ flushCurrentBlock();
+ } else {
+ // block output stream logic, multi-part upload
+ uploadCurrentBlock();
+ }
+ }
+ }
+ }
+
+ /**
+ * Start an asynchronous upload of the current block.
+ *
+ * @throws IOException Problems opening the destination for upload or
+ * initializing the upload.
+ */
+ private synchronized void uploadCurrentBlock() throws IOException {
+ Preconditions.checkState(hasActiveBlock(), "No active block");
+ LOG.debug("Writing block # {}", blockCount);
+
+ try {
+ if (multiPartUpload == null) {
+ LOG.debug("Initiating Multipart upload");
+ multiPartUpload = new MultiPartUpload();
+ }
+ multiPartUpload.uploadBlockAsync(getActiveBlock());
+ } catch (IOException e) {
+ hasException.set(true);
+ LOG.error("Upload current block on ({}/{}) failed.", fs.getBucket(),
+ key, e);
+ throw e;
+ } finally {
+ // set the block to null, so the next write will create a new block.
+ clearActiveBlock();
+ }
+ }
+
+ /**
+ * Close the stream.
+ *
+ *
This will not return until the upload is complete or the attempt to
+ * perform the upload has failed. Exceptions raised in this method are
+ * indicative that the write has failed and data is at risk of being lost.
+ *
+ * @throws IOException on any failure.
+ */
+ @Override
+ public synchronized void close() throws IOException {
+ if (closed.getAndSet(true)) {
+ // already closed
+ LOG.debug("Ignoring close() as stream is already closed");
+ return;
+ }
+ if (hasException.get()) {
+ String closeWarning = String.format(
+ "closed has error. bs : pre write obs[%s] has error.", key);
+ LOG.warn(closeWarning);
+ throw new IOException(closeWarning);
+ }
+ // do upload
+ completeCurrentBlock();
+
+ // clear
+ clearHFlushOrSync();
+
+ // All end of write operations, including deleting fake parent
+ // directories
+ writeOperationHelper.writeSuccessful(key);
+ }
+
+ /**
+ * If flush has take place, need to append file, else to put object.
+ *
+ * @throws IOException any problem in append or put object
+ */
+ private synchronized void putObjectIfNeedAppend() throws IOException {
+ if (appendAble.get() && fs.exists(
+ OBSCommonUtils.keyToQualifiedPath(fs, key))) {
+ appendFsFile();
+ } else {
+ putObject();
+ }
+ }
+
+ /**
+ * Append posix file.
+ *
+ * @throws IOException any problem
+ */
+ private synchronized void appendFsFile() throws IOException {
+ LOG.debug("bucket is posix, to append file. key is {}", key);
+ final OBSDataBlocks.DataBlock block = getActiveBlock();
+ WriteFileRequest writeFileReq;
+ if (block instanceof OBSDataBlocks.DiskBlock) {
+ writeFileReq = OBSCommonUtils.newAppendFileRequest(fs, key,
+ objectLen, (File) block.startUpload());
+ } else {
+ writeFileReq = OBSCommonUtils.newAppendFileRequest(fs, key,
+ objectLen, (InputStream) block.startUpload());
+ }
+ OBSCommonUtils.appendFile(fs, writeFileReq);
+ objectLen += block.dataSize();
+ }
+
+ /**
+ * Upload the current block as a single PUT request; if the buffer is empty a
+ * 0-byte PUT will be invoked, as it is needed to create an entry at the far
+ * end.
+ *
+ * @throws IOException any problem.
+ */
+ private synchronized void putObject() throws IOException {
+ LOG.debug("Executing regular upload for {}",
+ writeOperationHelper.toString(key));
+
+ final OBSDataBlocks.DataBlock block = getActiveBlock();
+ clearActiveBlock();
+ final int size = block.dataSize();
+ final PutObjectRequest putObjectRequest;
+ if (block instanceof OBSDataBlocks.DiskBlock) {
+ putObjectRequest = writeOperationHelper.newPutRequest(key,
+ (File) block.startUpload());
+
+ } else {
+ putObjectRequest =
+ writeOperationHelper.newPutRequest(key,
+ (InputStream) block.startUpload(), size);
+
+ }
+ putObjectRequest.setAcl(fs.getCannedACL());
+ fs.getSchemeStatistics().incrementWriteOps(1);
+ try {
+ // the putObject call automatically closes the input
+ // stream afterwards.
+ writeOperationHelper.putObject(putObjectRequest);
+ } finally {
+ OBSCommonUtils.closeAll(block);
+ }
+ }
+
+ @Override
+ public synchronized String toString() {
+ final StringBuilder sb = new StringBuilder("OBSBlockOutputStream{");
+ sb.append(writeOperationHelper.toString());
+ sb.append(", blockSize=").append(blockSize);
+ OBSDataBlocks.DataBlock block = activeBlock;
+ if (block != null) {
+ sb.append(", activeBlock=").append(block);
+ }
+ sb.append('}');
+ return sb.toString();
+ }
+
+ public synchronized void sync() {
+ // need to do
+ }
+
+ @Override
+ public synchronized void hflush() throws IOException {
+ // hflush hsyn same
+ flushOrSync();
+ }
+
+ /**
+ * Flush local file or multipart to obs. focus: not posix bucket is not
+ * support
+ *
+ * @throws IOException io exception
+ */
+ private synchronized void flushOrSync() throws IOException {
+
+ checkOpen();
+ if (hasException.get()) {
+ String flushWarning = String.format(
+ "flushOrSync has error. bs : pre write obs[%s] has error.",
+ key);
+ LOG.warn(flushWarning);
+ throw new IOException(flushWarning);
+ }
+ if (fs.isFsBucket()) {
+ // upload
+ flushCurrentBlock();
+
+ // clear
+ clearHFlushOrSync();
+ } else {
+ LOG.warn("not posix bucket, not support hflush or hsync.");
+ flush();
+ }
+ }
+
+ /**
+ * Clear for hflush or hsync.
+ */
+ private synchronized void clearHFlushOrSync() {
+ appendAble.set(true);
+ multiPartUpload = null;
+ }
+
+ /**
+ * Upload block to obs.
+ *
+ * @param block block
+ * @param hasBlock jungle if has block
+ * @throws IOException io exception
+ */
+ private synchronized void uploadWriteBlocks(
+ final OBSDataBlocks.DataBlock block,
+ final boolean hasBlock)
+ throws IOException {
+ if (multiPartUpload == null) {
+ if (hasBlock) {
+ // no uploads of data have taken place, put the single block
+ // up. This must happen even if there is no data, so that 0 byte
+ // files are created.
+ putObjectIfNeedAppend();
+ }
+ } else {
+ // there has already been at least one block scheduled for upload;
+ // put up the current then wait
+ if (hasBlock && block.hasData()) {
+ // send last part
+ uploadCurrentBlock();
+ }
+ // wait for the partial uploads to finish
+ final List partETags
+ = multiPartUpload.waitForAllPartUploads();
+ // then complete the operation
+ multiPartUpload.complete(partETags);
+ }
+ LOG.debug("Upload complete for {}", writeOperationHelper.toString(key));
+ }
+
+ private synchronized void completeCurrentBlock() throws IOException {
+ OBSDataBlocks.DataBlock block = getActiveBlock();
+ boolean hasBlock = hasActiveBlock();
+ LOG.debug("{}: complete block #{}: current block= {}", this, blockCount,
+ hasBlock ? block : "(none)");
+ try {
+ uploadWriteBlocks(block, hasBlock);
+ } catch (IOException ioe) {
+ LOG.error("Upload data to obs error. io exception : {}",
+ ioe.getMessage());
+ throw ioe;
+ } catch (Exception e) {
+ LOG.error("Upload data to obs error. other exception : {}",
+ e.getMessage());
+ throw e;
+ } finally {
+ OBSCommonUtils.closeAll(block);
+ clearActiveBlock();
+ }
+ }
+
+ private synchronized void flushCurrentBlock() throws IOException {
+ OBSDataBlocks.DataBlock block = getActiveBlock();
+ boolean hasBlock = hasActiveBlock();
+ LOG.debug(
+ "{}: complete block #{}: current block= {}", this, blockCount,
+ hasBlock ? block : "(none)");
+ try {
+ uploadWriteBlocks(block, hasBlock);
+ } catch (IOException ioe) {
+ LOG.error("hflush data to obs error. io exception : {}",
+ ioe.getMessage());
+ hasException.set(true);
+ throw ioe;
+ } catch (Exception e) {
+ LOG.error("hflush data to obs error. other exception : {}",
+ e.getMessage());
+ hasException.set(true);
+ throw e;
+ } finally {
+ OBSCommonUtils.closeAll(block);
+ clearActiveBlock();
+ }
+ }
+
+ @Override
+ public synchronized void hsync() throws IOException {
+ flushOrSync();
+ }
+
+ /**
+ * Multiple partition upload.
+ */
+ private class MultiPartUpload {
+ /**
+ * Upload id for multipart upload.
+ */
+ private final String uploadId;
+
+ /**
+ * List for async part upload future.
+ */
+ private final List> partETagsFutures;
+
+ MultiPartUpload() throws IOException {
+ this.uploadId = writeOperationHelper.initiateMultiPartUpload(key);
+ this.partETagsFutures = new ArrayList<>(2);
+ LOG.debug(
+ "Initiated multi-part upload for {} with , the key is {}"
+ + "id '{}'",
+ writeOperationHelper,
+ uploadId,
+ key);
+ }
+
+ /**
+ * Upload a block of data asynchronously.
+ *
+ * @param block block to upload
+ * @throws IOException upload failure
+ */
+ private void uploadBlockAsync(final OBSDataBlocks.DataBlock block)
+ throws IOException {
+ LOG.debug("Queueing upload of {}", block);
+
+ final int size = block.dataSize();
+ final int currentPartNumber = partETagsFutures.size() + 1;
+ final UploadPartRequest request;
+ if (block instanceof OBSDataBlocks.DiskBlock) {
+ request = writeOperationHelper.newUploadPartRequest(
+ key,
+ uploadId,
+ currentPartNumber,
+ size,
+ (File) block.startUpload());
+ } else {
+ request = writeOperationHelper.newUploadPartRequest(
+ key,
+ uploadId,
+ currentPartNumber,
+ size,
+ (InputStream) block.startUpload());
+
+ }
+ ListenableFuture partETagFuture = executorService.submit(
+ () -> {
+ // this is the queued upload operation
+ LOG.debug("Uploading part {} for id '{}'",
+ currentPartNumber, uploadId);
+ // do the upload
+ PartEtag partETag = null;
+ try {
+ if (mockUploadPartError) {
+ throw new ObsException("mock upload part error");
+ }
+ UploadPartResult uploadPartResult
+ = OBSCommonUtils.uploadPart(fs, request);
+ partETag =
+ new PartEtag(uploadPartResult.getEtag(),
+ uploadPartResult.getPartNumber());
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Completed upload of {} to part {}",
+ block, partETag);
+ }
+ } catch (ObsException e) {
+ // catch all exception
+ hasException.set(true);
+ LOG.error("UploadPart failed (ObsException). {}",
+ OBSCommonUtils.translateException("UploadPart", key,
+ e).getMessage());
+ } finally {
+ // close the stream and block
+ OBSCommonUtils.closeAll(block);
+ }
+ return partETag;
+ });
+ partETagsFutures.add(partETagFuture);
+ }
+
+ /**
+ * Block awaiting all outstanding uploads to complete.
+ *
+ * @return list of results
+ * @throws IOException IO Problems
+ */
+ private List waitForAllPartUploads() throws IOException {
+ LOG.debug("Waiting for {} uploads to complete",
+ partETagsFutures.size());
+ try {
+ return Futures.allAsList(partETagsFutures).get();
+ } catch (InterruptedException ie) {
+ LOG.warn("Interrupted partUpload", ie);
+ LOG.debug("Cancelling futures");
+ for (ListenableFuture future : partETagsFutures) {
+ future.cancel(true);
+ }
+ // abort multipartupload
+ this.abort();
+ throw new IOException(
+ "Interrupted multi-part upload with id '" + uploadId
+ + "' to " + key);
+ } catch (ExecutionException ee) {
+ // there is no way of recovering so abort
+ // cancel all partUploads
+ LOG.debug("While waiting for upload completion", ee);
+ LOG.debug("Cancelling futures");
+ for (ListenableFuture future : partETagsFutures) {
+ future.cancel(true);
+ }
+ // abort multipartupload
+ this.abort();
+ throw OBSCommonUtils.extractException(
+ "Multi-part upload with id '" + uploadId + "' to " + key,
+ key, ee);
+ }
+ }
+
+ /**
+ * This completes a multipart upload. Sometimes it fails; here retries are
+ * handled to avoid losing all data on a transient failure.
+ *
+ * @param partETags list of partial uploads
+ * @return result for completing multipart upload
+ * @throws IOException on any problem
+ */
+ private CompleteMultipartUploadResult complete(
+ final List partETags) throws IOException {
+ String operation = String.format(
+ "Completing multi-part upload for key '%s',"
+ + " id '%s' with %s partitions ",
+ key, uploadId, partETags.size());
+ try {
+ LOG.debug(operation);
+ return writeOperationHelper.completeMultipartUpload(key,
+ uploadId, partETags);
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException(operation, key, e);
+ }
+ }
+
+ /**
+ * Abort a multi-part upload. Retries are attempted on failures.
+ * IOExceptions are caught; this is expected to be run as a cleanup
+ * process.
+ */
+ void abort() {
+ String operation =
+ String.format(
+ "Aborting multi-part upload for '%s', id '%s",
+ writeOperationHelper, uploadId);
+ try {
+ LOG.debug(operation);
+ writeOperationHelper.abortMultipartUpload(key, uploadId);
+ } catch (ObsException e) {
+ LOG.warn(
+ "Unable to abort multipart upload, you may need to purge "
+ + "uploaded parts",
+ e);
+ }
+ }
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSClientFactory.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSClientFactory.java
new file mode 100644
index 0000000000000..fbd54feae803a
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSClientFactory.java
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import com.obs.services.ObsClient;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+import java.io.IOException;
+import java.net.URI;
+
+/**
+ * Factory for creating OBS client instance to be used by {@link
+ * OBSFileSystem}.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+interface OBSClientFactory {
+ /**
+ * Creates a new {@link ObsClient} client. This method accepts the OBS file
+ * system URI both in raw input form and validated form as separate arguments,
+ * because both values may be useful in logging.
+ *
+ * @param name raw input OBS file system URI
+ * @return OBS client
+ * @throws IOException IO problem
+ */
+ ObsClient createObsClient(URI name) throws IOException;
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSCommonUtils.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSCommonUtils.java
new file mode 100644
index 0000000000000..3a06961d3acd9
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSCommonUtils.java
@@ -0,0 +1,1546 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.util.Preconditions;
+import com.obs.services.ObsClient;
+import com.obs.services.exception.ObsException;
+import com.obs.services.model.AbortMultipartUploadRequest;
+import com.obs.services.model.DeleteObjectsRequest;
+import com.obs.services.model.DeleteObjectsResult;
+import com.obs.services.model.KeyAndVersion;
+import com.obs.services.model.ListMultipartUploadsRequest;
+import com.obs.services.model.ListObjectsRequest;
+import com.obs.services.model.MultipartUpload;
+import com.obs.services.model.MultipartUploadListing;
+import com.obs.services.model.ObjectListing;
+import com.obs.services.model.ObjectMetadata;
+import com.obs.services.model.ObsObject;
+import com.obs.services.model.PutObjectRequest;
+import com.obs.services.model.PutObjectResult;
+import com.obs.services.model.UploadPartRequest;
+import com.obs.services.model.UploadPartResult;
+import com.obs.services.model.fs.FSStatusEnum;
+import com.obs.services.model.fs.GetAttributeRequest;
+import com.obs.services.model.fs.GetBucketFSStatusRequest;
+import com.obs.services.model.fs.GetBucketFSStatusResult;
+import com.obs.services.model.fs.ObsFSAttribute;
+import com.obs.services.model.fs.WriteFileRequest;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileAlreadyExistsException;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.InvalidRequestException;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathIOException;
+import org.apache.hadoop.security.ProviderUtils;
+import org.apache.hadoop.util.Lists;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.EOFException;
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.nio.file.AccessDeniedException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Date;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+
+/**
+ * Common utils for {@link OBSFileSystem}.
+ */
+final class OBSCommonUtils {
+ /**
+ * Class logger.
+ */
+ private static final Logger LOG = LoggerFactory.getLogger(
+ OBSCommonUtils.class);
+
+ /**
+ * Moved permanently response code.
+ */
+ static final int MOVED_PERMANENTLY_CODE = 301;
+
+ /**
+ * Unauthorized response code.
+ */
+ static final int UNAUTHORIZED_CODE = 401;
+
+ /**
+ * Forbidden response code.
+ */
+ static final int FORBIDDEN_CODE = 403;
+
+ /**
+ * Not found response code.
+ */
+ static final int NOT_FOUND_CODE = 404;
+
+ /**
+ * File conflict.
+ */
+ static final int CONFLICT_CODE = 409;
+
+ /**
+ * Gone response code.
+ */
+ static final int GONE_CODE = 410;
+
+ /**
+ * EOF response code.
+ */
+ static final int EOF_CODE = 416;
+
+ /**
+ * Core property for provider path. Duplicated here for consistent code across
+ * Hadoop version: {@value}.
+ */
+ static final String CREDENTIAL_PROVIDER_PATH
+ = "hadoop.security.credential.provider.path";
+
+ /**
+ * Max number of retry times.
+ */
+ static final int MAX_RETRY_TIME = 3;
+
+ /**
+ * Delay time between two retries.
+ */
+ static final int DELAY_TIME = 10;
+
+ /**
+ * Max number of listing keys for checking folder empty.
+ */
+ static final int MAX_KEYS_FOR_CHECK_FOLDER_EMPTY = 3;
+
+ /**
+ * Max number of listing keys for checking folder empty.
+ */
+ static final int BYTE_TO_INT_MASK = 0xFF;
+
+ private OBSCommonUtils() {
+ }
+
+ /**
+ * Get the fs status of the bucket.
+ *
+ * @param obs OBS client instance
+ * @param bucketName bucket name
+ * @return boolean value indicating if this bucket is a posix bucket
+ * @throws FileNotFoundException the bucket is absent
+ * @throws IOException any other problem talking to OBS
+ */
+ static boolean getBucketFsStatus(final ObsClient obs,
+ final String bucketName)
+ throws FileNotFoundException, IOException {
+ try {
+ GetBucketFSStatusRequest getBucketFsStatusRequest
+ = new GetBucketFSStatusRequest();
+ getBucketFsStatusRequest.setBucketName(bucketName);
+ GetBucketFSStatusResult getBucketFsStatusResult =
+ obs.getBucketFSStatus(getBucketFsStatusRequest);
+ FSStatusEnum fsStatus = getBucketFsStatusResult.getStatus();
+ return fsStatus == FSStatusEnum.ENABLED;
+ } catch (ObsException e) {
+ LOG.error(e.toString());
+ throw translateException("getBucketFsStatus", bucketName, e);
+ }
+ }
+
+ /**
+ * Turns a path (relative or otherwise) into an OBS key.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param path input path, may be relative to the working dir
+ * @return a key excluding the leading "/", or, if it is the root path, ""
+ */
+ static String pathToKey(final OBSFileSystem owner, final Path path) {
+ Path absolutePath = path;
+ if (!path.isAbsolute()) {
+ absolutePath = new Path(owner.getWorkingDirectory(), path);
+ }
+
+ if (absolutePath.toUri().getScheme() != null && absolutePath.toUri()
+ .getPath()
+ .isEmpty()) {
+ return "";
+ }
+
+ return absolutePath.toUri().getPath().substring(1);
+ }
+
+ /**
+ * Turns a path (relative or otherwise) into an OBS key, adding a trailing "/"
+ * if the path is not the root and does not already have a "/" at the
+ * end.
+ *
+ * @param key obs key or ""
+ * @return the with a trailing "/", or, if it is the root key, "",
+ */
+ static String maybeAddTrailingSlash(final String key) {
+ if (!StringUtils.isEmpty(key) && !key.endsWith("/")) {
+ return key + '/';
+ } else {
+ return key;
+ }
+ }
+
+ /**
+ * Convert a path back to a key.
+ *
+ * @param key input key
+ * @return the path from this key
+ */
+ static Path keyToPath(final String key) {
+ return new Path("/" + key);
+ }
+
+ /**
+ * Convert a key to a fully qualified path.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param key input key
+ * @return the fully qualified path including URI scheme and bucket name.
+ */
+ static Path keyToQualifiedPath(final OBSFileSystem owner,
+ final String key) {
+ return qualify(owner, keyToPath(key));
+ }
+
+ /**
+ * Qualify a path.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param path path to qualify
+ * @return a qualified path.
+ */
+ static Path qualify(final OBSFileSystem owner, final Path path) {
+ return path.makeQualified(owner.getUri(), owner.getWorkingDirectory());
+ }
+
+ /**
+ * Delete obs key started '/'.
+ *
+ * @param key object key
+ * @return new key
+ */
+ static String maybeDeleteBeginningSlash(final String key) {
+ return !StringUtils.isEmpty(key) && key.startsWith("/") ? key.substring(
+ 1) : key;
+ }
+
+ /**
+ * Add obs key started '/'.
+ *
+ * @param key object key
+ * @return new key
+ */
+ static String maybeAddBeginningSlash(final String key) {
+ return !StringUtils.isEmpty(key) && !key.startsWith("/")
+ ? "/" + key
+ : key;
+ }
+
+ /**
+ * Translate an exception raised in an operation into an IOException. HTTP
+ * error codes are examined and can be used to build a more specific
+ * response.
+ *
+ * @param operation operation
+ * @param path path operated on (may be null)
+ * @param exception obs exception raised
+ * @return an IOE which wraps the caught exception.
+ */
+ static IOException translateException(
+ final String operation, final String path,
+ final ObsException exception) {
+ String message = String.format("%s%s: status [%d] - request id [%s] "
+ + "- error code [%s] - error message [%s] - trace :%s ",
+ operation, path != null ? " on " + path : "",
+ exception.getResponseCode(), exception.getErrorRequestId(),
+ exception.getErrorCode(),
+ exception.getErrorMessage(), exception);
+
+ IOException ioe;
+
+ int status = exception.getResponseCode();
+ switch (status) {
+ case MOVED_PERMANENTLY_CODE:
+ message =
+ String.format("Received permanent redirect response, "
+ + "status [%d] - request id [%s] - "
+ + "error code [%s] - message [%s]",
+ exception.getResponseCode(),
+ exception.getErrorRequestId(), exception.getErrorCode(),
+ exception.getErrorMessage());
+ ioe = new OBSIOException(message, exception);
+ break;
+ // permissions
+ case UNAUTHORIZED_CODE:
+ case FORBIDDEN_CODE:
+ ioe = new AccessDeniedException(path, null, message);
+ ioe.initCause(exception);
+ break;
+
+ // the object isn't there
+ case NOT_FOUND_CODE:
+ case GONE_CODE:
+ ioe = new FileNotFoundException(message);
+ ioe.initCause(exception);
+ break;
+
+ // out of range. This may happen if an object is overwritten with
+ // a shorter one while it is being read.
+ case EOF_CODE:
+ ioe = new EOFException(message);
+ break;
+
+ default:
+ // no specific exit code. Choose an IOE subclass based on the
+ // class
+ // of the caught exception
+ ioe = new OBSIOException(message, exception);
+ break;
+ }
+ return ioe;
+ }
+
+ /**
+ * Reject any request to delete an object where the key is root.
+ *
+ * @param bucket bucket name
+ * @param key key to validate
+ * @throws InvalidRequestException if the request was rejected due to a
+ * mistaken attempt to delete the root
+ * directory.
+ */
+ static void blockRootDelete(final String bucket, final String key)
+ throws InvalidRequestException {
+ if (key.isEmpty() || "/".equals(key)) {
+ throw new InvalidRequestException(
+ "Bucket " + bucket + " cannot be deleted");
+ }
+ }
+
+ /**
+ * Delete an object. Increments the {@code OBJECT_DELETE_REQUESTS} and write
+ * operation statistics.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param key key to blob to delete.
+ * @throws IOException on any failure to delete object
+ */
+ static void deleteObject(final OBSFileSystem owner, final String key)
+ throws IOException {
+ blockRootDelete(owner.getBucket(), key);
+ ObsException lastException = null;
+ for (int retryTime = 1; retryTime <= MAX_RETRY_TIME; retryTime++) {
+ try {
+ owner.getObsClient().deleteObject(owner.getBucket(), key);
+ owner.getSchemeStatistics().incrementWriteOps(1);
+ return;
+ } catch (ObsException e) {
+ lastException = e;
+ LOG.warn("Delete path failed with [{}], "
+ + "retry time [{}] - request id [{}] - "
+ + "error code [{}] - error message [{}]",
+ e.getResponseCode(), retryTime, e.getErrorRequestId(),
+ e.getErrorCode(), e.getErrorMessage());
+ if (retryTime < MAX_RETRY_TIME) {
+ try {
+ Thread.sleep(DELAY_TIME);
+ } catch (InterruptedException ie) {
+ throw translateException("delete", key, e);
+ }
+ }
+ }
+ }
+ throw translateException(
+ String.format("retry max times [%s] delete failed", MAX_RETRY_TIME),
+ key, lastException);
+ }
+
+ /**
+ * Perform a bulk object delete operation. Increments the {@code
+ * OBJECT_DELETE_REQUESTS} and write operation statistics.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param deleteRequest keys to delete on the obs-backend
+ * @throws IOException on any failure to delete objects
+ */
+ static void deleteObjects(final OBSFileSystem owner,
+ final DeleteObjectsRequest deleteRequest) throws IOException {
+ DeleteObjectsResult result;
+ deleteRequest.setQuiet(true);
+ try {
+ result = owner.getObsClient().deleteObjects(deleteRequest);
+ owner.getSchemeStatistics().incrementWriteOps(1);
+ } catch (ObsException e) {
+ LOG.warn("delete objects failed, request [{}], request id [{}] - "
+ + "error code [{}] - error message [{}]",
+ deleteRequest, e.getErrorRequestId(), e.getErrorCode(),
+ e.getErrorMessage());
+ for (KeyAndVersion keyAndVersion
+ : deleteRequest.getKeyAndVersionsList()) {
+ deleteObject(owner, keyAndVersion.getKey());
+ }
+ return;
+ }
+
+ // delete one by one if there is errors
+ if (result != null) {
+ List errorResults
+ = result.getErrorResults();
+ if (!errorResults.isEmpty()) {
+ LOG.warn("bulk delete {} objects, {} failed, begin to delete "
+ + "one by one.",
+ deleteRequest.getKeyAndVersionsList().size(),
+ errorResults.size());
+ for (DeleteObjectsResult.ErrorResult errorResult
+ : errorResults) {
+ deleteObject(owner, errorResult.getObjectKey());
+ }
+ }
+ }
+ }
+
+ /**
+ * Create a putObject request. Adds the ACL and metadata
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param key key of object
+ * @param metadata metadata header
+ * @param srcfile source file
+ * @return the request
+ */
+ static PutObjectRequest newPutObjectRequest(final OBSFileSystem owner,
+ final String key, final ObjectMetadata metadata, final File srcfile) {
+ Preconditions.checkNotNull(srcfile);
+ PutObjectRequest putObjectRequest = new PutObjectRequest(
+ owner.getBucket(), key, srcfile);
+ putObjectRequest.setAcl(owner.getCannedACL());
+ putObjectRequest.setMetadata(metadata);
+ if (owner.getSse().isSseCEnable()) {
+ putObjectRequest.setSseCHeader(owner.getSse().getSseCHeader());
+ } else if (owner.getSse().isSseKmsEnable()) {
+ putObjectRequest.setSseKmsHeader(owner.getSse().getSseKmsHeader());
+ }
+ return putObjectRequest;
+ }
+
+ /**
+ * Create a {@link PutObjectRequest} request. The metadata is assumed to have
+ * been configured with the size of the operation.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param key key of object
+ * @param metadata metadata header
+ * @param inputStream source data.
+ * @return the request
+ */
+ static PutObjectRequest newPutObjectRequest(final OBSFileSystem owner,
+ final String key, final ObjectMetadata metadata,
+ final InputStream inputStream) {
+ Preconditions.checkNotNull(inputStream);
+ PutObjectRequest putObjectRequest = new PutObjectRequest(
+ owner.getBucket(), key, inputStream);
+ putObjectRequest.setAcl(owner.getCannedACL());
+ putObjectRequest.setMetadata(metadata);
+ if (owner.getSse().isSseCEnable()) {
+ putObjectRequest.setSseCHeader(owner.getSse().getSseCHeader());
+ } else if (owner.getSse().isSseKmsEnable()) {
+ putObjectRequest.setSseKmsHeader(owner.getSse().getSseKmsHeader());
+ }
+ return putObjectRequest;
+ }
+
+ /**
+ * PUT an object directly (i.e. not via the transfer manager). Byte length is
+ * calculated from the file length, or, if there is no file, from the content
+ * length of the header. Important: this call will close any input stream
+ * in the request.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param putObjectRequest the request
+ * @return the upload initiated
+ * @throws ObsException on problems
+ */
+ static PutObjectResult putObjectDirect(final OBSFileSystem owner,
+ final PutObjectRequest putObjectRequest) throws ObsException {
+ long len;
+ if (putObjectRequest.getFile() != null) {
+ len = putObjectRequest.getFile().length();
+ } else {
+ len = putObjectRequest.getMetadata().getContentLength();
+ }
+
+ PutObjectResult result = owner.getObsClient()
+ .putObject(putObjectRequest);
+ owner.getSchemeStatistics().incrementWriteOps(1);
+ owner.getSchemeStatistics().incrementBytesWritten(len);
+ return result;
+ }
+
+ /**
+ * Upload part of a multi-partition file. Increments the write and put
+ * counters. Important: this call does not close any input stream in the
+ * request.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param request request
+ * @return the result of the operation.
+ * @throws ObsException on problems
+ */
+ static UploadPartResult uploadPart(final OBSFileSystem owner,
+ final UploadPartRequest request) throws ObsException {
+ long len = request.getPartSize();
+ UploadPartResult uploadPartResult = owner.getObsClient()
+ .uploadPart(request);
+ owner.getSchemeStatistics().incrementWriteOps(1);
+ owner.getSchemeStatistics().incrementBytesWritten(len);
+ return uploadPartResult;
+ }
+
+ static void removeKeys(final OBSFileSystem owner,
+ final List keysToDelete, final boolean clearKeys,
+ final boolean checkRootDelete) throws IOException {
+ if (keysToDelete.isEmpty()) {
+ // exit fast if there are no keys to delete
+ return;
+ }
+
+ if (checkRootDelete) {
+ for (KeyAndVersion keyVersion : keysToDelete) {
+ blockRootDelete(owner.getBucket(), keyVersion.getKey());
+ }
+ }
+
+ if (!owner.isEnableMultiObjectDelete()
+ || keysToDelete.size() < owner.getMultiDeleteThreshold()) {
+ // delete one by one.
+ for (KeyAndVersion keyVersion : keysToDelete) {
+ deleteObject(owner, keyVersion.getKey());
+ }
+ } else if (keysToDelete.size() <= owner.getMaxEntriesToDelete()) {
+ // Only one batch.
+ DeleteObjectsRequest deleteObjectsRequest
+ = new DeleteObjectsRequest(owner.getBucket());
+ deleteObjectsRequest.setKeyAndVersions(
+ keysToDelete.toArray(new KeyAndVersion[0]));
+ deleteObjects(owner, deleteObjectsRequest);
+ } else {
+ // Multi batches.
+ List keys = new ArrayList<>(
+ owner.getMaxEntriesToDelete());
+ for (KeyAndVersion key : keysToDelete) {
+ keys.add(key);
+ if (keys.size() == owner.getMaxEntriesToDelete()) {
+ // Delete one batch.
+ removeKeys(owner, keys, true, false);
+ }
+ }
+ // Delete the last batch
+ removeKeys(owner, keys, true, false);
+ }
+
+ if (clearKeys) {
+ keysToDelete.clear();
+ }
+ }
+
+ /**
+ * Translate an exception raised in an operation into an IOException. The
+ * specific type of IOException depends on the class of {@link ObsException}
+ * passed in, and any status codes included in the operation. That is: HTTP
+ * error codes are examined and can be used to build a more specific
+ * response.
+ *
+ * @param operation operation
+ * @param path path operated on (must not be null)
+ * @param exception obs exception raised
+ * @return an IOE which wraps the caught exception.
+ */
+ static IOException translateException(final String operation,
+ final Path path, final ObsException exception) {
+ return translateException(operation, path.toString(), exception);
+ }
+
+ /**
+ * List the statuses of the files/directories in the given path if the path is
+ * a directory.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param f given path
+ * @param recursive flag indicating if list is recursive
+ * @return the statuses of the files/directories in the given patch
+ * @throws FileNotFoundException when the path does not exist;
+ * @throws IOException due to an IO problem.
+ * @throws ObsException on failures inside the OBS SDK
+ */
+ static FileStatus[] innerListStatus(final OBSFileSystem owner, final Path f,
+ final boolean recursive)
+ throws FileNotFoundException, IOException, ObsException {
+ Path path = qualify(owner, f);
+ String key = pathToKey(owner, path);
+
+ List result;
+ final FileStatus fileStatus = owner.getFileStatus(path);
+
+ if (fileStatus.isDirectory()) {
+ key = maybeAddTrailingSlash(key);
+ String delimiter = recursive ? null : "/";
+ ListObjectsRequest request = createListObjectsRequest(owner, key,
+ delimiter);
+ LOG.debug(
+ "listStatus: doing listObjects for directory {} - recursive {}",
+ f, recursive);
+
+ OBSListing.FileStatusListingIterator files = owner.getObsListing()
+ .createFileStatusListingIterator(
+ path, request, OBSListing.ACCEPT_ALL,
+ new OBSListing.AcceptAllButSelfAndS3nDirs(path));
+ result = new ArrayList<>(files.getBatchSize());
+ while (files.hasNext()) {
+ result.add(files.next());
+ }
+
+ return result.toArray(new FileStatus[0]);
+ } else {
+ LOG.debug("Adding: rd (not a dir): {}", path);
+ FileStatus[] stats = new FileStatus[1];
+ stats[0] = fileStatus;
+ return stats;
+ }
+ }
+
+ /**
+ * Create a {@code ListObjectsRequest} request against this bucket.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param key key for request
+ * @param delimiter any delimiter
+ * @return the request
+ */
+ static ListObjectsRequest createListObjectsRequest(
+ final OBSFileSystem owner, final String key, final String delimiter) {
+ return createListObjectsRequest(owner, key, delimiter, -1);
+ }
+
+ static ListObjectsRequest createListObjectsRequest(
+ final OBSFileSystem owner, final String key, final String delimiter,
+ final int maxKeyNum) {
+ ListObjectsRequest request = new ListObjectsRequest();
+ request.setBucketName(owner.getBucket());
+ if (maxKeyNum > 0 && maxKeyNum < owner.getMaxKeys()) {
+ request.setMaxKeys(maxKeyNum);
+ } else {
+ request.setMaxKeys(owner.getMaxKeys());
+ }
+ request.setPrefix(key);
+ if (delimiter != null) {
+ request.setDelimiter(delimiter);
+ }
+ return request;
+ }
+
+ /**
+ * Implements the specific logic to reject root directory deletion. The caller
+ * must return the result of this call, rather than attempt to continue with
+ * the delete operation: deleting root directories is never allowed. This
+ * method simply implements the policy of when to return an exit code versus
+ * raise an exception.
+ *
+ * @param bucket bucket name
+ * @param isEmptyDir flag indicating if the directory is empty
+ * @param recursive recursive flag from command
+ * @return a return code for the operation
+ * @throws PathIOException if the operation was explicitly rejected.
+ */
+ static boolean rejectRootDirectoryDelete(final String bucket,
+ final boolean isEmptyDir,
+ final boolean recursive)
+ throws IOException {
+ LOG.info("obs delete the {} root directory of {}", bucket, recursive);
+ if (isEmptyDir) {
+ return true;
+ }
+ if (recursive) {
+ return false;
+ } else {
+ // reject
+ throw new PathIOException(bucket, "Cannot delete root path");
+ }
+ }
+
+ /**
+ * Make the given path and all non-existent parents into directories.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param path path to create
+ * @return true if a directory was created
+ * @throws FileAlreadyExistsException there is a file at the path specified
+ * @throws IOException other IO problems
+ * @throws ObsException on failures inside the OBS SDK
+ */
+ static boolean innerMkdirs(final OBSFileSystem owner, final Path path)
+ throws IOException, FileAlreadyExistsException, ObsException {
+ LOG.debug("Making directory: {}", path);
+ FileStatus fileStatus;
+ try {
+ fileStatus = owner.getFileStatus(path);
+
+ if (fileStatus.isDirectory()) {
+ return true;
+ } else {
+ throw new FileAlreadyExistsException("Path is a file: " + path);
+ }
+ } catch (FileNotFoundException e) {
+ Path fPart = path.getParent();
+ do {
+ try {
+ fileStatus = owner.getFileStatus(fPart);
+ if (fileStatus.isDirectory()) {
+ break;
+ }
+ if (fileStatus.isFile()) {
+ throw new FileAlreadyExistsException(
+ String.format("Can't make directory for path '%s'"
+ + " since it is a file.", fPart));
+ }
+ } catch (FileNotFoundException fnfe) {
+ LOG.debug("file {} not fount, but ignore.", path);
+ }
+ fPart = fPart.getParent();
+ } while (fPart != null);
+
+ String key = pathToKey(owner, path);
+ if (owner.isFsBucket()) {
+ OBSPosixBucketUtils.fsCreateFolder(owner, key);
+ } else {
+ OBSObjectBucketUtils.createFakeDirectory(owner, key);
+ }
+ return true;
+ }
+ }
+
+ /**
+ * Initiate a {@code listObjects} operation, incrementing metrics in the
+ * process.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param request request to initiate
+ * @return the results
+ * @throws IOException on any failure to list objects
+ */
+ static ObjectListing listObjects(final OBSFileSystem owner,
+ final ListObjectsRequest request) throws IOException {
+ if (request.getDelimiter() == null && request.getMarker() == null
+ && owner.isFsBucket() && owner.isObsClientDFSListEnable()) {
+ return OBSFsDFSListing.fsDFSListObjects(owner, request);
+ }
+
+ return commonListObjects(owner, request);
+ }
+
+ static ObjectListing commonListObjects(final OBSFileSystem owner,
+ final ListObjectsRequest request) {
+ for (int retryTime = 1; retryTime < MAX_RETRY_TIME; retryTime++) {
+ try {
+ owner.getSchemeStatistics().incrementReadOps(1);
+ return owner.getObsClient().listObjects(request);
+ } catch (ObsException e) {
+ LOG.warn("Failed to commonListObjects for request[{}], retry "
+ + "time [{}], due to exception[{}]",
+ request, retryTime, e);
+ try {
+ Thread.sleep(DELAY_TIME);
+ } catch (InterruptedException ie) {
+ LOG.error("Failed to commonListObjects for request[{}], "
+ + "retry time [{}], due to exception[{}]",
+ request, retryTime, e);
+ throw e;
+ }
+ }
+ }
+
+ owner.getSchemeStatistics().incrementReadOps(1);
+ return owner.getObsClient().listObjects(request);
+ }
+
+ /**
+ * List the next set of objects.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param objects paged result
+ * @return the next result object
+ * @throws IOException on any failure to list the next set of objects
+ */
+ static ObjectListing continueListObjects(final OBSFileSystem owner,
+ final ObjectListing objects) throws IOException {
+ if (objects.getDelimiter() == null && owner.isFsBucket()
+ && owner.isObsClientDFSListEnable()) {
+ return OBSFsDFSListing.fsDFSContinueListObjects(owner,
+ (OBSFsDFSListing) objects);
+ }
+
+ return commonContinueListObjects(owner, objects);
+ }
+
+ private static ObjectListing commonContinueListObjects(
+ final OBSFileSystem owner, final ObjectListing objects) {
+ String delimiter = objects.getDelimiter();
+ int maxKeyNum = objects.getMaxKeys();
+ // LOG.debug("delimiters: "+objects.getDelimiter());
+ ListObjectsRequest request = new ListObjectsRequest();
+ request.setMarker(objects.getNextMarker());
+ request.setBucketName(owner.getBucket());
+ request.setPrefix(objects.getPrefix());
+ if (maxKeyNum > 0 && maxKeyNum < owner.getMaxKeys()) {
+ request.setMaxKeys(maxKeyNum);
+ } else {
+ request.setMaxKeys(owner.getMaxKeys());
+ }
+ if (delimiter != null) {
+ request.setDelimiter(delimiter);
+ }
+ return commonContinueListObjects(owner, request);
+ }
+
+ static ObjectListing commonContinueListObjects(final OBSFileSystem owner,
+ final ListObjectsRequest request) {
+ for (int retryTime = 1; retryTime < MAX_RETRY_TIME; retryTime++) {
+ try {
+ owner.getSchemeStatistics().incrementReadOps(1);
+ return owner.getObsClient().listObjects(request);
+ } catch (ObsException e) {
+ LOG.warn("Continue list objects failed for request[{}], retry"
+ + " time[{}], due to exception[{}]",
+ request, retryTime, e);
+ try {
+ Thread.sleep(DELAY_TIME);
+ } catch (InterruptedException ie) {
+ LOG.error("Continue list objects failed for request[{}], "
+ + "retry time[{}], due to exception[{}]",
+ request, retryTime, e);
+ throw e;
+ }
+ }
+ }
+
+ owner.getSchemeStatistics().incrementReadOps(1);
+ return owner.getObsClient().listObjects(request);
+ }
+
+ /**
+ * Predicate: does the object represent a directory?.
+ *
+ * @param name object name
+ * @param size object size
+ * @return true if it meets the criteria for being an object
+ */
+ public static boolean objectRepresentsDirectory(final String name,
+ final long size) {
+ return !name.isEmpty() && name.charAt(name.length() - 1) == '/'
+ && size == 0L;
+ }
+
+ /**
+ * Date to long conversion. Handles null Dates that can be returned by OBS by
+ * returning 0
+ *
+ * @param date date from OBS query
+ * @return timestamp of the object
+ */
+ public static long dateToLong(final Date date) {
+ if (date == null) {
+ return 0L;
+ }
+
+ return date.getTime() / OBSConstants.SEC2MILLISEC_FACTOR
+ * OBSConstants.SEC2MILLISEC_FACTOR;
+ }
+
+ // Used to check if a folder is empty or not.
+ static boolean isFolderEmpty(final OBSFileSystem owner, final String key)
+ throws FileNotFoundException, ObsException {
+ for (int retryTime = 1; retryTime < MAX_RETRY_TIME; retryTime++) {
+ try {
+ return innerIsFolderEmpty(owner, key);
+ } catch (ObsException e) {
+ LOG.warn(
+ "Failed to check empty folder for [{}], retry time [{}], "
+ + "exception [{}]", key, retryTime, e);
+
+ try {
+ Thread.sleep(DELAY_TIME);
+ } catch (InterruptedException ie) {
+ throw e;
+ }
+ }
+ }
+
+ return innerIsFolderEmpty(owner, key);
+ }
+
+ // Used to check if a folder is empty or not by counting the number of
+ // sub objects in list.
+ private static boolean isFolderEmpty(final String key,
+ final ObjectListing objects) {
+ int count = objects.getObjects().size();
+ if (count >= 2) {
+ // There is a sub file at least.
+ return false;
+ } else if (count == 1 && !objects.getObjects()
+ .get(0)
+ .getObjectKey()
+ .equals(key)) {
+ // There is a sub file at least.
+ return false;
+ }
+
+ count = objects.getCommonPrefixes().size();
+ // There is a sub file at least.
+ // There is no sub object.
+ if (count >= 2) {
+ // There is a sub file at least.
+ return false;
+ } else {
+ return count != 1 || objects.getCommonPrefixes().get(0).equals(key);
+ }
+ }
+
+ // Used to check if a folder is empty or not.
+ static boolean innerIsFolderEmpty(final OBSFileSystem owner,
+ final String key)
+ throws FileNotFoundException, ObsException {
+ String obsKey = maybeAddTrailingSlash(key);
+ ListObjectsRequest request = new ListObjectsRequest();
+ request.setBucketName(owner.getBucket());
+ request.setPrefix(obsKey);
+ request.setDelimiter("/");
+ request.setMaxKeys(MAX_KEYS_FOR_CHECK_FOLDER_EMPTY);
+ owner.getSchemeStatistics().incrementReadOps(1);
+ ObjectListing objects = owner.getObsClient().listObjects(request);
+
+ if (!objects.getCommonPrefixes().isEmpty() || !objects.getObjects()
+ .isEmpty()) {
+ if (isFolderEmpty(obsKey, objects)) {
+ LOG.debug("Found empty directory {}", obsKey);
+ return true;
+ }
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Found path as directory (with /): {}/{}",
+ objects.getCommonPrefixes().size(),
+ objects.getObjects().size());
+
+ for (ObsObject summary : objects.getObjects()) {
+ LOG.debug("Summary: {} {}", summary.getObjectKey(),
+ summary.getMetadata().getContentLength());
+ }
+ for (String prefix : objects.getCommonPrefixes()) {
+ LOG.debug("Prefix: {}", prefix);
+ }
+ }
+ LOG.debug("Found non-empty directory {}", obsKey);
+ return false;
+ } else if (obsKey.isEmpty()) {
+ LOG.debug("Found root directory");
+ return true;
+ } else if (owner.isFsBucket()) {
+ LOG.debug("Found empty directory {}", obsKey);
+ return true;
+ }
+
+ LOG.debug("Not Found: {}", obsKey);
+ throw new FileNotFoundException("No such file or directory: " + obsKey);
+ }
+
+ /**
+ * Build a {@link LocatedFileStatus} from a {@link FileStatus} instance.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param status file status
+ * @return a located status with block locations set up from this FS.
+ * @throws IOException IO Problems.
+ */
+ static LocatedFileStatus toLocatedFileStatus(final OBSFileSystem owner,
+ final FileStatus status) throws IOException {
+ return new LocatedFileStatus(
+ status, status.isFile() ? owner.getFileBlockLocations(status, 0,
+ status.getLen()) : null);
+ }
+
+ /**
+ * Create a appendFile request. Adds the ACL and metadata
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param key key of object
+ * @param tmpFile temp file or input stream
+ * @param recordPosition client record next append position
+ * @return the request
+ * @throws IOException any problem
+ */
+ static WriteFileRequest newAppendFileRequest(final OBSFileSystem owner,
+ final String key, final long recordPosition, final File tmpFile)
+ throws IOException {
+ Preconditions.checkNotNull(key);
+ Preconditions.checkNotNull(tmpFile);
+ ObsFSAttribute obsFsAttribute;
+ try {
+ GetAttributeRequest getAttributeReq = new GetAttributeRequest(
+ owner.getBucket(), key);
+ obsFsAttribute = owner.getObsClient().getAttribute(getAttributeReq);
+ } catch (ObsException e) {
+ throw translateException("GetAttributeRequest", key, e);
+ }
+
+ long appendPosition = Math.max(recordPosition,
+ obsFsAttribute.getContentLength());
+ if (recordPosition != obsFsAttribute.getContentLength()) {
+ LOG.warn("append url[{}] position[{}], file contentLength[{}] not"
+ + " equal to recordPosition[{}].", key, appendPosition,
+ obsFsAttribute.getContentLength(), recordPosition);
+ }
+ WriteFileRequest writeFileReq = new WriteFileRequest(owner.getBucket(),
+ key, tmpFile, appendPosition);
+ writeFileReq.setAcl(owner.getCannedACL());
+ return writeFileReq;
+ }
+
+ /**
+ * Create a appendFile request. Adds the ACL and metadata
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param key key of object
+ * @param inputStream temp file or input stream
+ * @param recordPosition client record next append position
+ * @return the request
+ * @throws IOException any problem
+ */
+ static WriteFileRequest newAppendFileRequest(final OBSFileSystem owner,
+ final String key, final long recordPosition,
+ final InputStream inputStream) throws IOException {
+ Preconditions.checkNotNull(key);
+ Preconditions.checkNotNull(inputStream);
+ ObsFSAttribute obsFsAttribute;
+ try {
+ GetAttributeRequest getAttributeReq = new GetAttributeRequest(
+ owner.getBucket(), key);
+ obsFsAttribute = owner.getObsClient().getAttribute(getAttributeReq);
+ } catch (ObsException e) {
+ throw translateException("GetAttributeRequest", key, e);
+ }
+
+ long appendPosition = Math.max(recordPosition,
+ obsFsAttribute.getContentLength());
+ if (recordPosition != obsFsAttribute.getContentLength()) {
+ LOG.warn("append url[{}] position[{}], file contentLength[{}] not"
+ + " equal to recordPosition[{}].", key, appendPosition,
+ obsFsAttribute.getContentLength(), recordPosition);
+ }
+ WriteFileRequest writeFileReq = new WriteFileRequest(owner.getBucket(),
+ key, inputStream, appendPosition);
+ writeFileReq.setAcl(owner.getCannedACL());
+ return writeFileReq;
+ }
+
+ /**
+ * Append File.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param appendFileRequest append object request
+ * @throws IOException on any failure to append file
+ */
+ static void appendFile(final OBSFileSystem owner,
+ final WriteFileRequest appendFileRequest) throws IOException {
+ long len = 0;
+ if (appendFileRequest.getFile() != null) {
+ len = appendFileRequest.getFile().length();
+ }
+
+ try {
+ LOG.debug("Append file, key {} position {} size {}",
+ appendFileRequest.getObjectKey(),
+ appendFileRequest.getPosition(),
+ len);
+ owner.getObsClient().writeFile(appendFileRequest);
+ owner.getSchemeStatistics().incrementWriteOps(1);
+ owner.getSchemeStatistics().incrementBytesWritten(len);
+ } catch (ObsException e) {
+ throw translateException("AppendFile",
+ appendFileRequest.getObjectKey(), e);
+ }
+ }
+
+ /**
+ * Close the Closeable objects and ignore any Exception or null
+ * pointers. (This is the SLF4J equivalent of that in {@code IOUtils}).
+ *
+ * @param closeables the objects to close
+ */
+ static void closeAll(final java.io.Closeable... closeables) {
+ for (java.io.Closeable c : closeables) {
+ if (c != null) {
+ try {
+ if (LOG != null) {
+ LOG.debug("Closing {}", c);
+ }
+ c.close();
+ } catch (Exception e) {
+ if (LOG != null && LOG.isDebugEnabled()) {
+ LOG.debug("Exception in closing {}", c, e);
+ }
+ }
+ }
+ }
+ }
+
+ /**
+ * Extract an exception from a failed future, and convert to an IOE.
+ *
+ * @param operation operation which failed
+ * @param path path operated on (may be null)
+ * @param ee execution exception
+ * @return an IOE which can be thrown
+ */
+ static IOException extractException(final String operation,
+ final String path, final ExecutionException ee) {
+ IOException ioe;
+ Throwable cause = ee.getCause();
+ if (cause instanceof ObsException) {
+ ioe = translateException(operation, path, (ObsException) cause);
+ } else if (cause instanceof IOException) {
+ ioe = (IOException) cause;
+ } else {
+ ioe = new IOException(operation + " failed: " + cause, cause);
+ }
+ return ioe;
+ }
+
+ /**
+ * Create a files status instance from a listing.
+ *
+ * @param keyPath path to entry
+ * @param summary summary from OBS
+ * @param blockSize block size to declare.
+ * @param owner owner of the file
+ * @return a status entry
+ */
+ static OBSFileStatus createFileStatus(
+ final Path keyPath, final ObsObject summary, final long blockSize,
+ final String owner) {
+ if (objectRepresentsDirectory(
+ summary.getObjectKey(), summary.getMetadata().getContentLength())) {
+ return new OBSFileStatus(keyPath, owner);
+ } else {
+ return new OBSFileStatus(
+ summary.getMetadata().getContentLength(),
+ dateToLong(summary.getMetadata().getLastModified()),
+ keyPath,
+ blockSize,
+ owner);
+ }
+ }
+
+ /**
+ * Return the access key and secret for OBS API use. Credentials may exist in
+ * configuration, within credential providers or indicated in the UserInfo of
+ * the name URI param.
+ *
+ * @param name the URI for which we need the access keys.
+ * @param conf the Configuration object to interrogate for keys.
+ * @return OBSAccessKeys
+ * @throws IOException problems retrieving passwords from KMS.
+ */
+ static OBSLoginHelper.Login getOBSAccessKeys(final URI name,
+ final Configuration conf)
+ throws IOException {
+ OBSLoginHelper.Login login
+ = OBSLoginHelper.extractLoginDetailsWithWarnings(name);
+ Configuration c =
+ ProviderUtils.excludeIncompatibleCredentialProviders(conf,
+ OBSFileSystem.class);
+ String accessKey = getPassword(c, OBSConstants.ACCESS_KEY,
+ login.getUser());
+ String secretKey = getPassword(c, OBSConstants.SECRET_KEY,
+ login.getPassword());
+ String sessionToken = getPassword(c, OBSConstants.SESSION_TOKEN,
+ login.getToken());
+ return new OBSLoginHelper.Login(accessKey, secretKey, sessionToken);
+ }
+
+ /**
+ * Get a password from a configuration, or, if a value is passed in, pick that
+ * up instead.
+ *
+ * @param conf configuration
+ * @param key key to look up
+ * @param val current value: if non empty this is used instead of querying
+ * the configuration.
+ * @return a password or "".
+ * @throws IOException on any problem
+ */
+ private static String getPassword(final Configuration conf,
+ final String key, final String val) throws IOException {
+ return StringUtils.isEmpty(val) ? lookupPassword(conf, key) : val;
+ }
+
+ /**
+ * Get a password from a configuration/configured credential providers.
+ *
+ * @param conf configuration
+ * @param key key to look up
+ * @return a password or the value in {@code defVal}
+ * @throws IOException on any problem
+ */
+ private static String lookupPassword(final Configuration conf,
+ final String key) throws IOException {
+ try {
+ final char[] pass = conf.getPassword(key);
+ return pass != null ? new String(pass).trim() : "";
+ } catch (IOException ioe) {
+ throw new IOException("Cannot find password option " + key, ioe);
+ }
+ }
+
+ /**
+ * String information about a summary entry for debug messages.
+ *
+ * @param summary summary object
+ * @return string value
+ */
+ static String stringify(final ObsObject summary) {
+ return summary.getObjectKey() + " size=" + summary.getMetadata()
+ .getContentLength();
+ }
+
+ /**
+ * Get a integer option not smaller than the minimum allowed value.
+ *
+ * @param conf configuration
+ * @param key key to look up
+ * @param defVal default value
+ * @param min minimum value
+ * @return the value
+ * @throws IllegalArgumentException if the value is below the minimum
+ */
+ static int intOption(final Configuration conf, final String key,
+ final int defVal,
+ final int min) {
+ int v = conf.getInt(key, defVal);
+ Preconditions.checkArgument(
+ v >= min,
+ String.format("Value of %s: %d is below the minimum value %d", key,
+ v, min));
+ LOG.debug("Value of {} is {}", key, v);
+ return v;
+ }
+
+ /**
+ * Get a long option not smaller than the minimum allowed value.
+ *
+ * @param conf configuration
+ * @param key key to look up
+ * @param defVal default value
+ * @param min minimum value
+ * @return the value
+ * @throws IllegalArgumentException if the value is below the minimum
+ */
+ static long longOption(final Configuration conf, final String key,
+ final long defVal,
+ final long min) {
+ long v = conf.getLong(key, defVal);
+ Preconditions.checkArgument(
+ v >= min,
+ String.format("Value of %s: %d is below the minimum value %d", key,
+ v, min));
+ LOG.debug("Value of {} is {}", key, v);
+ return v;
+ }
+
+ /**
+ * Get a long option not smaller than the minimum allowed value, supporting
+ * memory prefixes K,M,G,T,P.
+ *
+ * @param conf configuration
+ * @param key key to look up
+ * @param defVal default value
+ * @param min minimum value
+ * @return the value
+ * @throws IllegalArgumentException if the value is below the minimum
+ */
+ static long longBytesOption(final Configuration conf, final String key,
+ final long defVal,
+ final long min) {
+ long v = conf.getLongBytes(key, defVal);
+ Preconditions.checkArgument(
+ v >= min,
+ String.format("Value of %s: %d is below the minimum value %d", key,
+ v, min));
+ LOG.debug("Value of {} is {}", key, v);
+ return v;
+ }
+
+ /**
+ * Get a size property from the configuration: this property must be at least
+ * equal to {@link OBSConstants#MULTIPART_MIN_SIZE}. If it is too small, it is
+ * rounded up to that minimum, and a warning printed.
+ *
+ * @param conf configuration
+ * @param property property name
+ * @param defVal default value
+ * @return the value, guaranteed to be above the minimum size
+ */
+ public static long getMultipartSizeProperty(final Configuration conf,
+ final String property, final long defVal) {
+ long partSize = conf.getLongBytes(property, defVal);
+ if (partSize < OBSConstants.MULTIPART_MIN_SIZE) {
+ LOG.warn("{} must be at least 5 MB; configured value is {}",
+ property, partSize);
+ partSize = OBSConstants.MULTIPART_MIN_SIZE;
+ }
+ return partSize;
+ }
+
+ /**
+ * Ensure that the long value is in the range of an integer.
+ *
+ * @param name property name for error messages
+ * @param size original size
+ * @return the size, guaranteed to be less than or equal to the max value of
+ * an integer.
+ */
+ static int ensureOutputParameterInRange(final String name,
+ final long size) {
+ if (size > Integer.MAX_VALUE) {
+ LOG.warn(
+ "obs: {} capped to ~2.14GB"
+ + " (maximum allowed size with current output mechanism)",
+ name);
+ return Integer.MAX_VALUE;
+ } else {
+ return (int) size;
+ }
+ }
+
+ /**
+ * Propagates bucket-specific settings into generic OBS configuration keys.
+ * This is done by propagating the values of the form {@code
+ * fs.obs.bucket.${bucket}.key} to {@code fs.obs.key}, for all values of "key"
+ * other than a small set of unmodifiable values.
+ *
+ *
The source of the updated property is set to the key name of the
+ * bucket property, to aid in diagnostics of where things came from.
+ *
+ *
Returns a new configuration. Why the clone? You can use the same conf
+ * for different filesystems, and the original values are not updated.
+ *
+ *
The {@code fs.obs.impl} property cannot be set, nor can any with the
+ * prefix {@code fs.obs.bucket}.
+ *
+ *
This method does not propagate security provider path information
+ * from the OBS property into the Hadoop common provider: callers must call
+ * {@link #patchSecurityCredentialProviders(Configuration)} explicitly.
+ *
+ * @param source Source Configuration object.
+ * @param bucket bucket name. Must not be empty.
+ * @return a (potentially) patched clone of the original.
+ */
+ static Configuration propagateBucketOptions(final Configuration source,
+ final String bucket) {
+
+ Preconditions.checkArgument(StringUtils.isNotEmpty(bucket), "bucket");
+ final String bucketPrefix = OBSConstants.FS_OBS_BUCKET_PREFIX + bucket
+ + '.';
+ LOG.debug("Propagating entries under {}", bucketPrefix);
+ final Configuration dest = new Configuration(source);
+ for (Map.Entry entry : source) {
+ final String key = entry.getKey();
+ // get the (unexpanded) value.
+ final String value = entry.getValue();
+ if (!key.startsWith(bucketPrefix) || bucketPrefix.equals(key)) {
+ continue;
+ }
+ // there's a bucket prefix, so strip it
+ final String stripped = key.substring(bucketPrefix.length());
+ if (stripped.startsWith("bucket.") || "impl".equals(stripped)) {
+ // tell user off
+ LOG.debug("Ignoring bucket option {}", key);
+ } else {
+ // propagate the value, building a new origin field.
+ // to track overwrites, the generic key is overwritten even if
+ // already matches the new one.
+ final String generic = OBSConstants.FS_OBS_PREFIX + stripped;
+ LOG.debug("Updating {}", generic);
+ dest.set(generic, value, key);
+ }
+ }
+ return dest;
+ }
+
+ /**
+ * Patch the security credential provider information in {@link
+ * #CREDENTIAL_PROVIDER_PATH} with the providers listed in {@link
+ * OBSConstants#OBS_SECURITY_CREDENTIAL_PROVIDER_PATH}.
+ *
+ *
This allows different buckets to use different credential files.
+ *
+ * @param conf configuration to patch
+ */
+ static void patchSecurityCredentialProviders(final Configuration conf) {
+ Collection customCredentials =
+ conf.getStringCollection(
+ OBSConstants.OBS_SECURITY_CREDENTIAL_PROVIDER_PATH);
+ Collection hadoopCredentials = conf.getStringCollection(
+ CREDENTIAL_PROVIDER_PATH);
+ if (!customCredentials.isEmpty()) {
+ List all = Lists.newArrayList(customCredentials);
+ all.addAll(hadoopCredentials);
+ String joined = StringUtils.join(all, ',');
+ LOG.debug("Setting {} to {}", CREDENTIAL_PROVIDER_PATH, joined);
+ conf.set(CREDENTIAL_PROVIDER_PATH, joined, "patch of "
+ + OBSConstants.OBS_SECURITY_CREDENTIAL_PROVIDER_PATH);
+ }
+ }
+
+ /**
+ * Verify that the bucket exists. This does not check permissions, not even
+ * read access.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @throws FileNotFoundException the bucket is absent
+ * @throws IOException any other problem talking to OBS
+ */
+ static void verifyBucketExists(final OBSFileSystem owner)
+ throws FileNotFoundException, IOException {
+ int retryTime = 1;
+ while (true) {
+ try {
+ if (!owner.getObsClient().headBucket(owner.getBucket())) {
+ throw new FileNotFoundException(
+ "Bucket " + owner.getBucket() + " does not exist");
+ }
+ return;
+ } catch (ObsException e) {
+ LOG.warn("Failed to head bucket for [{}], retry time [{}], "
+ + "exception [{}]", owner.getBucket(), retryTime,
+ translateException("doesBucketExist", owner.getBucket(),
+ e));
+
+ if (MAX_RETRY_TIME == retryTime) {
+ throw translateException("doesBucketExist",
+ owner.getBucket(), e);
+ }
+
+ try {
+ Thread.sleep(DELAY_TIME);
+ } catch (InterruptedException ie) {
+ throw e;
+ }
+ }
+ retryTime++;
+ }
+ }
+
+ /**
+ * initialize multi-part upload, purge larger than the value of
+ * PURGE_EXISTING_MULTIPART_AGE.
+ *
+ * @param owner the owner OBSFileSystem instance
+ * @param conf the configuration to use for the FS
+ * @throws IOException on any failure to initialize multipart upload
+ */
+ static void initMultipartUploads(final OBSFileSystem owner,
+ final Configuration conf)
+ throws IOException {
+ boolean purgeExistingMultipart =
+ conf.getBoolean(OBSConstants.PURGE_EXISTING_MULTIPART,
+ OBSConstants.DEFAULT_PURGE_EXISTING_MULTIPART);
+ long purgeExistingMultipartAge =
+ longOption(conf, OBSConstants.PURGE_EXISTING_MULTIPART_AGE,
+ OBSConstants.DEFAULT_PURGE_EXISTING_MULTIPART_AGE, 0);
+
+ if (!purgeExistingMultipart) {
+ return;
+ }
+
+ final Date purgeBefore = new Date(
+ new Date().getTime() - purgeExistingMultipartAge * 1000);
+
+ try {
+ ListMultipartUploadsRequest request
+ = new ListMultipartUploadsRequest(owner.getBucket());
+ while (true) {
+ // List + purge
+ MultipartUploadListing uploadListing = owner.getObsClient()
+ .listMultipartUploads(request);
+ for (MultipartUpload upload
+ : uploadListing.getMultipartTaskList()) {
+ if (upload.getInitiatedDate().compareTo(purgeBefore) < 0) {
+ owner.getObsClient().abortMultipartUpload(
+ new AbortMultipartUploadRequest(
+ owner.getBucket(), upload.getObjectKey(),
+ upload.getUploadId()));
+ }
+ }
+ if (!uploadListing.isTruncated()) {
+ break;
+ }
+ request.setUploadIdMarker(
+ uploadListing.getNextUploadIdMarker());
+ request.setKeyMarker(uploadListing.getNextKeyMarker());
+ }
+ } catch (ObsException e) {
+ if (e.getResponseCode() == FORBIDDEN_CODE) {
+ LOG.debug("Failed to purging multipart uploads against {},"
+ + " FS may be read only", owner.getBucket(),
+ e);
+ } else {
+ throw translateException("purging multipart uploads",
+ owner.getBucket(), e);
+ }
+ }
+ }
+
+ static void shutdownAll(final ExecutorService... executors) {
+ for (ExecutorService exe : executors) {
+ if (exe != null) {
+ try {
+ if (LOG != null) {
+ LOG.debug("Shutdown {}", exe);
+ }
+ exe.shutdown();
+ } catch (Exception e) {
+ if (LOG != null && LOG.isDebugEnabled()) {
+ LOG.debug("Exception in shutdown {}", exe, e);
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSConstants.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSConstants.java
new file mode 100644
index 0000000000000..ac72e0404c4ac
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSConstants.java
@@ -0,0 +1,726 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+/**
+ * All constants used by {@link OBSFileSystem}.
+ *
+ *
Some of the strings are marked as {@code Unstable}. This means that they
+ * may be unsupported in future; at which point they will be marked as
+ * deprecated and simply ignored.
+ */
+@InterfaceAudience.Public
+@InterfaceStability.Evolving
+final class OBSConstants {
+ /**
+ * Minimum multipart size which OBS supports.
+ */
+ static final int MULTIPART_MIN_SIZE = 5 * 1024 * 1024;
+
+ /**
+ * OBS access key.
+ */
+ static final String ACCESS_KEY = "fs.obs.access.key";
+
+ /**
+ * OBS secret key.
+ */
+ static final String SECRET_KEY = "fs.obs.secret.key";
+
+ /**
+ * OBS credentials provider.
+ */
+ static final String OBS_CREDENTIALS_PROVIDER
+ = "fs.obs.credentials.provider";
+
+ /**
+ * OBS client security provider.
+ */
+ static final String OBS_SECURITY_PROVIDER = "fs.obs.security.provider";
+
+ /**
+ * Extra set of security credentials which will be prepended to that set in
+ * {@code "hadoop.security.credential.provider.path"}. This extra option
+ * allows for per-bucket overrides.
+ */
+ static final String OBS_SECURITY_CREDENTIAL_PROVIDER_PATH =
+ "fs.obs.security.credential.provider.path";
+
+ /**
+ * Session token for when using TemporaryOBSCredentialsProvider.
+ */
+ static final String SESSION_TOKEN = "fs.obs.session.token";
+
+ /**
+ * Maximum number of simultaneous connections to obs.
+ */
+ static final String MAXIMUM_CONNECTIONS = "fs.obs.connection.maximum";
+
+ /**
+ * Default value of {@link #MAXIMUM_CONNECTIONS}.
+ */
+ static final int DEFAULT_MAXIMUM_CONNECTIONS = 1000;
+
+ /**
+ * Connect to obs over ssl.
+ */
+ static final String SECURE_CONNECTIONS = "fs.obs.connection.ssl.enabled";
+
+ /**
+ * Default value of {@link #SECURE_CONNECTIONS}.
+ */
+ static final boolean DEFAULT_SECURE_CONNECTIONS = false;
+
+ /**
+ * Use a custom endpoint.
+ */
+ static final String ENDPOINT = "fs.obs.endpoint";
+
+ /**
+ * Host for connecting to OBS through proxy server.
+ */
+ static final String PROXY_HOST = "fs.obs.proxy.host";
+
+ /**
+ * Port for connecting to OBS through proxy server.
+ */
+ static final String PROXY_PORT = "fs.obs.proxy.port";
+
+ /**
+ * User name for connecting to OBS through proxy server.
+ */
+ static final String PROXY_USERNAME = "fs.obs.proxy.username";
+
+ /**
+ * Password for connecting to OBS through proxy server.
+ */
+ static final String PROXY_PASSWORD = "fs.obs.proxy.password";
+
+ /**
+ * Default port for HTTPS.
+ */
+ static final int DEFAULT_HTTPS_PORT = 443;
+
+ /**
+ * Default port for HTTP.
+ */
+ static final int DEFAULT_HTTP_PORT = 80;
+
+ /**
+ * Number of times we should retry errors.
+ */
+ static final String MAX_ERROR_RETRIES = "fs.obs.attempts.maximum";
+
+ /**
+ * Default value of {@link #MAX_ERROR_RETRIES}.
+ */
+ static final int DEFAULT_MAX_ERROR_RETRIES = 3;
+
+ /**
+ * Seconds until we give up trying to establish a connection to obs.
+ */
+ static final String ESTABLISH_TIMEOUT
+ = "fs.obs.connection.establish.timeout";
+
+ /**
+ * Default value of {@link #ESTABLISH_TIMEOUT}.
+ */
+ static final int DEFAULT_ESTABLISH_TIMEOUT = 120000;
+
+ /**
+ * Seconds until we give up on a connection to obs.
+ */
+ static final String SOCKET_TIMEOUT = "fs.obs.connection.timeout";
+
+ /**
+ * Default value of {@link #SOCKET_TIMEOUT}.
+ */
+ static final int DEFAULT_SOCKET_TIMEOUT = 120000;
+
+ /**
+ * Socket send buffer to be used in OBS SDK.
+ */
+ static final String SOCKET_SEND_BUFFER = "fs.obs.socket.send.buffer";
+
+ /**
+ * Default value of {@link #SOCKET_SEND_BUFFER}.
+ */
+ static final int DEFAULT_SOCKET_SEND_BUFFER = 256 * 1024;
+
+ /**
+ * Socket receive buffer to be used in OBS SDK.
+ */
+ static final String SOCKET_RECV_BUFFER = "fs.obs.socket.recv.buffer";
+
+ /**
+ * Default value of {@link #SOCKET_RECV_BUFFER}.
+ */
+ static final int DEFAULT_SOCKET_RECV_BUFFER = 256 * 1024;
+
+ /**
+ * Number of records to get while paging through a directory listing.
+ */
+ static final String MAX_PAGING_KEYS = "fs.obs.paging.maximum";
+
+ /**
+ * Default value of {@link #MAX_PAGING_KEYS}.
+ */
+ static final int DEFAULT_MAX_PAGING_KEYS = 1000;
+
+ /**
+ * Maximum number of threads to allow in the pool used by TransferManager.
+ */
+ static final String MAX_THREADS = "fs.obs.threads.max";
+
+ /**
+ * Default value of {@link #MAX_THREADS}.
+ */
+ static final int DEFAULT_MAX_THREADS = 20;
+
+ /**
+ * Maximum number of tasks cached if all threads are already uploading.
+ */
+ static final String MAX_TOTAL_TASKS = "fs.obs.max.total.tasks";
+
+ /**
+ * Default value of {@link #MAX_TOTAL_TASKS}.
+ */
+ static final int DEFAULT_MAX_TOTAL_TASKS = 20;
+
+ /**
+ * Max number of copy threads.
+ */
+ static final String MAX_COPY_THREADS = "fs.obs.copy.threads.max";
+
+ /**
+ * Default value of {@link #MAX_COPY_THREADS}.
+ */
+ static final int DEFAULT_MAX_COPY_THREADS = 40;
+
+ /**
+ * Max number of delete threads.
+ */
+ static final String MAX_DELETE_THREADS = "fs.obs.delete.threads.max";
+
+ /**
+ * Default value of {@link #MAX_DELETE_THREADS}.
+ */
+ static final int DEFAULT_MAX_DELETE_THREADS = 20;
+
+ /**
+ * Unused option: maintained for compile-time compatibility. If set, a warning
+ * is logged in OBS during init.
+ */
+ @Deprecated
+ static final String CORE_THREADS = "fs.obs.threads.core";
+
+ /**
+ * The time that an idle thread waits before terminating.
+ */
+ static final String KEEPALIVE_TIME = "fs.obs.threads.keepalivetime";
+
+ /**
+ * Default value of {@link #KEEPALIVE_TIME}.
+ */
+ static final int DEFAULT_KEEPALIVE_TIME = 60;
+
+ /**
+ * Size of each of or multipart pieces in bytes.
+ */
+ static final String MULTIPART_SIZE = "fs.obs.multipart.size";
+
+ /**
+ * Default value of {@link #MULTIPART_SIZE}.
+ */
+ static final long DEFAULT_MULTIPART_SIZE = 104857600; // 100 MB
+
+ /**
+ * Enable multi-object delete calls.
+ */
+ static final String ENABLE_MULTI_DELETE = "fs.obs.multiobjectdelete.enable";
+
+ /**
+ * Max number of objects in one multi-object delete call. This option takes
+ * effect only when the option 'ENABLE_MULTI_DELETE' is set to 'true'.
+ */
+ static final String MULTI_DELETE_MAX_NUMBER
+ = "fs.obs.multiobjectdelete.maximum";
+
+ /**
+ * Default value of {@link #MULTI_DELETE_MAX_NUMBER}.
+ */
+ static final int DEFAULT_MULTI_DELETE_MAX_NUMBER = 1000;
+
+ /**
+ * Delete recursively or not.
+ */
+ static final String MULTI_DELETE_RECURSION
+ = "fs.obs.multiobjectdelete.recursion";
+
+ /**
+ * Minimum number of objects in one multi-object delete call.
+ */
+ static final String MULTI_DELETE_THRESHOLD
+ = "fs.obs.multiobjectdelete.threshold";
+
+ /**
+ * Default value of {@link #MULTI_DELETE_THRESHOLD}.
+ */
+ static final int MULTI_DELETE_DEFAULT_THRESHOLD = 3;
+
+ /**
+ * Comma separated list of directories.
+ */
+ static final String BUFFER_DIR = "fs.obs.buffer.dir";
+
+ /**
+ * Switch to the fast block-by-block upload mechanism.
+ */
+ static final String FAST_UPLOAD = "fs.obs.fast.upload";
+
+ /**
+ * What buffer to use. Default is {@link #FAST_UPLOAD_BUFFER_DISK} Value:
+ * {@value}
+ */
+ @InterfaceStability.Unstable
+ static final String FAST_UPLOAD_BUFFER = "fs.obs.fast.upload.buffer";
+
+ /**
+ * Buffer blocks to disk: {@value}. Capacity is limited to available disk
+ * space.
+ */
+ @InterfaceStability.Unstable
+ static final String FAST_UPLOAD_BUFFER_DISK = "disk";
+
+ /**
+ * Use an in-memory array. Fast but will run of heap rapidly: {@value}.
+ */
+ @InterfaceStability.Unstable
+ static final String FAST_UPLOAD_BUFFER_ARRAY = "array";
+
+ /**
+ * Use a byte buffer. May be more memory efficient than the {@link
+ * #FAST_UPLOAD_BUFFER_ARRAY}: {@value}.
+ */
+ @InterfaceStability.Unstable
+ static final String FAST_UPLOAD_BYTEBUFFER = "bytebuffer";
+
+ /**
+ * Maximum number of blocks a single output stream can have active (uploading,
+ * or queued to the central FileSystem instance's pool of queued operations.
+ * )This stops a single stream overloading the shared thread pool. {@value}
+ *
+ *
Default is {@link #DEFAULT_FAST_UPLOAD_ACTIVE_BLOCKS}
+ */
+ @InterfaceStability.Unstable
+ static final String FAST_UPLOAD_ACTIVE_BLOCKS
+ = "fs.obs.fast.upload.active.blocks";
+
+ /**
+ * Limit of queued block upload operations before writes block. Value:
+ * {@value}
+ */
+ @InterfaceStability.Unstable
+ static final int DEFAULT_FAST_UPLOAD_ACTIVE_BLOCKS = 4;
+
+ /**
+ * Canned acl options: Private | PublicRead | PublicReadWrite |
+ * AuthenticatedRead | LogDeliveryWrite | BucketOwnerRead |
+ * BucketOwnerFullControl.
+ */
+ static final String CANNED_ACL = "fs.obs.acl.default";
+
+ /**
+ * Default value of {@link #CANNED_ACL}.
+ */
+ static final String DEFAULT_CANNED_ACL = "";
+
+ /**
+ * Should we try to purge old multipart uploads when starting up.
+ */
+ static final String PURGE_EXISTING_MULTIPART = "fs.obs.multipart.purge";
+
+ /**
+ * Default value of {@link #PURGE_EXISTING_MULTIPART}.
+ */
+ static final boolean DEFAULT_PURGE_EXISTING_MULTIPART = false;
+
+ /**
+ * Purge any multipart uploads older than this number of seconds.
+ */
+ static final String PURGE_EXISTING_MULTIPART_AGE
+ = "fs.obs.multipart.purge.age";
+
+ /**
+ * Default value of {@link #PURGE_EXISTING_MULTIPART_AGE}.
+ */
+ static final long DEFAULT_PURGE_EXISTING_MULTIPART_AGE = 86400;
+
+ /**
+ * OBS folder suffix.
+ */
+ static final String OBS_FOLDER_SUFFIX = "_$folder$";
+
+ /**
+ * Block size for
+ * {@link org.apache.hadoop.fs.FileSystem#getDefaultBlockSize()}.
+ */
+ static final String FS_OBS_BLOCK_SIZE = "fs.obs.block.size";
+
+ /**
+ * Default value of {@link #FS_OBS_BLOCK_SIZE}.
+ */
+ static final int DEFAULT_FS_OBS_BLOCK_SIZE = 128 * 1024 * 1024;
+
+ /**
+ * OBS scheme.
+ */
+ static final String OBS_SCHEME = "obs";
+
+ /**
+ * Prefix for all OBS properties: {@value}.
+ */
+ static final String FS_OBS_PREFIX = "fs.obs.";
+
+ /**
+ * Prefix for OBS bucket-specific properties: {@value}.
+ */
+ static final String FS_OBS_BUCKET_PREFIX = "fs.obs.bucket.";
+
+ /**
+ * OBS default port.
+ */
+ static final int OBS_DEFAULT_PORT = -1;
+
+ /**
+ * User agent prefix.
+ */
+ static final String USER_AGENT_PREFIX = "fs.obs.user.agent.prefix";
+
+ /**
+ * Read ahead buffer size to prevent connection re-establishments.
+ */
+ static final String READAHEAD_RANGE = "fs.obs.readahead.range";
+
+ /**
+ * Default value of {@link #READAHEAD_RANGE}.
+ */
+ static final long DEFAULT_READAHEAD_RANGE = 1024 * 1024;
+
+ /**
+ * Flag indicating if {@link OBSInputStream#read(long, byte[], int, int)} will
+ * use the implementation of
+ * {@link org.apache.hadoop.fs.FSInputStream#read(long,
+ * byte[], int, int)}.
+ */
+ static final String READ_TRANSFORM_ENABLE = "fs.obs.read.transform.enable";
+
+ /**
+ * OBS client factory implementation class.
+ */
+ @InterfaceAudience.Private
+ @InterfaceStability.Unstable
+ static final String OBS_CLIENT_FACTORY_IMPL
+ = "fs.obs.client.factory.impl";
+
+ /**
+ * Default value of {@link #OBS_CLIENT_FACTORY_IMPL}.
+ */
+ @InterfaceAudience.Private
+ @InterfaceStability.Unstable
+ static final Class extends OBSClientFactory>
+ DEFAULT_OBS_CLIENT_FACTORY_IMPL =
+ DefaultOBSClientFactory.class;
+
+ /**
+ * Maximum number of partitions in a multipart upload: {@value}.
+ */
+ @InterfaceAudience.Private
+ static final int MAX_MULTIPART_COUNT = 10000;
+
+ // OBS Client configuration
+
+ /**
+ * Idle connection time.
+ */
+ static final String IDLE_CONNECTION_TIME = "fs.obs.idle.connection.time";
+
+ /**
+ * Default value of {@link #IDLE_CONNECTION_TIME}.
+ */
+ static final int DEFAULT_IDLE_CONNECTION_TIME = 30000;
+
+ /**
+ * Maximum number of idle connections.
+ */
+ static final String MAX_IDLE_CONNECTIONS = "fs.obs.max.idle.connections";
+
+ /**
+ * Default value of {@link #MAX_IDLE_CONNECTIONS}.
+ */
+ static final int DEFAULT_MAX_IDLE_CONNECTIONS = 1000;
+
+ /**
+ * Keep alive.
+ */
+ static final String KEEP_ALIVE = "fs.obs.keep.alive";
+
+ /**
+ * Default value of {@link #KEEP_ALIVE}.
+ */
+ static final boolean DEFAULT_KEEP_ALIVE = true;
+
+ /**
+ * Validate certificate.
+ */
+ static final String VALIDATE_CERTIFICATE = "fs.obs.validate.certificate";
+
+ /**
+ * Default value of {@link #VALIDATE_CERTIFICATE}.
+ */
+ static final boolean DEFAULT_VALIDATE_CERTIFICATE = false;
+
+ /**
+ * Verify response content type.
+ */
+ static final String VERIFY_RESPONSE_CONTENT_TYPE
+ = "fs.obs.verify.response.content.type";
+
+ /**
+ * Default value of {@link #VERIFY_RESPONSE_CONTENT_TYPE}.
+ */
+ static final boolean DEFAULT_VERIFY_RESPONSE_CONTENT_TYPE = true;
+
+ /**
+ * UploadStreamRetryBufferSize.
+ */
+ static final String UPLOAD_STREAM_RETRY_SIZE
+ = "fs.obs.upload.stream.retry.buffer.size";
+
+ /**
+ * Default value of {@link #UPLOAD_STREAM_RETRY_SIZE}.
+ */
+ static final int DEFAULT_UPLOAD_STREAM_RETRY_SIZE = 512 * 1024;
+
+ /**
+ * Read buffer size.
+ */
+ static final String READ_BUFFER_SIZE = "fs.obs.read.buffer.size";
+
+ /**
+ * Default value of {@link #READ_BUFFER_SIZE}.
+ */
+ static final int DEFAULT_READ_BUFFER_SIZE = 256 * 1024;
+
+ /**
+ * Write buffer size.
+ */
+ static final String WRITE_BUFFER_SIZE = "fs.obs.write.buffer.size";
+
+ /**
+ * Default value of {@link #WRITE_BUFFER_SIZE}.
+ */
+ static final int DEFAULT_WRITE_BUFFER_SIZE = 256 * 1024;
+
+ /**
+ * Canonical name.
+ */
+ static final String CNAME = "fs.obs.cname";
+
+ /**
+ * Default value of {@link #CNAME}.
+ */
+ static final boolean DEFAULT_CNAME = false;
+
+ /**
+ * Strict host name verification.
+ */
+ static final String STRICT_HOSTNAME_VERIFICATION
+ = "fs.obs.strict.hostname.verification";
+
+ /**
+ * Default value of {@link #STRICT_HOSTNAME_VERIFICATION}.
+ */
+ static final boolean DEFAULT_STRICT_HOSTNAME_VERIFICATION = false;
+
+ /**
+ * Size of object copy part pieces in bytes.
+ */
+ static final String COPY_PART_SIZE = "fs.obs.copypart.size";
+
+ /**
+ * Maximum value of {@link #COPY_PART_SIZE}.
+ */
+ static final long MAX_COPY_PART_SIZE = 5368709120L; // 5GB
+
+ /**
+ * Default value of {@link #COPY_PART_SIZE}.
+ */
+ static final long DEFAULT_COPY_PART_SIZE = 104857600L; // 100MB
+
+ /**
+ * Maximum number of copy part threads.
+ */
+ static final String MAX_COPY_PART_THREADS = "fs.obs.copypart.threads.max";
+
+ /**
+ * Default value of {@link #MAX_COPY_PART_THREADS}.
+ */
+ static final int DEFAULT_MAX_COPY_PART_THREADS = 40;
+
+ /**
+ * Number of core list threads.
+ */
+ static final String CORE_LIST_THREADS = "fs.obs.list.threads.core";
+
+ /**
+ * Default value of {@link #CORE_LIST_THREADS}.
+ */
+ static final int DEFAULT_CORE_LIST_THREADS = 30;
+
+ /**
+ * Maximum number of list threads.
+ */
+ static final String MAX_LIST_THREADS = "fs.obs.list.threads.max";
+
+ /**
+ * Default value of {@link #MAX_LIST_THREADS}.
+ */
+ static final int DEFAULT_MAX_LIST_THREADS = 60;
+
+ /**
+ * Capacity of list work queue.
+ */
+ static final String LIST_WORK_QUEUE_CAPACITY
+ = "fs.obs.list.workqueue.capacity";
+
+ /**
+ * Default value of {@link #LIST_WORK_QUEUE_CAPACITY}.
+ */
+ static final int DEFAULT_LIST_WORK_QUEUE_CAPACITY = 1024;
+
+ /**
+ * List parallel factor.
+ */
+ static final String LIST_PARALLEL_FACTOR = "fs.obs.list.parallel.factor";
+
+ /**
+ * Default value of {@link #LIST_PARALLEL_FACTOR}.
+ */
+ static final int DEFAULT_LIST_PARALLEL_FACTOR = 30;
+
+ /**
+ * Switch for the fast delete.
+ */
+ static final String TRASH_ENABLE = "fs.obs.trash.enable";
+
+ /**
+ * Enable obs content summary or not.
+ */
+ static final String OBS_CONTENT_SUMMARY_ENABLE
+ = "fs.obs.content.summary.enable";
+
+ /**
+ * Enable obs client dfs list or not.
+ */
+ static final String OBS_CLIENT_DFS_LIST_ENABLE
+ = "fs.obs.client.dfs.list.enable";
+
+ /**
+ * Default trash : false.
+ */
+ static final boolean DEFAULT_TRASH = false;
+
+ /**
+ * The fast delete recycle directory.
+ */
+ static final String TRASH_DIR = "fs.obs.trash.dir";
+
+ /**
+ * Encryption type is sse-kms or sse-c.
+ */
+ static final String SSE_TYPE = "fs.obs.server-side-encryption-type";
+
+ /**
+ * Kms key id for sse-kms, while key base64 encoded content for sse-c.
+ */
+ static final String SSE_KEY = "fs.obs.server-side-encryption-key";
+
+ /**
+ * Array first block size.
+ */
+ static final String FAST_UPLOAD_BUFFER_ARRAY_FIRST_BLOCK_SIZE
+ = "fs.obs.fast.upload.array.first.buffer";
+
+ /**
+ * The fast upload buffer array first block default size.
+ */
+ static final int FAST_UPLOAD_BUFFER_ARRAY_FIRST_BLOCK_SIZE_DEFAULT = 1024
+ * 1024;
+
+ /**
+ * Auth Type Negotiation Enable Switch.
+ */
+ static final String SDK_AUTH_TYPE_NEGOTIATION_ENABLE
+ = "fs.obs.authtype.negotiation.enable";
+
+ /**
+ * Default value of {@link #SDK_AUTH_TYPE_NEGOTIATION_ENABLE}.
+ */
+ static final boolean DEFAULT_SDK_AUTH_TYPE_NEGOTIATION_ENABLE = false;
+
+ /**
+ * Okhttp retryOnConnectionFailure switch.
+ */
+ static final String SDK_RETRY_ON_CONNECTION_FAILURE_ENABLE
+ = "fs.obs.connection.retry.enable";
+
+ /**
+ * Default value of {@link #SDK_RETRY_ON_CONNECTION_FAILURE_ENABLE}.
+ */
+ static final boolean DEFAULT_SDK_RETRY_ON_CONNECTION_FAILURE_ENABLE = true;
+
+ /**
+ * Sdk max retry times on unexpected end of stream. exception, default: -1,
+ * don't retry
+ */
+ static final String SDK_RETRY_TIMES_ON_UNEXPECTED_END_EXCEPTION
+ = "fs.obs.unexpectedend.retrytime";
+
+ /**
+ * Default value of {@link #SDK_RETRY_TIMES_ON_UNEXPECTED_END_EXCEPTION}.
+ */
+ static final int DEFAULT_SDK_RETRY_TIMES_ON_UNEXPECTED_END_EXCEPTION = -1;
+
+ /**
+ * Maximum sdk connection retry times, default : 2000.
+ */
+ static final int DEFAULT_MAX_SDK_CONNECTION_RETRY_TIMES = 2000;
+
+ /**
+ * Second to millisecond factor.
+ */
+ static final int SEC2MILLISEC_FACTOR = 1000;
+
+ private OBSConstants() {
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSDataBlocks.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSDataBlocks.java
new file mode 100644
index 0000000000000..e347970ee8446
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSDataBlocks.java
@@ -0,0 +1,1020 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.util.Preconditions;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSExceptionMessages;
+import org.apache.hadoop.fs.LocalDirAllocator;
+import org.apache.hadoop.util.DirectBufferPool;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedOutputStream;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.Closeable;
+import java.io.EOFException;
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
+
+/**
+ * Set of classes to support output streaming into blocks which are then
+ * uploaded as to OBS as a single PUT, or as part of a multipart request.
+ */
+final class OBSDataBlocks {
+
+ /**
+ * Class logger.
+ */
+ private static final Logger LOG = LoggerFactory.getLogger(
+ OBSDataBlocks.class);
+
+ private OBSDataBlocks() {
+ }
+
+ /**
+ * Validate args to a write command. These are the same validation checks
+ * expected for any implementation of {@code OutputStream.write()}.
+ *
+ * @param b byte array containing data
+ * @param off offset in array where to start
+ * @param len number of bytes to be written
+ * @throws NullPointerException for a null buffer
+ * @throws IndexOutOfBoundsException if indices are out of range
+ */
+ static void validateWriteArgs(final byte[] b, final int off,
+ final int len) {
+ Preconditions.checkNotNull(b);
+ if (off < 0 || off > b.length || len < 0 || off + len > b.length
+ || off + len < 0) {
+ throw new IndexOutOfBoundsException(
+ "write (b[" + b.length + "], " + off + ", " + len + ')');
+ }
+ }
+
+ /**
+ * Create a factory.
+ *
+ * @param owner factory owner
+ * @param name factory name -the option from {@link OBSConstants}.
+ * @return the factory, ready to be initialized.
+ * @throws IllegalArgumentException if the name is unknown.
+ */
+ static BlockFactory createFactory(final OBSFileSystem owner,
+ final String name) {
+ switch (name) {
+ case OBSConstants.FAST_UPLOAD_BUFFER_ARRAY:
+ return new ByteArrayBlockFactory(owner);
+ case OBSConstants.FAST_UPLOAD_BUFFER_DISK:
+ return new DiskBlockFactory(owner);
+ case OBSConstants.FAST_UPLOAD_BYTEBUFFER:
+ return new ByteBufferBlockFactory(owner);
+ default:
+ throw new IllegalArgumentException(
+ "Unsupported block buffer" + " \"" + name + '"');
+ }
+ }
+
+ /**
+ * Base class for block factories.
+ */
+ abstract static class BlockFactory {
+ /**
+ * OBS file system type.
+ */
+ private final OBSFileSystem owner;
+
+ protected BlockFactory(final OBSFileSystem obsFileSystem) {
+ this.owner = obsFileSystem;
+ }
+
+ /**
+ * Create a block.
+ *
+ * @param index index of block
+ * @param limit limit of the block.
+ * @return a new block.
+ * @throws IOException on any failure to create block
+ */
+ abstract DataBlock create(long index, int limit) throws IOException;
+
+ /**
+ * Owner.
+ *
+ * @return obsFileSystem instance
+ */
+ protected OBSFileSystem getOwner() {
+ return owner;
+ }
+ }
+
+ /**
+ * This represents a block being uploaded.
+ */
+ abstract static class DataBlock implements Closeable {
+
+ /**
+ * Data block index.
+ */
+ private final long index;
+
+ /**
+ * Dest state can be : writing/upload/closed.
+ */
+ private volatile DestState state = DestState.Writing;
+
+ protected DataBlock(final long dataIndex) {
+ this.index = dataIndex;
+ }
+
+ /**
+ * Atomically enter a state, verifying current state.
+ *
+ * @param current current state. null means "no check"
+ * @param next next state
+ * @throws IllegalStateException if the current state is not as expected
+ */
+ protected final synchronized void enterState(final DestState current,
+ final DestState next)
+ throws IllegalStateException {
+ verifyState(current);
+ LOG.debug("{}: entering state {}", this, next);
+ state = next;
+ }
+
+ /**
+ * Verify that the block is in the declared state.
+ *
+ * @param expected expected state.
+ * @throws IllegalStateException if the DataBlock is in the wrong state
+ */
+ protected final void verifyState(final DestState expected)
+ throws IllegalStateException {
+ if (expected != null && state != expected) {
+ throw new IllegalStateException(
+ "Expected stream state " + expected
+ + " -but actual state is " + state + " in " + this);
+ }
+ }
+
+ /**
+ * Current state.
+ *
+ * @return the current state.
+ */
+ protected final DestState getState() {
+ return state;
+ }
+
+ protected long getIndex() {
+ return index;
+ }
+
+ /**
+ * Return the current data size.
+ *
+ * @return the size of the data
+ */
+ abstract int dataSize();
+
+ /**
+ * Predicate to verify that the block has the capacity to write the given
+ * set of bytes.
+ *
+ * @param bytes number of bytes desired to be written.
+ * @return true if there is enough space.
+ */
+ abstract boolean hasCapacity(long bytes);
+
+ /**
+ * Predicate to check if there is data in the block.
+ *
+ * @return true if there is
+ */
+ boolean hasData() {
+ return dataSize() > 0;
+ }
+
+ /**
+ * The remaining capacity in the block before it is full.
+ *
+ * @return the number of bytes remaining.
+ */
+ abstract int remainingCapacity();
+
+ /**
+ * Write a series of bytes from the buffer, from the offset. Returns the
+ * number of bytes written. Only valid in the state {@code Writing}. Base
+ * class verifies the state but does no writing.
+ *
+ * @param buffer buffer
+ * @param offset offset
+ * @param length length of write
+ * @return number of bytes written
+ * @throws IOException trouble
+ */
+ int write(final byte[] buffer, final int offset, final int length)
+ throws IOException {
+ verifyState(DestState.Writing);
+ Preconditions.checkArgument(buffer != null, "Null buffer");
+ Preconditions.checkArgument(length >= 0, "length is negative");
+ Preconditions.checkArgument(offset >= 0, "offset is negative");
+ Preconditions.checkArgument(
+ !(buffer.length - offset < length),
+ "buffer shorter than amount of data to write");
+ return 0;
+ }
+
+ /**
+ * Flush the output. Only valid in the state {@code Writing}. In the base
+ * class, this is a no-op
+ *
+ * @throws IOException any IO problem.
+ */
+ void flush() throws IOException {
+ verifyState(DestState.Writing);
+ }
+
+ /**
+ * Switch to the upload state and return a stream for uploading. Base class
+ * calls {@link #enterState(DestState, DestState)} to manage the state
+ * machine.
+ *
+ * @return the stream
+ * @throws IOException trouble
+ */
+ Object startUpload() throws IOException {
+ LOG.debug("Start datablock[{}] upload", index);
+ enterState(DestState.Writing, DestState.Upload);
+ return null;
+ }
+
+ /**
+ * Enter the closed state.
+ *
+ * @return true if the class was in any other state, implying that the
+ * subclass should do its close operations
+ */
+ protected synchronized boolean enterClosedState() {
+ if (!state.equals(DestState.Closed)) {
+ enterState(null, DestState.Closed);
+ return true;
+ } else {
+ return false;
+ }
+ }
+
+ @Override
+ public void close() throws IOException {
+ if (enterClosedState()) {
+ LOG.debug("Closed {}", this);
+ innerClose();
+ }
+ }
+
+ /**
+ * Inner close logic for subclasses to implement.
+ *
+ * @throws IOException on any failure to close
+ */
+ protected abstract void innerClose() throws IOException;
+
+ /**
+ * Destination state definition for a data block.
+ */
+ enum DestState {
+ /**
+ * destination state : writing.
+ */
+ Writing,
+ /**
+ * destination state : upload.
+ */
+ Upload,
+ /**
+ * destination state : closed.
+ */
+ Closed
+ }
+ }
+
+ /**
+ * Use byte arrays on the heap for storage.
+ */
+ static class ByteArrayBlockFactory extends BlockFactory {
+ ByteArrayBlockFactory(final OBSFileSystem owner) {
+ super(owner);
+ }
+
+ @Override
+ DataBlock create(final long index, final int limit) {
+ int firstBlockSize = super.owner.getConf()
+ .getInt(OBSConstants.FAST_UPLOAD_BUFFER_ARRAY_FIRST_BLOCK_SIZE,
+ OBSConstants
+ .FAST_UPLOAD_BUFFER_ARRAY_FIRST_BLOCK_SIZE_DEFAULT);
+ return new ByteArrayBlock(0, limit, firstBlockSize);
+ }
+ }
+
+ /**
+ * OBS specific byte array output stream.
+ */
+ static class OBSByteArrayOutputStream extends ByteArrayOutputStream {
+ OBSByteArrayOutputStream(final int size) {
+ super(size);
+ }
+
+ /**
+ * InputStream backed by the internal byte array.
+ *
+ * @return input stream
+ */
+ ByteArrayInputStream getInputStream() {
+ ByteArrayInputStream bin = new ByteArrayInputStream(this.buf, 0,
+ count);
+ this.reset();
+ this.buf = null;
+ return bin;
+ }
+ }
+
+ /**
+ * Stream to memory via a {@code ByteArrayOutputStream}.
+ *
+ *
This was taken from {@code OBSBlockOutputStream} and has the same
+ * problem which surfaced there: it can consume a lot of heap space
+ * proportional to the mismatch between writes to the stream and the JVM-wide
+ * upload bandwidth to the OBS endpoint. The memory consumption can be limited
+ * by tuning the filesystem settings to restrict the number of queued/active
+ * uploads.
+ */
+ static class ByteArrayBlock extends DataBlock {
+ /**
+ * Memory limit.
+ */
+ private final int limit;
+
+ /**
+ * Output stream.
+ */
+ private OBSByteArrayOutputStream buffer;
+
+ /**
+ * Cache data size so that it is consistent after the buffer is reset.
+ */
+ private Integer dataSize;
+
+ /**
+ * Block first size.
+ */
+ private int firstBlockSize;
+
+ /**
+ * Input stream.
+ */
+ private ByteArrayInputStream inputStream = null;
+
+ ByteArrayBlock(final long index, final int limitBlockSize,
+ final int blockSize) {
+ super(index);
+ this.limit = limitBlockSize;
+ this.buffer = new OBSByteArrayOutputStream(blockSize);
+ this.firstBlockSize = blockSize;
+ }
+
+ /**
+ * Returns the block first block size.
+ *
+ * @return the block first block size
+ */
+ @VisibleForTesting
+ public int firstBlockSize() {
+ return this.firstBlockSize;
+ }
+
+ /**
+ * Get the amount of data; if there is no buffer then the size is 0.
+ *
+ * @return the amount of data available to upload.
+ */
+ @Override
+ int dataSize() {
+ return dataSize != null ? dataSize : buffer.size();
+ }
+
+ @Override
+ InputStream startUpload() throws IOException {
+ super.startUpload();
+ dataSize = buffer.size();
+ inputStream = buffer.getInputStream();
+ return inputStream;
+ }
+
+ @Override
+ boolean hasCapacity(final long bytes) {
+ return dataSize() + bytes <= limit;
+ }
+
+ @Override
+ int remainingCapacity() {
+ return limit - dataSize();
+ }
+
+ @Override
+ int write(final byte[] b, final int offset, final int len)
+ throws IOException {
+ super.write(b, offset, len);
+ int written = Math.min(remainingCapacity(), len);
+ buffer.write(b, offset, written);
+ return written;
+ }
+
+ @Override
+ protected void innerClose() throws IOException {
+ if (buffer != null) {
+ buffer.close();
+ buffer = null;
+ }
+
+ if (inputStream != null) {
+ inputStream.close();
+ inputStream = null;
+ }
+ }
+
+ @Override
+ public String toString() {
+ return "ByteArrayBlock{"
+ + "index="
+ + getIndex()
+ + ", state="
+ + getState()
+ + ", limit="
+ + limit
+ + ", dataSize="
+ + dataSize
+ + '}';
+ }
+ }
+
+ /**
+ * Stream via Direct ByteBuffers; these are allocated off heap via {@link
+ * DirectBufferPool}.
+ */
+ static class ByteBufferBlockFactory extends BlockFactory {
+
+ /**
+ * The directory buffer pool.
+ */
+ private static final DirectBufferPool BUFFER_POOL
+ = new DirectBufferPool();
+
+ /**
+ * Count of outstanding buffers.
+ */
+ private static final AtomicInteger BUFFERS_OUTSTANDING
+ = new AtomicInteger(0);
+
+ ByteBufferBlockFactory(final OBSFileSystem owner) {
+ super(owner);
+ }
+
+ @Override
+ ByteBufferBlock create(final long index, final int limit) {
+ return new ByteBufferBlock(index, limit);
+ }
+
+ public static ByteBuffer requestBuffer(final int limit) {
+ LOG.debug("Requesting buffer of size {}", limit);
+ BUFFERS_OUTSTANDING.incrementAndGet();
+ return BUFFER_POOL.getBuffer(limit);
+ }
+
+ public static void releaseBuffer(final ByteBuffer buffer) {
+ LOG.debug("Releasing buffer");
+ BUFFER_POOL.returnBuffer(buffer);
+ BUFFERS_OUTSTANDING.decrementAndGet();
+ }
+
+ /**
+ * Get count of outstanding buffers.
+ *
+ * @return the current buffer count
+ */
+ public int getOutstandingBufferCount() {
+ return BUFFERS_OUTSTANDING.get();
+ }
+
+ @Override
+ public String toString() {
+ return "ByteBufferBlockFactory{" + "buffersOutstanding="
+ + BUFFERS_OUTSTANDING + '}';
+ }
+ }
+
+ /**
+ * A DataBlock which requests a buffer from pool on creation; returns it when
+ * it is closed.
+ */
+ static class ByteBufferBlock extends DataBlock {
+ /**
+ * Set the buffer size.
+ */
+ private final int bufferSize;
+
+ /**
+ * Create block buffer.
+ */
+ private ByteBuffer blockBuffer;
+
+ /**
+ * Cache data size so that it is consistent after the buffer is reset.
+ */
+ private Integer dataSize;
+
+ /**
+ * Create input stream.
+ */
+ private ByteBufferInputStream inputStream;
+
+ /**
+ * Instantiate. This will request a ByteBuffer of the desired size.
+ *
+ * @param index block index
+ * @param initBufferSize buffer size
+ */
+ ByteBufferBlock(final long index, final int initBufferSize) {
+ super(index);
+ this.bufferSize = initBufferSize;
+ blockBuffer = ByteBufferBlockFactory.requestBuffer(initBufferSize);
+ }
+
+ /**
+ * Get the amount of data; if there is no buffer then the size is 0.
+ *
+ * @return the amount of data available to upload.
+ */
+ @Override
+ int dataSize() {
+ return dataSize != null ? dataSize : bufferCapacityUsed();
+ }
+
+ @Override
+ InputStream startUpload() throws IOException {
+ super.startUpload();
+ dataSize = bufferCapacityUsed();
+ // set the buffer up from reading from the beginning
+ blockBuffer.limit(blockBuffer.position());
+ blockBuffer.position(0);
+ inputStream = new ByteBufferInputStream(dataSize, blockBuffer);
+ return inputStream;
+ }
+
+ @Override
+ public boolean hasCapacity(final long bytes) {
+ return bytes <= remainingCapacity();
+ }
+
+ @Override
+ public int remainingCapacity() {
+ return blockBuffer != null ? blockBuffer.remaining() : 0;
+ }
+
+ private int bufferCapacityUsed() {
+ return blockBuffer.capacity() - blockBuffer.remaining();
+ }
+
+ @Override
+ int write(final byte[] b, final int offset, final int len)
+ throws IOException {
+ super.write(b, offset, len);
+ int written = Math.min(remainingCapacity(), len);
+ blockBuffer.put(b, offset, written);
+ return written;
+ }
+
+ /**
+ * Closing the block will release the buffer.
+ */
+ @Override
+ protected void innerClose() {
+ if (blockBuffer != null) {
+ ByteBufferBlockFactory.releaseBuffer(blockBuffer);
+ blockBuffer = null;
+ }
+ if (inputStream != null) {
+ inputStream.close();
+ inputStream = null;
+ }
+ }
+
+ @Override
+ public String toString() {
+ return "ByteBufferBlock{"
+ + "index="
+ + getIndex()
+ + ", state="
+ + getState()
+ + ", dataSize="
+ + dataSize()
+ + ", limit="
+ + bufferSize
+ + ", remainingCapacity="
+ + remainingCapacity()
+ + '}';
+ }
+
+ /**
+ * Provide an input stream from a byte buffer; supporting {@link
+ * #mark(int)}, which is required to enable replay of failed PUT attempts.
+ */
+ class ByteBufferInputStream extends InputStream {
+
+ /**
+ * Set the input stream size.
+ */
+ private final int size;
+
+ /**
+ * Set the byte buffer.
+ */
+ private ByteBuffer byteBuffer;
+
+ ByteBufferInputStream(final int streamSize,
+ final ByteBuffer streamByteBuffer) {
+ LOG.debug("Creating ByteBufferInputStream of size {}",
+ streamSize);
+ this.size = streamSize;
+ this.byteBuffer = streamByteBuffer;
+ }
+
+ /**
+ * After the stream is closed, set the local reference to the byte buffer
+ * to null; this guarantees that future attempts to use stream methods
+ * will fail.
+ */
+ @Override
+ public synchronized void close() {
+ LOG.debug("ByteBufferInputStream.close() for {}",
+ ByteBufferBlock.super.toString());
+ byteBuffer = null;
+ }
+
+ /**
+ * Verify that the stream is open.
+ *
+ * @throws IOException if the stream is closed
+ */
+ private void verifyOpen() throws IOException {
+ if (byteBuffer == null) {
+ throw new IOException(FSExceptionMessages.STREAM_IS_CLOSED);
+ }
+ }
+
+ public synchronized int read() {
+ if (available() > 0) {
+ return byteBuffer.get() & OBSCommonUtils.BYTE_TO_INT_MASK;
+ } else {
+ return -1;
+ }
+ }
+
+ @Override
+ public synchronized long skip(final long offset)
+ throws IOException {
+ verifyOpen();
+ long newPos = position() + offset;
+ if (newPos < 0) {
+ throw new EOFException(FSExceptionMessages.NEGATIVE_SEEK);
+ }
+ if (newPos > size) {
+ throw new EOFException(
+ FSExceptionMessages.CANNOT_SEEK_PAST_EOF);
+ }
+ byteBuffer.position((int) newPos);
+ return newPos;
+ }
+
+ @Override
+ public synchronized int available() {
+ Preconditions.checkState(byteBuffer != null,
+ FSExceptionMessages.STREAM_IS_CLOSED);
+ return byteBuffer.remaining();
+ }
+
+ /**
+ * Get the current buffer position.
+ *
+ * @return the buffer position
+ */
+ public synchronized int position() {
+ return byteBuffer.position();
+ }
+
+ /**
+ * Check if there is data left.
+ *
+ * @return true if there is data remaining in the buffer.
+ */
+ public synchronized boolean hasRemaining() {
+ return byteBuffer.hasRemaining();
+ }
+
+ @Override
+ public synchronized void mark(final int readlimit) {
+ LOG.debug("mark at {}", position());
+ byteBuffer.mark();
+ }
+
+ @Override
+ public synchronized void reset() {
+ LOG.debug("reset");
+ byteBuffer.reset();
+ }
+
+ @Override
+ public boolean markSupported() {
+ return true;
+ }
+
+ /**
+ * Read in data.
+ *
+ * @param b destination buffer
+ * @param offset offset within the buffer
+ * @param length length of bytes to read
+ * @return read size
+ * @throws EOFException if the position is negative
+ * @throws IndexOutOfBoundsException if there isn't space for the amount
+ * of data requested.
+ * @throws IllegalArgumentException other arguments are invalid.
+ */
+ public synchronized int read(final byte[] b, final int offset,
+ final int length)
+ throws IOException {
+ Preconditions.checkArgument(length >= 0, "length is negative");
+ Preconditions.checkArgument(b != null, "Null buffer");
+ if (b.length - offset < length) {
+ throw new IndexOutOfBoundsException(
+ FSExceptionMessages.TOO_MANY_BYTES_FOR_DEST_BUFFER
+ + ": request length ="
+ + length
+ + ", with offset ="
+ + offset
+ + "; buffer capacity ="
+ + (b.length - offset));
+ }
+ verifyOpen();
+ if (!hasRemaining()) {
+ return -1;
+ }
+
+ int toRead = Math.min(length, available());
+ byteBuffer.get(b, offset, toRead);
+ return toRead;
+ }
+
+ @Override
+ public String toString() {
+ final StringBuilder sb = new StringBuilder(
+ "ByteBufferInputStream{");
+ sb.append("size=").append(size);
+ ByteBuffer buf = this.byteBuffer;
+ if (buf != null) {
+ sb.append(", available=").append(buf.remaining());
+ }
+ sb.append(", ").append(ByteBufferBlock.super.toString());
+ sb.append('}');
+ return sb.toString();
+ }
+ }
+ }
+
+ /**
+ * Buffer blocks to disk.
+ */
+ static class DiskBlockFactory extends BlockFactory {
+ /**
+ * Allocator the local directory.
+ */
+ private static LocalDirAllocator directoryAllocator;
+
+ DiskBlockFactory(final OBSFileSystem owner) {
+ super(owner);
+ }
+
+ /**
+ * Create a temp file and a {@link DiskBlock} instance to manage it.
+ *
+ * @param index block index
+ * @param limit limit of the block.
+ * @return the new block
+ * @throws IOException IO problems
+ */
+ @Override
+ DataBlock create(final long index, final int limit) throws IOException {
+ File destFile = createTmpFileForWrite(
+ String.format("obs-block-%04d-", index), limit,
+ getOwner().getConf());
+ return new DiskBlock(destFile, limit, index);
+ }
+
+ /**
+ * Demand create the directory allocator, then create a temporary file.
+ * {@link LocalDirAllocator#createTmpFileForWrite(String, long,
+ * Configuration)}.
+ *
+ * @param pathStr prefix for the temporary file
+ * @param size the size of the file that is going to be written
+ * @param conf the Configuration object
+ * @return a unique temporary file
+ * @throws IOException IO problems
+ */
+ static synchronized File createTmpFileForWrite(final String pathStr,
+ final long size, final Configuration conf)
+ throws IOException {
+ if (directoryAllocator == null) {
+ String bufferDir = conf.get(OBSConstants.BUFFER_DIR) != null
+ ? OBSConstants.BUFFER_DIR
+ : "hadoop.tmp.dir";
+ directoryAllocator = new LocalDirAllocator(bufferDir);
+ }
+ return directoryAllocator.createTmpFileForWrite(pathStr, size,
+ conf);
+ }
+ }
+
+ /**
+ * Stream to a file. This will stop at the limit; the caller is expected to
+ * create a new block.
+ */
+ static class DiskBlock extends DataBlock {
+
+ /**
+ * Create buffer file.
+ */
+ private final File bufferFile;
+
+ /**
+ * Buffer size limit.
+ */
+ private final int limit;
+
+ /**
+ * Verify block has closed or not.
+ */
+ private final AtomicBoolean closed = new AtomicBoolean(false);
+
+ /**
+ * Written bytes count.
+ */
+ private int bytesWritten;
+
+ /**
+ * Out put stream buffer.
+ */
+ private BufferedOutputStream out;
+
+ DiskBlock(final File destBufferFile, final int limitSize,
+ final long index)
+ throws FileNotFoundException {
+ super(index);
+ this.limit = limitSize;
+ this.bufferFile = destBufferFile;
+ out = new BufferedOutputStream(
+ new FileOutputStream(destBufferFile));
+ }
+
+ @Override
+ int dataSize() {
+ return bytesWritten;
+ }
+
+ @Override
+ boolean hasCapacity(final long bytes) {
+ return dataSize() + bytes <= limit;
+ }
+
+ @Override
+ int remainingCapacity() {
+ return limit - bytesWritten;
+ }
+
+ @Override
+ int write(final byte[] b, final int offset, final int len)
+ throws IOException {
+ super.write(b, offset, len);
+ int written = Math.min(remainingCapacity(), len);
+ out.write(b, offset, written);
+ bytesWritten += written;
+ return written;
+ }
+
+ @Override
+ File startUpload() throws IOException {
+ super.startUpload();
+ try {
+ out.flush();
+ } finally {
+ out.close();
+ out = null;
+ }
+ return bufferFile;
+ }
+
+ /**
+ * The close operation will delete the destination file if it still exists.
+ */
+ @Override
+ protected void innerClose() {
+ final DestState state = getState();
+ LOG.debug("Closing {}", this);
+ switch (state) {
+ case Writing:
+ if (bufferFile.exists()) {
+ // file was not uploaded
+ LOG.debug(
+ "Block[{}]: Deleting buffer file as upload "
+ + "did not start",
+ getIndex());
+ closeBlock();
+ }
+ break;
+
+ case Upload:
+ LOG.debug(
+ "Block[{}]: Buffer file {} exists close upload stream",
+ getIndex(), bufferFile);
+ break;
+
+ case Closed:
+ closeBlock();
+ break;
+
+ default:
+ // this state can never be reached, but checkstyle
+ // complains, so it is here.
+ }
+ }
+
+ /**
+ * Flush operation will flush to disk.
+ *
+ * @throws IOException IOE raised on FileOutputStream
+ */
+ @Override
+ void flush() throws IOException {
+ super.flush();
+ out.flush();
+ }
+
+ @Override
+ public String toString() {
+ return "FileBlock{index=" + getIndex() + ", destFile=" + bufferFile
+ + ", state=" + getState() + ", dataSize="
+ + dataSize() + ", limit=" + limit + '}';
+ }
+
+ /**
+ * Close the block. This will delete the block's buffer file if the block
+ * has not previously been closed.
+ */
+ void closeBlock() {
+ LOG.debug("block[{}]: closeBlock()", getIndex());
+ if (!closed.getAndSet(true)) {
+ if (!bufferFile.delete() && bufferFile.exists()) {
+ LOG.warn("delete({}) returned false",
+ bufferFile.getAbsoluteFile());
+ }
+ } else {
+ LOG.debug("block[{}]: skipping re-entrant closeBlock()",
+ getIndex());
+ }
+ }
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSFileStatus.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSFileStatus.java
new file mode 100644
index 0000000000000..448115554f84c
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSFileStatus.java
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.Path;
+
+/**
+ * File status for an OBS file.
+ *
+ *
The subclass is private as it should not be created directly.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Evolving
+class OBSFileStatus extends FileStatus {
+ /**
+ * Create a directory status.
+ *
+ * @param path the path
+ * @param owner the owner
+ */
+ OBSFileStatus(final Path path, final String owner) {
+ super(0, true, 1, 0, 0, path);
+ setOwner(owner);
+ setGroup(owner);
+ }
+
+ /**
+ * Create a directory status.
+ *
+ * @param modificationTime modification time
+ * @param path the path
+ * @param owner the owner
+ */
+ OBSFileStatus(final Path path, final long modificationTime,
+ final String owner) {
+ super(0, true, 1, 0, modificationTime, path);
+ setOwner(owner);
+ setGroup(owner);
+ }
+
+ /**
+ * Create a directory status.
+ *
+ * @param modificationTime modification time
+ * @param accessTime access time
+ * @param path the path
+ * @param owner the owner
+ */
+ OBSFileStatus(final Path path, final long modificationTime,
+ final long accessTime,
+ final String owner) {
+ super(0, true, 1, 0, modificationTime, accessTime, null, owner, owner,
+ path);
+ }
+
+ /**
+ * A simple file.
+ *
+ * @param length file length
+ * @param modificationTime mod time
+ * @param path path
+ * @param blockSize block size
+ * @param owner owner
+ */
+ OBSFileStatus(
+ final long length, final long modificationTime, final Path path,
+ final long blockSize,
+ final String owner) {
+ super(length, false, 1, blockSize, modificationTime, path);
+ setOwner(owner);
+ setGroup(owner);
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSFileSystem.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSFileSystem.java
new file mode 100644
index 0000000000000..042466bd60365
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSFileSystem.java
@@ -0,0 +1,1562 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.classification.VisibleForTesting;
+import com.obs.services.ObsClient;
+import com.obs.services.exception.ObsException;
+import com.obs.services.model.AccessControlList;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.ContentSummary;
+import org.apache.hadoop.fs.CreateFlag;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileAlreadyExistsException;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Options.ChecksumOpt;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.hadoop.util.Progressable;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.apache.hadoop.util.SemaphoredDelegatingExecutor;
+import org.apache.hadoop.util.BlockingThreadPoolExecutorService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.net.URI;
+import java.util.EnumSet;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+/**
+ * The core OBS Filesystem implementation.
+ *
+ *
This subclass is marked as private as code should not be creating it
+ * directly; use {@link FileSystem#get(Configuration)} and variants to create
+ * one.
+ *
+ *
If cast to {@code OBSFileSystem}, extra methods and features may be
+ * accessed. Consider those private and unstable.
+ *
+ *
Because it prints some of the state of the instrumentation, the output of
+ * {@link #toString()} must also be considered unstable.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Evolving
+public final class OBSFileSystem extends FileSystem {
+ /**
+ * Class logger.
+ */
+ public static final Logger LOG = LoggerFactory.getLogger(
+ OBSFileSystem.class);
+
+ /**
+ * Flag indicating if the filesystem instance is closed.
+ */
+ private final AtomicBoolean closed = new AtomicBoolean(false);
+
+ /**
+ * URI of the filesystem.
+ */
+ private URI uri;
+
+ /**
+ * Current working directory of the filesystem.
+ */
+ private Path workingDir;
+
+ /**
+ * Short name of the user who instantiated the filesystem.
+ */
+ private String username;
+
+ /**
+ * OBS client instance.
+ */
+ private ObsClient obs;
+
+ /**
+ * Flag indicating if posix bucket is used.
+ */
+ private boolean enablePosix = false;
+
+ /**
+ * Flag indicating if multi-object delete recursion is enabled.
+ */
+ private boolean enableMultiObjectDeleteRecursion = true;
+
+ /**
+ * Flag indicating if OBS specific content summary is enabled.
+ */
+ private boolean obsContentSummaryEnable = true;
+
+ /**
+ * Flag indicating if OBS client specific depth first search (DFS) list is
+ * enabled.
+ */
+ private boolean obsClientDFSListEnable = true;
+
+ /**
+ * Bucket name.
+ */
+ private String bucket;
+
+ /**
+ * Max number of keys to get while paging through a directory listing.
+ */
+ private int maxKeys;
+
+ /**
+ * OBSListing instance.
+ */
+ private OBSListing obsListing;
+
+ /**
+ * Helper for an ongoing write operation.
+ */
+ private OBSWriteOperationHelper writeHelper;
+
+ /**
+ * Part size for multipart upload.
+ */
+ private long partSize;
+
+ /**
+ * Flag indicating if multi-object delete is enabled.
+ */
+ private boolean enableMultiObjectDelete;
+
+ /**
+ * Minimum number of objects in one multi-object delete call.
+ */
+ private int multiDeleteThreshold;
+
+ /**
+ * Maximum number of entries in one multi-object delete call.
+ */
+ private int maxEntriesToDelete;
+
+ /**
+ * Bounded thread pool for multipart upload.
+ */
+ private ExecutorService boundedMultipartUploadThreadPool;
+
+ /**
+ * Bounded thread pool for copy.
+ */
+ private ThreadPoolExecutor boundedCopyThreadPool;
+
+ /**
+ * Bounded thread pool for delete.
+ */
+ private ThreadPoolExecutor boundedDeleteThreadPool;
+
+ /**
+ * Bounded thread pool for copy part.
+ */
+ private ThreadPoolExecutor boundedCopyPartThreadPool;
+
+ /**
+ * Bounded thread pool for list.
+ */
+ private ThreadPoolExecutor boundedListThreadPool;
+
+ /**
+ * List parallel factor.
+ */
+ private int listParallelFactor;
+
+ /**
+ * Read ahead range.
+ */
+ private long readAheadRange;
+
+ /**
+ * Flag indicating if {@link OBSInputStream#read(long, byte[], int, int)} will
+ * be transformed into {@link org.apache.hadoop.fs.FSInputStream#read(long,
+ * byte[], int, int)}.
+ */
+ private boolean readTransformEnable = true;
+
+ /**
+ * Factory for creating blocks.
+ */
+ private OBSDataBlocks.BlockFactory blockFactory;
+
+ /**
+ * Maximum Number of active blocks a single output stream can submit to {@link
+ * #boundedMultipartUploadThreadPool}.
+ */
+ private int blockOutputActiveBlocks;
+
+ /**
+ * Copy part size.
+ */
+ private long copyPartSize;
+
+ /**
+ * Flag indicating if fast delete is enabled.
+ */
+ private boolean enableTrash = false;
+
+ /**
+ * Trash directory for fast delete.
+ */
+ private String trashDir;
+
+ /**
+ * OBS redefined access control list.
+ */
+ private AccessControlList cannedACL;
+
+ /**
+ * Server-side encryption wrapper.
+ */
+ private SseWrapper sse;
+
+ /**
+ * Block size for {@link FileSystem#getDefaultBlockSize()}.
+ */
+ private long blockSize;
+
+ /**
+ * Initialize a FileSystem. Called after a new FileSystem instance is
+ * constructed.
+ *
+ * @param name a URI whose authority section names the host, port,
+ * etc. for this FileSystem
+ * @param originalConf the configuration to use for the FS. The
+ * bucket-specific options are patched over the base ones
+ * before any use is made of the config.
+ */
+ @Override
+ public void initialize(final URI name, final Configuration originalConf)
+ throws IOException {
+ uri = URI.create(name.getScheme() + "://" + name.getAuthority());
+ bucket = name.getAuthority();
+ // clone the configuration into one with propagated bucket options
+ Configuration conf = OBSCommonUtils.propagateBucketOptions(originalConf,
+ bucket);
+ OBSCommonUtils.patchSecurityCredentialProviders(conf);
+ super.initialize(name, conf);
+ setConf(conf);
+ try {
+
+ // Username is the current user at the time the FS was instantiated.
+ username = UserGroupInformation.getCurrentUser().getShortUserName();
+ workingDir = new Path("/user", username).makeQualified(this.uri,
+ this.getWorkingDirectory());
+
+ Class extends OBSClientFactory> obsClientFactoryClass =
+ conf.getClass(
+ OBSConstants.OBS_CLIENT_FACTORY_IMPL,
+ OBSConstants.DEFAULT_OBS_CLIENT_FACTORY_IMPL,
+ OBSClientFactory.class);
+ obs = ReflectionUtils.newInstance(obsClientFactoryClass, conf)
+ .createObsClient(name);
+ sse = new SseWrapper(conf);
+
+ OBSCommonUtils.verifyBucketExists(this);
+ enablePosix = OBSCommonUtils.getBucketFsStatus(obs, bucket);
+
+ maxKeys = OBSCommonUtils.intOption(conf,
+ OBSConstants.MAX_PAGING_KEYS,
+ OBSConstants.DEFAULT_MAX_PAGING_KEYS, 1);
+ obsListing = new OBSListing(this);
+ partSize = OBSCommonUtils.getMultipartSizeProperty(conf,
+ OBSConstants.MULTIPART_SIZE,
+ OBSConstants.DEFAULT_MULTIPART_SIZE);
+
+ // check but do not store the block size
+ blockSize = OBSCommonUtils.longBytesOption(conf,
+ OBSConstants.FS_OBS_BLOCK_SIZE,
+ OBSConstants.DEFAULT_FS_OBS_BLOCK_SIZE, 1);
+ enableMultiObjectDelete = conf.getBoolean(
+ OBSConstants.ENABLE_MULTI_DELETE, true);
+ maxEntriesToDelete = conf.getInt(
+ OBSConstants.MULTI_DELETE_MAX_NUMBER,
+ OBSConstants.DEFAULT_MULTI_DELETE_MAX_NUMBER);
+ enableMultiObjectDeleteRecursion = conf.getBoolean(
+ OBSConstants.MULTI_DELETE_RECURSION, true);
+ obsContentSummaryEnable = conf.getBoolean(
+ OBSConstants.OBS_CONTENT_SUMMARY_ENABLE, true);
+ readAheadRange = OBSCommonUtils.longBytesOption(conf,
+ OBSConstants.READAHEAD_RANGE,
+ OBSConstants.DEFAULT_READAHEAD_RANGE, 0);
+ readTransformEnable = conf.getBoolean(
+ OBSConstants.READ_TRANSFORM_ENABLE, true);
+ multiDeleteThreshold = conf.getInt(
+ OBSConstants.MULTI_DELETE_THRESHOLD,
+ OBSConstants.MULTI_DELETE_DEFAULT_THRESHOLD);
+
+ initThreadPools(conf);
+
+ writeHelper = new OBSWriteOperationHelper(this);
+
+ initCannedAcls(conf);
+
+ OBSCommonUtils.initMultipartUploads(this, conf);
+
+ String blockOutputBuffer = conf.getTrimmed(
+ OBSConstants.FAST_UPLOAD_BUFFER,
+ OBSConstants.FAST_UPLOAD_BUFFER_DISK);
+ partSize = OBSCommonUtils.ensureOutputParameterInRange(
+ OBSConstants.MULTIPART_SIZE, partSize);
+ blockFactory = OBSDataBlocks.createFactory(this, blockOutputBuffer);
+ blockOutputActiveBlocks =
+ OBSCommonUtils.intOption(conf,
+ OBSConstants.FAST_UPLOAD_ACTIVE_BLOCKS,
+ OBSConstants.DEFAULT_FAST_UPLOAD_ACTIVE_BLOCKS, 1);
+ LOG.debug(
+ "Using OBSBlockOutputStream with buffer = {}; block={};"
+ + " queue limit={}",
+ blockOutputBuffer,
+ partSize,
+ blockOutputActiveBlocks);
+
+ enableTrash = conf.getBoolean(OBSConstants.TRASH_ENABLE,
+ OBSConstants.DEFAULT_TRASH);
+ if (enableTrash) {
+ if (!isFsBucket()) {
+ String errorMsg = String.format(
+ "The bucket [%s] is not posix. not supported for "
+ + "trash.", bucket);
+ LOG.warn(errorMsg);
+ enableTrash = false;
+ trashDir = null;
+ } else {
+ trashDir = conf.get(OBSConstants.TRASH_DIR);
+ if (StringUtils.isEmpty(trashDir)) {
+ String errorMsg =
+ String.format(
+ "The trash feature(fs.obs.trash.enable) is "
+ + "enabled, but the "
+ + "configuration(fs.obs.trash.dir [%s]) "
+ + "is empty.",
+ trashDir);
+ LOG.error(errorMsg);
+ throw new ObsException(errorMsg);
+ }
+ trashDir = OBSCommonUtils.maybeAddBeginningSlash(trashDir);
+ trashDir = OBSCommonUtils.maybeAddTrailingSlash(trashDir);
+ }
+ }
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException("initializing ",
+ new Path(name), e);
+ }
+ }
+
+ private void initThreadPools(final Configuration conf) {
+ long keepAliveTime = OBSCommonUtils.longOption(conf,
+ OBSConstants.KEEPALIVE_TIME,
+ OBSConstants.DEFAULT_KEEPALIVE_TIME, 0);
+
+ int maxThreads = conf.getInt(OBSConstants.MAX_THREADS,
+ OBSConstants.DEFAULT_MAX_THREADS);
+ if (maxThreads < 2) {
+ LOG.warn(OBSConstants.MAX_THREADS
+ + " must be at least 2: forcing to 2.");
+ maxThreads = 2;
+ }
+ int totalTasks = OBSCommonUtils.intOption(conf,
+ OBSConstants.MAX_TOTAL_TASKS,
+ OBSConstants.DEFAULT_MAX_TOTAL_TASKS, 1);
+ boundedMultipartUploadThreadPool =
+ BlockingThreadPoolExecutorService.newInstance(
+ maxThreads,
+ maxThreads + totalTasks,
+ keepAliveTime,
+ TimeUnit.SECONDS,
+ "obs-transfer-shared");
+
+ int maxDeleteThreads = conf.getInt(OBSConstants.MAX_DELETE_THREADS,
+ OBSConstants.DEFAULT_MAX_DELETE_THREADS);
+ if (maxDeleteThreads < 2) {
+ LOG.warn(OBSConstants.MAX_DELETE_THREADS
+ + " must be at least 2: forcing to 2.");
+ maxDeleteThreads = 2;
+ }
+ int coreDeleteThreads = (int) Math.ceil(maxDeleteThreads / 2.0);
+ boundedDeleteThreadPool =
+ new ThreadPoolExecutor(
+ coreDeleteThreads,
+ maxDeleteThreads,
+ keepAliveTime,
+ TimeUnit.SECONDS,
+ new LinkedBlockingQueue<>(),
+ BlockingThreadPoolExecutorService.newDaemonThreadFactory(
+ "obs-delete-transfer-shared"));
+ boundedDeleteThreadPool.allowCoreThreadTimeOut(true);
+
+ if (enablePosix) {
+ obsClientDFSListEnable = conf.getBoolean(
+ OBSConstants.OBS_CLIENT_DFS_LIST_ENABLE, true);
+ if (obsClientDFSListEnable) {
+ int coreListThreads = conf.getInt(
+ OBSConstants.CORE_LIST_THREADS,
+ OBSConstants.DEFAULT_CORE_LIST_THREADS);
+ int maxListThreads = conf.getInt(OBSConstants.MAX_LIST_THREADS,
+ OBSConstants.DEFAULT_MAX_LIST_THREADS);
+ int listWorkQueueCapacity = conf.getInt(
+ OBSConstants.LIST_WORK_QUEUE_CAPACITY,
+ OBSConstants.DEFAULT_LIST_WORK_QUEUE_CAPACITY);
+ listParallelFactor = conf.getInt(
+ OBSConstants.LIST_PARALLEL_FACTOR,
+ OBSConstants.DEFAULT_LIST_PARALLEL_FACTOR);
+ if (listParallelFactor < 1) {
+ LOG.warn(OBSConstants.LIST_PARALLEL_FACTOR
+ + " must be at least 1: forcing to 1.");
+ listParallelFactor = 1;
+ }
+ boundedListThreadPool =
+ new ThreadPoolExecutor(
+ coreListThreads,
+ maxListThreads,
+ keepAliveTime,
+ TimeUnit.SECONDS,
+ new LinkedBlockingQueue<>(listWorkQueueCapacity),
+ BlockingThreadPoolExecutorService
+ .newDaemonThreadFactory(
+ "obs-list-transfer-shared"));
+ boundedListThreadPool.allowCoreThreadTimeOut(true);
+ }
+ } else {
+ int maxCopyThreads = conf.getInt(OBSConstants.MAX_COPY_THREADS,
+ OBSConstants.DEFAULT_MAX_COPY_THREADS);
+ if (maxCopyThreads < 2) {
+ LOG.warn(OBSConstants.MAX_COPY_THREADS
+ + " must be at least 2: forcing to 2.");
+ maxCopyThreads = 2;
+ }
+ int coreCopyThreads = (int) Math.ceil(maxCopyThreads / 2.0);
+ boundedCopyThreadPool =
+ new ThreadPoolExecutor(
+ coreCopyThreads,
+ maxCopyThreads,
+ keepAliveTime,
+ TimeUnit.SECONDS,
+ new LinkedBlockingQueue<>(),
+ BlockingThreadPoolExecutorService.newDaemonThreadFactory(
+ "obs-copy-transfer-shared"));
+ boundedCopyThreadPool.allowCoreThreadTimeOut(true);
+
+ copyPartSize = OBSCommonUtils.longOption(conf,
+ OBSConstants.COPY_PART_SIZE,
+ OBSConstants.DEFAULT_COPY_PART_SIZE, 0);
+ if (copyPartSize > OBSConstants.MAX_COPY_PART_SIZE) {
+ LOG.warn(
+ "obs: {} capped to ~5GB (maximum allowed part size with "
+ + "current output mechanism)",
+ OBSConstants.COPY_PART_SIZE);
+ copyPartSize = OBSConstants.MAX_COPY_PART_SIZE;
+ }
+
+ int maxCopyPartThreads = conf.getInt(
+ OBSConstants.MAX_COPY_PART_THREADS,
+ OBSConstants.DEFAULT_MAX_COPY_PART_THREADS);
+ if (maxCopyPartThreads < 2) {
+ LOG.warn(OBSConstants.MAX_COPY_PART_THREADS
+ + " must be at least 2: forcing to 2.");
+ maxCopyPartThreads = 2;
+ }
+ int coreCopyPartThreads = (int) Math.ceil(maxCopyPartThreads / 2.0);
+ boundedCopyPartThreadPool =
+ new ThreadPoolExecutor(
+ coreCopyPartThreads,
+ maxCopyPartThreads,
+ keepAliveTime,
+ TimeUnit.SECONDS,
+ new LinkedBlockingQueue<>(),
+ BlockingThreadPoolExecutorService.newDaemonThreadFactory(
+ "obs-copy-part-transfer-shared"));
+ boundedCopyPartThreadPool.allowCoreThreadTimeOut(true);
+ }
+ }
+
+ /**
+ * Is posix bucket or not.
+ *
+ * @return is it posix bucket
+ */
+ boolean isFsBucket() {
+ return enablePosix;
+ }
+
+ /**
+ * Get read transform switch stat.
+ *
+ * @return is read transform enabled
+ */
+ boolean isReadTransformEnabled() {
+ return readTransformEnable;
+ }
+
+ /**
+ * Initialize bucket acl for upload, write operation.
+ *
+ * @param conf the configuration to use for the FS.
+ */
+ private void initCannedAcls(final Configuration conf) {
+ // No canned acl in obs
+ String cannedACLName = conf.get(OBSConstants.CANNED_ACL,
+ OBSConstants.DEFAULT_CANNED_ACL);
+ if (!cannedACLName.isEmpty()) {
+ switch (cannedACLName) {
+ case "Private":
+ case "PublicRead":
+ case "PublicReadWrite":
+ case "AuthenticatedRead":
+ case "LogDeliveryWrite":
+ case "BucketOwnerRead":
+ case "BucketOwnerFullControl":
+ cannedACL = new AccessControlList();
+ break;
+ default:
+ cannedACL = null;
+ }
+ } else {
+ cannedACL = null;
+ }
+ }
+
+ /**
+ * Get the bucket acl of user setting.
+ *
+ * @return bucket acl {@link AccessControlList}
+ */
+ AccessControlList getCannedACL() {
+ return cannedACL;
+ }
+
+ /**
+ * Return the protocol scheme for the FileSystem.
+ *
+ * @return "obs"
+ */
+ @Override
+ public String getScheme() {
+ return "obs";
+ }
+
+ /**
+ * Return a URI whose scheme and authority identify this FileSystem.
+ *
+ * @return the URI of this filesystem.
+ */
+ @Override
+ public URI getUri() {
+ return uri;
+ }
+
+ /**
+ * Return the default port for this FileSystem.
+ *
+ * @return -1 to indicate the port is undefined, which agrees with the
+ * contract of {@link URI#getPort()}
+ */
+ @Override
+ public int getDefaultPort() {
+ return OBSConstants.OBS_DEFAULT_PORT;
+ }
+
+ /**
+ * Return the OBS client used by this filesystem.
+ *
+ * @return OBS client
+ */
+ @VisibleForTesting
+ ObsClient getObsClient() {
+ return obs;
+ }
+
+ /**
+ * Return the read ahead range used by this filesystem.
+ *
+ * @return read ahead range
+ */
+ @VisibleForTesting
+ long getReadAheadRange() {
+ return readAheadRange;
+ }
+
+ /**
+ * Return the bucket of this filesystem.
+ *
+ * @return the bucket
+ */
+ String getBucket() {
+ return bucket;
+ }
+
+ /**
+ * Check that a Path belongs to this FileSystem. Unlike the superclass, this
+ * version does not look at authority, but only hostname.
+ *
+ * @param path the path to check
+ * @throws IllegalArgumentException if there is an FS mismatch
+ */
+ @Override
+ public void checkPath(final Path path) {
+ OBSLoginHelper.checkPath(getConf(), getUri(), path, getDefaultPort());
+ }
+
+ /**
+ * Canonicalize the given URI.
+ *
+ * @param rawUri the URI to be canonicalized
+ * @return the canonicalized URI
+ */
+ @Override
+ protected URI canonicalizeUri(final URI rawUri) {
+ return OBSLoginHelper.canonicalizeUri(rawUri, getDefaultPort());
+ }
+
+ /**
+ * Open an FSDataInputStream at the indicated Path.
+ *
+ * @param f the file path to open
+ * @param bufferSize the size of the buffer to be used
+ * @return the FSDataInputStream for the file
+ * @throws IOException on any failure to open the file
+ */
+ @Override
+ public FSDataInputStream open(final Path f, final int bufferSize)
+ throws IOException {
+ LOG.debug("Opening '{}' for reading.", f);
+ final FileStatus fileStatus = getFileStatus(f);
+ if (fileStatus.isDirectory()) {
+ throw new FileNotFoundException(
+ "Can't open " + f + " because it is a directory");
+ }
+
+ return new FSDataInputStream(
+ new OBSInputStream(bucket, OBSCommonUtils.pathToKey(this, f),
+ fileStatus.getLen(),
+ obs, statistics, readAheadRange, this));
+ }
+
+ /**
+ * Create an FSDataOutputStream at the indicated Path with write-progress
+ * reporting.
+ *
+ * @param f the file path to create
+ * @param permission the permission to set
+ * @param overwrite if a file with this name already exists, then if true,
+ * the file will be overwritten, and if false an error will
+ * be thrown
+ * @param bufferSize the size of the buffer to be used
+ * @param replication required block replication for the file
+ * @param blkSize the requested block size
+ * @param progress the progress reporter
+ * @throws IOException on any failure to create the file
+ * @see #setPermission(Path, FsPermission)
+ */
+ @Override
+ public FSDataOutputStream create(
+ final Path f,
+ final FsPermission permission,
+ final boolean overwrite,
+ final int bufferSize,
+ final short replication,
+ final long blkSize,
+ final Progressable progress)
+ throws IOException {
+ String key = OBSCommonUtils.pathToKey(this, f);
+ FileStatus status;
+ long objectLen = 0;
+ try {
+ // get the status or throw an exception
+ status = getFileStatus(f);
+ objectLen = status.getLen();
+ // if the thread reaches here, there is something at the path
+ if (status.isDirectory()) {
+ // path references a directory: automatic error
+ throw new FileAlreadyExistsException(f + " is a directory");
+ }
+ if (!overwrite) {
+ // path references a file and overwrite is disabled
+ throw new FileAlreadyExistsException(f + " already exists");
+ }
+ LOG.debug("create: Overwriting file {}", f);
+ } catch (FileNotFoundException e) {
+ // this means the file is not found
+ LOG.debug("create: Creating new file {}", f);
+ }
+ return new FSDataOutputStream(
+ new OBSBlockOutputStream(
+ this,
+ key,
+ objectLen,
+ new SemaphoredDelegatingExecutor(
+ boundedMultipartUploadThreadPool,
+ blockOutputActiveBlocks, true),
+ false),
+ null);
+ }
+
+ /**
+ * Return the part size for multipart upload used by {@link
+ * OBSBlockOutputStream}.
+ *
+ * @return the part size
+ */
+ long getPartSize() {
+ return partSize;
+ }
+
+ /**
+ * Return the block factory used by {@link OBSBlockOutputStream}.
+ *
+ * @return the block factory
+ */
+ OBSDataBlocks.BlockFactory getBlockFactory() {
+ return blockFactory;
+ }
+
+ /**
+ * Return the write helper used by {@link OBSBlockOutputStream}.
+ *
+ * @return the write helper
+ */
+ OBSWriteOperationHelper getWriteHelper() {
+ return writeHelper;
+ }
+
+ /**
+ * Create an FSDataOutputStream at the indicated Path with write-progress
+ * reporting.
+ *
+ * @param f the file name to create
+ * @param permission permission of
+ * @param flags {@link CreateFlag}s to use for this stream
+ * @param bufferSize the size of the buffer to be used
+ * @param replication required block replication for the file
+ * @param blkSize block size
+ * @param progress progress
+ * @param checksumOpt check sum option
+ * @throws IOException io exception
+ */
+ @Override
+ @SuppressWarnings("checkstyle:parameternumber")
+ public FSDataOutputStream create(
+ final Path f,
+ final FsPermission permission,
+ final EnumSet flags,
+ final int bufferSize,
+ final short replication,
+ final long blkSize,
+ final Progressable progress,
+ final ChecksumOpt checksumOpt)
+ throws IOException {
+ LOG.debug("create: Creating new file {}, flags:{}, isFsBucket:{}", f,
+ flags, isFsBucket());
+ if (null != flags && flags.contains(CreateFlag.APPEND)) {
+ if (!isFsBucket()) {
+ throw new UnsupportedOperationException(
+ "non-posix bucket. Append is not supported by "
+ + "OBSFileSystem");
+ }
+ String key = OBSCommonUtils.pathToKey(this, f);
+ FileStatus status;
+ long objectLen = 0;
+ try {
+ // get the status or throw an FNFE
+ status = getFileStatus(f);
+ objectLen = status.getLen();
+ // if the thread reaches here, there is something at the path
+ if (status.isDirectory()) {
+ // path references a directory: automatic error
+ throw new FileAlreadyExistsException(f + " is a directory");
+ }
+ } catch (FileNotFoundException e) {
+ LOG.debug("FileNotFoundException, create: Creating new file {}",
+ f);
+ }
+
+ return new FSDataOutputStream(
+ new OBSBlockOutputStream(
+ this,
+ key,
+ objectLen,
+ new SemaphoredDelegatingExecutor(
+ boundedMultipartUploadThreadPool,
+ blockOutputActiveBlocks, true),
+ true),
+ null);
+ } else {
+ return create(
+ f,
+ permission,
+ flags == null || flags.contains(CreateFlag.OVERWRITE),
+ bufferSize,
+ replication,
+ blkSize,
+ progress);
+ }
+ }
+
+ /**
+ * Open an FSDataOutputStream at the indicated Path with write-progress
+ * reporting. Same as create(), except fails if parent directory doesn't
+ * already exist.
+ *
+ * @param path the file path to create
+ * @param permission file permission
+ * @param flags {@link CreateFlag}s to use for this stream
+ * @param bufferSize the size of the buffer to be used
+ * @param replication required block replication for the file
+ * @param blkSize block size
+ * @param progress the progress reporter
+ * @throws IOException IO failure
+ */
+ @Override
+ public FSDataOutputStream createNonRecursive(
+ final Path path,
+ final FsPermission permission,
+ final EnumSet flags,
+ final int bufferSize,
+ final short replication,
+ final long blkSize,
+ final Progressable progress)
+ throws IOException {
+ Path parent = path.getParent();
+ if (parent != null && !getFileStatus(parent).isDirectory()) {
+ // expect this to raise an exception if there is no parent
+ throw new FileAlreadyExistsException("Not a directory: " + parent);
+ }
+ return create(
+ path,
+ permission,
+ flags.contains(CreateFlag.OVERWRITE),
+ bufferSize,
+ replication,
+ blkSize,
+ progress);
+ }
+
+ /**
+ * Append to an existing file (optional operation).
+ *
+ * @param f the existing file to be appended
+ * @param bufferSize the size of the buffer to be used
+ * @param progress for reporting progress if it is not null
+ * @throws IOException indicating that append is not supported
+ */
+ @Override
+ public FSDataOutputStream append(final Path f, final int bufferSize,
+ final Progressable progress)
+ throws IOException {
+ if (!isFsBucket()) {
+ throw new UnsupportedOperationException(
+ "non-posix bucket. Append is not supported "
+ + "by OBSFileSystem");
+ }
+ LOG.debug("append: Append file {}.", f);
+ String key = OBSCommonUtils.pathToKey(this, f);
+
+ // get the status or throw an FNFE
+ FileStatus status = getFileStatus(f);
+ long objectLen = status.getLen();
+ // if the thread reaches here, there is something at the path
+ if (status.isDirectory()) {
+ // path references a directory: automatic error
+ throw new FileAlreadyExistsException(f + " is a directory");
+ }
+
+ return new FSDataOutputStream(
+ new OBSBlockOutputStream(
+ this,
+ key,
+ objectLen,
+ new SemaphoredDelegatingExecutor(
+ boundedMultipartUploadThreadPool,
+ blockOutputActiveBlocks, true),
+ true),
+ null);
+ }
+
+ /**
+ * Check if a path exists.
+ *
+ * @param f source path
+ * @return true if the path exists
+ * @throws IOException IO failure
+ */
+ @Override
+ public boolean exists(final Path f) throws IOException {
+ try {
+ return getFileStatus(f) != null;
+ } catch (FileNotFoundException | FileConflictException e) {
+ return false;
+ }
+ }
+
+ /**
+ * Rename Path src to Path dst.
+ *
+ * @param src path to be renamed
+ * @param dst new path after rename
+ * @return true if rename is successful
+ * @throws IOException on IO failure
+ */
+ @Override
+ public boolean rename(final Path src, final Path dst) throws IOException {
+ long startTime = System.currentTimeMillis();
+ long threadId = Thread.currentThread().getId();
+ LOG.debug("Rename path {} to {} start", src, dst);
+ try {
+ if (enablePosix) {
+ return OBSPosixBucketUtils.renameBasedOnPosix(this, src, dst);
+ } else {
+ return OBSObjectBucketUtils.renameBasedOnObject(this, src, dst);
+ }
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException(
+ "rename(" + src + ", " + dst + ")", src, e);
+ } catch (RenameFailedException e) {
+ LOG.error(e.getMessage());
+ return e.getExitCode();
+ } catch (FileNotFoundException e) {
+ LOG.error(e.toString());
+ return false;
+ } finally {
+ long endTime = System.currentTimeMillis();
+ LOG.debug(
+ "Rename path {} to {} finished, thread:{}, "
+ + "timeUsedInMilliSec:{}.", src, dst, threadId,
+ endTime - startTime);
+ }
+ }
+
+ /**
+ * Return maximum number of entries in one multi-object delete call.
+ *
+ * @return the maximum number of entries in one multi-object delete call
+ */
+ int getMaxEntriesToDelete() {
+ return maxEntriesToDelete;
+ }
+
+ /**
+ * Return list parallel factor.
+ *
+ * @return the list parallel factor
+ */
+ int getListParallelFactor() {
+ return listParallelFactor;
+ }
+
+ /**
+ * Return bounded thread pool for list.
+ *
+ * @return bounded thread pool for list
+ */
+ ThreadPoolExecutor getBoundedListThreadPool() {
+ return boundedListThreadPool;
+ }
+
+ /**
+ * Return a flag that indicates if OBS client specific depth first search
+ * (DFS) list is enabled.
+ *
+ * @return the flag
+ */
+ boolean isObsClientDFSListEnable() {
+ return obsClientDFSListEnable;
+ }
+
+ /**
+ * Return the {@link Statistics} instance used by this filesystem.
+ *
+ * @return the used {@link Statistics} instance
+ */
+ Statistics getSchemeStatistics() {
+ return statistics;
+ }
+
+ /**
+ * Return the minimum number of objects in one multi-object delete call.
+ *
+ * @return the minimum number of objects in one multi-object delete call
+ */
+ int getMultiDeleteThreshold() {
+ return multiDeleteThreshold;
+ }
+
+ /**
+ * Return a flag that indicates if multi-object delete is enabled.
+ *
+ * @return the flag
+ */
+ boolean isEnableMultiObjectDelete() {
+ return enableMultiObjectDelete;
+ }
+
+ /**
+ * Delete a Path. This operation is at least {@code O(files)}, with added
+ * overheads to enumerate the path. It is also not atomic.
+ *
+ * @param f the path to delete
+ * @param recursive if path is a directory and set to true, the directory is
+ * deleted else throws an exception. In case of a file the
+ * recursive can be set to either true or false
+ * @return true if delete is successful else false
+ * @throws IOException due to inability to delete a directory or file
+ */
+ @Override
+ public boolean delete(final Path f, final boolean recursive)
+ throws IOException {
+ try {
+ FileStatus status = getFileStatus(f);
+ LOG.debug("delete: path {} - recursive {}", status.getPath(),
+ recursive);
+
+ if (enablePosix) {
+ return OBSPosixBucketUtils.fsDelete(this, status, recursive);
+ }
+
+ return OBSObjectBucketUtils.objectDelete(this, status, recursive);
+ } catch (FileNotFoundException e) {
+ LOG.warn("Couldn't delete {} - does not exist", f);
+ return false;
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException("delete", f, e);
+ }
+ }
+
+ /**
+ * Return a flag that indicates if fast delete is enabled.
+ *
+ * @return the flag
+ */
+ boolean isEnableTrash() {
+ return enableTrash;
+ }
+
+ /**
+ * Return trash directory for fast delete.
+ *
+ * @return the trash directory
+ */
+ String getTrashDir() {
+ return trashDir;
+ }
+
+ /**
+ * Return a flag that indicates if multi-object delete recursion is enabled.
+ *
+ * @return the flag
+ */
+ boolean isEnableMultiObjectDeleteRecursion() {
+ return enableMultiObjectDeleteRecursion;
+ }
+
+ /**
+ * List the statuses of the files/directories in the given path if the path is
+ * a directory.
+ *
+ * @param f given path
+ * @return the statuses of the files/directories in the given patch
+ * @throws FileNotFoundException when the path does not exist
+ * @throws IOException see specific implementation
+ */
+ @Override
+ public FileStatus[] listStatus(final Path f)
+ throws FileNotFoundException, IOException {
+ long startTime = System.currentTimeMillis();
+ long threadId = Thread.currentThread().getId();
+ try {
+ FileStatus[] statuses = OBSCommonUtils.innerListStatus(this, f,
+ false);
+ long endTime = System.currentTimeMillis();
+ LOG.debug(
+ "List status for path:{}, thread:{}, timeUsedInMilliSec:{}", f,
+ threadId, endTime - startTime);
+ return statuses;
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException("listStatus", f, e);
+ }
+ }
+
+ /**
+ * This public interface is provided specially for Huawei MRS. List the
+ * statuses of the files/directories in the given path if the path is a
+ * directory. When recursive is true, iterator all objects in the given path
+ * and its sub directories.
+ *
+ * @param f given path
+ * @param recursive whether to iterator objects in sub direcotries
+ * @return the statuses of the files/directories in the given patch
+ * @throws FileNotFoundException when the path does not exist
+ * @throws IOException see specific implementation
+ */
+ public FileStatus[] listStatus(final Path f, final boolean recursive)
+ throws FileNotFoundException, IOException {
+ long startTime = System.currentTimeMillis();
+ long threadId = Thread.currentThread().getId();
+ try {
+ FileStatus[] statuses = OBSCommonUtils.innerListStatus(this, f,
+ recursive);
+ long endTime = System.currentTimeMillis();
+ LOG.debug(
+ "List status for path:{}, thread:{}, timeUsedInMilliSec:{}", f,
+ threadId, endTime - startTime);
+ return statuses;
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException(
+ "listStatus with recursive flag["
+ + (recursive ? "true] " : "false] "), f, e);
+ }
+ }
+
+ /**
+ * Return the OBSListing instance used by this filesystem.
+ *
+ * @return the OBSListing instance
+ */
+ OBSListing getObsListing() {
+ return obsListing;
+ }
+
+ /**
+ * Return the current working directory for the given file system.
+ *
+ * @return the directory pathname
+ */
+ @Override
+ public Path getWorkingDirectory() {
+ return workingDir;
+ }
+
+ /**
+ * Set the current working directory for the file system. All relative paths
+ * will be resolved relative to it.
+ *
+ * @param newDir the new working directory
+ */
+ @Override
+ public void setWorkingDirectory(final Path newDir) {
+ workingDir = newDir;
+ }
+
+ /**
+ * Return the username of the filesystem.
+ *
+ * @return the short name of the user who instantiated the filesystem
+ */
+ String getUsername() {
+ return username;
+ }
+
+ /**
+ * Make the given path and all non-existent parents into directories. Has the
+ * semantics of Unix {@code 'mkdir -p'}. Existence of the directory hierarchy
+ * is not an error.
+ *
+ * @param path path to create
+ * @param permission to apply to f
+ * @return true if a directory was created
+ * @throws FileAlreadyExistsException there is a file at the path specified
+ * @throws IOException other IO problems
+ */
+ @Override
+ public boolean mkdirs(final Path path, final FsPermission permission)
+ throws IOException, FileAlreadyExistsException {
+ try {
+ return OBSCommonUtils.innerMkdirs(this, path);
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException("mkdirs", path, e);
+ }
+ }
+
+ /**
+ * Return a file status object that represents the path.
+ *
+ * @param f the path we want information from
+ * @return a FileStatus object
+ * @throws FileNotFoundException when the path does not exist
+ * @throws IOException on other problems
+ */
+ @Override
+ public FileStatus getFileStatus(final Path f)
+ throws FileNotFoundException, IOException {
+ for (int retryTime = 1;
+ retryTime < OBSCommonUtils.MAX_RETRY_TIME; retryTime++) {
+ try {
+ return innerGetFileStatus(f);
+ } catch (FileNotFoundException | FileConflictException e) {
+ throw e;
+ } catch (IOException e) {
+ LOG.warn("Failed to get file status for [{}], retry time [{}], "
+ + "exception [{}]", f, retryTime, e);
+
+ try {
+ Thread.sleep(OBSCommonUtils.DELAY_TIME);
+ } catch (InterruptedException ie) {
+ throw e;
+ }
+ }
+ }
+
+ return innerGetFileStatus(f);
+ }
+
+ /**
+ * Inner implementation without retry for {@link #getFileStatus(Path)}.
+ *
+ * @param f the path we want information from
+ * @return a FileStatus object
+ * @throws IOException on IO failure
+ */
+ @VisibleForTesting
+ OBSFileStatus innerGetFileStatus(final Path f) throws IOException {
+ if (enablePosix) {
+ return OBSPosixBucketUtils.innerFsGetObjectStatus(this, f);
+ }
+
+ return OBSObjectBucketUtils.innerGetObjectStatus(this, f);
+ }
+
+ /**
+ * Return the {@link ContentSummary} of a given {@link Path}.
+ *
+ * @param f path to use
+ * @return the {@link ContentSummary}
+ * @throws FileNotFoundException if the path does not resolve
+ * @throws IOException IO failure
+ */
+ @Override
+ public ContentSummary getContentSummary(final Path f)
+ throws FileNotFoundException, IOException {
+ if (!obsContentSummaryEnable) {
+ return super.getContentSummary(f);
+ }
+
+ FileStatus status = getFileStatus(f);
+ if (status.isFile()) {
+ // f is a file
+ long length = status.getLen();
+ return new ContentSummary.Builder().length(length)
+ .fileCount(1).directoryCount(0).spaceConsumed(length).build();
+ }
+
+ // f is a directory
+ if (enablePosix) {
+ return OBSPosixBucketUtils.fsGetDirectoryContentSummary(this,
+ OBSCommonUtils.pathToKey(this, f));
+ } else {
+ return OBSObjectBucketUtils.getDirectoryContentSummary(this,
+ OBSCommonUtils.pathToKey(this, f));
+ }
+ }
+
+ /**
+ * Copy the {@code src} file on the local disk to the filesystem at the given
+ * {@code dst} name.
+ *
+ * @param delSrc whether to delete the src
+ * @param overwrite whether to overwrite an existing file
+ * @param src path
+ * @param dst path
+ * @throws FileAlreadyExistsException if the destination file exists and
+ * overwrite == false
+ * @throws IOException IO problem
+ */
+ @Override
+ public void copyFromLocalFile(final boolean delSrc, final boolean overwrite,
+ final Path src, final Path dst) throws FileAlreadyExistsException,
+ IOException {
+ try {
+ super.copyFromLocalFile(delSrc, overwrite, src, dst);
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException(
+ "copyFromLocalFile(" + src + ", " + dst + ")", src, e);
+ }
+ }
+
+ /**
+ * Close the filesystem. This shuts down all transfers.
+ *
+ * @throws IOException IO problem
+ */
+ @Override
+ public void close() throws IOException {
+ LOG.debug("This Filesystem closed by user, clear resource.");
+ if (closed.getAndSet(true)) {
+ // already closed
+ return;
+ }
+
+ try {
+ super.close();
+ } finally {
+ OBSCommonUtils.shutdownAll(
+ boundedMultipartUploadThreadPool,
+ boundedCopyThreadPool,
+ boundedDeleteThreadPool,
+ boundedCopyPartThreadPool,
+ boundedListThreadPool);
+ }
+ }
+
+ /**
+ * Override {@code getCanonicalServiceName} and return {@code null} since
+ * delegation token is not supported.
+ */
+ @Override
+ public String getCanonicalServiceName() {
+ // Does not support Token
+ return null;
+ }
+
+ /**
+ * Return copy part size.
+ *
+ * @return copy part size
+ */
+ long getCopyPartSize() {
+ return copyPartSize;
+ }
+
+ /**
+ * Return bounded thread pool for copy part.
+ *
+ * @return the bounded thread pool for copy part
+ */
+ ThreadPoolExecutor getBoundedCopyPartThreadPool() {
+ return boundedCopyPartThreadPool;
+ }
+
+ /**
+ * Return bounded thread pool for copy.
+ *
+ * @return the bounded thread pool for copy
+ */
+ ThreadPoolExecutor getBoundedCopyThreadPool() {
+ return boundedCopyThreadPool;
+ }
+
+ /**
+ * Imitate HDFS to return the number of bytes that large input files should be
+ * optimally split into to minimize I/O time for compatibility.
+ *
+ * @deprecated use {@link #getDefaultBlockSize(Path)} instead
+ */
+ @Override
+ public long getDefaultBlockSize() {
+ return blockSize;
+ }
+
+ /**
+ * Imitate HDFS to return the number of bytes that large input files should be
+ * optimally split into to minimize I/O time. The given path will be used to
+ * locate the actual filesystem. The full path does not have to exist.
+ *
+ * @param f path of file
+ * @return the default block size for the path's filesystem
+ */
+ @Override
+ public long getDefaultBlockSize(final Path f) {
+ return blockSize;
+ }
+
+ /**
+ * Return a string that describes this filesystem instance.
+ *
+ * @return the string
+ */
+ @Override
+ public String toString() {
+ final StringBuilder sb = new StringBuilder("OBSFileSystem{");
+ sb.append("uri=").append(uri);
+ sb.append(", workingDir=").append(workingDir);
+ sb.append(", partSize=").append(partSize);
+ sb.append(", enableMultiObjectsDelete=")
+ .append(enableMultiObjectDelete);
+ sb.append(", maxKeys=").append(maxKeys);
+ if (cannedACL != null) {
+ sb.append(", cannedACL=").append(cannedACL.toString());
+ }
+ sb.append(", readAheadRange=").append(readAheadRange);
+ sb.append(", blockSize=").append(getDefaultBlockSize());
+ if (blockFactory != null) {
+ sb.append(", blockFactory=").append(blockFactory);
+ }
+ sb.append(", boundedMultipartUploadThreadPool=")
+ .append(boundedMultipartUploadThreadPool);
+ sb.append(", statistics {").append(statistics).append("}");
+ sb.append(", metrics {").append("}");
+ sb.append('}');
+ return sb.toString();
+ }
+
+ /**
+ * Return the maximum number of keys to get while paging through a directory
+ * listing.
+ *
+ * @return the maximum number of keys
+ */
+ int getMaxKeys() {
+ return maxKeys;
+ }
+
+ /**
+ * List the statuses and block locations of the files in the given path. Does
+ * not guarantee to return the iterator that traverses statuses of the files
+ * in a sorted order.
+ *
+ *
+ * If the path is a directory,
+ * if recursive is false, returns files in the directory;
+ * if recursive is true, return files in the subtree rooted at the path.
+ * If the path is a file, return the file's status and block locations.
+ *
+ *
+ * @param f a path
+ * @param recursive if the subdirectories need to be traversed recursively
+ * @return an iterator that traverses statuses of the files/directories in the
+ * given path
+ * @throws FileNotFoundException if {@code path} does not exist
+ * @throws IOException if any I/O error occurred
+ */
+ @Override
+ public RemoteIterator listFiles(final Path f,
+ final boolean recursive)
+ throws FileNotFoundException, IOException {
+ Path path = OBSCommonUtils.qualify(this, f);
+ LOG.debug("listFiles({}, {})", path, recursive);
+ try {
+ // lookup dir triggers existence check
+ final FileStatus fileStatus = getFileStatus(path);
+ if (fileStatus.isFile()) {
+ // simple case: File
+ LOG.debug("Path is a file");
+ return new OBSListing
+ .SingleStatusRemoteIterator(
+ OBSCommonUtils.toLocatedFileStatus(this, fileStatus));
+ } else {
+ LOG.debug(
+ "listFiles: doing listFiles of directory {} - recursive {}",
+ path, recursive);
+ // directory: do a bulk operation
+ String key = OBSCommonUtils.maybeAddTrailingSlash(
+ OBSCommonUtils.pathToKey(this, path));
+ String delimiter = recursive ? null : "/";
+ LOG.debug("Requesting all entries under {} with delimiter '{}'",
+ key, delimiter);
+ return obsListing.createLocatedFileStatusIterator(
+ obsListing.createFileStatusListingIterator(
+ path,
+ OBSCommonUtils.createListObjectsRequest(this, key,
+ delimiter),
+ OBSListing.ACCEPT_ALL,
+ new OBSListing.AcceptFilesOnly(path)));
+ }
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException("listFiles", path, e);
+ }
+ }
+
+ /**
+ * List the statuses of the files/directories in the given path if the path is
+ * a directory. Return the file's status and block locations If the path is a
+ * file.
+ *
+ * If a returned status is a file, it contains the file's block locations.
+ *
+ * @param f is the path
+ * @return an iterator that traverses statuses of the files/directories in the
+ * given path
+ * @throws FileNotFoundException If f does not exist
+ * @throws IOException If an I/O error occurred
+ */
+ @Override
+ public RemoteIterator listLocatedStatus(final Path f)
+ throws FileNotFoundException, IOException {
+ return listLocatedStatus(f,
+ OBSListing.ACCEPT_ALL);
+ }
+
+ /**
+ * List a directory. The returned results include its block location if it is
+ * a file The results are filtered by the given path filter
+ *
+ * @param f a path
+ * @param filter a path filter
+ * @return an iterator that traverses statuses of the files/directories in the
+ * given path
+ * @throws FileNotFoundException if f does not exist
+ * @throws IOException if any I/O error occurred
+ */
+ @Override
+ public RemoteIterator listLocatedStatus(final Path f,
+ final PathFilter filter)
+ throws FileNotFoundException, IOException {
+ Path path = OBSCommonUtils.qualify(this, f);
+ LOG.debug("listLocatedStatus({}, {}", path, filter);
+ try {
+ // lookup dir triggers existence check
+ final FileStatus fileStatus = getFileStatus(path);
+ if (fileStatus.isFile()) {
+ // simple case: File
+ LOG.debug("Path is a file");
+ return new OBSListing.SingleStatusRemoteIterator(
+ filter.accept(path) ? OBSCommonUtils.toLocatedFileStatus(
+ this, fileStatus) : null);
+ } else {
+ // directory: trigger a lookup
+ String key = OBSCommonUtils.maybeAddTrailingSlash(
+ OBSCommonUtils.pathToKey(this, path));
+ return obsListing.createLocatedFileStatusIterator(
+ obsListing.createFileStatusListingIterator(
+ path,
+ OBSCommonUtils.createListObjectsRequest(this, key, "/"),
+ filter,
+ new OBSListing.AcceptAllButSelfAndS3nDirs(path)));
+ }
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException("listLocatedStatus", path,
+ e);
+ }
+ }
+
+ /**
+ * Return server-side encryption wrapper used by this filesystem instance.
+ *
+ * @return the server-side encryption wrapper
+ */
+ SseWrapper getSse() {
+ return sse;
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSFsDFSListing.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSFsDFSListing.java
new file mode 100644
index 0000000000000..bbf29df14f32c
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSFsDFSListing.java
@@ -0,0 +1,744 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import com.obs.services.model.ListObjectsRequest;
+import com.obs.services.model.ObjectListing;
+import com.obs.services.model.ObjectMetadata;
+import com.obs.services.model.ObsObject;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InterruptedIOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.ListIterator;
+import java.util.Queue;
+import java.util.Stack;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Future;
+
+/**
+ * OBS depth first search listing implementation for posix bucket.
+ */
+class OBSFsDFSListing extends ObjectListing {
+ /**
+ * Class logger.
+ */
+ private static final Logger LOG = LoggerFactory.getLogger(
+ OBSFsDFSListing.class);
+
+ static void increaseLevelStats(final List levelStatsList,
+ final int level,
+ final boolean isDir) {
+ int currMaxLevel = levelStatsList.size() - 1;
+ if (currMaxLevel < level) {
+ for (int i = 0; i < level - currMaxLevel; i++) {
+ levelStatsList.add(new LevelStats(currMaxLevel + 1 + i));
+ }
+ }
+
+ if (isDir) {
+ levelStatsList.get(level).increaseDirNum();
+ } else {
+ levelStatsList.get(level).increaseFileNum();
+ }
+ }
+
+ static String fsDFSListNextBatch(final OBSFileSystem owner,
+ final Stack listStack,
+ final Queue resultQueue,
+ final String marker,
+ final int maxKeyNum,
+ final List objectSummaries,
+ final List levelStatsList) throws IOException {
+ // 0. check if marker matches with the peek of result queue when marker
+ // is given
+ if (marker != null) {
+ if (resultQueue.isEmpty()) {
+ throw new IllegalArgumentException(
+ "result queue is empty, but marker is not empty: "
+ + marker);
+ } else if (resultQueue.peek().getType()
+ == ListEntityType.LIST_TAIL) {
+ throw new RuntimeException(
+ "cannot put list tail (" + resultQueue.peek()
+ + ") into result queue");
+ } else if (!marker.equals(
+ resultQueue.peek().getType() == ListEntityType.COMMON_PREFIX
+ ? resultQueue.peek().getCommonPrefix()
+ : resultQueue.peek().getObjectSummary().getObjectKey())) {
+ throw new IllegalArgumentException("marker (" + marker
+ + ") does not match with result queue peek ("
+ + resultQueue.peek() + ")");
+ }
+ }
+
+ // 1. fetch some list results from local result queue
+ int resultNum = fetchListResultLocally(owner.getBucket(), resultQueue,
+ maxKeyNum, objectSummaries,
+ levelStatsList);
+
+ // 2. fetch more list results by doing one-level lists in parallel
+ fetchListResultRemotely(owner, listStack, resultQueue, maxKeyNum,
+ objectSummaries, levelStatsList, resultNum);
+
+ // 3. check if list operation ends
+ if (!listStack.empty() && resultQueue.isEmpty()) {
+ throw new RuntimeException(
+ "result queue is empty, but list stack is not empty: "
+ + listStack);
+ }
+
+ String nextMarker = null;
+ if (!resultQueue.isEmpty()) {
+ if (resultQueue.peek().getType() == ListEntityType.LIST_TAIL) {
+ throw new RuntimeException(
+ "cannot put list tail (" + resultQueue.peek()
+ + ") into result queue");
+ } else {
+ nextMarker =
+ resultQueue.peek().getType() == ListEntityType.COMMON_PREFIX
+ ? resultQueue
+ .peek().getCommonPrefix()
+ : resultQueue.peek().getObjectSummary().getObjectKey();
+ }
+ }
+ return nextMarker;
+ }
+
+ static void fetchListResultRemotely(final OBSFileSystem owner,
+ final Stack listStack,
+ final Queue resultQueue, final int maxKeyNum,
+ final List objectSummaries,
+ final List levelStatsList,
+ final int resultNum) throws IOException {
+ int newResultNum = resultNum;
+ while (!listStack.empty() && (newResultNum < maxKeyNum
+ || resultQueue.isEmpty())) {
+ List oneLevelListRequests = new ArrayList<>();
+ List> oneLevelListFutures = new ArrayList<>();
+ List levels = new ArrayList<>();
+ List oneLevelObjectListings = new ArrayList<>();
+ // a. submit some one-level list tasks in parallel
+ submitOneLevelListTasks(owner, listStack, maxKeyNum,
+ oneLevelListRequests, oneLevelListFutures, levels);
+
+ // b. wait these tasks to complete
+ waitForOneLevelListTasksFinished(oneLevelListRequests,
+ oneLevelListFutures, oneLevelObjectListings);
+
+ // c. put subdir/file into result commonPrefixes and
+ // objectSummaries;if the number of results reaches maxKeyNum,
+ // cache it into resultQueue for next list batch note: unlike
+ // standard DFS, we put subdir directly into result list to avoid
+ // caching it using more space
+ newResultNum = handleOneLevelListTaskResult(resultQueue, maxKeyNum,
+ objectSummaries, levelStatsList, newResultNum,
+ oneLevelListRequests, levels, oneLevelObjectListings);
+
+ // d. push subdirs and list continuing tail/end into list stack in
+ // reversed order,so that we can pop them from the stack in order
+ // later
+ addNewListStackEntities(listStack, oneLevelListRequests, levels,
+ oneLevelObjectListings);
+ }
+ }
+
+ @SuppressWarnings("checkstyle:parameternumber")
+ static int handleOneLevelListTaskResult(final Queue resultQueue,
+ final int maxKeyNum,
+ final List objectSummaries,
+ final List levelStatsList,
+ final int resultNum,
+ final List oneLevelListRequests,
+ final List levels,
+ final List oneLevelObjectListings) {
+ int newResultNum = resultNum;
+ for (int i = 0; i < oneLevelObjectListings.size(); i++) {
+ LOG.debug(
+ "one level listing with prefix=" + oneLevelListRequests.get(i)
+ .getPrefix()
+ + ", marker=" + (
+ oneLevelListRequests.get(i).getMarker() != null
+ ? oneLevelListRequests.get(i)
+ .getMarker()
+ : ""));
+
+ ObjectListing oneLevelObjectListing = oneLevelObjectListings.get(i);
+ LOG.debug("# of CommonPrefixes/Objects: {}/{}",
+ oneLevelObjectListing.getCommonPrefixes().size(),
+ oneLevelObjectListing.getObjects().size());
+
+ if (oneLevelObjectListing.getCommonPrefixes().isEmpty()
+ && oneLevelObjectListing.getObjects().isEmpty()) {
+ continue;
+ }
+
+ for (String commonPrefix
+ : oneLevelObjectListing.getCommonPrefixes()) {
+ if (commonPrefix.equals(
+ oneLevelListRequests.get(i).getPrefix())) {
+ // skip prefix itself
+ continue;
+ }
+
+ LOG.debug("common prefix: " + commonPrefix);
+ if (newResultNum < maxKeyNum) {
+ addCommonPrefixIntoObjectList(
+ oneLevelListRequests.get(i).getBucketName(),
+ objectSummaries,
+ commonPrefix);
+ increaseLevelStats(levelStatsList, levels.get(i), true);
+ newResultNum++;
+ } else {
+ resultQueue.add(
+ new ListEntity(commonPrefix, levels.get(i)));
+ }
+ }
+
+ for (ObsObject obj : oneLevelObjectListing.getObjects()) {
+ if (obj.getObjectKey()
+ .equals(oneLevelListRequests.get(i).getPrefix())) {
+ // skip prefix itself
+ continue;
+ }
+
+ LOG.debug("object: {}, size: {}", obj.getObjectKey(),
+ obj.getMetadata().getContentLength());
+ if (newResultNum < maxKeyNum) {
+ objectSummaries.add(obj);
+ increaseLevelStats(levelStatsList, levels.get(i),
+ obj.getObjectKey().endsWith("/"));
+ newResultNum++;
+ } else {
+ resultQueue.add(new ListEntity(obj, levels.get(i)));
+ }
+ }
+ }
+ return newResultNum;
+ }
+
+ static void waitForOneLevelListTasksFinished(
+ final List oneLevelListRequests,
+ final List> oneLevelListFutures,
+ final List oneLevelObjectListings)
+ throws IOException {
+ for (int i = 0; i < oneLevelListFutures.size(); i++) {
+ try {
+ oneLevelObjectListings.add(oneLevelListFutures.get(i).get());
+ } catch (InterruptedException e) {
+ LOG.warn("Interrupted while listing using DFS, prefix="
+ + oneLevelListRequests.get(i).getPrefix() + ", marker="
+ + (oneLevelListRequests.get(i).getMarker() != null
+ ? oneLevelListRequests.get(i).getMarker()
+ : ""));
+ throw new InterruptedIOException(
+ "Interrupted while listing using DFS, prefix="
+ + oneLevelListRequests.get(i).getPrefix() + ", marker="
+ + (oneLevelListRequests.get(i).getMarker() != null
+ ? oneLevelListRequests.get(i).getMarker()
+ : ""));
+ } catch (ExecutionException e) {
+ LOG.error("Exception while listing using DFS, prefix="
+ + oneLevelListRequests.get(i).getPrefix() + ", marker="
+ + (oneLevelListRequests.get(i).getMarker() != null
+ ? oneLevelListRequests.get(i).getMarker()
+ : ""),
+ e);
+ for (Future future : oneLevelListFutures) {
+ future.cancel(true);
+ }
+
+ throw OBSCommonUtils.extractException(
+ "Listing using DFS with exception, marker="
+ + (oneLevelListRequests.get(i).getMarker() != null
+ ? oneLevelListRequests.get(i).getMarker()
+ : ""),
+ oneLevelListRequests.get(i).getPrefix(), e);
+ }
+ }
+ }
+
+ static void submitOneLevelListTasks(final OBSFileSystem owner,
+ final Stack listStack, final int maxKeyNum,
+ final List oneLevelListRequests,
+ final List> oneLevelListFutures,
+ final List levels) {
+ for (int i = 0;
+ i < owner.getListParallelFactor() && !listStack.empty(); i++) {
+ ListEntity listEntity = listStack.pop();
+ if (listEntity.getType() == ListEntityType.LIST_TAIL) {
+ if (listEntity.getNextMarker() != null) {
+ ListObjectsRequest oneLevelListRequest
+ = new ListObjectsRequest();
+ oneLevelListRequest.setBucketName(owner.getBucket());
+ oneLevelListRequest.setPrefix(listEntity.getPrefix());
+ oneLevelListRequest.setMarker(listEntity.getNextMarker());
+ oneLevelListRequest.setMaxKeys(
+ Math.min(maxKeyNum, owner.getMaxKeys()));
+ oneLevelListRequest.setDelimiter("/");
+ oneLevelListRequests.add(oneLevelListRequest);
+ oneLevelListFutures.add(owner.getBoundedListThreadPool()
+ .submit(() -> OBSCommonUtils.commonContinueListObjects(
+ owner, oneLevelListRequest)));
+ levels.add(listEntity.getLevel());
+ }
+
+ // avoid adding list tasks in different levels later
+ break;
+ } else {
+ String oneLevelListPrefix =
+ listEntity.getType() == ListEntityType.COMMON_PREFIX
+ ? listEntity.getCommonPrefix()
+ : listEntity.getObjectSummary().getObjectKey();
+ ListObjectsRequest oneLevelListRequest = OBSCommonUtils
+ .createListObjectsRequest(owner, oneLevelListPrefix, "/",
+ maxKeyNum);
+ oneLevelListRequests.add(oneLevelListRequest);
+ oneLevelListFutures.add(owner.getBoundedListThreadPool()
+ .submit(() -> OBSCommonUtils.commonListObjects(owner,
+ oneLevelListRequest)));
+ levels.add(listEntity.getLevel() + 1);
+ }
+ }
+ }
+
+ static void addNewListStackEntities(final Stack listStack,
+ final List oneLevelListRequests,
+ final List levels,
+ final List oneLevelObjectListings) {
+ for (int i = oneLevelObjectListings.size() - 1; i >= 0; i--) {
+ ObjectListing oneLevelObjectListing = oneLevelObjectListings.get(i);
+
+ if (oneLevelObjectListing.getCommonPrefixes().isEmpty()
+ && oneLevelObjectListing.getObjects()
+ .isEmpty()) {
+ continue;
+ }
+
+ listStack.push(new ListEntity(oneLevelObjectListing.getPrefix(),
+ oneLevelObjectListing.isTruncated()
+ ? oneLevelObjectListing.getNextMarker()
+ : null,
+ levels.get(i)));
+
+ ListIterator commonPrefixListIterator
+ = oneLevelObjectListing.getCommonPrefixes()
+ .listIterator(oneLevelObjectListing.getCommonPrefixes().size());
+ while (commonPrefixListIterator.hasPrevious()) {
+ String commonPrefix = commonPrefixListIterator.previous();
+
+ if (commonPrefix.equals(
+ oneLevelListRequests.get(i).getPrefix())) {
+ // skip prefix itself
+ continue;
+ }
+
+ listStack.push(new ListEntity(commonPrefix, levels.get(i)));
+ }
+
+ ListIterator objectSummaryListIterator
+ = oneLevelObjectListing.getObjects()
+ .listIterator(oneLevelObjectListing.getObjects().size());
+ while (objectSummaryListIterator.hasPrevious()) {
+ ObsObject objectSummary = objectSummaryListIterator.previous();
+
+ if (objectSummary.getObjectKey()
+ .equals(oneLevelListRequests.get(i).getPrefix())) {
+ // skip prefix itself
+ continue;
+ }
+
+ if (objectSummary.getObjectKey().endsWith("/")) {
+ listStack.push(
+ new ListEntity(objectSummary, levels.get(i)));
+ }
+ }
+ }
+ }
+
+ static int fetchListResultLocally(final String bucketName,
+ final Queue resultQueue, final int maxKeyNum,
+ final List objectSummaries,
+ final List levelStatsList) {
+ int resultNum = 0;
+ while (!resultQueue.isEmpty() && resultNum < maxKeyNum) {
+ ListEntity listEntity = resultQueue.poll();
+ if (listEntity.getType() == ListEntityType.LIST_TAIL) {
+ throw new RuntimeException("cannot put list tail (" + listEntity
+ + ") into result queue");
+ } else if (listEntity.getType() == ListEntityType.COMMON_PREFIX) {
+ addCommonPrefixIntoObjectList(bucketName, objectSummaries,
+ listEntity.getCommonPrefix());
+ increaseLevelStats(levelStatsList, listEntity.getLevel(), true);
+ resultNum++;
+ } else {
+ objectSummaries.add(listEntity.getObjectSummary());
+ increaseLevelStats(levelStatsList, listEntity.getLevel(),
+ listEntity.getObjectSummary().getObjectKey().endsWith("/"));
+ resultNum++;
+ }
+ }
+ return resultNum;
+ }
+
+ static void addCommonPrefixIntoObjectList(final String bucketName,
+ final List objectSummaries,
+ final String commonPrefix) {
+ ObsObject objectSummary = new ObsObject();
+ ObjectMetadata objectMetadata = new ObjectMetadata();
+ objectMetadata.setContentLength(0L);
+ objectSummary.setBucketName(bucketName);
+ objectSummary.setObjectKey(commonPrefix);
+ objectSummary.setMetadata(objectMetadata);
+ objectSummaries.add(objectSummary);
+ }
+
+ static OBSFsDFSListing fsDFSListObjects(final OBSFileSystem owner,
+ final ListObjectsRequest request) throws IOException {
+ List objectSummaries = new ArrayList<>();
+ List commonPrefixes = new ArrayList<>();
+ String bucketName = owner.getBucket();
+ String prefix = request.getPrefix();
+ int maxKeyNum = request.getMaxKeys();
+ if (request.getDelimiter() != null) {
+ throw new IllegalArgumentException(
+ "illegal delimiter: " + request.getDelimiter());
+ }
+ if (request.getMarker() != null) {
+ throw new IllegalArgumentException(
+ "illegal marker: " + request.getMarker());
+ }
+
+ Stack listStack = new Stack<>();
+ Queue resultQueue = new LinkedList<>();
+ List levelStatsList = new ArrayList<>();
+
+ listStack.push(new ListEntity(prefix, 0));
+ increaseLevelStats(levelStatsList, 0, true);
+
+ String nextMarker = fsDFSListNextBatch(owner, listStack, resultQueue,
+ null, maxKeyNum, objectSummaries,
+ levelStatsList);
+
+ if (nextMarker == null) {
+ StringBuilder levelStatsStringBuilder = new StringBuilder();
+ levelStatsStringBuilder.append("bucketName=").append(bucketName)
+ .append(", prefix=").append(prefix).append(": ");
+ for (LevelStats levelStats : levelStatsList) {
+ levelStatsStringBuilder.append("level=")
+ .append(levelStats.getLevel())
+ .append(", dirNum=")
+ .append(levelStats.getDirNum())
+ .append(", fileNum=")
+ .append(levelStats.getFileNum())
+ .append("; ");
+ }
+ LOG.debug("[list level statistics info] "
+ + levelStatsStringBuilder.toString());
+ }
+
+ return new OBSFsDFSListing(request,
+ objectSummaries,
+ commonPrefixes,
+ nextMarker,
+ listStack,
+ resultQueue,
+ levelStatsList);
+ }
+
+ static OBSFsDFSListing fsDFSContinueListObjects(final OBSFileSystem owner,
+ final OBSFsDFSListing obsFsDFSListing)
+ throws IOException {
+ List objectSummaries = new ArrayList<>();
+ List commonPrefixes = new ArrayList<>();
+ String bucketName = owner.getBucket();
+ String prefix = obsFsDFSListing.getPrefix();
+ String marker = obsFsDFSListing.getNextMarker();
+ int maxKeyNum = obsFsDFSListing.getMaxKeys();
+ if (obsFsDFSListing.getDelimiter() != null) {
+ throw new IllegalArgumentException(
+ "illegal delimiter: " + obsFsDFSListing.getDelimiter());
+ }
+
+ Stack listStack = obsFsDFSListing.getListStack();
+ Queue resultQueue = obsFsDFSListing.getResultQueue();
+ List levelStatsList = obsFsDFSListing.getLevelStatsList();
+
+ String nextMarker = fsDFSListNextBatch(owner, listStack, resultQueue,
+ marker, maxKeyNum, objectSummaries,
+ levelStatsList);
+
+ if (nextMarker == null) {
+ StringBuilder levelStatsStringBuilder = new StringBuilder();
+ levelStatsStringBuilder.append("bucketName=").append(bucketName)
+ .append(", prefix=").append(prefix).append(": ");
+ for (LevelStats levelStats : levelStatsList) {
+ levelStatsStringBuilder.append("level=")
+ .append(levelStats.getLevel())
+ .append(", dirNum=")
+ .append(levelStats.getDirNum())
+ .append(", fileNum=")
+ .append(levelStats.getFileNum())
+ .append("; ");
+ }
+ LOG.debug("[list level statistics info] "
+ + levelStatsStringBuilder.toString());
+ }
+
+ return new OBSFsDFSListing(obsFsDFSListing,
+ objectSummaries,
+ commonPrefixes,
+ nextMarker,
+ listStack,
+ resultQueue,
+ levelStatsList);
+ }
+
+ /**
+ * List entity type definition.
+ */
+ enum ListEntityType {
+ /**
+ * Common prefix.
+ */
+ COMMON_PREFIX,
+ /**
+ * Object summary.
+ */
+ OBJECT_SUMMARY,
+ /**
+ * List tail.
+ */
+ LIST_TAIL
+ }
+
+ /**
+ * List entity for OBS depth first search listing.
+ */
+ static class ListEntity {
+ /**
+ * List entity type.
+ */
+ private ListEntityType type;
+
+ /**
+ * Entity level.
+ */
+ private final int level;
+
+ /**
+ * For COMMON_PREFIX.
+ */
+ private String commonPrefix = null;
+
+ /**
+ * For OBJECT_SUMMARY.
+ */
+ private ObsObject objectSummary = null;
+
+ /**
+ * For LIST_TAIL.
+ */
+ private String prefix = null;
+
+ /**
+ * Next marker.
+ */
+ private String nextMarker = null;
+
+ ListEntity(final String comPrefix, final int entityLevel) {
+ this.type = ListEntityType.COMMON_PREFIX;
+ this.commonPrefix = comPrefix;
+ this.level = entityLevel;
+ }
+
+ ListEntity(final ObsObject summary, final int entityLevel) {
+ this.type = ListEntityType.OBJECT_SUMMARY;
+ this.objectSummary = summary;
+ this.level = entityLevel;
+ }
+
+ ListEntity(final String pf, final String nextMk,
+ final int entityLevel) {
+ this.type = ListEntityType.LIST_TAIL;
+ this.prefix = pf;
+ this.nextMarker = nextMk;
+ this.level = entityLevel;
+ }
+
+ ListEntityType getType() {
+ return type;
+ }
+
+ int getLevel() {
+ return level;
+ }
+
+ String getCommonPrefix() {
+ return commonPrefix;
+ }
+
+ ObsObject getObjectSummary() {
+ return objectSummary;
+ }
+
+ public String getPrefix() {
+ return prefix;
+ }
+
+ String getNextMarker() {
+ return nextMarker;
+ }
+
+ @Override
+ public String toString() {
+ return "type: " + type
+ + ", commonPrefix: " + (commonPrefix != null
+ ? commonPrefix
+ : "")
+ + ", objectSummary: " + (objectSummary != null
+ ? objectSummary
+ : "")
+ + ", prefix: " + (prefix != null ? prefix : "")
+ + ", nextMarker: " + (nextMarker != null ? nextMarker : "");
+ }
+ }
+
+ /**
+ * Level statistics for OBS depth first search listing.
+ */
+ static class LevelStats {
+ /**
+ * Entity level.
+ */
+ private int level;
+
+ /**
+ * Directory num.
+ */
+ private long dirNum;
+
+ /**
+ * File num.
+ */
+ private long fileNum;
+
+ LevelStats(final int entityLevel) {
+ this.level = entityLevel;
+ this.dirNum = 0;
+ this.fileNum = 0;
+ }
+
+ void increaseDirNum() {
+ dirNum++;
+ }
+
+ void increaseFileNum() {
+ fileNum++;
+ }
+
+ int getLevel() {
+ return level;
+ }
+
+ long getDirNum() {
+ return dirNum;
+ }
+
+ long getFileNum() {
+ return fileNum;
+ }
+ }
+
+ /**
+ * Stack of entity list..
+ */
+ private Stack listStack;
+
+ /**
+ * Queue of entity list.
+ */
+ private Queue resultQueue;
+
+ /**
+ * List of levelStats.
+ */
+ private List levelStatsList;
+
+ OBSFsDFSListing(final ListObjectsRequest request,
+ final List objectSummaries,
+ final List commonPrefixes,
+ final String nextMarker,
+ final Stack listEntityStack,
+ final Queue listEntityQueue,
+ final List listLevelStats) {
+ super(objectSummaries,
+ commonPrefixes,
+ request.getBucketName(),
+ nextMarker != null,
+ request.getPrefix(),
+ null,
+ request.getMaxKeys(),
+ null,
+ nextMarker,
+ null);
+ this.listStack = listEntityStack;
+ this.resultQueue = listEntityQueue;
+ this.levelStatsList = listLevelStats;
+ }
+
+ OBSFsDFSListing(final OBSFsDFSListing obsFsDFSListing,
+ final List objectSummaries,
+ final List commonPrefixes,
+ final String nextMarker,
+ final Stack listEntityStack,
+ final Queue listEntityQueue,
+ final List listLevelStats) {
+ super(objectSummaries,
+ commonPrefixes,
+ obsFsDFSListing.getBucketName(),
+ nextMarker != null,
+ obsFsDFSListing.getPrefix(),
+ obsFsDFSListing.getNextMarker(),
+ obsFsDFSListing.getMaxKeys(),
+ null,
+ nextMarker,
+ null);
+ this.listStack = listEntityStack;
+ this.resultQueue = listEntityQueue;
+ this.levelStatsList = listLevelStats;
+ }
+
+ Stack getListStack() {
+ return listStack;
+ }
+
+ Queue getResultQueue() {
+ return resultQueue;
+ }
+
+ List getLevelStatsList() {
+ return levelStatsList;
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSIOException.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSIOException.java
new file mode 100644
index 0000000000000..3f99fd610efa5
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSIOException.java
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.util.Preconditions;
+import com.obs.services.exception.ObsException;
+
+import java.io.IOException;
+
+/**
+ * IOException equivalent to {@link ObsException}.
+ */
+class OBSIOException extends IOException {
+ private static final long serialVersionUID = -1582681108285856259L;
+
+ /**
+ * Peration message.
+ */
+ private final String operation;
+
+ OBSIOException(final String operationMsg, final ObsException cause) {
+ super(cause);
+ Preconditions.checkArgument(operationMsg != null,
+ "Null 'operation' argument");
+ Preconditions.checkArgument(cause != null, "Null 'cause' argument");
+ this.operation = operationMsg;
+ }
+
+ public ObsException getCause() {
+ return (ObsException) super.getCause();
+ }
+
+ @Override
+ public String getMessage() {
+ return operation + ": " + getCause().getErrorMessage()
+ + ", detailMessage: " + super.getMessage();
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSInputStream.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSInputStream.java
new file mode 100644
index 0000000000000..3f7e9888889b5
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSInputStream.java
@@ -0,0 +1,1047 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.util.Preconditions;
+import com.obs.services.ObsClient;
+import com.obs.services.exception.ObsException;
+import com.obs.services.model.GetObjectRequest;
+import com.sun.istack.NotNull;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.ByteBufferReadable;
+import org.apache.hadoop.fs.CanSetReadahead;
+import org.apache.hadoop.fs.FSExceptionMessages;
+import org.apache.hadoop.fs.FSInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+
+import static org.apache.hadoop.fs.obs.OBSCommonUtils.translateException;
+
+/**
+ * Input stream for an OBS object.
+ *
+ *
As this stream seeks withing an object, it may close then re-open the
+ * stream. When this happens, any updated stream data may be retrieved, and,
+ * given the consistency model of Huawei OBS, outdated data may in fact be
+ * picked up.
+ *
+ *
As a result, the outcome of reading from a stream of an object which is
+ * actively manipulated during the read process is "undefined".
+ *
+ *
The class is marked as private as code should not be creating instances
+ * themselves. Any extra feature (e.g instrumentation) should be considered
+ * unstable.
+ *
+ *
Because it prints some of the state of the instrumentation, the output of
+ * {@link #toString()} must also be considered unstable.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Evolving
+class OBSInputStream extends FSInputStream
+ implements CanSetReadahead, ByteBufferReadable {
+ /**
+ * Class logger.
+ */
+ public static final Logger LOG = LoggerFactory.getLogger(
+ OBSInputStream.class);
+
+ /**
+ * Read retry times.
+ */
+ private static final int READ_RETRY_TIME = 3;
+
+ /**
+ * Seek retry times.
+ */
+ private static final int SEEK_RETRY_TIME = 9;
+
+ /**
+ * Delay times.
+ */
+ private static final long DELAY_TIME = 10;
+
+ /**
+ * The statistics for OBS file system.
+ */
+ private final FileSystem.Statistics statistics;
+
+ /**
+ * Obs client.
+ */
+ private final ObsClient client;
+
+ /**
+ * Bucket name.
+ */
+ private final String bucket;
+
+ /**
+ * Bucket key.
+ */
+ private final String key;
+
+ /**
+ * Content length.
+ */
+ private final long contentLength;
+
+ /**
+ * Object uri.
+ */
+ private final String uri;
+
+ /**
+ * Obs file system instance.
+ */
+ private OBSFileSystem fs;
+
+ /**
+ * This is the public position; the one set in {@link #seek(long)} and
+ * returned in {@link #getPos()}.
+ */
+ private long streamCurrentPos;
+
+ /**
+ * Closed bit. Volatile so reads are non-blocking. Updates must be in a
+ * synchronized block to guarantee an atomic check and set
+ */
+ private volatile boolean closed;
+
+ /**
+ * Input stream.
+ */
+ private InputStream wrappedStream = null;
+
+ /**
+ * Read ahead range.
+ */
+ private long readAheadRange = OBSConstants.DEFAULT_READAHEAD_RANGE;
+
+ /**
+ * This is the actual position within the object, used by lazy seek to decide
+ * whether to seek on the next read or not.
+ */
+ private long nextReadPos;
+
+ /**
+ * The end of the content range of the last request. This is an absolute value
+ * of the range, not a length field.
+ */
+ private long contentRangeFinish;
+
+ /**
+ * The start of the content range of the last request.
+ */
+ private long contentRangeStart;
+
+ OBSInputStream(
+ final String bucketName,
+ final String bucketKey,
+ final long fileStatusLength,
+ final ObsClient obsClient,
+ final FileSystem.Statistics stats,
+ final long readaheadRange,
+ final OBSFileSystem obsFileSystem) {
+ Preconditions.checkArgument(StringUtils.isNotEmpty(bucketName),
+ "No Bucket");
+ Preconditions.checkArgument(StringUtils.isNotEmpty(bucketKey),
+ "No Key");
+ Preconditions.checkArgument(fileStatusLength >= 0,
+ "Negative content length");
+ this.bucket = bucketName;
+ this.key = bucketKey;
+ this.contentLength = fileStatusLength;
+ this.client = obsClient;
+ this.statistics = stats;
+ this.uri = "obs://" + this.bucket + "/" + this.key;
+ this.fs = obsFileSystem;
+ setReadahead(readaheadRange);
+ }
+
+ /**
+ * Calculate the limit for a get request, based on input policy and state of
+ * object.
+ *
+ * @param targetPos position of the read
+ * @param length length of bytes requested; if less than zero
+ * "unknown"
+ * @param contentLength total length of file
+ * @param readahead current readahead value
+ * @return the absolute value of the limit of the request.
+ */
+ static long calculateRequestLimit(
+ final long targetPos, final long length, final long contentLength,
+ final long readahead) {
+ // cannot read past the end of the object
+ return Math.min(contentLength, length < 0 ? contentLength
+ : targetPos + Math.max(readahead, length));
+ }
+
+ /**
+ * Opens up the stream at specified target position and for given length.
+ *
+ * @param reason reason for reopen
+ * @param targetPos target position
+ * @param length length requested
+ * @throws IOException on any failure to open the object
+ */
+ private synchronized void reopen(final String reason, final long targetPos,
+ final long length)
+ throws IOException {
+ long startTime = System.currentTimeMillis();
+ long threadId = Thread.currentThread().getId();
+ if (wrappedStream != null) {
+ closeStream("reopen(" + reason + ")", contentRangeFinish);
+ }
+
+ contentRangeFinish =
+ calculateRequestLimit(targetPos, length, contentLength,
+ readAheadRange);
+
+ try {
+ GetObjectRequest request = new GetObjectRequest(bucket, key);
+ request.setRangeStart(targetPos);
+ request.setRangeEnd(contentRangeFinish);
+ if (fs.getSse().isSseCEnable()) {
+ request.setSseCHeader(fs.getSse().getSseCHeader());
+ }
+ wrappedStream = client.getObject(request).getObjectContent();
+ contentRangeStart = targetPos;
+ if (wrappedStream == null) {
+ throw new IOException(
+ "Null IO stream from reopen of (" + reason + ") " + uri);
+ }
+ } catch (ObsException e) {
+ throw translateException("Reopen at position " + targetPos, uri, e);
+ }
+
+ this.streamCurrentPos = targetPos;
+ long endTime = System.currentTimeMillis();
+ LOG.debug(
+ "reopen({}) for {} range[{}-{}], length={},"
+ + " streamPosition={}, nextReadPosition={}, thread={}, "
+ + "timeUsedInMilliSec={}",
+ uri,
+ reason,
+ targetPos,
+ contentRangeFinish,
+ length,
+ streamCurrentPos,
+ nextReadPos,
+ threadId,
+ endTime - startTime
+ );
+ }
+
+ @Override
+ public synchronized long getPos() {
+ return nextReadPos < 0 ? 0 : nextReadPos;
+ }
+
+ @Override
+ public synchronized void seek(final long targetPos) throws IOException {
+ checkNotClosed();
+
+ // Do not allow negative seek
+ if (targetPos < 0) {
+ throw new EOFException(
+ FSExceptionMessages.NEGATIVE_SEEK + " " + targetPos);
+ }
+
+ if (this.contentLength <= 0) {
+ return;
+ }
+
+ // Lazy seek
+ nextReadPos = targetPos;
+ }
+
+ /**
+ * Seek without raising any exception. This is for use in {@code finally}
+ * clauses
+ *
+ * @param positiveTargetPos a target position which must be positive.
+ */
+ private void seekQuietly(final long positiveTargetPos) {
+ try {
+ seek(positiveTargetPos);
+ } catch (IOException ioe) {
+ LOG.debug("Ignoring IOE on seek of {} to {}", uri,
+ positiveTargetPos, ioe);
+ }
+ }
+
+ /**
+ * Adjust the stream to a specific position.
+ *
+ * @param targetPos target seek position
+ * @throws IOException on any failure to seek
+ */
+ private void seekInStream(final long targetPos) throws IOException {
+ checkNotClosed();
+ if (wrappedStream == null) {
+ return;
+ }
+ // compute how much more to skip
+ long diff = targetPos - streamCurrentPos;
+ if (diff > 0) {
+ // forward seek -this is where data can be skipped
+
+ int available = wrappedStream.available();
+ // always seek at least as far as what is available
+ long forwardSeekRange = Math.max(readAheadRange, available);
+ // work out how much is actually left in the stream
+ // then choose whichever comes first: the range or the EOF
+ long remainingInCurrentRequest = remainingInCurrentRequest();
+
+ long forwardSeekLimit = Math.min(remainingInCurrentRequest,
+ forwardSeekRange);
+ boolean skipForward = remainingInCurrentRequest > 0
+ && diff <= forwardSeekLimit;
+ if (skipForward) {
+ // the forward seek range is within the limits
+ LOG.debug("Forward seek on {}, of {} bytes", uri, diff);
+ long skippedOnce = wrappedStream.skip(diff);
+ while (diff > 0 && skippedOnce > 0) {
+ streamCurrentPos += skippedOnce;
+ diff -= skippedOnce;
+ incrementBytesRead(skippedOnce);
+ skippedOnce = wrappedStream.skip(diff);
+ }
+
+ if (streamCurrentPos == targetPos) {
+ // all is well
+ return;
+ } else {
+ // log a warning; continue to attempt to re-open
+ LOG.info("Failed to seek on {} to {}. Current position {}",
+ uri, targetPos, streamCurrentPos);
+ }
+ }
+ } else if (diff == 0 && remainingInCurrentRequest() > 0) {
+ // targetPos == streamCurrentPos
+ // if there is data left in the stream, keep going
+ return;
+ }
+
+ // if the code reaches here, the stream needs to be reopened.
+ // close the stream; if read the object will be opened at the
+ // new streamCurrentPos
+ closeStream("seekInStream()", this.contentRangeFinish);
+ streamCurrentPos = targetPos;
+ }
+
+ @Override
+ public boolean seekToNewSource(final long targetPos) {
+ return false;
+ }
+
+ /**
+ * Perform lazy seek and adjust stream to correct position for reading.
+ *
+ * @param targetPos position from where data should be read
+ * @param len length of the content that needs to be read
+ * @throws IOException on any failure to lazy seek
+ */
+ private void lazySeek(final long targetPos, final long len)
+ throws IOException {
+ for (int i = 0; i < SEEK_RETRY_TIME; i++) {
+ try {
+ // For lazy seek
+ seekInStream(targetPos);
+
+ // re-open at specific location if needed
+ if (wrappedStream == null) {
+ reopen("read from new offset", targetPos, len);
+ }
+
+ break;
+ } catch (IOException e) {
+ if (wrappedStream != null) {
+ closeStream("lazySeek() seekInStream has exception ",
+ this.contentRangeFinish);
+ }
+ Throwable cause = e.getCause();
+ if (cause instanceof ObsException) {
+ ObsException obsException = (ObsException) cause;
+ int status = obsException.getResponseCode();
+ switch (status) {
+ case OBSCommonUtils.UNAUTHORIZED_CODE:
+ case OBSCommonUtils.FORBIDDEN_CODE:
+ case OBSCommonUtils.NOT_FOUND_CODE:
+ case OBSCommonUtils.GONE_CODE:
+ case OBSCommonUtils.EOF_CODE:
+ throw e;
+ default:
+ break;
+ }
+ }
+
+ LOG.warn("IOException occurred in lazySeek, retry: {}", i, e);
+ if (i == SEEK_RETRY_TIME - 1) {
+ throw e;
+ }
+ try {
+ Thread.sleep(DELAY_TIME);
+ } catch (InterruptedException ie) {
+ throw e;
+ }
+ }
+ }
+ }
+
+ /**
+ * Increment the bytes read counter if there is a stats instance and the
+ * number of bytes read is more than zero.
+ *
+ * @param bytesRead number of bytes read
+ */
+ private void incrementBytesRead(final long bytesRead) {
+ if (statistics != null && bytesRead > 0) {
+ statistics.incrementBytesRead(bytesRead);
+ }
+ }
+
+ private void sleepInLock() throws InterruptedException {
+ long start = System.currentTimeMillis();
+ long now = start;
+ while (now - start < OBSInputStream.DELAY_TIME) {
+ wait(start + OBSInputStream.DELAY_TIME - now);
+ now = System.currentTimeMillis();
+ }
+ }
+
+ @Override
+ public synchronized int read() throws IOException {
+ long startTime = System.currentTimeMillis();
+ long threadId = Thread.currentThread().getId();
+ checkNotClosed();
+ if (this.contentLength == 0 || nextReadPos >= contentLength) {
+ return -1;
+ }
+
+ int byteRead = -1;
+ try {
+ lazySeek(nextReadPos, 1);
+ } catch (EOFException e) {
+ onReadFailure(e, 1);
+ return -1;
+ }
+
+ IOException exception = null;
+ for (int retryTime = 1; retryTime <= READ_RETRY_TIME; retryTime++) {
+ try {
+ byteRead = wrappedStream.read();
+ exception = null;
+ break;
+ } catch (EOFException e) {
+ onReadFailure(e, 1);
+ return -1;
+ } catch (IOException e) {
+ exception = e;
+ onReadFailure(e, 1);
+ LOG.warn(
+ "read of [{}] failed, retry time[{}], due to exception[{}]",
+ uri, retryTime, exception);
+ if (retryTime < READ_RETRY_TIME) {
+ try {
+ sleepInLock();
+ } catch (InterruptedException ie) {
+ LOG.error(
+ "read of [{}] failed, retry time[{}], due to "
+ + "exception[{}]",
+ uri, retryTime,
+ exception);
+ throw exception;
+ }
+ }
+ }
+ }
+
+ if (exception != null) {
+ LOG.error(
+ "read of [{}] failed, retry time[{}], due to exception[{}]",
+ uri, READ_RETRY_TIME, exception);
+ throw exception;
+ }
+
+ if (byteRead >= 0) {
+ streamCurrentPos++;
+ nextReadPos++;
+ }
+
+ if (byteRead >= 0) {
+ incrementBytesRead(1);
+ }
+
+ long endTime = System.currentTimeMillis();
+ LOG.debug(
+ "read-0arg uri:{}, contentLength:{}, position:{}, readValue:{}, "
+ + "thread:{}, timeUsedMilliSec:{}",
+ uri, contentLength, byteRead >= 0 ? nextReadPos - 1 : nextReadPos,
+ byteRead, threadId,
+ endTime - startTime);
+ return byteRead;
+ }
+
+ /**
+ * Handle an IOE on a read by attempting to re-open the stream. The
+ * filesystem's readException count will be incremented.
+ *
+ * @param ioe exception caught.
+ * @param length length of data being attempted to read
+ * @throws IOException any exception thrown on the re-open attempt.
+ */
+ private void onReadFailure(final IOException ioe, final int length)
+ throws IOException {
+ LOG.debug(
+ "Got exception while trying to read from stream {}"
+ + " trying to recover: " + ioe, uri);
+ int i = 1;
+ while (true) {
+ try {
+ reopen("failure recovery", streamCurrentPos, length);
+ return;
+ } catch (OBSIOException e) {
+ LOG.warn(
+ "OBSIOException occurred in reopen for failure recovery, "
+ + "the {} retry time",
+ i, e);
+ if (i == READ_RETRY_TIME) {
+ throw e;
+ }
+ try {
+ Thread.sleep(DELAY_TIME);
+ } catch (InterruptedException ie) {
+ throw e;
+ }
+ }
+ i++;
+ }
+ }
+
+ @Override
+ public synchronized int read(final ByteBuffer byteBuffer)
+ throws IOException {
+ long startTime = System.currentTimeMillis();
+ long threadId = Thread.currentThread().getId();
+ LOG.debug("read byteBuffer: {}", byteBuffer.toString());
+ checkNotClosed();
+
+ int len = byteBuffer.remaining();
+ if (len == 0) {
+ return 0;
+ }
+
+ byte[] buf = new byte[len];
+
+ if (this.contentLength == 0 || nextReadPos >= contentLength) {
+ return -1;
+ }
+
+ try {
+ lazySeek(nextReadPos, len);
+ } catch (EOFException e) {
+ onReadFailure(e, len);
+ // the end of the file has moved
+ return -1;
+ }
+
+ int bytesRead = 0;
+ IOException exception = null;
+ for (int retryTime = 1; retryTime <= READ_RETRY_TIME; retryTime++) {
+ try {
+ bytesRead = tryToReadFromInputStream(wrappedStream, buf, 0,
+ len);
+ if (bytesRead == -1) {
+ return -1;
+ }
+ exception = null;
+ break;
+ } catch (EOFException e) {
+ onReadFailure(e, len);
+ return -1;
+ } catch (IOException e) {
+ exception = e;
+ onReadFailure(e, len);
+ LOG.warn(
+ "read len[{}] of [{}] failed, retry time[{}], "
+ + "due to exception[{}]",
+ len, uri, retryTime, exception);
+ if (retryTime < READ_RETRY_TIME) {
+ try {
+ sleepInLock();
+ } catch (InterruptedException ie) {
+ LOG.error(
+ "read len[{}] of [{}] failed, retry time[{}], "
+ + "due to exception[{}]",
+ len, uri, retryTime, exception);
+ throw exception;
+ }
+ }
+ }
+ }
+
+ if (exception != null) {
+ LOG.error(
+ "read len[{}] of [{}] failed, retry time[{}], "
+ + "due to exception[{}]",
+ len, uri, READ_RETRY_TIME, exception);
+ throw exception;
+ }
+
+ if (bytesRead > 0) {
+ streamCurrentPos += bytesRead;
+ nextReadPos += bytesRead;
+ byteBuffer.put(buf, 0, bytesRead);
+ }
+ incrementBytesRead(bytesRead);
+
+ long endTime = System.currentTimeMillis();
+ LOG.debug(
+ "Read-ByteBuffer uri:{}, contentLength:{}, destLen:{}, readLen:{}, "
+ + "position:{}, thread:{}, timeUsedMilliSec:{}",
+ uri, contentLength, len, bytesRead,
+ bytesRead >= 0 ? nextReadPos - bytesRead : nextReadPos, threadId,
+ endTime - startTime);
+ return bytesRead;
+ }
+
+ private int tryToReadFromInputStream(final InputStream in, final byte[] buf,
+ final int off, final int len) throws IOException {
+ int bytesRead = 0;
+ while (bytesRead < len) {
+ int bytes = in.read(buf, off + bytesRead, len - bytesRead);
+ if (bytes == -1) {
+ if (bytesRead == 0) {
+ return -1;
+ } else {
+ break;
+ }
+ }
+ bytesRead += bytes;
+ }
+
+ return bytesRead;
+ }
+
+ /**
+ * {@inheritDoc}
+ *
+ *
This updates the statistics on read operations started and whether or
+ * not the read operation "completed", that is: returned the exact number of
+ * bytes requested.
+ *
+ * @throws IOException if there are other problems
+ */
+ @Override
+ public synchronized int read(@NotNull final byte[] buf, final int off,
+ final int len) throws IOException {
+ long startTime = System.currentTimeMillis();
+ long threadId = Thread.currentThread().getId();
+ checkNotClosed();
+ validatePositionedReadArgs(nextReadPos, buf, off, len);
+ if (len == 0) {
+ return 0;
+ }
+
+ if (this.contentLength == 0 || nextReadPos >= contentLength) {
+ return -1;
+ }
+
+ try {
+ lazySeek(nextReadPos, len);
+ } catch (EOFException e) {
+ onReadFailure(e, len);
+ // the end of the file has moved
+ return -1;
+ }
+
+ int bytesRead = 0;
+ IOException exception = null;
+ for (int retryTime = 1; retryTime <= READ_RETRY_TIME; retryTime++) {
+ try {
+ bytesRead = tryToReadFromInputStream(wrappedStream, buf, off,
+ len);
+ if (bytesRead == -1) {
+ return -1;
+ }
+ exception = null;
+ break;
+ } catch (EOFException e) {
+ onReadFailure(e, len);
+ return -1;
+ } catch (IOException e) {
+ exception = e;
+ onReadFailure(e, len);
+ LOG.warn(
+ "read offset[{}] len[{}] of [{}] failed, retry time[{}], "
+ + "due to exception[{}]",
+ off, len, uri, retryTime, exception);
+ if (retryTime < READ_RETRY_TIME) {
+ try {
+ sleepInLock();
+ } catch (InterruptedException ie) {
+ LOG.error(
+ "read offset[{}] len[{}] of [{}] failed, "
+ + "retry time[{}], due to exception[{}]",
+ off, len, uri, retryTime, exception);
+ throw exception;
+ }
+ }
+ }
+ }
+
+ if (exception != null) {
+ LOG.error(
+ "read offset[{}] len[{}] of [{}] failed, retry time[{}], "
+ + "due to exception[{}]",
+ off, len, uri, READ_RETRY_TIME, exception);
+ throw exception;
+ }
+
+ if (bytesRead > 0) {
+ streamCurrentPos += bytesRead;
+ nextReadPos += bytesRead;
+ }
+ incrementBytesRead(bytesRead);
+
+ long endTime = System.currentTimeMillis();
+ LOG.debug(
+ "Read-3args uri:{}, contentLength:{}, destLen:{}, readLen:{}, "
+ + "position:{}, thread:{}, timeUsedMilliSec:{}",
+ uri, contentLength, len, bytesRead,
+ bytesRead >= 0 ? nextReadPos - bytesRead : nextReadPos, threadId,
+ endTime - startTime);
+ return bytesRead;
+ }
+
+ /**
+ * Verify that the input stream is open. Non blocking; this gives the last
+ * state of the volatile {@link #closed} field.
+ *
+ * @throws IOException if the connection is closed.
+ */
+ private void checkNotClosed() throws IOException {
+ if (closed) {
+ throw new IOException(
+ uri + ": " + FSExceptionMessages.STREAM_IS_CLOSED);
+ }
+ }
+
+ /**
+ * Close the stream. This triggers publishing of the stream statistics back to
+ * the filesystem statistics. This operation is synchronized, so that only one
+ * thread can attempt to close the connection; all later/blocked calls are
+ * no-ops.
+ *
+ * @throws IOException on any problem
+ */
+ @Override
+ public synchronized void close() throws IOException {
+ if (!closed) {
+ closed = true;
+ // close or abort the stream
+ closeStream("close() operation", this.contentRangeFinish);
+ // this is actually a no-op
+ super.close();
+ }
+ }
+
+ /**
+ * Close a stream: decide whether to abort or close, based on the length of
+ * the stream and the current position. If a close() is attempted and fails,
+ * the operation escalates to an abort.
+ *
+ *
This does not set the {@link #closed} flag.
+ *
+ * @param reason reason for stream being closed; used in messages
+ * @param length length of the stream
+ * @throws IOException on any failure to close stream
+ */
+ private synchronized void closeStream(final String reason,
+ final long length)
+ throws IOException {
+ if (wrappedStream != null) {
+ try {
+ wrappedStream.close();
+ } catch (IOException e) {
+ // exception escalates to an abort
+ LOG.debug("When closing {} stream for {}", uri, reason, e);
+ throw e;
+ }
+
+ LOG.debug(
+ "Stream {} : {}; streamPos={}, nextReadPos={},"
+ + " request range {}-{} length={}",
+ uri,
+ reason,
+ streamCurrentPos,
+ nextReadPos,
+ contentRangeStart,
+ contentRangeFinish,
+ length);
+ wrappedStream = null;
+ }
+ }
+
+ @Override
+ public synchronized int available() throws IOException {
+ checkNotClosed();
+
+ long remaining = remainingInFile();
+ if (remaining > Integer.MAX_VALUE) {
+ return Integer.MAX_VALUE;
+ }
+ return (int) remaining;
+ }
+
+ /**
+ * Bytes left in stream.
+ *
+ * @return how many bytes are left to read
+ */
+ @InterfaceAudience.Private
+ @InterfaceStability.Unstable
+ public synchronized long remainingInFile() {
+ return this.contentLength - this.streamCurrentPos;
+ }
+
+ /**
+ * Bytes left in the current request. Only valid if there is an active
+ * request.
+ *
+ * @return how many bytes are left to read in the current GET.
+ */
+ @InterfaceAudience.Private
+ @InterfaceStability.Unstable
+ public synchronized long remainingInCurrentRequest() {
+ return this.contentRangeFinish - this.streamCurrentPos;
+ }
+
+ @Override
+ public boolean markSupported() {
+ return false;
+ }
+
+ /**
+ * String value includes statistics as well as stream state. Important:
+ * there are no guarantees as to the stability of this value.
+ *
+ * @return a string value for printing in logs/diagnostics
+ */
+ @Override
+ @InterfaceStability.Unstable
+ public String toString() {
+ synchronized (this) {
+ return "OBSInputStream{" + uri
+ + " wrappedStream=" + (wrappedStream != null
+ ? "open"
+ : "closed")
+ + " streamCurrentPos=" + streamCurrentPos
+ + " nextReadPos=" + nextReadPos
+ + " contentLength=" + contentLength
+ + " contentRangeStart=" + contentRangeStart
+ + " contentRangeFinish=" + contentRangeFinish
+ + " remainingInCurrentRequest=" + remainingInCurrentRequest()
+ + '}';
+ }
+ }
+
+ /**
+ * Subclass {@code readFully()} operation which only seeks at the start of the
+ * series of operations; seeking back at the end.
+ *
+ *
This is significantly higher performance if multiple read attempts
+ * are needed to fetch the data, as it does not break the HTTP connection.
+ *
+ *
To maintain thread safety requirements, this operation is
+ * synchronized for the duration of the sequence. {@inheritDoc}
+ */
+ @Override
+ public void readFully(final long position, final byte[] buffer,
+ final int offset,
+ final int length)
+ throws IOException {
+ long startTime = System.currentTimeMillis();
+ long threadId = Thread.currentThread().getId();
+ checkNotClosed();
+ validatePositionedReadArgs(position, buffer, offset, length);
+ if (length == 0) {
+ return;
+ }
+ int nread = 0;
+ synchronized (this) {
+ long oldPos = getPos();
+ try {
+ seek(position);
+ while (nread < length) {
+ int nbytes = read(buffer, offset + nread, length - nread);
+ if (nbytes < 0) {
+ throw new EOFException(
+ FSExceptionMessages.EOF_IN_READ_FULLY);
+ }
+ nread += nbytes;
+ }
+ } finally {
+ seekQuietly(oldPos);
+ }
+ }
+
+ long endTime = System.currentTimeMillis();
+ LOG.debug(
+ "ReadFully uri:{}, contentLength:{}, destLen:{}, readLen:{}, "
+ + "position:{}, thread:{}, timeUsedMilliSec:{}",
+ uri, contentLength, length, nread, position, threadId,
+ endTime - startTime);
+ }
+
+ /**
+ * Read bytes starting from the specified position.
+ *
+ * @param position start read from this position
+ * @param buffer read buffer
+ * @param offset offset into buffer
+ * @param length number of bytes to read
+ * @return actual number of bytes read
+ * @throws IOException on any failure to read
+ */
+ @Override
+ public int read(final long position, final byte[] buffer, final int offset,
+ final int length)
+ throws IOException {
+ int len = length;
+ checkNotClosed();
+ validatePositionedReadArgs(position, buffer, offset, len);
+ if (position < 0 || position >= contentLength) {
+ return -1;
+ }
+ if ((position + len) > contentLength) {
+ len = (int) (contentLength - position);
+ }
+
+ if (fs.isReadTransformEnabled()) {
+ return super.read(position, buffer, offset, len);
+ }
+
+ return randomReadWithNewInputStream(position, buffer, offset, len);
+ }
+
+ private int randomReadWithNewInputStream(final long position,
+ final byte[] buffer, final int offset, final int length)
+ throws IOException {
+ long startTime = System.currentTimeMillis();
+ long threadId = Thread.currentThread().getId();
+ int bytesRead = 0;
+ InputStream inputStream = null;
+ IOException exception = null;
+ GetObjectRequest request = new GetObjectRequest(bucket, key);
+ request.setRangeStart(position);
+ request.setRangeEnd(position + length);
+ if (fs.getSse().isSseCEnable()) {
+ request.setSseCHeader(fs.getSse().getSseCHeader());
+ }
+
+ for (int retryTime = 1; retryTime <= READ_RETRY_TIME; retryTime++) {
+ try {
+ inputStream = client.getObject(request).getObjectContent();
+ if (inputStream == null) {
+ break;
+ }
+ bytesRead = tryToReadFromInputStream(inputStream, buffer,
+ offset, length);
+ if (bytesRead == -1) {
+ return -1;
+ }
+
+ exception = null;
+ break;
+ } catch (ObsException | IOException e) {
+ if (e instanceof ObsException) {
+ exception = translateException(
+ "Read at position " + position, uri, (ObsException) e);
+ } else {
+ exception = (IOException) e;
+ }
+ LOG.warn(
+ "read position[{}] destLen[{}] offset[{}] readLen[{}] "
+ + "of [{}] failed, retry time[{}], due to "
+ + "exception[{}] e[{}]",
+ position, length, offset, bytesRead, uri, retryTime,
+ exception, e);
+ if (retryTime < READ_RETRY_TIME) {
+ try {
+ Thread.sleep(DELAY_TIME);
+ } catch (InterruptedException ie) {
+ LOG.error(
+ "read position[{}] destLen[{}] offset[{}] "
+ + "readLen[{}] of [{}] failed, retry time[{}], "
+ + "due to exception[{}] e[{}]",
+ position, length, offset, bytesRead, uri, retryTime,
+ exception, e);
+ throw exception;
+ }
+ }
+ } finally {
+ if (inputStream != null) {
+ inputStream.close();
+ }
+ }
+ }
+
+ if (inputStream == null || exception != null) {
+ LOG.error(
+ "read position[{}] destLen[{}] offset[{}] len[{}] failed, "
+ + "retry time[{}], due to exception[{}]",
+ position, length, offset, bytesRead, READ_RETRY_TIME,
+ exception);
+ throw new IOException("read failed of " + uri + ", inputStream is "
+ + (inputStream == null ? "null" : "not null"), exception);
+
+ }
+
+ long endTime = System.currentTimeMillis();
+ LOG.debug(
+ "Read-4args uri:{}, contentLength:{}, destLen:{}, readLen:{}, "
+ + "position:{}, thread:{}, timeUsedMilliSec:{}",
+ uri, contentLength, length, bytesRead, position, threadId,
+ endTime - startTime);
+ return bytesRead;
+ }
+
+ @Override
+ public synchronized void setReadahead(final Long newReadaheadRange) {
+ if (newReadaheadRange == null) {
+ this.readAheadRange = OBSConstants.DEFAULT_READAHEAD_RANGE;
+ } else {
+ Preconditions.checkArgument(newReadaheadRange >= 0,
+ "Negative readahead value");
+ this.readAheadRange = newReadaheadRange;
+ }
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSListing.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSListing.java
new file mode 100644
index 0000000000000..4072feb2cac9d
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSListing.java
@@ -0,0 +1,656 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import com.obs.services.exception.ObsException;
+import com.obs.services.model.ListObjectsRequest;
+import com.obs.services.model.ObjectListing;
+import com.obs.services.model.ObsObject;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.ListIterator;
+import java.util.NoSuchElementException;
+
+/**
+ * OBS listing implementation.
+ */
+class OBSListing {
+ /**
+ * A Path filter which accepts all filenames.
+ */
+ static final PathFilter ACCEPT_ALL =
+ new PathFilter() {
+ @Override
+ public boolean accept(final Path file) {
+ return true;
+ }
+
+ @Override
+ public String toString() {
+ return "ACCEPT_ALL";
+ }
+ };
+
+ /**
+ * Class logger.
+ */
+ private static final Logger LOG = LoggerFactory.getLogger(OBSListing.class);
+
+ /**
+ * OBS File System instance.
+ */
+ private final OBSFileSystem owner;
+
+ OBSListing(final OBSFileSystem ownerFS) {
+ this.owner = ownerFS;
+ }
+
+ /**
+ * Create a FileStatus iterator against a path, with a given list object
+ * request.
+ *
+ * @param listPath path of the listing
+ * @param request initial request to make
+ * @param filter the filter on which paths to accept
+ * @param acceptor the class/predicate to decide which entries to accept in
+ * the listing based on the full file status.
+ * @return the iterator
+ * @throws IOException IO Problems
+ */
+ FileStatusListingIterator createFileStatusListingIterator(
+ final Path listPath,
+ final ListObjectsRequest request,
+ final PathFilter filter,
+ final FileStatusAcceptor acceptor)
+ throws IOException {
+ return new FileStatusListingIterator(
+ new ObjectListingIterator(listPath, request), filter, acceptor);
+ }
+
+ /**
+ * Create a located status iterator over a file status iterator.
+ *
+ * @param statusIterator an iterator over the remote status entries
+ * @return a new remote iterator
+ */
+ LocatedFileStatusIterator createLocatedFileStatusIterator(
+ final RemoteIterator statusIterator) {
+ return new LocatedFileStatusIterator(statusIterator);
+ }
+
+ /**
+ * Interface to implement by the logic deciding whether to accept a summary
+ * entry or path as a valid file or directory.
+ */
+ interface FileStatusAcceptor {
+
+ /**
+ * Predicate to decide whether or not to accept a summary entry.
+ *
+ * @param keyPath qualified path to the entry
+ * @param summary summary entry
+ * @return true if the entry is accepted (i.e. that a status entry should be
+ * generated.
+ */
+ boolean accept(Path keyPath, ObsObject summary);
+
+ /**
+ * Predicate to decide whether or not to accept a prefix.
+ *
+ * @param keyPath qualified path to the entry
+ * @param commonPrefix the prefix
+ * @return true if the entry is accepted (i.e. that a status entry should be
+ * generated.)
+ */
+ boolean accept(Path keyPath, String commonPrefix);
+ }
+
+ /**
+ * A remote iterator which only iterates over a single `LocatedFileStatus`
+ * value.
+ *
+ *
If the status value is null, the iterator declares that it has no
+ * data. This iterator is used to handle
+ * {@link OBSFileSystem#listStatus(Path)}calls where the path handed in
+ * refers to a file, not a directory: this is
+ * the iterator returned.
+ */
+ static final class SingleStatusRemoteIterator
+ implements RemoteIterator {
+
+ /**
+ * The status to return; set to null after the first iteration.
+ */
+ private LocatedFileStatus status;
+
+ /**
+ * Constructor.
+ *
+ * @param locatedFileStatus status value: may be null, in which case the
+ * iterator is empty.
+ */
+ SingleStatusRemoteIterator(final LocatedFileStatus locatedFileStatus) {
+ this.status = locatedFileStatus;
+ }
+
+ /**
+ * {@inheritDoc}
+ *
+ * @return true if there is a file status to return: this is always false
+ * for the second iteration, and may be false for the first.
+ */
+ @Override
+ public boolean hasNext() {
+ return status != null;
+ }
+
+ /**
+ * {@inheritDoc}
+ *
+ * @return the non-null status element passed in when the instance was
+ * constructed, if it ha not already been retrieved.
+ * @throws NoSuchElementException if this is the second call, or it is the
+ * first call and a null
+ * {@link LocatedFileStatus}
+ * entry was passed to the constructor.
+ */
+ @Override
+ public LocatedFileStatus next() {
+ if (hasNext()) {
+ LocatedFileStatus s = this.status;
+ status = null;
+ return s;
+ } else {
+ throw new NoSuchElementException();
+ }
+ }
+ }
+
+ /**
+ * Accept all entries except the base path and those which map to OBS pseudo
+ * directory markers.
+ */
+ static class AcceptFilesOnly implements FileStatusAcceptor {
+ /**
+ * path to qualify.
+ */
+ private final Path qualifiedPath;
+
+ AcceptFilesOnly(final Path path) {
+ this.qualifiedPath = path;
+ }
+
+ /**
+ * Reject a summary entry if the key path is the qualified Path, or it ends
+ * with {@code "_$folder$"}.
+ *
+ * @param keyPath key path of the entry
+ * @param summary summary entry
+ * @return true if the entry is accepted (i.e. that a status entry should be
+ * generated.
+ */
+ @Override
+ public boolean accept(final Path keyPath, final ObsObject summary) {
+ return !keyPath.equals(qualifiedPath)
+ && !summary.getObjectKey()
+ .endsWith(OBSConstants.OBS_FOLDER_SUFFIX)
+ && !OBSCommonUtils.objectRepresentsDirectory(
+ summary.getObjectKey(),
+ summary.getMetadata().getContentLength());
+ }
+
+ /**
+ * Accept no directory paths.
+ *
+ * @param keyPath qualified path to the entry
+ * @param prefix common prefix in listing.
+ * @return false, always.
+ */
+ @Override
+ public boolean accept(final Path keyPath, final String prefix) {
+ return false;
+ }
+ }
+
+ /**
+ * Accept all entries except the base path and those which map to OBS pseudo
+ * directory markers.
+ */
+ static class AcceptAllButSelfAndS3nDirs implements FileStatusAcceptor {
+
+ /**
+ * Base path.
+ */
+ private final Path qualifiedPath;
+
+ /**
+ * Constructor.
+ *
+ * @param path an already-qualified path.
+ */
+ AcceptAllButSelfAndS3nDirs(final Path path) {
+ this.qualifiedPath = path;
+ }
+
+ /**
+ * Reject a summary entry if the key path is the qualified Path, or it ends
+ * with {@code "_$folder$"}.
+ *
+ * @param keyPath key path of the entry
+ * @param summary summary entry
+ * @return true if the entry is accepted (i.e. that a status entry should be
+ * generated.)
+ */
+ @Override
+ public boolean accept(final Path keyPath, final ObsObject summary) {
+ return !keyPath.equals(qualifiedPath) && !summary.getObjectKey()
+ .endsWith(OBSConstants.OBS_FOLDER_SUFFIX);
+ }
+
+ /**
+ * Accept all prefixes except the one for the base path, "self".
+ *
+ * @param keyPath qualified path to the entry
+ * @param prefix common prefix in listing.
+ * @return true if the entry is accepted (i.e. that a status entry should be
+ * generated.
+ */
+ @Override
+ public boolean accept(final Path keyPath, final String prefix) {
+ return !keyPath.equals(qualifiedPath);
+ }
+ }
+
+ /**
+ * Wraps up object listing into a remote iterator which will ask for more
+ * listing data if needed.
+ *
+ *
This is a complex operation, especially the process to determine if
+ * there are more entries remaining. If there are no more results remaining in
+ * the (filtered) results of the current listing request, then another request
+ * is made
+ * and those results filtered before the iterator can declare that
+ * there is more data available.
+ *
+ *
The need to filter the results precludes the iterator from simply
+ * declaring that if the {@link ObjectListingIterator#hasNext()} is true then
+ * there are more results. Instead the next batch of results must be retrieved
+ * and filtered.
+ *
+ *
What does this mean? It means that remote requests to retrieve new
+ * batches of object listings are made in the {@link #hasNext()} call; the
+ * {@link #next()} call simply returns the filtered results of the last
+ * listing processed. However, do note that {@link #next()} calls {@link
+ * #hasNext()} during its operation. This is critical to ensure that a listing
+ * obtained through a sequence of {@link #next()} will complete with the same
+ * set of results as a classic {@code while(it.hasNext()} loop.
+ *
+ *
Thread safety: None.
+ */
+ class FileStatusListingIterator implements RemoteIterator {
+
+ /**
+ * Source of objects.
+ */
+ private final ObjectListingIterator source;
+
+ /**
+ * Filter of paths from API call.
+ */
+ private final PathFilter filter;
+
+ /**
+ * Filter of entries from file status.
+ */
+ private final FileStatusAcceptor acceptor;
+
+ /**
+ * Request batch size.
+ */
+ private int batchSize;
+
+ /**
+ * Iterator over the current set of results.
+ */
+ private ListIterator statusBatchIterator;
+
+ /**
+ * Create an iterator over file status entries.
+ *
+ * @param listPath the listing iterator from a listObjects call.
+ * @param pathFilter the filter on which paths to accept
+ * @param fileStatusAcceptor the class/predicate to decide which entries to
+ * accept in the listing based on the full file
+ * status.
+ * @throws IOException IO Problems
+ */
+ FileStatusListingIterator(
+ final ObjectListingIterator listPath, final PathFilter pathFilter,
+ final FileStatusAcceptor fileStatusAcceptor)
+ throws IOException {
+ this.source = listPath;
+ this.filter = pathFilter;
+ this.acceptor = fileStatusAcceptor;
+ // build the first set of results. This will not trigger any
+ // remote IO, assuming the source iterator is in its initial
+ // iteration
+ requestNextBatch();
+ }
+
+ /**
+ * Report whether or not there is new data available. If there is data in
+ * the local filtered list, return true. Else: request more data util that
+ * condition is met, or there is no more remote listing data.
+ *
+ * @return true if a call to {@link #next()} will succeed.
+ * @throws IOException on any failure to request next batch
+ */
+ @Override
+ public boolean hasNext() throws IOException {
+ return statusBatchIterator.hasNext() || requestNextBatch();
+ }
+
+ @Override
+ public FileStatus next() throws IOException {
+ if (!hasNext()) {
+ throw new NoSuchElementException();
+ }
+ return statusBatchIterator.next();
+ }
+
+ /**
+ * Try to retrieve another batch. Note that for the initial batch, {@link
+ * ObjectListingIterator} does not generate a request; it simply returns the
+ * initial set.
+ *
+ * @return true if a new batch was created.
+ * @throws IOException IO problems
+ */
+ private boolean requestNextBatch() throws IOException {
+ // look for more object listing batches being available
+ while (source.hasNext()) {
+ // if available, retrieve it and build the next status
+ if (buildNextStatusBatch(source.next())) {
+ // this batch successfully generated entries matching
+ // the filters/acceptors;
+ // declare that the request was successful
+ return true;
+ } else {
+ LOG.debug(
+ "All entries in batch were filtered...continuing");
+ }
+ }
+ // if this code is reached, it means that all remaining
+ // object lists have been retrieved, and there are no new entries
+ // to return.
+ return false;
+ }
+
+ /**
+ * Build the next status batch from a listing.
+ *
+ * @param objects the next object listing
+ * @return true if this added any entries after filtering
+ */
+ private boolean buildNextStatusBatch(final ObjectListing objects) {
+ // counters for debug logs
+ int added = 0;
+ int ignored = 0;
+ // list to fill in with results. Initial size will be list maximum.
+ List stats =
+ new ArrayList<>(
+ objects.getObjects().size() + objects.getCommonPrefixes()
+ .size());
+ // objects
+ for (ObsObject summary : objects.getObjects()) {
+ String key = summary.getObjectKey();
+ Path keyPath = OBSCommonUtils.keyToQualifiedPath(owner, key);
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("{}: {}", keyPath,
+ OBSCommonUtils.stringify(summary));
+ }
+ // Skip over keys that are ourselves and old OBS _$folder$ files
+ if (acceptor.accept(keyPath, summary) && filter.accept(
+ keyPath)) {
+ FileStatus status =
+ OBSCommonUtils.createFileStatus(
+ keyPath, summary,
+ owner.getDefaultBlockSize(keyPath),
+ owner.getUsername());
+ LOG.debug("Adding: {}", status);
+ stats.add(status);
+ added++;
+ } else {
+ LOG.debug("Ignoring: {}", keyPath);
+ ignored++;
+ }
+ }
+
+ // prefixes: always directories
+ for (ObsObject prefix : objects.getExtenedCommonPrefixes()) {
+ String key = prefix.getObjectKey();
+ Path keyPath = OBSCommonUtils.keyToQualifiedPath(owner, key);
+ if (acceptor.accept(keyPath, key) && filter.accept(keyPath)) {
+ long lastModified =
+ prefix.getMetadata().getLastModified() == null
+ ? System.currentTimeMillis()
+ : OBSCommonUtils.dateToLong(
+ prefix.getMetadata().getLastModified());
+ FileStatus status = new OBSFileStatus(keyPath, lastModified,
+ lastModified, owner.getUsername());
+ LOG.debug("Adding directory: {}", status);
+ added++;
+ stats.add(status);
+ } else {
+ LOG.debug("Ignoring directory: {}", keyPath);
+ ignored++;
+ }
+ }
+
+ // finish up
+ batchSize = stats.size();
+ statusBatchIterator = stats.listIterator();
+ boolean hasNext = statusBatchIterator.hasNext();
+ LOG.debug(
+ "Added {} entries; ignored {}; hasNext={}; hasMoreObjects={}",
+ added,
+ ignored,
+ hasNext,
+ objects.isTruncated());
+ return hasNext;
+ }
+
+ /**
+ * Get the number of entries in the current batch.
+ *
+ * @return a number, possibly zero.
+ */
+ public int getBatchSize() {
+ return batchSize;
+ }
+ }
+
+ /**
+ * Wraps up OBS `ListObjects` requests in a remote iterator which will ask for
+ * more listing data if needed.
+ *
+ *
That is:
+ *
+ *
1. The first invocation of the {@link #next()} call will return the
+ * results of the first request, the one created during the construction of
+ * the instance.
+ *
+ *
2. Second and later invocations will continue the ongoing listing,
+ * calling {@link OBSCommonUtils#continueListObjects} to request the next
+ * batch of results.
+ *
+ *
3. The {@link #hasNext()} predicate returns true for the initial call,
+ * where {@link #next()} will return the initial results. It declares that it
+ * has future results iff the last executed request was truncated.
+ *
+ *
Thread safety: none.
+ */
+ class ObjectListingIterator implements RemoteIterator {
+
+ /**
+ * The path listed.
+ */
+ private final Path listPath;
+
+ /**
+ * The most recent listing results.
+ */
+ private ObjectListing objects;
+
+ /**
+ * Indicator that this is the first listing.
+ */
+ private boolean firstListing = true;
+
+ /**
+ * Count of how many listings have been requested (including initial
+ * result).
+ */
+ private int listingCount = 1;
+
+ /**
+ * Maximum keys in a request.
+ */
+ private int maxKeys;
+
+ /**
+ * Constructor -calls {@link OBSCommonUtils#listObjects} on the request to
+ * populate the initial set of results/fail if there was a problem talking
+ * to the bucket.
+ *
+ * @param path path of the listing
+ * @param request initial request to make
+ * @throws IOException on any failure to list objects
+ */
+ ObjectListingIterator(final Path path,
+ final ListObjectsRequest request)
+ throws IOException {
+ this.listPath = path;
+ this.maxKeys = owner.getMaxKeys();
+ this.objects = OBSCommonUtils.listObjects(owner, request);
+ }
+
+ /**
+ * Declare that the iterator has data if it is either is the initial
+ * iteration or it is a later one and the last listing obtained was
+ * incomplete.
+ */
+ @Override
+ public boolean hasNext() {
+ return firstListing || objects.isTruncated();
+ }
+
+ /**
+ * Ask for the next listing. For the first invocation, this returns the
+ * initial set, with no remote IO. For later requests, OBS will be queried,
+ * hence the calls may block or fail.
+ *
+ * @return the next object listing.
+ * @throws IOException if a query made of OBS fails.
+ * @throws NoSuchElementException if there is no more data to list.
+ */
+ @Override
+ public ObjectListing next() throws IOException {
+ if (firstListing) {
+ // on the first listing, don't request more data.
+ // Instead just clear the firstListing flag so that it future
+ // calls will request new data.
+ firstListing = false;
+ } else {
+ try {
+ if (!objects.isTruncated()) {
+ // nothing more to request: fail.
+ throw new NoSuchElementException(
+ "No more results in listing of " + listPath);
+ }
+ // need to request a new set of objects.
+ LOG.debug("[{}], Requesting next {} objects under {}",
+ listingCount, maxKeys, listPath);
+ objects = OBSCommonUtils.continueListObjects(owner,
+ objects);
+ listingCount++;
+ LOG.debug("New listing status: {}", this);
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException("listObjects()",
+ listPath, e);
+ }
+ }
+ return objects;
+ }
+
+ @Override
+ public String toString() {
+ return "Object listing iterator against "
+ + listPath
+ + "; listing count "
+ + listingCount
+ + "; isTruncated="
+ + objects.isTruncated();
+ }
+
+ }
+
+ /**
+ * Take a remote iterator over a set of {@link FileStatus} instances and
+ * return a remote iterator of {@link LocatedFileStatus} instances.
+ */
+ class LocatedFileStatusIterator
+ implements RemoteIterator {
+ /**
+ * File status.
+ */
+ private final RemoteIterator statusIterator;
+
+ /**
+ * Constructor.
+ *
+ * @param statusRemoteIterator an iterator over the remote status entries
+ */
+ LocatedFileStatusIterator(
+ final RemoteIterator statusRemoteIterator) {
+ this.statusIterator = statusRemoteIterator;
+ }
+
+ @Override
+ public boolean hasNext() throws IOException {
+ return statusIterator.hasNext();
+ }
+
+ @Override
+ public LocatedFileStatus next() throws IOException {
+ return OBSCommonUtils.toLocatedFileStatus(owner,
+ statusIterator.next());
+ }
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSLoginHelper.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSLoginHelper.java
new file mode 100644
index 0000000000000..cd9853369af88
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSLoginHelper.java
@@ -0,0 +1,350 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.UnsupportedEncodingException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.net.URLDecoder;
+import java.util.Objects;
+
+import static org.apache.commons.lang3.StringUtils.equalsIgnoreCase;
+
+/**
+ * Helper for OBS login.
+ */
+final class OBSLoginHelper {
+ /**
+ * login warning.
+ */
+ public static final String LOGIN_WARNING =
+ "The Filesystem URI contains login details."
+ + " This is insecure and may be unsupported in future.";
+
+ /**
+ * plus warning.
+ */
+ public static final String PLUS_WARNING =
+ "Secret key contains a special character that should be URL encoded! "
+ + "Attempting to resolve...";
+
+ /**
+ * defined plus unencoded char.
+ */
+ public static final String PLUS_UNENCODED = "+";
+
+ /**
+ * defined plus encoded char.
+ */
+ public static final String PLUS_ENCODED = "%2B";
+
+ /**
+ * Class logger.
+ */
+ private static final Logger LOG = LoggerFactory.getLogger(
+ OBSLoginHelper.class);
+
+ private OBSLoginHelper() {
+ }
+
+ /**
+ * Build the filesystem URI. This can include stripping down of part of the
+ * URI.
+ *
+ * @param uri filesystem uri
+ * @return the URI to use as the basis for FS operation and qualifying paths.
+ * @throws IllegalArgumentException if the URI is in some way invalid.
+ */
+ public static URI buildFSURI(final URI uri) {
+ Objects.requireNonNull(uri, "null uri");
+ Objects.requireNonNull(uri.getScheme(), "null uri.getScheme()");
+ if (uri.getHost() == null && uri.getAuthority() != null) {
+ Objects.requireNonNull(
+ uri.getHost(),
+ "null uri host."
+ + " This can be caused by unencoded / in the "
+ + "password string");
+ }
+ Objects.requireNonNull(uri.getHost(), "null uri host.");
+ return URI.create(uri.getScheme() + "://" + uri.getHost());
+ }
+
+ /**
+ * Create a stripped down string value for error messages.
+ *
+ * @param pathUri URI
+ * @return a shortened schema://host/path value
+ */
+ public static String toString(final URI pathUri) {
+ return pathUri != null
+ ? String.format("%s://%s/%s", pathUri.getScheme(),
+ pathUri.getHost(), pathUri.getPath())
+ : "(null URI)";
+ }
+
+ /**
+ * Extract the login details from a URI, logging a warning if the URI contains
+ * these.
+ *
+ * @param name URI of the filesystem
+ * @return a login tuple, possibly empty.
+ */
+ public static Login extractLoginDetailsWithWarnings(final URI name) {
+ Login login = extractLoginDetails(name);
+ if (login.hasLogin()) {
+ LOG.warn(LOGIN_WARNING);
+ }
+ return login;
+ }
+
+ /**
+ * Extract the login details from a URI.
+ *
+ * @param name URI of the filesystem
+ * @return a login tuple, possibly empty.
+ */
+ public static Login extractLoginDetails(final URI name) {
+ try {
+ String authority = name.getAuthority();
+ if (authority == null) {
+ return Login.EMPTY;
+ }
+ int loginIndex = authority.indexOf('@');
+ if (loginIndex < 0) {
+ // no login
+ return Login.EMPTY;
+ }
+ String login = authority.substring(0, loginIndex);
+ int loginSplit = login.indexOf(':');
+ if (loginSplit > 0) {
+ String user = login.substring(0, loginSplit);
+ String encodedPassword = login.substring(loginSplit + 1);
+ if (encodedPassword.contains(PLUS_UNENCODED)) {
+ LOG.warn(PLUS_WARNING);
+ encodedPassword = encodedPassword.replaceAll(
+ "\\" + PLUS_UNENCODED, PLUS_ENCODED);
+ }
+ String password = URLDecoder.decode(encodedPassword, "UTF-8");
+ return new Login(user, password);
+ } else if (loginSplit == 0) {
+ // there is no user, just a password. In this case,
+ // there's no login
+ return Login.EMPTY;
+ } else {
+ return new Login(login, "");
+ }
+ } catch (UnsupportedEncodingException e) {
+ // this should never happen; translate it if it does.
+ throw new RuntimeException(e);
+ }
+ }
+
+ /**
+ * Canonicalize the given URI.
+ *
+ *
This strips out login information.
+ *
+ * @param uri the URI to canonicalize
+ * @param defaultPort default port to use in canonicalized URI if the input
+ * URI has no port and this value is greater than 0
+ * @return a new, canonicalized URI.
+ */
+ public static URI canonicalizeUri(final URI uri, final int defaultPort) {
+ URI newUri = uri;
+ if (uri.getPort() == -1 && defaultPort > 0) {
+ // reconstruct the uri with the default port set
+ try {
+ newUri =
+ new URI(
+ newUri.getScheme(),
+ null,
+ newUri.getHost(),
+ defaultPort,
+ newUri.getPath(),
+ newUri.getQuery(),
+ newUri.getFragment());
+ } catch (URISyntaxException e) {
+ // Should never happen!
+ throw new AssertionError(
+ "Valid URI became unparseable: " + newUri);
+ }
+ }
+
+ return newUri;
+ }
+
+ /**
+ * Check the path, ignoring authentication details. See {@link
+ * OBSFileSystem#checkPath(Path)} for the operation of this.
+ *
+ *
Essentially
+ *
+ *
+ *
The URI is canonicalized.
+ *
If the schemas match, the hosts are compared.
+ *
If there is a mismatch between null/non-null host,
+ * the default FS values are used to patch in the host.
+ *
+ *
+ * That all originates in the core FS; the sole change here being to use
+ * {@link URI#getHost()}over {@link URI#getAuthority()}. Some of that code
+ * looks a relic of the code anti-pattern of using "hdfs:file.txt" to define
+ * the path without declaring the hostname. It's retained for compatibility.
+ *
+ * @param conf FS configuration
+ * @param fsUri the FS URI
+ * @param path path to check
+ * @param defaultPort default port of FS
+ */
+ public static void checkPath(final Configuration conf, final URI fsUri,
+ final Path path, final int defaultPort) {
+ URI pathUri = path.toUri();
+ String thatScheme = pathUri.getScheme();
+ if (thatScheme == null) {
+ // fs is relative
+ return;
+ }
+ URI thisUri = canonicalizeUri(fsUri, defaultPort);
+ String thisScheme = thisUri.getScheme();
+ // hostname and scheme are not case sensitive in these checks
+ if (equalsIgnoreCase(thisScheme, thatScheme)) { // schemes match
+ String thisHost = thisUri.getHost();
+ String thatHost = pathUri.getHost();
+ if (thatHost == null
+ && // path's host is null
+ thisHost != null) { // fs has a host
+ URI defaultUri = FileSystem.getDefaultUri(conf);
+ if (equalsIgnoreCase(thisScheme, defaultUri.getScheme())) {
+ pathUri
+ = defaultUri; // schemes match, so use this uri instead
+ } else {
+ pathUri = null; // can't determine auth of the path
+ }
+ }
+ if (pathUri != null) {
+ // canonicalize uri before comparing with this fs
+ pathUri = canonicalizeUri(pathUri, defaultPort);
+ thatHost = pathUri.getHost();
+ if (equalsIgnoreCase(thisHost, thatHost)) {
+ return;
+ }
+ }
+ }
+ // make sure the exception strips out any auth details
+ throw new IllegalArgumentException(
+ "Wrong FS " + OBSLoginHelper.toString(pathUri) + " -expected "
+ + fsUri);
+ }
+
+ /**
+ * Simple tuple of login details.
+ */
+ public static class Login {
+ /**
+ * Defined empty login instance.
+ */
+ public static final Login EMPTY = new Login();
+
+ /**
+ * Defined user name.
+ */
+ private final String user;
+
+ /**
+ * Defined password.
+ */
+ private final String password;
+
+ /**
+ * Login token.
+ */
+ private final String token;
+
+ /**
+ * Create an instance with no login details. Calls to {@link #hasLogin()}
+ * return false.
+ */
+ Login() {
+ this("", "");
+ }
+
+ Login(final String userName, final String passwd) {
+ this(userName, passwd, null);
+ }
+
+ Login(final String userName, final String passwd,
+ final String sessionToken) {
+ this.user = userName;
+ this.password = passwd;
+ this.token = sessionToken;
+ }
+
+ /**
+ * Predicate to verify login details are defined.
+ *
+ * @return true if the username is defined (not null, not empty).
+ */
+ public boolean hasLogin() {
+ return StringUtils.isNotEmpty(user);
+ }
+
+ /**
+ * Equality test matches user and password.
+ *
+ * @param o other object
+ * @return true if the objects are considered equivalent.
+ */
+ @Override
+ public boolean equals(final Object o) {
+ if (this == o) {
+ return true;
+ }
+ if (o == null || getClass() != o.getClass()) {
+ return false;
+ }
+ Login that = (Login) o;
+ return Objects.equals(user, that.user) && Objects.equals(password,
+ that.password);
+ }
+
+ @Override
+ public int hashCode() {
+ return Objects.hash(user, password);
+ }
+
+ public String getUser() {
+ return user;
+ }
+
+ public String getPassword() {
+ return password;
+ }
+
+ public String getToken() {
+ return token;
+ }
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSObjectBucketUtils.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSObjectBucketUtils.java
new file mode 100644
index 0000000000000..ca29a965e9911
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSObjectBucketUtils.java
@@ -0,0 +1,897 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import com.obs.services.exception.ObsException;
+import com.obs.services.model.AbortMultipartUploadRequest;
+import com.obs.services.model.CompleteMultipartUploadRequest;
+import com.obs.services.model.CopyObjectRequest;
+import com.obs.services.model.CopyObjectResult;
+import com.obs.services.model.CopyPartRequest;
+import com.obs.services.model.CopyPartResult;
+import com.obs.services.model.DeleteObjectsRequest;
+import com.obs.services.model.GetObjectMetadataRequest;
+import com.obs.services.model.InitiateMultipartUploadRequest;
+import com.obs.services.model.InitiateMultipartUploadResult;
+import com.obs.services.model.KeyAndVersion;
+import com.obs.services.model.ListObjectsRequest;
+import com.obs.services.model.ObjectListing;
+import com.obs.services.model.ObjectMetadata;
+import com.obs.services.model.ObsObject;
+import com.obs.services.model.PartEtag;
+import com.obs.services.model.PutObjectRequest;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.ContentSummary;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.ParentNotDirectoryException;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathIsNotEmptyDirectoryException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InterruptedIOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.TreeSet;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Future;
+
+/**
+ * Object bucket specific utils for {@link OBSFileSystem}.
+ */
+final class OBSObjectBucketUtils {
+ /**
+ * Class logger.
+ */
+ private static final Logger LOG = LoggerFactory.getLogger(
+ OBSObjectBucketUtils.class);
+
+ private OBSObjectBucketUtils() {
+
+ }
+
+ /**
+ * The inner rename operation.
+ *
+ * @param owner OBS File System instance
+ * @param src path to be renamed
+ * @param dst new path after rename
+ * @return boolean
+ * @throws RenameFailedException if some criteria for a state changing rename
+ * was not met. This means work didn't happen;
+ * it's not something which is reported upstream
+ * to the FileSystem APIs, for which the
+ * semantics of "false" are pretty vague.
+ * @throws FileNotFoundException there's no source file.
+ * @throws IOException on IO failure.
+ * @throws ObsException on failures inside the OBS SDK
+ */
+ static boolean renameBasedOnObject(final OBSFileSystem owner,
+ final Path src, final Path dst) throws RenameFailedException,
+ FileNotFoundException, IOException,
+ ObsException {
+ String srcKey = OBSCommonUtils.pathToKey(owner, src);
+ String dstKey = OBSCommonUtils.pathToKey(owner, dst);
+
+ if (srcKey.isEmpty()) {
+ LOG.error("rename: src [{}] is root directory", src);
+ throw new IOException(src + " is root directory");
+ }
+
+ // get the source file status; this raises a FNFE if there is no source
+ // file.
+ FileStatus srcStatus = owner.getFileStatus(src);
+
+ FileStatus dstStatus;
+ try {
+ dstStatus = owner.getFileStatus(dst);
+ // if there is no destination entry, an exception is raised.
+ // hence this code sequence can assume that there is something
+ // at the end of the path; the only detail being what it is and
+ // whether or not it can be the destination of the rename.
+ if (dstStatus.isDirectory()) {
+ String newDstKey = OBSCommonUtils.maybeAddTrailingSlash(dstKey);
+ String filename = srcKey.substring(
+ OBSCommonUtils.pathToKey(owner, src.getParent()).length()
+ + 1);
+ newDstKey = newDstKey + filename;
+ dstKey = newDstKey;
+ dstStatus = owner.getFileStatus(
+ OBSCommonUtils.keyToPath(dstKey));
+ if (dstStatus.isDirectory()) {
+ throw new RenameFailedException(src, dst,
+ "new destination is an existed directory")
+ .withExitCode(false);
+ } else {
+ throw new RenameFailedException(src, dst,
+ "new destination is an existed file")
+ .withExitCode(false);
+ }
+ } else {
+
+ if (srcKey.equals(dstKey)) {
+ LOG.warn(
+ "rename: src and dest refer to the same file or"
+ + " directory: {}",
+ dst);
+ return true;
+ } else {
+ throw new RenameFailedException(src, dst,
+ "destination is an existed file")
+ .withExitCode(false);
+ }
+ }
+ } catch (FileNotFoundException e) {
+ LOG.debug("rename: destination path {} not found", dst);
+
+ // Parent must exist
+ checkDestinationParent(owner, src, dst);
+ }
+
+ if (dstKey.startsWith(srcKey)
+ && dstKey.charAt(srcKey.length()) == Path.SEPARATOR_CHAR) {
+ LOG.error("rename: dest [{}] cannot be a descendant of src [{}]",
+ dst, src);
+ return false;
+ }
+
+ // Ok! Time to start
+ if (srcStatus.isFile()) {
+ LOG.debug("rename: renaming file {} to {}", src, dst);
+
+ renameFile(owner, srcKey, dstKey, srcStatus);
+ } else {
+ LOG.debug("rename: renaming directory {} to {}", src, dst);
+
+ // This is a directory to directory copy
+ dstKey = OBSCommonUtils.maybeAddTrailingSlash(dstKey);
+ srcKey = OBSCommonUtils.maybeAddTrailingSlash(srcKey);
+
+ renameFolder(owner, srcKey, dstKey);
+ }
+
+ if (src.getParent() != dst.getParent()) {
+ // deleteUnnecessaryFakeDirectories(dst.getParent());
+ createFakeDirectoryIfNecessary(owner, src.getParent());
+ }
+
+ return true;
+ }
+
+ private static void checkDestinationParent(final OBSFileSystem owner,
+ final Path src,
+ final Path dst) throws IOException {
+ Path parent = dst.getParent();
+ if (!OBSCommonUtils.pathToKey(owner, parent).isEmpty()) {
+ try {
+ FileStatus dstParentStatus = owner.getFileStatus(
+ dst.getParent());
+ if (!dstParentStatus.isDirectory()) {
+ throw new ParentNotDirectoryException(
+ "destination parent [" + dst.getParent()
+ + "] is not a directory");
+ }
+ } catch (FileNotFoundException e2) {
+ throw new RenameFailedException(src, dst,
+ "destination has no parent ");
+ }
+ }
+ }
+
+ /**
+ * Implement rename file.
+ *
+ * @param owner OBS File System instance
+ * @param srcKey source object key
+ * @param dstKey destination object key
+ * @param srcStatus source object status
+ * @throws IOException any problem with rename operation
+ */
+ private static void renameFile(final OBSFileSystem owner,
+ final String srcKey,
+ final String dstKey,
+ final FileStatus srcStatus)
+ throws IOException {
+ long startTime = System.nanoTime();
+
+ copyFile(owner, srcKey, dstKey, srcStatus.getLen());
+ objectDelete(owner, srcStatus, false);
+
+ if (LOG.isDebugEnabled()) {
+ long delay = System.nanoTime() - startTime;
+ LOG.debug("OBSFileSystem rename: "
+ + ", {src="
+ + srcKey
+ + ", dst="
+ + dstKey
+ + ", delay="
+ + delay
+ + "}");
+ }
+ }
+
+ static boolean objectDelete(final OBSFileSystem owner,
+ final FileStatus status,
+ final boolean recursive) throws IOException {
+ Path f = status.getPath();
+ String key = OBSCommonUtils.pathToKey(owner, f);
+
+ if (status.isDirectory()) {
+ LOG.debug("delete: Path is a directory: {} - recursive {}", f,
+ recursive);
+
+ key = OBSCommonUtils.maybeAddTrailingSlash(key);
+ if (!key.endsWith("/")) {
+ key = key + "/";
+ }
+
+ boolean isEmptyDir = OBSCommonUtils.isFolderEmpty(owner, key);
+ if (key.equals("/")) {
+ return OBSCommonUtils.rejectRootDirectoryDelete(
+ owner.getBucket(), isEmptyDir, recursive);
+ }
+
+ if (!recursive && !isEmptyDir) {
+ throw new PathIsNotEmptyDirectoryException(f.toString());
+ }
+
+ if (isEmptyDir) {
+ LOG.debug(
+ "delete: Deleting fake empty directory {} - recursive {}",
+ f, recursive);
+ OBSCommonUtils.deleteObject(owner, key);
+ } else {
+ LOG.debug(
+ "delete: Deleting objects for directory prefix {} "
+ + "- recursive {}",
+ f, recursive);
+ deleteNonEmptyDir(owner, recursive, key);
+ }
+
+ } else {
+ LOG.debug("delete: Path is a file");
+ OBSCommonUtils.deleteObject(owner, key);
+ }
+
+ Path parent = f.getParent();
+ if (parent != null) {
+ createFakeDirectoryIfNecessary(owner, parent);
+ }
+ return true;
+ }
+
+ /**
+ * Implement rename folder.
+ *
+ * @param owner OBS File System instance
+ * @param srcKey source folder key
+ * @param dstKey destination folder key
+ * @throws IOException any problem with rename folder
+ */
+ static void renameFolder(final OBSFileSystem owner, final String srcKey,
+ final String dstKey)
+ throws IOException {
+ long startTime = System.nanoTime();
+
+ List keysToDelete = new ArrayList<>();
+
+ createFakeDirectory(owner, dstKey);
+
+ ListObjectsRequest request = new ListObjectsRequest();
+ request.setBucketName(owner.getBucket());
+ request.setPrefix(srcKey);
+ request.setMaxKeys(owner.getMaxKeys());
+
+ ObjectListing objects = OBSCommonUtils.listObjects(owner, request);
+
+ List> copyfutures = new LinkedList<>();
+ while (true) {
+ for (ObsObject summary : objects.getObjects()) {
+ if (summary.getObjectKey().equals(srcKey)) {
+ // skip prefix itself
+ continue;
+ }
+
+ keysToDelete.add(new KeyAndVersion(summary.getObjectKey()));
+ String newDstKey = dstKey + summary.getObjectKey()
+ .substring(srcKey.length());
+ // copyFile(summary.getObjectKey(), newDstKey,
+ // summary.getMetadata().getContentLength());
+ copyfutures.add(
+ copyFileAsync(owner, summary.getObjectKey(), newDstKey,
+ summary.getMetadata().getContentLength()));
+
+ if (keysToDelete.size() == owner.getMaxEntriesToDelete()) {
+ waitAllCopyFinished(copyfutures);
+ copyfutures.clear();
+ }
+ }
+
+ if (!objects.isTruncated()) {
+ if (!keysToDelete.isEmpty()) {
+ waitAllCopyFinished(copyfutures);
+ copyfutures.clear();
+ }
+ break;
+ }
+ objects = OBSCommonUtils.continueListObjects(owner, objects);
+ }
+
+ keysToDelete.add(new KeyAndVersion(srcKey));
+
+ DeleteObjectsRequest deleteObjectsRequest = new DeleteObjectsRequest(
+ owner.getBucket());
+ deleteObjectsRequest.setKeyAndVersions(
+ keysToDelete.toArray(new KeyAndVersion[0]));
+ OBSCommonUtils.deleteObjects(owner, deleteObjectsRequest);
+
+ if (LOG.isDebugEnabled()) {
+ long delay = System.nanoTime() - startTime;
+ LOG.debug(
+ "OBSFileSystem rename: "
+ + ", {src="
+ + srcKey
+ + ", dst="
+ + dstKey
+ + ", delay="
+ + delay
+ + "}");
+ }
+ }
+
+ private static void waitAllCopyFinished(
+ final List> copyFutures)
+ throws IOException {
+ try {
+ for (Future copyFuture : copyFutures) {
+ copyFuture.get();
+ }
+ } catch (InterruptedException e) {
+ LOG.warn("Interrupted while copying objects (copy)");
+ throw new InterruptedIOException(
+ "Interrupted while copying objects (copy)");
+ } catch (ExecutionException e) {
+ for (Future future : copyFutures) {
+ future.cancel(true);
+ }
+
+ throw OBSCommonUtils.extractException(
+ "waitAllCopyFinished", copyFutures.toString(), e);
+ }
+ }
+
+ /**
+ * Request object metadata; increments counters in the process.
+ *
+ * @param owner OBS File System instance
+ * @param key key
+ * @return the metadata
+ */
+ protected static ObjectMetadata getObjectMetadata(final OBSFileSystem owner,
+ final String key) {
+ GetObjectMetadataRequest request = new GetObjectMetadataRequest();
+ request.setBucketName(owner.getBucket());
+ request.setObjectKey(key);
+ if (owner.getSse().isSseCEnable()) {
+ request.setSseCHeader(owner.getSse().getSseCHeader());
+ }
+ ObjectMetadata meta = owner.getObsClient().getObjectMetadata(request);
+ owner.getSchemeStatistics().incrementReadOps(1);
+ return meta;
+ }
+
+ /**
+ * Create a new object metadata instance. Any standard metadata headers are
+ * added here, for example: encryption.
+ *
+ * @param length length of data to set in header.
+ * @return a new metadata instance
+ */
+ static ObjectMetadata newObjectMetadata(final long length) {
+ final ObjectMetadata om = new ObjectMetadata();
+ if (length >= 0) {
+ om.setContentLength(length);
+ }
+ return om;
+ }
+
+ private static void deleteNonEmptyDir(final OBSFileSystem owner,
+ final boolean recursive, final String key) throws IOException {
+ String delimiter = recursive ? null : "/";
+ ListObjectsRequest request = OBSCommonUtils.createListObjectsRequest(
+ owner, key, delimiter);
+
+ ObjectListing objects = OBSCommonUtils.listObjects(owner, request);
+ List keys = new ArrayList<>(objects.getObjects().size());
+ while (true) {
+ for (ObsObject summary : objects.getObjects()) {
+ if (summary.getObjectKey().equals(key)) {
+ // skip prefix itself
+ continue;
+ }
+
+ keys.add(new KeyAndVersion(summary.getObjectKey()));
+ LOG.debug("Got object to delete {}", summary.getObjectKey());
+
+ if (keys.size() == owner.getMaxEntriesToDelete()) {
+ OBSCommonUtils.removeKeys(owner, keys, true, true);
+ }
+ }
+
+ if (!objects.isTruncated()) {
+ keys.add(new KeyAndVersion(key));
+ OBSCommonUtils.removeKeys(owner, keys, false, true);
+
+ break;
+ }
+ objects = OBSCommonUtils.continueListObjects(owner, objects);
+ }
+ }
+
+ static void createFakeDirectoryIfNecessary(final OBSFileSystem owner,
+ final Path f)
+ throws IOException, ObsException {
+
+ String key = OBSCommonUtils.pathToKey(owner, f);
+ if (!key.isEmpty() && !owner.exists(f)) {
+ LOG.debug("Creating new fake directory at {}", f);
+ createFakeDirectory(owner, key);
+ }
+ }
+
+ static void createFakeDirectory(final OBSFileSystem owner,
+ final String objectName)
+ throws ObsException, IOException {
+ String newObjectName = objectName;
+ newObjectName = OBSCommonUtils.maybeAddTrailingSlash(newObjectName);
+ createEmptyObject(owner, newObjectName);
+ }
+
+ // Used to create an empty file that represents an empty directory
+ private static void createEmptyObject(final OBSFileSystem owner,
+ final String objectName)
+ throws ObsException, IOException {
+ for (int retryTime = 1;
+ retryTime < OBSCommonUtils.MAX_RETRY_TIME; retryTime++) {
+ try {
+ innerCreateEmptyObject(owner, objectName);
+ return;
+ } catch (ObsException e) {
+ LOG.warn("Failed to create empty object [{}], retry time [{}], "
+ + "exception [{}]", objectName, retryTime, e);
+ try {
+ Thread.sleep(OBSCommonUtils.DELAY_TIME);
+ } catch (InterruptedException ie) {
+ throw e;
+ }
+ }
+ }
+
+ innerCreateEmptyObject(owner, objectName);
+ }
+
+ // Used to create an empty file that represents an empty directory
+ private static void innerCreateEmptyObject(final OBSFileSystem owner,
+ final String objectName)
+ throws ObsException, IOException {
+ final InputStream im =
+ new InputStream() {
+ @Override
+ public int read() {
+ return -1;
+ }
+ };
+
+ PutObjectRequest putObjectRequest = OBSCommonUtils
+ .newPutObjectRequest(owner, objectName, newObjectMetadata(0L), im);
+
+ long len;
+ if (putObjectRequest.getFile() != null) {
+ len = putObjectRequest.getFile().length();
+ } else {
+ len = putObjectRequest.getMetadata().getContentLength();
+ }
+
+ try {
+ owner.getObsClient().putObject(putObjectRequest);
+ owner.getSchemeStatistics().incrementWriteOps(1);
+ owner.getSchemeStatistics().incrementBytesWritten(len);
+ } finally {
+ im.close();
+ }
+ }
+
+ /**
+ * Copy a single object in the bucket via a COPY operation.
+ *
+ * @param owner OBS File System instance
+ * @param srcKey source object path
+ * @param dstKey destination object path
+ * @param size object size
+ * @throws InterruptedIOException the operation was interrupted
+ * @throws IOException Other IO problems
+ */
+ private static void copyFile(final OBSFileSystem owner, final String srcKey,
+ final String dstKey, final long size)
+ throws IOException, InterruptedIOException {
+ for (int retryTime = 1;
+ retryTime < OBSCommonUtils.MAX_RETRY_TIME; retryTime++) {
+ try {
+ innerCopyFile(owner, srcKey, dstKey, size);
+ return;
+ } catch (InterruptedIOException e) {
+ throw e;
+ } catch (IOException e) {
+ LOG.warn(
+ "Failed to copy file from [{}] to [{}] with size [{}], "
+ + "retry time [{}], exception [{}]", srcKey, dstKey,
+ size, retryTime, e);
+ try {
+ Thread.sleep(OBSCommonUtils.DELAY_TIME);
+ } catch (InterruptedException ie) {
+ throw e;
+ }
+ }
+ }
+
+ innerCopyFile(owner, srcKey, dstKey, size);
+ }
+
+ private static void innerCopyFile(final OBSFileSystem owner,
+ final String srcKey,
+ final String dstKey, final long size)
+ throws IOException {
+ LOG.debug("copyFile {} -> {} ", srcKey, dstKey);
+ try {
+ // 100MB per part
+ if (size > owner.getCopyPartSize()) {
+ // initial copy part task
+ InitiateMultipartUploadRequest request
+ = new InitiateMultipartUploadRequest(owner.getBucket(),
+ dstKey);
+ request.setAcl(owner.getCannedACL());
+ if (owner.getSse().isSseCEnable()) {
+ request.setSseCHeader(owner.getSse().getSseCHeader());
+ } else if (owner.getSse().isSseKmsEnable()) {
+ request.setSseKmsHeader(owner.getSse().getSseKmsHeader());
+ }
+ InitiateMultipartUploadResult result = owner.getObsClient()
+ .initiateMultipartUpload(request);
+
+ final String uploadId = result.getUploadId();
+ LOG.debug("Multipart copy file, uploadId: {}", uploadId);
+ // count the parts
+ long partCount = calPartCount(owner.getCopyPartSize(), size);
+
+ final List partEtags =
+ getCopyFilePartEtags(owner, srcKey, dstKey, size, uploadId,
+ partCount);
+ // merge the copy parts
+ CompleteMultipartUploadRequest completeMultipartUploadRequest =
+ new CompleteMultipartUploadRequest(owner.getBucket(),
+ dstKey, uploadId, partEtags);
+ owner.getObsClient()
+ .completeMultipartUpload(completeMultipartUploadRequest);
+ } else {
+ ObjectMetadata srcom = getObjectMetadata(owner, srcKey);
+ ObjectMetadata dstom = cloneObjectMetadata(srcom);
+ final CopyObjectRequest copyObjectRequest =
+ new CopyObjectRequest(owner.getBucket(), srcKey,
+ owner.getBucket(), dstKey);
+ copyObjectRequest.setAcl(owner.getCannedACL());
+ copyObjectRequest.setNewObjectMetadata(dstom);
+ if (owner.getSse().isSseCEnable()) {
+ copyObjectRequest.setSseCHeader(
+ owner.getSse().getSseCHeader());
+ copyObjectRequest.setSseCHeaderSource(
+ owner.getSse().getSseCHeader());
+ } else if (owner.getSse().isSseKmsEnable()) {
+ copyObjectRequest.setSseKmsHeader(
+ owner.getSse().getSseKmsHeader());
+ }
+ owner.getObsClient().copyObject(copyObjectRequest);
+ }
+
+ owner.getSchemeStatistics().incrementWriteOps(1);
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException(
+ "copyFile(" + srcKey + ", " + dstKey + ")", srcKey, e);
+ }
+ }
+
+ static int calPartCount(final long partSize, final long cloudSize) {
+ // get user setting of per copy part size ,default is 100MB
+ // calculate the part count
+ long partCount = cloudSize % partSize == 0
+ ? cloudSize / partSize
+ : cloudSize / partSize + 1;
+ return (int) partCount;
+ }
+
+ static List getCopyFilePartEtags(final OBSFileSystem owner,
+ final String srcKey,
+ final String dstKey,
+ final long objectSize,
+ final String uploadId,
+ final long partCount)
+ throws IOException {
+ final List partEtags = Collections.synchronizedList(
+ new ArrayList<>());
+ final List> partCopyFutures = new ArrayList<>();
+ submitCopyPartTasks(owner, srcKey, dstKey, objectSize, uploadId,
+ partCount, partEtags, partCopyFutures);
+
+ // wait the tasks for completing
+ try {
+ for (Future> partCopyFuture : partCopyFutures) {
+ partCopyFuture.get();
+ }
+ } catch (InterruptedException e) {
+ LOG.warn("Interrupted while copying objects (copy)");
+ throw new InterruptedIOException(
+ "Interrupted while copying objects (copy)");
+ } catch (ExecutionException e) {
+ LOG.error("Multipart copy file exception.", e);
+ for (Future> future : partCopyFutures) {
+ future.cancel(true);
+ }
+
+ owner.getObsClient()
+ .abortMultipartUpload(
+ new AbortMultipartUploadRequest(owner.getBucket(), dstKey,
+ uploadId));
+
+ throw OBSCommonUtils.extractException(
+ "Multi-part copy with id '" + uploadId + "' from " + srcKey
+ + "to " + dstKey, dstKey, e);
+ }
+
+ // Make part numbers in ascending order
+ partEtags.sort(Comparator.comparingInt(PartEtag::getPartNumber));
+ return partEtags;
+ }
+
+ @SuppressWarnings("checkstyle:parameternumber")
+ private static void submitCopyPartTasks(final OBSFileSystem owner,
+ final String srcKey,
+ final String dstKey,
+ final long objectSize,
+ final String uploadId,
+ final long partCount,
+ final List partEtags,
+ final List> partCopyFutures) {
+ for (int i = 0; i < partCount; i++) {
+ final long rangeStart = i * owner.getCopyPartSize();
+ final long rangeEnd = (i + 1 == partCount)
+ ? objectSize - 1
+ : rangeStart + owner.getCopyPartSize() - 1;
+ final int partNumber = i + 1;
+ partCopyFutures.add(
+ owner.getBoundedCopyPartThreadPool().submit(() -> {
+ CopyPartRequest request = new CopyPartRequest();
+ request.setUploadId(uploadId);
+ request.setSourceBucketName(owner.getBucket());
+ request.setSourceObjectKey(srcKey);
+ request.setDestinationBucketName(owner.getBucket());
+ request.setDestinationObjectKey(dstKey);
+ request.setByteRangeStart(rangeStart);
+ request.setByteRangeEnd(rangeEnd);
+ request.setPartNumber(partNumber);
+ if (owner.getSse().isSseCEnable()) {
+ request.setSseCHeaderSource(
+ owner.getSse().getSseCHeader());
+ request.setSseCHeaderDestination(
+ owner.getSse().getSseCHeader());
+ }
+ CopyPartResult result = owner.getObsClient()
+ .copyPart(request);
+ partEtags.add(
+ new PartEtag(result.getEtag(), result.getPartNumber()));
+ LOG.debug(
+ "Multipart copy file, uploadId: {}, Part#{} done.",
+ uploadId, partNumber);
+ }));
+ }
+ }
+
+ /**
+ * Creates a copy of the passed {@link ObjectMetadata}. Does so without using
+ * the {@link ObjectMetadata#clone()} method, to avoid copying unnecessary
+ * headers.
+ *
+ * @param source the {@link ObjectMetadata} to copy
+ * @return a copy of {@link ObjectMetadata} with only relevant attributes
+ */
+ private static ObjectMetadata cloneObjectMetadata(
+ final ObjectMetadata source) {
+ // This approach may be too brittle, especially if
+ // in future there are new attributes added to ObjectMetadata
+ // that we do not explicitly call to set here
+ ObjectMetadata ret = newObjectMetadata(source.getContentLength());
+
+ if (source.getContentEncoding() != null) {
+ ret.setContentEncoding(source.getContentEncoding());
+ }
+ return ret;
+ }
+
+ static OBSFileStatus innerGetObjectStatus(final OBSFileSystem owner,
+ final Path f)
+ throws IOException {
+ final Path path = OBSCommonUtils.qualify(owner, f);
+ String key = OBSCommonUtils.pathToKey(owner, path);
+ LOG.debug("Getting path status for {} ({})", path, key);
+ if (!StringUtils.isEmpty(key)) {
+ try {
+ ObjectMetadata meta = getObjectMetadata(owner, key);
+
+ if (OBSCommonUtils.objectRepresentsDirectory(key,
+ meta.getContentLength())) {
+ LOG.debug("Found exact file: fake directory");
+ return new OBSFileStatus(path, owner.getUsername());
+ } else {
+ LOG.debug("Found exact file: normal file");
+ return new OBSFileStatus(meta.getContentLength(),
+ OBSCommonUtils.dateToLong(meta.getLastModified()),
+ path, owner.getDefaultBlockSize(path),
+ owner.getUsername());
+ }
+ } catch (ObsException e) {
+ if (e.getResponseCode() != OBSCommonUtils.NOT_FOUND_CODE) {
+ throw OBSCommonUtils.translateException("getFileStatus",
+ path, e);
+ }
+ }
+
+ if (!key.endsWith("/")) {
+ String newKey = key + "/";
+ try {
+ ObjectMetadata meta = getObjectMetadata(owner, newKey);
+
+ if (OBSCommonUtils.objectRepresentsDirectory(newKey,
+ meta.getContentLength())) {
+ LOG.debug("Found file (with /): fake directory");
+ return new OBSFileStatus(path, owner.getUsername());
+ } else {
+ LOG.debug(
+ "Found file (with /): real file? should not "
+ + "happen: {}",
+ key);
+
+ return new OBSFileStatus(meta.getContentLength(),
+ OBSCommonUtils.dateToLong(meta.getLastModified()),
+ path,
+ owner.getDefaultBlockSize(path),
+ owner.getUsername());
+ }
+ } catch (ObsException e) {
+ if (e.getResponseCode() != OBSCommonUtils.NOT_FOUND_CODE) {
+ throw OBSCommonUtils.translateException("getFileStatus",
+ newKey, e);
+ }
+ }
+ }
+ }
+
+ try {
+ boolean isEmpty = OBSCommonUtils.innerIsFolderEmpty(owner, key);
+ LOG.debug("Is dir ({}) empty? {}", path, isEmpty);
+ return new OBSFileStatus(path, owner.getUsername());
+ } catch (ObsException e) {
+ if (e.getResponseCode() != OBSCommonUtils.NOT_FOUND_CODE) {
+ throw OBSCommonUtils.translateException("getFileStatus", key,
+ e);
+ }
+ }
+
+ LOG.debug("Not Found: {}", path);
+ throw new FileNotFoundException("No such file or directory: " + path);
+ }
+
+ static ContentSummary getDirectoryContentSummary(final OBSFileSystem owner,
+ final String key) throws IOException {
+ String newKey = key;
+ newKey = OBSCommonUtils.maybeAddTrailingSlash(newKey);
+ long[] summary = {0, 0, 1};
+ LOG.debug("Summary key {}", newKey);
+ ListObjectsRequest request = new ListObjectsRequest();
+ request.setBucketName(owner.getBucket());
+ request.setPrefix(newKey);
+ Set directories = new TreeSet<>();
+ request.setMaxKeys(owner.getMaxKeys());
+ ObjectListing objects = OBSCommonUtils.listObjects(owner, request);
+ while (true) {
+ if (!objects.getCommonPrefixes().isEmpty() || !objects.getObjects()
+ .isEmpty()) {
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Found path as directory (with /): {}/{}",
+ objects.getCommonPrefixes().size(),
+ objects.getObjects().size());
+ }
+ for (String prefix : objects.getCommonPrefixes()) {
+ LOG.debug("Objects in folder [" + prefix + "]:");
+ getDirectories(prefix, newKey, directories);
+ }
+
+ for (ObsObject obj : objects.getObjects()) {
+ LOG.debug("Summary: {} {}", obj.getObjectKey(),
+ obj.getMetadata().getContentLength());
+ if (!obj.getObjectKey().endsWith("/")) {
+ summary[0] += obj.getMetadata().getContentLength();
+ summary[1] += 1;
+ }
+ getDirectories(obj.getObjectKey(), newKey, directories);
+ }
+ }
+ if (!objects.isTruncated()) {
+ break;
+ }
+ objects = OBSCommonUtils.continueListObjects(owner, objects);
+ }
+ summary[2] += directories.size();
+ LOG.debug(String.format(
+ "file size [%d] - file count [%d] - directory count [%d] - "
+ + "file path [%s]",
+ summary[0],
+ summary[1], summary[2], newKey));
+ return new ContentSummary.Builder().length(summary[0])
+ .fileCount(summary[1]).directoryCount(summary[2])
+ .spaceConsumed(summary[0]).build();
+ }
+
+ private static void getDirectories(final String key, final String sourceKey,
+ final Set directories) {
+ Path p = new Path(key);
+ Path sourcePath = new Path(sourceKey);
+ // directory must add first
+ if (key.endsWith("/") && p.compareTo(sourcePath) > 0) {
+ directories.add(p.toString());
+ }
+ while (p.compareTo(sourcePath) > 0) {
+ Optional parent = p.getOptionalParentPath();
+ if (!parent.isPresent()) {
+ break;
+ }
+ p = parent.get();
+ if (p.compareTo(sourcePath) == 0) {
+ break;
+ }
+ directories.add(p.toString());
+ }
+ }
+
+ private static Future copyFileAsync(
+ final OBSFileSystem owner,
+ final String srcKey,
+ final String dstKey, final long size) {
+ return owner.getBoundedCopyThreadPool().submit(() -> {
+ copyFile(owner, srcKey, dstKey, size);
+ return null;
+ });
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSPosixBucketUtils.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSPosixBucketUtils.java
new file mode 100644
index 0000000000000..d6afd456969d5
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSPosixBucketUtils.java
@@ -0,0 +1,745 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import com.obs.services.exception.ObsException;
+import com.obs.services.model.KeyAndVersion;
+import com.obs.services.model.ListObjectsRequest;
+import com.obs.services.model.ObjectListing;
+import com.obs.services.model.ObsObject;
+import com.obs.services.model.fs.GetAttributeRequest;
+import com.obs.services.model.fs.NewFolderRequest;
+import com.obs.services.model.fs.ObsFSAttribute;
+import com.obs.services.model.fs.RenameRequest;
+
+import org.apache.hadoop.fs.ContentSummary;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.ParentNotDirectoryException;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathIsNotEmptyDirectoryException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.text.SimpleDateFormat;
+import java.util.ArrayList;
+import java.util.Date;
+import java.util.List;
+
+/**
+ * Posix bucket specific utils for {@link OBSFileSystem}.
+ */
+final class OBSPosixBucketUtils {
+ /**
+ * Class logger.
+ */
+ private static final Logger LOG = LoggerFactory.getLogger(
+ OBSPosixBucketUtils.class);
+
+ private OBSPosixBucketUtils() {
+ }
+
+ /**
+ * Get the depth of an absolute path, that is the number of '/' in the path.
+ *
+ * @param key object key
+ * @return depth
+ */
+ static int fsGetObjectKeyDepth(final String key) {
+ int depth = 0;
+ for (int idx = key.indexOf('/');
+ idx >= 0; idx = key.indexOf('/', idx + 1)) {
+ depth++;
+ }
+ return key.endsWith("/") ? depth - 1 : depth;
+ }
+
+ /**
+ * Used to judge that an object is a file or folder.
+ *
+ * @param attr posix object attribute
+ * @return is posix folder
+ */
+ static boolean fsIsFolder(final ObsFSAttribute attr) {
+ final int ifDir = 0x004000;
+ int mode = attr.getMode();
+ // object mode is -1 when the object is migrated from
+ // object bucket to posix bucket.
+ // -1 is a file, not folder.
+ if (mode < 0) {
+ return false;
+ }
+
+ return (mode & ifDir) != 0;
+ }
+
+ /**
+ * The inner rename operation based on Posix bucket.
+ *
+ * @param owner OBS File System instance
+ * @param src source path to be renamed from
+ * @param dst destination path to be renamed to
+ * @return boolean
+ * @throws RenameFailedException if some criteria for a state changing rename
+ * was not met. This means work didn't happen;
+ * it's not something which is reported upstream
+ * to the FileSystem APIs, for which the
+ * semantics of "false" are pretty vague.
+ * @throws IOException on IO failure.
+ */
+ static boolean renameBasedOnPosix(final OBSFileSystem owner, final Path src,
+ final Path dst) throws IOException {
+ Path dstPath = dst;
+ String srcKey = OBSCommonUtils.pathToKey(owner, src);
+ String dstKey = OBSCommonUtils.pathToKey(owner, dstPath);
+
+ if (srcKey.isEmpty()) {
+ LOG.error("rename: src [{}] is root directory", src);
+ return false;
+ }
+
+ try {
+ FileStatus dstStatus = owner.getFileStatus(dstPath);
+ if (dstStatus.isDirectory()) {
+ String newDstString = OBSCommonUtils.maybeAddTrailingSlash(
+ dstPath.toString());
+ String filename = srcKey.substring(
+ OBSCommonUtils.pathToKey(owner, src.getParent())
+ .length() + 1);
+ dstPath = new Path(newDstString + filename);
+ dstKey = OBSCommonUtils.pathToKey(owner, dstPath);
+ LOG.debug(
+ "rename: dest is an existing directory and will be "
+ + "changed to [{}]", dstPath);
+
+ if (owner.exists(dstPath)) {
+ LOG.error("rename: failed to rename " + src + " to "
+ + dstPath
+ + " because destination exists");
+ return false;
+ }
+ } else {
+ if (srcKey.equals(dstKey)) {
+ LOG.warn(
+ "rename: src and dest refer to the same "
+ + "file or directory: {}", dstPath);
+ return true;
+ } else {
+ LOG.error("rename: failed to rename " + src + " to "
+ + dstPath
+ + " because destination exists");
+ return false;
+ }
+ }
+ } catch (FileNotFoundException e) {
+ // if destination does not exist, do not change the
+ // destination key, and just do rename.
+ LOG.debug("rename: dest [{}] does not exist", dstPath);
+ } catch (FileConflictException e) {
+ Path parent = dstPath.getParent();
+ if (!OBSCommonUtils.pathToKey(owner, parent).isEmpty()) {
+ FileStatus dstParentStatus = owner.getFileStatus(parent);
+ if (!dstParentStatus.isDirectory()) {
+ throw new ParentNotDirectoryException(
+ parent + " is not a directory");
+ }
+ }
+ }
+
+ if (dstKey.startsWith(srcKey) && (dstKey.equals(srcKey)
+ || dstKey.charAt(srcKey.length()) == Path.SEPARATOR_CHAR)) {
+ LOG.error("rename: dest [{}] cannot be a descendant of src [{}]",
+ dstPath, src);
+ return false;
+ }
+
+ return innerFsRenameWithRetry(owner, src, dstPath, srcKey, dstKey);
+ }
+
+ private static boolean innerFsRenameWithRetry(final OBSFileSystem owner,
+ final Path src,
+ final Path dst, final String srcKey, final String dstKey)
+ throws IOException {
+ boolean renameResult = true;
+ int retryTime = 1;
+ while (retryTime <= OBSCommonUtils.MAX_RETRY_TIME) {
+ try {
+ LOG.debug("rename: {}-st rename from [{}] to [{}] ...",
+ retryTime, srcKey, dstKey);
+ innerFsRenameFile(owner, srcKey, dstKey);
+ renameResult = true;
+ break;
+ } catch (FileNotFoundException e) {
+ if (owner.exists(dst)) {
+ LOG.warn(
+ "rename: successfully {}-st rename src [{}] "
+ + "to dest [{}] with SDK retry",
+ retryTime, src, dst, e);
+ renameResult = true;
+ } else {
+ LOG.error(
+ "rename: failed {}-st rename src [{}] to dest [{}]",
+ retryTime, src, dst, e);
+ renameResult = false;
+ }
+ break;
+ } catch (IOException e) {
+ if (retryTime == OBSCommonUtils.MAX_RETRY_TIME) {
+ LOG.error(
+ "rename: failed {}-st rename src [{}] to dest [{}]",
+ retryTime, src, dst, e);
+ throw e;
+ } else {
+ LOG.warn(
+ "rename: failed {}-st rename src [{}] to dest [{}]",
+ retryTime, src, dst, e);
+ if (owner.exists(dst) && owner.exists(src)) {
+ LOG.warn(
+ "rename: failed {}-st rename src [{}] to "
+ + "dest [{}] with SDK retry", retryTime, src,
+ dst, e);
+ renameResult = false;
+ break;
+ }
+
+ try {
+ Thread.sleep(OBSCommonUtils.DELAY_TIME);
+ } catch (InterruptedException ie) {
+ throw e;
+ }
+ }
+ }
+
+ retryTime++;
+ }
+
+ return renameResult;
+ }
+
+ /**
+ * Used to rename a source folder to a destination folder that is not existed
+ * before rename.
+ *
+ * @param owner OBS File System instance
+ * @param src source folder key
+ * @param dst destination folder key that not existed before rename
+ * @throws IOException any io exception
+ * @throws ObsException any obs operation exception
+ */
+ static void fsRenameToNewFolder(final OBSFileSystem owner, final String src,
+ final String dst)
+ throws IOException, ObsException {
+ LOG.debug("RenameFolder path {} to {}", src, dst);
+
+ try {
+ RenameRequest renameObjectRequest = new RenameRequest();
+ renameObjectRequest.setBucketName(owner.getBucket());
+ renameObjectRequest.setObjectKey(src);
+ renameObjectRequest.setNewObjectKey(dst);
+ owner.getObsClient().renameFolder(renameObjectRequest);
+ owner.getSchemeStatistics().incrementWriteOps(1);
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException(
+ "renameFile(" + src + ", " + dst + ")", src, e);
+ }
+ }
+
+ static void innerFsRenameFile(final OBSFileSystem owner,
+ final String srcKey,
+ final String dstKey) throws IOException {
+ LOG.debug("RenameFile path {} to {}", srcKey, dstKey);
+
+ try {
+ final RenameRequest renameObjectRequest = new RenameRequest();
+ renameObjectRequest.setBucketName(owner.getBucket());
+ renameObjectRequest.setObjectKey(srcKey);
+ renameObjectRequest.setNewObjectKey(dstKey);
+ owner.getObsClient().renameFile(renameObjectRequest);
+ owner.getSchemeStatistics().incrementWriteOps(1);
+ } catch (ObsException e) {
+ if (e.getResponseCode() == OBSCommonUtils.NOT_FOUND_CODE) {
+ throw new FileNotFoundException(
+ "No such file or directory: " + srcKey);
+ }
+ if (e.getResponseCode() == OBSCommonUtils.CONFLICT_CODE) {
+ throw new FileConflictException(
+ "File conflicts during rename, " + e.getResponseStatus());
+ }
+ throw OBSCommonUtils.translateException(
+ "renameFile(" + srcKey + ", " + dstKey + ")", srcKey, e);
+ }
+ }
+
+ /**
+ * Used to rename a source object to a destination object which is not existed
+ * before rename.
+ *
+ * @param owner OBS File System instance
+ * @param srcKey source object key
+ * @param dstKey destination object key
+ * @throws IOException io exception
+ */
+ static void fsRenameToNewObject(final OBSFileSystem owner,
+ final String srcKey,
+ final String dstKey) throws IOException {
+ String newSrcKey = srcKey;
+ String newdstKey = dstKey;
+ newSrcKey = OBSCommonUtils.maybeDeleteBeginningSlash(newSrcKey);
+ newdstKey = OBSCommonUtils.maybeDeleteBeginningSlash(newdstKey);
+ if (newSrcKey.endsWith("/")) {
+ // Rename folder.
+ fsRenameToNewFolder(owner, newSrcKey, newdstKey);
+ } else {
+ // Rename file.
+ innerFsRenameFile(owner, newSrcKey, newdstKey);
+ }
+ }
+
+ // Delete a file.
+ private static int fsRemoveFile(final OBSFileSystem owner,
+ final String sonObjectKey,
+ final List files)
+ throws IOException {
+ files.add(new KeyAndVersion(sonObjectKey));
+ if (files.size() == owner.getMaxEntriesToDelete()) {
+ // batch delete files.
+ OBSCommonUtils.removeKeys(owner, files, true, false);
+ return owner.getMaxEntriesToDelete();
+ }
+ return 0;
+ }
+
+ // Recursively delete a folder that might be not empty.
+ static boolean fsDelete(final OBSFileSystem owner, final FileStatus status,
+ final boolean recursive)
+ throws IOException, ObsException {
+ long startTime = System.currentTimeMillis();
+ long threadId = Thread.currentThread().getId();
+ Path f = status.getPath();
+ String key = OBSCommonUtils.pathToKey(owner, f);
+
+ if (!status.isDirectory()) {
+ LOG.debug("delete: Path is a file");
+ trashObjectIfNeed(owner, key);
+ } else {
+ LOG.debug("delete: Path is a directory: {} - recursive {}", f,
+ recursive);
+ key = OBSCommonUtils.maybeAddTrailingSlash(key);
+ boolean isEmptyDir = OBSCommonUtils.isFolderEmpty(owner, key);
+ if (key.equals("")) {
+ return OBSCommonUtils.rejectRootDirectoryDelete(
+ owner.getBucket(), isEmptyDir, recursive);
+ }
+ if (!recursive && !isEmptyDir) {
+ LOG.warn("delete: Path is not empty: {} - recursive {}", f,
+ recursive);
+ throw new PathIsNotEmptyDirectoryException(f.toString());
+ }
+ if (isEmptyDir) {
+ LOG.debug(
+ "delete: Deleting fake empty directory {} - recursive {}",
+ f, recursive);
+ OBSCommonUtils.deleteObject(owner, key);
+ } else {
+ LOG.debug(
+ "delete: Deleting objects for directory prefix {} to "
+ + "delete - recursive {}", f, recursive);
+ trashFolderIfNeed(owner, key, f);
+ }
+ }
+
+ long endTime = System.currentTimeMillis();
+ LOG.debug("delete Path:{} thread:{}, timeUsedInMilliSec:{}", f,
+ threadId, endTime - startTime);
+ return true;
+ }
+
+ private static void trashObjectIfNeed(final OBSFileSystem owner,
+ final String key)
+ throws ObsException, IOException {
+ if (needToTrash(owner, key)) {
+ mkTrash(owner, key);
+ StringBuilder sb = new StringBuilder(owner.getTrashDir());
+ sb.append(key);
+ if (owner.exists(new Path(sb.toString()))) {
+ SimpleDateFormat df = new SimpleDateFormat("-yyyyMMddHHmmss");
+ sb.append(df.format(new Date()));
+ }
+ fsRenameToNewObject(owner, key, sb.toString());
+ LOG.debug("Moved: '" + key + "' to trash at: " + sb.toString());
+ } else {
+ OBSCommonUtils.deleteObject(owner, key);
+ }
+ }
+
+ private static void trashFolderIfNeed(final OBSFileSystem owner,
+ final String key,
+ final Path f) throws ObsException, IOException {
+ if (needToTrash(owner, key)) {
+ mkTrash(owner, key);
+ StringBuilder sb = new StringBuilder(owner.getTrashDir());
+ String subKey = OBSCommonUtils.maybeAddTrailingSlash(key);
+ sb.append(subKey);
+ if (owner.exists(new Path(sb.toString()))) {
+ SimpleDateFormat df = new SimpleDateFormat("-yyyyMMddHHmmss");
+ sb.insert(sb.length() - 1, df.format(new Date()));
+ }
+
+ String srcKey = OBSCommonUtils.maybeDeleteBeginningSlash(key);
+ String dstKey = OBSCommonUtils.maybeDeleteBeginningSlash(
+ sb.toString());
+ fsRenameToNewFolder(owner, srcKey, dstKey);
+ LOG.debug("Moved: '" + key + "' to trash at: " + sb.toString());
+ } else {
+ if (owner.isEnableMultiObjectDeleteRecursion()) {
+ long delNum = fsRecursivelyDeleteDir(owner, key, true);
+ LOG.debug("Recursively delete {} files/dirs when deleting {}",
+ delNum, key);
+ } else {
+ fsNonRecursivelyDelete(owner, f);
+ }
+ }
+ }
+
+ static long fsRecursivelyDeleteDir(final OBSFileSystem owner,
+ final String parentKey,
+ final boolean deleteParent) throws IOException {
+ long delNum = 0;
+ List subdirList = new ArrayList<>(
+ owner.getMaxEntriesToDelete());
+ List fileList = new ArrayList<>(
+ owner.getMaxEntriesToDelete());
+
+ ListObjectsRequest request = OBSCommonUtils.createListObjectsRequest(
+ owner, parentKey, "/", owner.getMaxKeys());
+ ObjectListing objects = OBSCommonUtils.listObjects(owner, request);
+ while (true) {
+ for (String commonPrefix : objects.getCommonPrefixes()) {
+ if (commonPrefix.equals(parentKey)) {
+ // skip prefix itself
+ continue;
+ }
+
+ delNum += fsRemoveSubdir(owner, commonPrefix, subdirList);
+ }
+
+ for (ObsObject sonObject : objects.getObjects()) {
+ String sonObjectKey = sonObject.getObjectKey();
+
+ if (sonObjectKey.equals(parentKey)) {
+ // skip prefix itself
+ continue;
+ }
+
+ if (!sonObjectKey.endsWith("/")) {
+ delNum += fsRemoveFile(owner, sonObjectKey, fileList);
+ } else {
+ delNum += fsRemoveSubdir(owner, sonObjectKey, subdirList);
+ }
+ }
+
+ if (!objects.isTruncated()) {
+ break;
+ }
+
+ objects = OBSCommonUtils.continueListObjects(owner, objects);
+ }
+
+ delNum += fileList.size();
+ OBSCommonUtils.removeKeys(owner, fileList, true, false);
+
+ delNum += subdirList.size();
+ OBSCommonUtils.removeKeys(owner, subdirList, true, false);
+
+ if (deleteParent) {
+ OBSCommonUtils.deleteObject(owner, parentKey);
+ delNum++;
+ }
+
+ return delNum;
+ }
+
+ private static boolean needToTrash(final OBSFileSystem owner,
+ final String key) {
+ String newKey = key;
+ newKey = OBSCommonUtils.maybeDeleteBeginningSlash(newKey);
+ if (owner.isEnableTrash() && newKey.startsWith(owner.getTrashDir())) {
+ return false;
+ }
+ return owner.isEnableTrash();
+ }
+
+ // Delete a sub dir.
+ private static int fsRemoveSubdir(final OBSFileSystem owner,
+ final String subdirKey,
+ final List subdirList)
+ throws IOException {
+ fsRecursivelyDeleteDir(owner, subdirKey, false);
+
+ subdirList.add(new KeyAndVersion(subdirKey));
+ if (subdirList.size() == owner.getMaxEntriesToDelete()) {
+ // batch delete subdirs.
+ OBSCommonUtils.removeKeys(owner, subdirList, true, false);
+ return owner.getMaxEntriesToDelete();
+ }
+
+ return 0;
+ }
+
+ private static void mkTrash(final OBSFileSystem owner, final String key)
+ throws ObsException, IOException {
+ String newKey = key;
+ StringBuilder sb = new StringBuilder(owner.getTrashDir());
+ newKey = OBSCommonUtils.maybeAddTrailingSlash(newKey);
+ sb.append(newKey);
+ sb.deleteCharAt(sb.length() - 1);
+ sb.delete(sb.lastIndexOf("/"), sb.length());
+ Path fastDeleteRecycleDirPath = new Path(sb.toString());
+ // keep the parent directory of the target path exists
+ if (!owner.exists(fastDeleteRecycleDirPath)) {
+ owner.mkdirs(fastDeleteRecycleDirPath);
+ }
+ }
+
+ // List all sub objects at first, delete sub objects in batch secondly.
+ private static void fsNonRecursivelyDelete(final OBSFileSystem owner,
+ final Path parent)
+ throws IOException, ObsException {
+ // List sub objects sorted by path depth.
+ FileStatus[] arFileStatus = OBSCommonUtils.innerListStatus(owner,
+ parent, true);
+ // Remove sub objects one depth by one depth to avoid that parents and
+ // children in a same batch.
+ fsRemoveKeys(owner, arFileStatus);
+ // Delete parent folder that should has become empty.
+ OBSCommonUtils.deleteObject(owner,
+ OBSCommonUtils.pathToKey(owner, parent));
+ }
+
+ // Remove sub objects of each depth one by one to avoid that parents and
+ // children in a same batch.
+ private static void fsRemoveKeys(final OBSFileSystem owner,
+ final FileStatus[] arFileStatus)
+ throws ObsException, IOException {
+ if (arFileStatus.length <= 0) {
+ // exit fast if there are no keys to delete
+ return;
+ }
+
+ String key;
+ for (FileStatus fileStatus : arFileStatus) {
+ key = OBSCommonUtils.pathToKey(owner, fileStatus.getPath());
+ OBSCommonUtils.blockRootDelete(owner.getBucket(), key);
+ }
+
+ fsRemoveKeysByDepth(owner, arFileStatus);
+ }
+
+ // Batch delete sub objects one depth by one depth to avoid that parents and
+ // children in a same
+ // batch.
+ // A batch deletion might be split into some concurrent deletions to promote
+ // the performance, but
+ // it
+ // can't make sure that an object is deleted before it's children.
+ private static void fsRemoveKeysByDepth(final OBSFileSystem owner,
+ final FileStatus[] arFileStatus)
+ throws ObsException, IOException {
+ if (arFileStatus.length <= 0) {
+ // exit fast if there is no keys to delete
+ return;
+ }
+
+ // Find all leaf keys in the list.
+ String key;
+ int depth = Integer.MAX_VALUE;
+ List leafKeys = new ArrayList<>(
+ owner.getMaxEntriesToDelete());
+ for (int idx = arFileStatus.length - 1; idx >= 0; idx--) {
+ if (leafKeys.size() >= owner.getMaxEntriesToDelete()) {
+ OBSCommonUtils.removeKeys(owner, leafKeys, true, false);
+ }
+
+ key = OBSCommonUtils.pathToKey(owner, arFileStatus[idx].getPath());
+
+ // Check file.
+ if (!arFileStatus[idx].isDirectory()) {
+ // A file must be a leaf.
+ leafKeys.add(new KeyAndVersion(key, null));
+ continue;
+ }
+
+ // Check leaf folder at current depth.
+ int keyDepth = fsGetObjectKeyDepth(key);
+ if (keyDepth == depth) {
+ // Any key at current depth must be a leaf.
+ leafKeys.add(new KeyAndVersion(key, null));
+ continue;
+ }
+ if (keyDepth < depth) {
+ // The last batch delete at current depth.
+ OBSCommonUtils.removeKeys(owner, leafKeys, true, false);
+ // Go on at the upper depth.
+ depth = keyDepth;
+ leafKeys.add(new KeyAndVersion(key, null));
+ continue;
+ }
+ LOG.warn(
+ "The objects list is invalid because it isn't sorted by"
+ + " path depth.");
+ throw new ObsException("System failure");
+ }
+
+ // The last batch delete at the minimum depth of all keys.
+ OBSCommonUtils.removeKeys(owner, leafKeys, true, false);
+ }
+
+ // Used to create a folder
+ static void fsCreateFolder(final OBSFileSystem owner,
+ final String objectName)
+ throws ObsException {
+ for (int retryTime = 1;
+ retryTime < OBSCommonUtils.MAX_RETRY_TIME; retryTime++) {
+ try {
+ innerFsCreateFolder(owner, objectName);
+ return;
+ } catch (ObsException e) {
+ LOG.warn("Failed to create folder [{}], retry time [{}], "
+ + "exception [{}]", objectName, retryTime, e);
+ try {
+ Thread.sleep(OBSCommonUtils.DELAY_TIME);
+ } catch (InterruptedException ie) {
+ throw e;
+ }
+ }
+ }
+
+ innerFsCreateFolder(owner, objectName);
+ }
+
+ private static void innerFsCreateFolder(final OBSFileSystem owner,
+ final String objectName)
+ throws ObsException {
+ final NewFolderRequest newFolderRequest = new NewFolderRequest(
+ owner.getBucket(), objectName);
+ newFolderRequest.setAcl(owner.getCannedACL());
+ long len = newFolderRequest.getObjectKey().length();
+ owner.getObsClient().newFolder(newFolderRequest);
+ owner.getSchemeStatistics().incrementWriteOps(1);
+ owner.getSchemeStatistics().incrementBytesWritten(len);
+ }
+
+ // Used to get the status of a file or folder in a file-gateway bucket.
+ static OBSFileStatus innerFsGetObjectStatus(final OBSFileSystem owner,
+ final Path f) throws IOException {
+ final Path path = OBSCommonUtils.qualify(owner, f);
+ String key = OBSCommonUtils.pathToKey(owner, path);
+ LOG.debug("Getting path status for {} ({})", path, key);
+
+ if (key.isEmpty()) {
+ LOG.debug("Found root directory");
+ return new OBSFileStatus(path, owner.getUsername());
+ }
+
+ try {
+ final GetAttributeRequest getAttrRequest = new GetAttributeRequest(
+ owner.getBucket(), key);
+ ObsFSAttribute meta = owner.getObsClient()
+ .getAttribute(getAttrRequest);
+ owner.getSchemeStatistics().incrementReadOps(1);
+ if (fsIsFolder(meta)) {
+ LOG.debug("Found file (with /): fake directory");
+ return new OBSFileStatus(path,
+ OBSCommonUtils.dateToLong(meta.getLastModified()),
+ owner.getUsername());
+ } else {
+ LOG.debug(
+ "Found file (with /): real file? should not happen: {}",
+ key);
+ return new OBSFileStatus(
+ meta.getContentLength(),
+ OBSCommonUtils.dateToLong(meta.getLastModified()),
+ path,
+ owner.getDefaultBlockSize(path),
+ owner.getUsername());
+ }
+ } catch (ObsException e) {
+ if (e.getResponseCode() == OBSCommonUtils.NOT_FOUND_CODE) {
+ LOG.debug("Not Found: {}", path);
+ throw new FileNotFoundException(
+ "No such file or directory: " + path);
+ }
+ if (e.getResponseCode() == OBSCommonUtils.CONFLICT_CODE) {
+ throw new FileConflictException(
+ "file conflicts: " + e.getResponseStatus());
+ }
+ throw OBSCommonUtils.translateException("getFileStatus", path, e);
+ }
+ }
+
+ static ContentSummary fsGetDirectoryContentSummary(
+ final OBSFileSystem owner,
+ final String key) throws IOException {
+ String newKey = key;
+ newKey = OBSCommonUtils.maybeAddTrailingSlash(newKey);
+ long[] summary = {0, 0, 1};
+ LOG.debug("Summary key {}", newKey);
+ ListObjectsRequest request = new ListObjectsRequest();
+ request.setBucketName(owner.getBucket());
+ request.setPrefix(newKey);
+ request.setMaxKeys(owner.getMaxKeys());
+ ObjectListing objects = OBSCommonUtils.listObjects(owner, request);
+ while (true) {
+ if (!objects.getCommonPrefixes().isEmpty() || !objects.getObjects()
+ .isEmpty()) {
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Found path as directory (with /): {}/{}",
+ objects.getCommonPrefixes().size(),
+ objects.getObjects().size());
+ }
+ for (String prefix : objects.getCommonPrefixes()) {
+ if (!prefix.equals(newKey)) {
+ summary[2]++;
+ }
+ }
+
+ for (ObsObject obj : objects.getObjects()) {
+ if (!obj.getObjectKey().endsWith("/")) {
+ summary[0] += obj.getMetadata().getContentLength();
+ summary[1] += 1;
+ } else if (!obj.getObjectKey().equals(newKey)) {
+ summary[2]++;
+ }
+ }
+ }
+ if (!objects.isTruncated()) {
+ break;
+ }
+ objects = OBSCommonUtils.continueListObjects(owner, objects);
+ }
+ LOG.debug(String.format(
+ "file size [%d] - file count [%d] - directory count [%d] - "
+ + "file path [%s]",
+ summary[0], summary[1], summary[2], newKey));
+ return new ContentSummary.Builder().length(summary[0])
+ .fileCount(summary[1]).directoryCount(summary[2])
+ .spaceConsumed(summary[0]).build();
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSWriteOperationHelper.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSWriteOperationHelper.java
new file mode 100644
index 0000000000000..2b02f962a0598
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/OBSWriteOperationHelper.java
@@ -0,0 +1,310 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.util.Preconditions;
+import com.obs.services.ObsClient;
+import com.obs.services.exception.ObsException;
+import com.obs.services.model.AbortMultipartUploadRequest;
+import com.obs.services.model.CompleteMultipartUploadRequest;
+import com.obs.services.model.CompleteMultipartUploadResult;
+import com.obs.services.model.InitiateMultipartUploadRequest;
+import com.obs.services.model.ObjectMetadata;
+import com.obs.services.model.PartEtag;
+import com.obs.services.model.PutObjectRequest;
+import com.obs.services.model.PutObjectResult;
+import com.obs.services.model.UploadPartRequest;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Helper for an ongoing write operation.
+ *
+ *
It hides direct access to the OBS API from the output stream, and is a
+ * location where the object upload process can be evolved/enhanced.
+ *
+ *
Features
+ *
+ *
+ *
Methods to create and submit requests to OBS, so avoiding all direct
+ * interaction with the OBS APIs.
+ *
Some extra preflight checks of arguments, so failing fast on errors.
+ *
Callbacks to let the FS know of events in the output stream upload
+ * process.
+ *
+ *
+ * Each instance of this state is unique to a single output stream.
+ */
+class OBSWriteOperationHelper {
+ /**
+ * Class logger.
+ */
+ public static final Logger LOG = LoggerFactory.getLogger(
+ OBSWriteOperationHelper.class);
+
+ /**
+ * Part number of the multipart task.
+ */
+ static final int PART_NUMBER = 10000;
+
+ /**
+ * Owning filesystem.
+ */
+ private final OBSFileSystem owner;
+
+ /**
+ * Bucket of the owner FS.
+ */
+ private final String bucket;
+
+ /**
+ * Define obs client.
+ */
+ private final ObsClient obs;
+
+ protected OBSWriteOperationHelper(final OBSFileSystem fs) {
+ this.owner = fs;
+ this.bucket = fs.getBucket();
+ this.obs = fs.getObsClient();
+ }
+
+ /**
+ * Create a {@link PutObjectRequest} request. If {@code length} is set, the
+ * metadata is configured with the size of the upload.
+ *
+ * @param destKey key of object
+ * @param inputStream source data
+ * @param length size, if known. Use -1 for not known
+ * @return the request
+ */
+ PutObjectRequest newPutRequest(final String destKey,
+ final InputStream inputStream,
+ final long length) {
+ return OBSCommonUtils.newPutObjectRequest(owner, destKey,
+ newObjectMetadata(length), inputStream);
+ }
+
+ /**
+ * Create a {@link PutObjectRequest} request to upload a file.
+ *
+ * @param destKey object key for request
+ * @param sourceFile source file
+ * @return the request
+ */
+ PutObjectRequest newPutRequest(final String destKey,
+ final File sourceFile) {
+ int length = (int) sourceFile.length();
+ return OBSCommonUtils.newPutObjectRequest(owner, destKey,
+ newObjectMetadata(length), sourceFile);
+ }
+
+ /**
+ * Callback on a successful write.
+ *
+ * @param destKey object key
+ */
+ void writeSuccessful(final String destKey) {
+ LOG.debug("Finished write to {}", destKey);
+ }
+
+ /**
+ * Create a new object metadata instance. Any standard metadata headers are
+ * added here, for example: encryption.
+ *
+ * @param length size, if known. Use -1 for not known
+ * @return a new metadata instance
+ */
+ public ObjectMetadata newObjectMetadata(final long length) {
+ return OBSObjectBucketUtils.newObjectMetadata(length);
+ }
+
+ /**
+ * Start the multipart upload process.
+ *
+ * @param destKey object key
+ * @return the upload result containing the ID
+ * @throws IOException IO problem
+ */
+ String initiateMultiPartUpload(final String destKey) throws IOException {
+ LOG.debug("Initiating Multipart upload");
+ final InitiateMultipartUploadRequest initiateMPURequest =
+ new InitiateMultipartUploadRequest(bucket, destKey);
+ initiateMPURequest.setAcl(owner.getCannedACL());
+ initiateMPURequest.setMetadata(newObjectMetadata(-1));
+ if (owner.getSse().isSseCEnable()) {
+ initiateMPURequest.setSseCHeader(owner.getSse().getSseCHeader());
+ } else if (owner.getSse().isSseKmsEnable()) {
+ initiateMPURequest.setSseKmsHeader(
+ owner.getSse().getSseKmsHeader());
+ }
+ try {
+ return obs.initiateMultipartUpload(initiateMPURequest)
+ .getUploadId();
+ } catch (ObsException ace) {
+ throw OBSCommonUtils.translateException("Initiate MultiPartUpload",
+ destKey, ace);
+ }
+ }
+
+ /**
+ * Complete a multipart upload operation.
+ *
+ * @param destKey Object key
+ * @param uploadId multipart operation Id
+ * @param partETags list of partial uploads
+ * @return the result
+ * @throws ObsException on problems.
+ */
+ CompleteMultipartUploadResult completeMultipartUpload(
+ final String destKey, final String uploadId,
+ final List partETags)
+ throws ObsException {
+ Preconditions.checkNotNull(uploadId);
+ Preconditions.checkNotNull(partETags);
+ Preconditions.checkArgument(!partETags.isEmpty(),
+ "No partitions have been uploaded");
+ LOG.debug("Completing multipart upload {} with {} parts", uploadId,
+ partETags.size());
+ // a copy of the list is required, so that the OBS SDK doesn't
+ // attempt to sort an unmodifiable list.
+ return obs.completeMultipartUpload(
+ new CompleteMultipartUploadRequest(bucket, destKey, uploadId,
+ new ArrayList<>(partETags)));
+ }
+
+ /**
+ * Abort a multipart upload operation.
+ *
+ * @param destKey object key
+ * @param uploadId multipart operation Id
+ * @throws ObsException on problems. Immediately execute
+ */
+ void abortMultipartUpload(final String destKey, final String uploadId)
+ throws ObsException {
+ LOG.debug("Aborting multipart upload {}", uploadId);
+ obs.abortMultipartUpload(
+ new AbortMultipartUploadRequest(bucket, destKey, uploadId));
+ }
+
+ /**
+ * Create request for uploading one part of a multipart task.
+ *
+ * @param destKey destination object key
+ * @param uploadId upload id
+ * @param partNumber part number
+ * @param size data size
+ * @param sourceFile source file to be uploaded
+ * @return part upload request
+ */
+ UploadPartRequest newUploadPartRequest(
+ final String destKey,
+ final String uploadId,
+ final int partNumber,
+ final int size,
+ final File sourceFile) {
+ Preconditions.checkNotNull(uploadId);
+
+ Preconditions.checkArgument(sourceFile != null, "Data source");
+ Preconditions.checkArgument(size > 0, "Invalid partition size %s",
+ size);
+ Preconditions.checkArgument(
+ partNumber > 0 && partNumber <= PART_NUMBER);
+
+ LOG.debug("Creating part upload request for {} #{} size {}", uploadId,
+ partNumber, size);
+ UploadPartRequest request = new UploadPartRequest();
+ request.setUploadId(uploadId);
+ request.setBucketName(bucket);
+ request.setObjectKey(destKey);
+ request.setPartSize((long) size);
+ request.setPartNumber(partNumber);
+ request.setFile(sourceFile);
+ if (owner.getSse().isSseCEnable()) {
+ request.setSseCHeader(owner.getSse().getSseCHeader());
+ }
+ return request;
+ }
+
+ /**
+ * Create request for uploading one part of a multipart task.
+ *
+ * @param destKey destination object key
+ * @param uploadId upload id
+ * @param partNumber part number
+ * @param size data size
+ * @param uploadStream upload stream for the part
+ * @return part upload request
+ */
+ UploadPartRequest newUploadPartRequest(
+ final String destKey,
+ final String uploadId,
+ final int partNumber,
+ final int size,
+ final InputStream uploadStream) {
+ Preconditions.checkNotNull(uploadId);
+
+ Preconditions.checkArgument(uploadStream != null, "Data source");
+ Preconditions.checkArgument(size > 0, "Invalid partition size %s",
+ size);
+ Preconditions.checkArgument(
+ partNumber > 0 && partNumber <= PART_NUMBER);
+
+ LOG.debug("Creating part upload request for {} #{} size {}", uploadId,
+ partNumber, size);
+ UploadPartRequest request = new UploadPartRequest();
+ request.setUploadId(uploadId);
+ request.setBucketName(bucket);
+ request.setObjectKey(destKey);
+ request.setPartSize((long) size);
+ request.setPartNumber(partNumber);
+ request.setInput(uploadStream);
+ if (owner.getSse().isSseCEnable()) {
+ request.setSseCHeader(owner.getSse().getSseCHeader());
+ }
+ return request;
+ }
+
+ public String toString(final String destKey) {
+ return "{bucket=" + bucket + ", key='" + destKey + '\'' + '}';
+ }
+
+ /**
+ * PUT an object directly (i.e. not via the transfer manager).
+ *
+ * @param putObjectRequest the request
+ * @return the upload initiated
+ * @throws IOException on problems
+ */
+ PutObjectResult putObject(final PutObjectRequest putObjectRequest)
+ throws IOException {
+ try {
+ return OBSCommonUtils.putObjectDirect(owner, putObjectRequest);
+ } catch (ObsException e) {
+ throw OBSCommonUtils.translateException("put",
+ putObjectRequest.getObjectKey(), e);
+ }
+ }
+}
diff --git a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/MetadataPersistenceException.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/RenameFailedException.java
similarity index 51%
rename from hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/MetadataPersistenceException.java
rename to hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/RenameFailedException.java
index e55b7e8a5b188..b7f7965ebe215 100644
--- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/MetadataPersistenceException.java
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/RenameFailedException.java
@@ -16,25 +16,42 @@
* limitations under the License.
*/
-package org.apache.hadoop.fs.s3a;
+package org.apache.hadoop.fs.obs;
+import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathIOException;
/**
- * Indicates the metadata associated with the given Path could not be persisted
- * to the metadata store (e.g. S3Guard / DynamoDB). When this occurs, the
- * file itself has been successfully written to S3, but the metadata may be out
- * of sync. The metadata can be corrected with the "s3guard import" command
- * provided by {@link org.apache.hadoop.fs.s3a.s3guard.S3GuardTool}.
+ * Exception to indicate a specific rename failure. The exit code defines the
+ * value returned by {@link OBSFileSystem#rename(Path, Path)}.
*/
-public class MetadataPersistenceException extends PathIOException {
+class RenameFailedException extends PathIOException {
+ /**
+ * Exit code to be returned.
+ */
+ private boolean exitCode = false;
+
+ RenameFailedException(final Path src, final Path optionalDest,
+ final String error) {
+ super(src.toString(), error);
+ setOperation("rename");
+ if (optionalDest != null) {
+ setTargetPath(optionalDest.toString());
+ }
+ }
+
+ public boolean getExitCode() {
+ return exitCode;
+ }
/**
- * Constructs a MetadataPersistenceException.
- * @param path path of the affected file
- * @param cause cause of the issue
+ * Set the exit code.
+ *
+ * @param code exit code to raise
+ * @return the exception
*/
- public MetadataPersistenceException(String path, Throwable cause) {
- super(path, cause);
+ public RenameFailedException withExitCode(final boolean code) {
+ this.exitCode = code;
+ return this;
}
}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/SseWrapper.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/SseWrapper.java
new file mode 100644
index 0000000000000..d14479c2d85e3
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/SseWrapper.java
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import static org.apache.hadoop.fs.obs.OBSConstants.SSE_KEY;
+import static org.apache.hadoop.fs.obs.OBSConstants.SSE_TYPE;
+
+import com.obs.services.model.SseCHeader;
+import com.obs.services.model.SseKmsHeader;
+
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Wrapper for Server-Side Encryption (SSE).
+ */
+class SseWrapper {
+ /**
+ * SSE-KMS: Server-Side Encryption with Key Management Service.
+ */
+ private static final String SSE_KMS = "sse-kms";
+
+ /**
+ * SSE-C: Server-Side Encryption with Customer-Provided Encryption Keys.
+ */
+ private static final String SSE_C = "sse-c";
+
+ /**
+ * SSE-C header.
+ */
+ private SseCHeader sseCHeader;
+
+ /**
+ * SSE-KMS header.
+ */
+ private SseKmsHeader sseKmsHeader;
+
+ @SuppressWarnings("deprecation")
+ SseWrapper(final Configuration conf) {
+ String sseType = conf.getTrimmed(SSE_TYPE);
+ if (null != sseType) {
+ String sseKey = conf.getTrimmed(SSE_KEY);
+ if (sseType.equalsIgnoreCase(SSE_C) && null != sseKey) {
+ sseCHeader = new SseCHeader();
+ sseCHeader.setSseCKeyBase64(sseKey);
+ sseCHeader.setAlgorithm(
+ com.obs.services.model.ServerAlgorithm.AES256);
+ } else if (sseType.equalsIgnoreCase(SSE_KMS)) {
+ sseKmsHeader = new SseKmsHeader();
+ sseKmsHeader.setEncryption(
+ com.obs.services.model.ServerEncryption.OBS_KMS);
+ sseKmsHeader.setKmsKeyId(sseKey);
+ }
+ }
+ }
+
+ boolean isSseCEnable() {
+ return sseCHeader != null;
+ }
+
+ boolean isSseKmsEnable() {
+ return sseKmsHeader != null;
+ }
+
+ SseCHeader getSseCHeader() {
+ return sseCHeader;
+ }
+
+ SseKmsHeader getSseKmsHeader() {
+ return sseKmsHeader;
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/package-info.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/package-info.java
new file mode 100644
index 0000000000000..9e198d3205744
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/java/org/apache/hadoop/fs/obs/package-info.java
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Package for supporting
+ * HuaweiCloud
+ * Object Storage Service (OBS) as a backend filesystem in Hadoop.
+ *
+ * OBS supports two kinds of buckets: object bucket and posix bucket. Posix
+ * bucket provides more POSIX-like semantics than object bucket, and is
+ * recommended for Hadoop. Object bucket is deprecated for Hadoop.
+ */
+
+package org.apache.hadoop.fs.obs;
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem
new file mode 100644
index 0000000000000..e77425ab52989
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+org.apache.hadoop.fs.obs.OBSFileSystem
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/site/markdown/index.md b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/site/markdown/index.md
new file mode 100644
index 0000000000000..723da89e2beb2
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/site/markdown/index.md
@@ -0,0 +1,370 @@
+
+
+# OBSA: HuaweiCloud OBS Adapter for Hadoop Support
+
+
+
+## Introduction
+
+The `hadoop-huaweicloud` module provides support for integration with the
+[HuaweiCloud Object Storage Service (OBS)](https://www.huaweicloud.com/en-us/product/obs.html).
+This support comes via the JAR file `hadoop-huaweicloud.jar`.
+
+## Features
+
+* Read and write data stored in a HuaweiCloud OBS account.
+* Reference file system paths using URLs using the `obs` scheme.
+* Present a hierarchical file system view by implementing the standard Hadoop `FileSystem` interface.
+* Support multipart upload for a large file.
+* Can act as a source of data in a MapReduce job, or a sink.
+* Uses HuaweiCloud OBS’s Java SDK with support for latest OBS features and authentication schemes.
+* Tested for scale.
+
+## Limitations
+
+Partial or no support for the following operations :
+
+* Symbolic link operations.
+* Proxy users.
+* File truncate.
+* File concat.
+* File checksum.
+* File replication factor.
+* Extended Attributes(XAttrs) operations.
+* Snapshot operations.
+* Storage policy.
+* Quota.
+* POSIX ACL.
+* Delegation token operations.
+
+## Getting Started
+
+### Packages
+
+OBSA depends upon two JARs, alongside `hadoop-common` and its dependencies.
+
+* `hadoop-huaweicloud` JAR.
+* `esdk-obs-java` JAR.
+
+The versions of `hadoop-common` and `hadoop-huaweicloud` must be identical.
+
+To import the libraries into a Maven build, add `hadoop-huaweicloud` JAR to the
+build dependencies; it will pull in a compatible `esdk-obs-java` JAR.
+
+The `hadoop-huaweicloud` JAR *does not* declare any dependencies other than that
+dependencies unique to it, the OBS SDK JAR. This is simplify excluding/tuning
+Hadoop dependency JARs in downstream applications. The `hadoop-client` or
+`hadoop-common` dependency must be declared.
+
+
+```xml
+
+
+ 3.4.0
+
+
+
+
+ org.apache.hadoop
+ hadoop-client
+ ${hadoop.version}
+
+
+ org.apache.hadoop
+ hadoop-huaweicloud
+ ${hadoop.version}
+
+
+```
+### Accessing OBS URLs
+Before access a URL, OBS implementation classes of Filesystem/AbstractFileSystem and
+a region endpoint where a bucket is located shoud be configured as follows:
+```xml
+
+ fs.obs.impl
+ org.apache.hadoop.fs.obs.OBSFileSystem
+ The OBS implementation class of the Filesystem.
+
+
+
+ fs.AbstractFileSystem.obs.impl
+ org.apache.hadoop.fs.obs.OBS
+ The OBS implementation class of the AbstractFileSystem.
+
+
+
+ fs.obs.endpoint
+ obs.region.myhuaweicloud.com
+ OBS region endpoint where a bucket is located.
+
+```
+
+OBS URLs can then be accessed as follows:
+
+```
+obs:///path
+```
+The scheme `obs` identifies a URL on a Hadoop-compatible file system `OBSFileSystem`
+backed by HuaweiCloud OBS.
+For example, the following
+[FileSystem Shell](../hadoop-project-dist/hadoop-common/FileSystemShell.html)
+commands demonstrate access to a bucket named `mybucket`.
+```bash
+hadoop fs -mkdir obs://mybucket/testDir
+
+hadoop fs -put testFile obs://mybucket/testDir/testFile
+
+hadoop fs -cat obs://mybucket/testDir/testFile
+test file content
+```
+
+For details on how to create a bucket, see
+[**Help Center > Object Storage Service > Getting Started> Basic Operation Procedure**](https://support.huaweicloud.com/intl/en-us/qs-obs/obs_qs_0003.html)
+
+### Authenticating with OBS
+Except when interacting with public OBS buckets, the OBSA client
+needs the credentials needed to interact with buckets.
+The client supports multiple authentication mechanisms. The simplest authentication mechanisms is
+to provide OBS access key and secret key as follows.
+```xml
+
+ fs.obs.access.key
+ OBS access key.
+ Omit for provider-based authentication.
+
+
+
+ fs.obs.secret.key
+ OBS secret key.
+ Omit for provider-based authentication.
+
+```
+
+**Do not share access key, secret key, and session token. They must be kept secret.**
+
+Custom implementations
+of `com.obs.services.IObsCredentialsProvider` (see [**Creating an Instance of ObsClient**](https://support.huaweicloud.com/intl/en-us/sdk-java-devg-obs/en-us_topic_0142815570.html)) or
+`org.apache.hadoop.fs.obs.BasicSessionCredential` may also be used for authentication.
+
+```xml
+
+ fs.obs.security.provider
+
+ Class name of security provider class which implements
+ com.obs.services.IObsCredentialsProvider, which will
+ be used to construct an OBS client instance as an input parameter.
+
+
+
+
+ fs.obs.credentials.provider
+
+ lass nameCof credential provider class which implements
+ org.apache.hadoop.fs.obs.BasicSessionCredential,
+ which must override three APIs: getOBSAccessKeyId(),
+ getOBSSecretKey(), and getSessionToken().
+
+
+```
+
+## General OBSA Client Configuration
+
+All OBSA client options are configured with options with the prefix `fs.obs.`.
+
+```xml
+
+ fs.obs.connection.ssl.enabled
+ false
+ Enable or disable SSL connections to OBS.
+
+
+
+ fs.obs.connection.maximum
+ 1000
+ Maximum number of simultaneous connections to OBS.
+
+
+
+ fs.obs.connection.establish.timeout
+ 120000
+ Socket connection setup timeout in milliseconds.
+
+
+
+ fs.obs.connection.timeout
+ 120000
+ Socket connection timeout in milliseconds.
+
+
+
+ fs.obs.idle.connection.time
+ 30000
+ Socket idle connection time.
+
+
+
+ fs.obs.max.idle.connections
+ 1000
+ Maximum number of socket idle connections.
+
+
+
+ fs.obs.socket.send.buffer
+ 256 * 1024
+ Socket send buffer to be used in OBS SDK. Represented in bytes.
+
+
+
+ fs.obs.socket.recv.buffer
+ 256 * 1024
+ Socket receive buffer to be used in OBS SDK. Represented in bytes.
+
+
+
+ fs.obs.threads.keepalivetime
+ 60
+ Number of seconds a thread can be idle before being
+ terminated in thread pool.
+
+
+
+ fs.obs.threads.max
+ 20
+ Maximum number of concurrent active (part)uploads,
+ which each use a thread from thread pool.
+
+
+
+ fs.obs.max.total.tasks
+ 20
+ Number of (part)uploads allowed to the queue before
+ blocking additional uploads.
+
+
+
+ fs.obs.delete.threads.max
+ 20
+ Max number of delete threads.
+
+
+
+ fs.obs.multipart.size
+ 104857600
+ Part size for multipart upload.
+
+
+
+
+ fs.obs.multiobjectdelete.maximum
+ 1000
+ Max number of objects in one multi-object delete call.
+
+
+
+
+ fs.obs.fast.upload.buffer
+ disk
+ Which buffer to use. Default is `disk`, value may be
+ `disk` | `array` | `bytebuffer`.
+
+
+
+
+ fs.obs.buffer.dir
+ dir1,dir2,dir3
+ Comma separated list of directories that will be used to buffer file
+ uploads to. This option takes effect only when the option 'fs.obs.fast.upload.buffer'
+ is set to 'disk'.
+
+
+
+
+ fs.obs.fast.upload.active.blocks
+ 4
+ Maximum number of blocks a single output stream can have active
+ (uploading, or queued to the central FileSystem instance's pool of queued
+ operations).
+
+
+
+
+ fs.obs.readahead.range
+ 1024 * 1024
+ Bytes to read ahead during a seek() before closing and
+ re-opening the OBS HTTP connection.
+
+
+
+ fs.obs.read.transform.enable
+ true
+ Flag indicating if socket connections can be reused by
+ position read. Set `false` only for HBase.
+
+
+
+ fs.obs.list.threads.core
+ 30
+ Number of core list threads.
+
+
+
+ fs.obs.list.threads.max
+ 60
+ Maximum number of list threads.
+
+
+
+ fs.obs.list.workqueue.capacity
+ 1024
+ Capacity of list work queue.
+
+
+
+ fs.obs.list.parallel.factor
+ 30
+ List parallel factor.
+
+
+
+ fs.obs.trash.enable
+ false
+ Switch for the fast delete.
+
+
+
+ fs.obs.trash.dir
+ The fast delete recycle directory.
+
+
+
+ fs.obs.block.size
+ 128 * 1024 * 1024
+ Default block size for OBS FileSystem.
+
+
+```
+
+## Testing the hadoop-huaweicloud Module
+The `hadoop-huaweicloud` module includes a full suite of unit tests.
+Most of the tests will run against the HuaweiCloud OBS. To run these
+tests, please create `src/test/resources/auth-keys.xml` with OBS account
+information mentioned in the above sections and the following properties.
+
+```xml
+
+ fs.contract.test.fs.obs
+ obs://obsfilesystem-bucket
+
+```
\ No newline at end of file
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/site/resources/css/site.css b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/site/resources/css/site.css
new file mode 100644
index 0000000000000..7315db31e53ca
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/site/resources/css/site.css
@@ -0,0 +1,29 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements. See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#banner {
+ height: 93px;
+ background: none;
+}
+
+#bannerLeft img {
+ margin-left: 30px;
+ margin-top: 10px;
+}
+
+#bannerRight img {
+ margin: 17px;
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/OBSContract.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/OBSContract.java
new file mode 100644
index 0000000000000..ab9d6dae4cc1d
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/OBSContract.java
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.contract.AbstractBondedFSContract;
+
+/**
+ * The contract of OBS: only enabled if the test bucket is provided.
+ */
+public class OBSContract extends AbstractBondedFSContract {
+
+ public static final String CONTRACT_XML = "contract/obs.xml";
+
+ private static final String CONTRACT_ENABLE_KEY =
+ "fs.obs.test.contract.enable";
+
+ private static final boolean CONTRACT_ENABLE_DEFAULT = false;
+
+ public OBSContract(Configuration conf) {
+ super(conf);
+ //insert the base features
+ addConfResource(CONTRACT_XML);
+ }
+
+ @Override
+ public String getScheme() {
+ return "obs";
+ }
+
+ @Override
+ public Path getTestPath() {
+ return OBSTestUtils.createTestPath(super.getTestPath());
+ }
+
+ public synchronized static boolean isContractTestEnabled() {
+ Configuration conf = null;
+ boolean isContractTestEnabled = true;
+
+ if (conf == null) {
+ conf = getConfiguration();
+ }
+ String fileSystem = conf.get(OBSTestConstants.TEST_FS_OBS_NAME);
+ if (fileSystem == null || fileSystem.trim().length() == 0) {
+ isContractTestEnabled = false;
+ }
+ return isContractTestEnabled;
+ }
+
+ public synchronized static Configuration getConfiguration() {
+ Configuration newConf = new Configuration();
+ newConf.addResource(CONTRACT_XML);
+ return newConf;
+ }
+}
diff --git a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/TableDeleteTimeoutException.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/OBSTestConstants.java
similarity index 67%
rename from hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/TableDeleteTimeoutException.java
rename to hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/OBSTestConstants.java
index 7969332139220..4fcff35b9c96f 100644
--- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/TableDeleteTimeoutException.java
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/OBSTestConstants.java
@@ -16,19 +16,25 @@
* limitations under the License.
*/
-package org.apache.hadoop.fs.s3a.s3guard;
-
-import org.apache.hadoop.fs.PathIOException;
+package org.apache.hadoop.fs.obs;
/**
- * An exception raised when a table being deleted is still present after
- * the wait time is exceeded.
+ * Constants for OBS Testing.
*/
-public class TableDeleteTimeoutException extends PathIOException {
- TableDeleteTimeoutException(final String path,
- final String error,
- final Throwable cause) {
- super(path, error, cause);
+final class OBSTestConstants {
+
+ private OBSTestConstants(){
}
+
+ /**
+ * Name of the test filesystem.
+ */
+ static final String TEST_FS_OBS_NAME = "fs.contract.test.fs.obs";
+
+ /**
+ * Fork ID passed down from maven if the test is running in parallel.
+ */
+ static final String TEST_UNIQUE_FORK_ID = "test.unique.fork.id";
+
}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/OBSTestUtils.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/OBSTestUtils.java
new file mode 100644
index 0000000000000..9496617256ae1
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/OBSTestUtils.java
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.junit.internal.AssumptionViolatedException;
+
+import java.io.IOException;
+import java.net.URI;
+
+import static org.apache.hadoop.fs.obs.OBSTestConstants.*;
+import static org.apache.hadoop.fs.obs.OBSConstants.*;
+
+/**
+ * Utilities for the OBS tests.
+ */
+public final class OBSTestUtils {
+
+ /**
+ * Create the test filesystem.
+ *
+ * If the test.fs.obs.name property is not set, this will trigger a JUnit
+ * failure.
+ *
+ * Multipart purging is enabled.
+ *
+ * @param conf configuration
+ * @return the FS
+ * @throws IOException IO Problems
+ * @throws AssumptionViolatedException if the FS is not named
+ */
+ public static OBSFileSystem createTestFileSystem(Configuration conf)
+ throws IOException {
+ return createTestFileSystem(conf, false);
+ }
+
+ /**
+ * Create the test filesystem with or without multipart purging
+ *
+ * If the test.fs.obs.name property is not set, this will trigger a JUnit
+ * failure.
+ *
+ * @param conf configuration
+ * @param purge flag to enable Multipart purging
+ * @return the FS
+ * @throws IOException IO Problems
+ * @throws AssumptionViolatedException if the FS is not named
+ */
+ @SuppressWarnings("deprecation")
+ public static OBSFileSystem createTestFileSystem(Configuration conf,
+ boolean purge)
+ throws IOException {
+
+ String fsname = conf.getTrimmed(TEST_FS_OBS_NAME, "");
+
+ boolean liveTest = !StringUtils.isEmpty(fsname);
+ URI testURI = null;
+ if (liveTest) {
+ testURI = URI.create(fsname);
+ liveTest = testURI.getScheme().equals(OBSConstants.OBS_SCHEME);
+ }
+ if (!liveTest) {
+ // This doesn't work with our JUnit 3 style test cases, so instead we'll
+ // make this whole class not run by default
+ throw new AssumptionViolatedException(
+ "No test filesystem in " + TEST_FS_OBS_NAME);
+ }
+ OBSFileSystem fs1 = new OBSFileSystem();
+ //enable purging in tests
+ if (purge) {
+ conf.setBoolean(PURGE_EXISTING_MULTIPART, true);
+ // but a long delay so that parallel multipart tests don't
+ // suddenly start timing out
+ conf.setInt(PURGE_EXISTING_MULTIPART_AGE, 30 * 60);
+ }
+ fs1.initialize(testURI, conf);
+ return fs1;
+ }
+
+ /**
+ * Create a test path, using the value of
+ * {@link OBSTestConstants#TEST_UNIQUE_FORK_ID}
+ * if it is set.
+ *
+ * @param defVal default value
+ * @return a path
+ */
+ public static Path createTestPath(Path defVal) {
+ String testUniqueForkId = System.getProperty(
+ OBSTestConstants.TEST_UNIQUE_FORK_ID);
+ return testUniqueForkId == null ? defVal :
+ new Path("/" + testUniqueForkId, "test");
+ }
+
+ /**
+ * This class should not be instantiated.
+ */
+ private OBSTestUtils() {
+ }
+
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractAppend.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractAppend.java
new file mode 100644
index 0000000000000..a4fb8153e7ca4
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractAppend.java
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.contract.AbstractContractAppendTest;
+import org.apache.hadoop.fs.contract.AbstractFSContract;
+import org.junit.Assume;
+
+/**
+ * Append test cases on obs file system.
+ */
+public class TestOBSContractAppend extends AbstractContractAppendTest {
+
+ @Override
+ protected AbstractFSContract createContract(final Configuration conf) {
+ return new OBSContract(conf);
+ }
+
+ @Override
+ public void testRenameFileBeingAppended() {
+ Assume.assumeTrue("unsupport.", false);
+ }
+}
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksumCompositeCrc.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractCreate.java
similarity index 58%
rename from hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksumCompositeCrc.java
rename to hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractCreate.java
index 87fb7da6e2e6f..d3966a13b95ff 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileChecksumCompositeCrc.java
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractCreate.java
@@ -1,4 +1,4 @@
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
@@ -15,33 +15,31 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.hadoop.hdfs;
+
+package org.apache.hadoop.fs.obs;
import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys;
+import org.apache.hadoop.fs.contract.AbstractContractCreateTest;
+import org.apache.hadoop.fs.contract.AbstractFSContract;
+import org.junit.Assume;
/**
- * End-to-end tests for COMPOSITE_CRC combine mode.
+ * Create test cases on obs file system.
*/
-public class TestFileChecksumCompositeCrc extends TestFileChecksum {
- @Override
- protected void customizeConf(Configuration conf) {
- conf.set(
- HdfsClientConfigKeys.DFS_CHECKSUM_COMBINE_MODE_KEY, "COMPOSITE_CRC");
- }
+public class TestOBSContractCreate extends AbstractContractCreateTest {
@Override
- protected boolean expectComparableStripedAndReplicatedFiles() {
- return true;
+ protected AbstractFSContract createContract(final Configuration conf) {
+ return new OBSContract(conf);
}
@Override
- protected boolean expectComparableDifferentBlockSizeReplicatedFiles() {
- return true;
+ public void testCreatedFileIsImmediatelyVisible() {
+ Assume.assumeTrue("unsupport.", false);
}
@Override
- protected boolean expectSupportForSingleFileMixedBytesPerChecksum() {
- return true;
+ public void testCreatedFileIsVisibleOnFlush() {
+ Assume.assumeTrue("unsupport", false);
}
}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractDelete.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractDelete.java
new file mode 100644
index 0000000000000..9dd67ad779beb
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractDelete.java
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.contract.AbstractContractDeleteTest;
+import org.apache.hadoop.fs.contract.AbstractFSContract;
+
+/**
+ * Delete test cases on obs file system.
+ */
+public class TestOBSContractDelete extends AbstractContractDeleteTest {
+
+ @Override
+ protected AbstractFSContract createContract(final Configuration conf) {
+ return new OBSContract(conf);
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractGetFileStatus.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractGetFileStatus.java
new file mode 100644
index 0000000000000..15ffd97e0904c
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractGetFileStatus.java
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.contract.AbstractContractGetFileStatusTest;
+import org.apache.hadoop.fs.contract.AbstractFSContract;
+
+/**
+ * Get file status test cases on obs file system.
+ */
+public class TestOBSContractGetFileStatus extends
+ AbstractContractGetFileStatusTest {
+
+ @Override
+ protected AbstractFSContract createContract(
+ final Configuration conf) {
+ return new OBSContract(conf);
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractMkdir.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractMkdir.java
new file mode 100644
index 0000000000000..e06ad860e21aa
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractMkdir.java
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.contract.AbstractContractMkdirTest;
+import org.apache.hadoop.fs.contract.AbstractFSContract;
+
+/**
+ * Mkdir test cases on obs file system.
+ */
+public class TestOBSContractMkdir extends AbstractContractMkdirTest {
+
+ @Override
+ protected AbstractFSContract createContract(final Configuration conf) {
+ return new OBSContract(conf);
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractOpen.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractOpen.java
new file mode 100644
index 0000000000000..c8641dfd627c6
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractOpen.java
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.contract.AbstractContractOpenTest;
+import org.apache.hadoop.fs.contract.AbstractFSContract;
+
+/**
+ * Open test cases on obs file system.
+ */
+public class TestOBSContractOpen extends AbstractContractOpenTest {
+
+ @Override
+ protected AbstractFSContract createContract(final Configuration conf) {
+ return new OBSContract(conf);
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractRename.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractRename.java
new file mode 100644
index 0000000000000..25502a23f27d8
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractRename.java
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.contract.AbstractContractRenameTest;
+import org.apache.hadoop.fs.contract.AbstractFSContract;
+import org.junit.Assume;
+
+/**
+ * Rename test cases on obs file system.
+ */
+public class TestOBSContractRename extends AbstractContractRenameTest {
+
+ @Override
+ protected AbstractFSContract createContract(final Configuration conf) {
+ return new OBSContract(conf);
+ }
+
+ @Override
+ public void testRenameFileUnderFileSubdir() {
+ Assume.assumeTrue("unsupport.", false);
+ }
+
+ @Override
+ public void testRenameFileUnderFile() {
+ Assume.assumeTrue("unsupport.", false);
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractRootDir.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractRootDir.java
new file mode 100644
index 0000000000000..ba961a300efb3
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractRootDir.java
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest;
+import org.apache.hadoop.fs.contract.AbstractFSContract;
+
+/**
+ * Root directory test cases on obs file system.
+ */
+public class TestOBSContractRootDir extends AbstractContractRootDirectoryTest {
+
+ @Override
+ protected AbstractFSContract createContract(final Configuration conf) {
+ return new OBSContract(conf);
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractSeek.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractSeek.java
new file mode 100644
index 0000000000000..48751ea669698
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSContractSeek.java
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.contract.AbstractContractSeekTest;
+import org.apache.hadoop.fs.contract.AbstractFSContract;
+
+/**
+ * Seek test cases on obs file system.
+ */
+public class TestOBSContractSeek extends AbstractContractSeekTest {
+
+ @Override
+ protected AbstractFSContract createContract(final Configuration conf) {
+ return new OBSContract(conf);
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFSMainOperations.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFSMainOperations.java
new file mode 100644
index 0000000000000..b62023b642486
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFSMainOperations.java
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.TestFSMainOperationsLocalFileSystem;
+import org.junit.After;
+import org.junit.Assume;
+import org.junit.Before;
+
+/**
+ *
+ * A collection of tests for the {@link FileSystem}. This test should be used
+ * for testing an instance of FileSystem that has been initialized to a specific
+ * default FileSystem such a LocalFileSystem, HDFS,OBS, etc.
+ *
+ *
+ * To test a given {@link FileSystem} implementation create a subclass of this
+ * test and override {@link #setUp()} to initialize the fSys {@link
+ * FileSystem} instance variable.
+ *
+ * Since this a junit 4 you can also do a single setup before the start of any
+ * tests. E.g.
+ *
+ *
+ *
+ */
+public class TestOBSFSMainOperations extends
+ TestFSMainOperationsLocalFileSystem {
+
+ @Override
+ @Before
+ public void setUp() throws Exception {
+ skipTestCheck();
+ Configuration conf = new Configuration();
+ conf.addResource(OBSContract.CONTRACT_XML);
+ fSys = OBSTestUtils.createTestFileSystem(conf);
+ }
+
+ @Override
+ public void testWorkingDirectory() {
+ Assume.assumeTrue("unspport.", false);
+ }
+
+ @Override
+ public void testListStatusThrowsExceptionForUnreadableDir() {
+ Assume.assumeTrue("unspport.", false);
+ }
+
+ @Override
+ public void testRenameDirectoryToItself() {
+ Assume.assumeTrue("unspport.", false);
+ }
+
+ @Override
+ public void testGlobStatusThrowsExceptionForUnreadableDir() {
+ Assume.assumeTrue("unspport.", false);
+ }
+
+ @Override
+ public void testRenameFileToItself() {
+ Assume.assumeTrue("unspport.", false);
+ }
+
+ @Override
+ @After
+ public void tearDown() throws Exception {
+ if(fSys != null) {
+ super.tearDown();
+ }
+ }
+
+ public void skipTestCheck() {
+ Assume.assumeTrue(OBSContract.isContractTestEnabled());
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextCreateMkdir.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextCreateMkdir.java
new file mode 100644
index 0000000000000..7860f356aa3ee
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextCreateMkdir.java
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.DelegateToFileSystem;
+import org.apache.hadoop.fs.FileContext;
+import org.apache.hadoop.fs.FileContextCreateMkdirBaseTest;
+import org.apache.hadoop.fs.FileContextTestHelper;
+import org.apache.hadoop.fs.FileSystem;
+import org.junit.Assume;
+import org.junit.BeforeClass;
+
+import java.net.URI;
+import java.util.UUID;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+
+/**
+ * File context create mkdir test cases on obs file system.
+ */
+public class TestOBSFileContextCreateMkdir extends
+ FileContextCreateMkdirBaseTest {
+
+ @BeforeClass
+ public static void skipTestCheck() {
+ Assume.assumeTrue(OBSContract.isContractTestEnabled());
+ }
+
+
+ @SuppressFBWarnings("ST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD")
+ @Override
+ public void setUp() throws Exception {
+ Configuration conf = OBSContract.getConfiguration();
+ conf.addResource(OBSContract.CONTRACT_XML);
+ String fileSystem = conf.get(OBSTestConstants.TEST_FS_OBS_NAME);
+ if (fileSystem == null || fileSystem.trim().length() == 0) {
+ throw new Exception("Default file system not configured.");
+ }
+
+ URI uri = new URI(fileSystem);
+ FileSystem fs = OBSTestUtils.createTestFileSystem(conf);
+ if (fc == null) {
+ this.fc = FileContext.getFileContext(new DelegateToFileSystem(uri, fs,
+ conf, fs.getScheme(), false) {
+ }, conf);
+ }
+ super.setUp();
+ }
+
+ @Override
+ protected FileContextTestHelper createFileContextHelper() {
+ // On Windows, root directory path is created from local running
+ // directory.
+ // obs does not support ':' as part of the path which results in
+ // failure.
+ return new FileContextTestHelper(UUID.randomUUID().toString());
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextMainOperations.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextMainOperations.java
new file mode 100644
index 0000000000000..ef6d31215f7a8
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextMainOperations.java
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.DelegateToFileSystem;
+import org.apache.hadoop.fs.FileContext;
+import org.apache.hadoop.fs.FileContextMainOperationsBaseTest;
+import org.apache.hadoop.fs.FileSystem;
+import org.junit.Assume;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.net.URI;
+
+/**
+ * Rename test cases on obs file system.
+ */
+public class TestOBSFileContextMainOperations extends
+ FileContextMainOperationsBaseTest {
+
+ @BeforeClass
+ public static void skipTestCheck() {
+ Assume.assumeTrue(OBSContract.isContractTestEnabled());
+ }
+
+ @edu.umd.cs.findbugs.annotations.SuppressFBWarnings(
+ "ST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD")
+ @Override
+ public void setUp() throws Exception {
+ Configuration conf = new Configuration();
+ conf.addResource(OBSContract.CONTRACT_XML);
+ String fileSystem = conf.get(OBSTestConstants.TEST_FS_OBS_NAME);
+ if (fileSystem == null || fileSystem.trim().length() == 0) {
+ throw new Exception("Default file system not configured.");
+ }
+
+ URI uri = new URI(fileSystem);
+ FileSystem fs = OBSTestUtils.createTestFileSystem(conf);
+ fc = FileContext.getFileContext(new DelegateToFileSystem(uri, fs,
+ conf, fs.getScheme(), false) {
+ }, conf);
+ super.setUp();
+ }
+
+ @Override
+ protected boolean listCorruptedBlocksSupported() {
+ return false;
+ }
+
+ @Override
+ @Test
+ public void testSetVerifyChecksum() {
+ Assume.assumeTrue("unsupport.", false);
+ }
+
+ @Override
+ public void testMkdirsFailsForSubdirectoryOfExistingFile() {
+ Assume.assumeTrue("unsupport.", false);
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextURI.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextURI.java
new file mode 100644
index 0000000000000..b3f523092a924
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextURI.java
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.DelegateToFileSystem;
+import org.apache.hadoop.fs.FileContext;
+import org.apache.hadoop.fs.FileContextURIBase;
+import org.apache.hadoop.fs.FileSystem;
+import org.junit.Assume;
+import org.junit.BeforeClass;
+
+import java.net.URI;
+
+/**
+ *
+ * A collection of tests for the {@link FileContext} to test path names passed
+ * as URIs. This test should be used for testing an instance of FileContext that
+ * has been initialized to a specific default FileSystem such a LocalFileSystem,
+ * HDFS,OBS, etc, and where path names are passed that are URIs in a different
+ * FileSystem.
+ *
+ *
+ *
+ * To test a given {@link FileSystem} implementation create a subclass of this
+ * test and override {@link #setUp()} to initialize the fc1 and
+ * fc2
+ *
+ * The tests will do operations on fc1 that use a URI in fc2
+ *
+ * {@link FileContext} instance variable.
+ *
+ */
+public class TestOBSFileContextURI extends FileContextURIBase {
+
+ @BeforeClass
+ public static void skipTestCheck() {
+ Assume.assumeTrue(OBSContract.isContractTestEnabled());
+ }
+
+ @Override
+ public void setUp() throws Exception {
+ Configuration conf = new Configuration();
+ conf.addResource(OBSContract.CONTRACT_XML);
+ String fileSystem = conf.get(OBSTestConstants.TEST_FS_OBS_NAME);
+ if (fileSystem == null || fileSystem.trim().length() == 0) {
+ throw new Exception("Default file system not configured.");
+ }
+
+ URI uri = new URI(fileSystem);
+ FileSystem fs = OBSTestUtils.createTestFileSystem(conf);
+ fc1 = FileContext.getFileContext(new DelegateToFileSystem(uri, fs,
+ conf, fs.getScheme(), false) {
+ }, conf);
+
+ fc2 = FileContext.getFileContext(new DelegateToFileSystem(uri, fs,
+ conf, fs.getScheme(), false) {
+ }, conf);
+ super.setUp();
+ }
+
+ @Override
+ public void testMkdirsFailsForSubdirectoryOfExistingFile() {
+ Assume.assumeTrue("unsupport.", false);
+ }
+
+ @Override
+ public void testFileStatus() {
+ Assume.assumeTrue("unsupport.", false);
+ }
+
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextUtil.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextUtil.java
new file mode 100644
index 0000000000000..1404e06a45227
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileContextUtil.java
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.DelegateToFileSystem;
+import org.apache.hadoop.fs.FileContext;
+import org.apache.hadoop.fs.FileContextUtilBase;
+import org.apache.hadoop.fs.FileSystem;
+import org.junit.Assume;
+import org.junit.BeforeClass;
+
+import java.net.URI;
+
+/**
+ *
+ * A collection of Util tests for the {@link FileContext#util()}. This test
+ * should be used for testing an instance of {@link FileContext#util()} that has
+ * been initialized to a specific default FileSystem such a LocalFileSystem,
+ * HDFS,OBS, etc.
+ *
+ *
+ * To test a given {@link FileSystem} implementation create a subclass of this
+ * test and override {@link #setUp()} to initialize the fc {@link
+ * FileContext} instance variable.
+ *
+ *
+ */
+public class TestOBSFileContextUtil extends FileContextUtilBase {
+
+ @BeforeClass
+ public static void skipTestCheck() {
+ Assume.assumeTrue(OBSContract.isContractTestEnabled());
+ }
+
+ @Override
+ public void setUp() throws Exception {
+ Configuration conf = new Configuration();
+ conf.addResource(OBSContract.CONTRACT_XML);
+ String fileSystem = conf.get(OBSTestConstants.TEST_FS_OBS_NAME);
+ if (fileSystem == null || fileSystem.trim().length() == 0) {
+ throw new Exception("Default file system not configured.");
+ }
+
+ URI uri = new URI(fileSystem);
+ FileSystem fs = OBSTestUtils.createTestFileSystem(conf);
+ fc = FileContext.getFileContext(new DelegateToFileSystem(uri, fs,
+ conf, fs.getScheme(), false) {
+ }, conf);
+ super.setUp();
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileSystemContract.java b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileSystemContract.java
new file mode 100644
index 0000000000000..defd3ba40f2aa
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/java/org/apache/hadoop/fs/obs/TestOBSFileSystemContract.java
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.obs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystemContractBaseTest;
+import org.junit.Assume;
+import org.junit.Before;
+
+
+/**
+ * Tests a live OBS system. If your keys and bucket aren't specified, all tests
+ * are marked as passed.
+ *
+ * This uses BlockJUnit4ClassRunner because FileSystemContractBaseTest from
+ * TestCase which uses the old Junit3 runner that doesn't ignore assumptions
+ * properly making it impossible to skip the tests if we don't have a valid
+ * bucket.
+ **/
+public class TestOBSFileSystemContract extends FileSystemContractBaseTest {
+
+ @Before
+ public void setUp() throws Exception {
+ skipTestCheck();
+ Configuration conf = new Configuration();
+ conf.addResource(OBSContract.CONTRACT_XML);
+ fs = OBSTestUtils.createTestFileSystem(conf);
+ }
+
+ @Override
+ public void testMkdirsWithUmask() {
+ Assume.assumeTrue("unspport.", false);
+ }
+
+ @Override
+ public void testRenameRootDirForbidden() {
+ Assume.assumeTrue("unspport.", false);
+ }
+
+ public void skipTestCheck() {
+ Assume.assumeTrue(OBSContract.isContractTestEnabled());
+ }
+}
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/resources/contract/obs.xml b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/resources/contract/obs.xml
new file mode 100644
index 0000000000000..30b2cf04234d9
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/resources/contract/obs.xml
@@ -0,0 +1,139 @@
+
+
+
+
+ fs.contract.test.root-tests-enabled
+ true
+
+
+
+ fs.contract.test.supports-concat
+ true
+
+
+
+ fs.contract.rename-returns-false-if-source-missing
+ true
+
+
+
+ fs.contract.test.random-seek-count
+ 10
+
+
+
+ fs.contract.is-case-sensitive
+ true
+
+
+
+ fs.contract.rename-returns-true-if-dest-exists
+ false
+
+
+
+ fs.contract.rename-returns-true-if-source-missing
+ false
+
+
+
+ fs.contract.rename-creates-dest-dirs
+ false
+
+
+
+ fs.contract.rename-remove-dest-if-empty-dir
+ false
+
+
+
+ fs.contract.supports-settimes
+ true
+
+
+
+ fs.contract.supports-append
+ true
+
+
+
+ fs.contract.supports-atomic-directory-delete
+ true
+
+
+
+ fs.contract.supports-atomic-rename
+ true
+
+
+
+ fs.contract.supports-block-locality
+ true
+
+
+
+ fs.contract.supports-concat
+ true
+
+
+
+ fs.contract.supports-seek
+ true
+
+
+
+ fs.contract.supports-seek-on-closed-file
+ true
+
+
+
+ fs.contract.rejects-seek-past-eof
+ true
+
+
+
+ fs.contract.supports-available-on-closed-file
+ true
+
+
+
+ fs.contract.supports-strict-exceptions
+ false
+
+
+
+ fs.contract.supports-unix-permissions
+ true
+
+
+
+ fs.contract.rename-overwrites-dest
+ false
+
+
+
+ fs.contract.supports-append
+ true
+
+
+
+ fs.contract.supports-getfilestatus
+ true
+
+
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/resources/core-site.xml b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/resources/core-site.xml
new file mode 100644
index 0000000000000..2058293646e3b
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/resources/core-site.xml
@@ -0,0 +1,136 @@
+
+
+
+
+
+
+
+
+ hadoop.tmp.dir
+ target/build/test
+ A base for other temporary directories.
+ true
+
+
+
+ hadoop.security.authentication
+ simple
+
+
+
+ fs.obs.impl
+ org.apache.hadoop.fs.obs.OBSFileSystem
+ The implementation class of the obs Filesystem
+
+
+ fs.obs.connection.establish.timeout
+ 60000
+
+
+ fs.obs.connection.timeout
+ 60000
+
+
+ fs.obs.idle.connection.time
+ 30000
+
+
+ fs.obs.max.idle.connections
+ 10
+
+
+ fs.obs.connection.maximum
+ 1000
+
+
+ fs.obs.attempts.maximum
+ 5
+
+
+ fs.obs.upload.stream.retry.buffer.size
+ 524288
+
+
+ fs.obs.read.buffer.size
+ 8192
+
+
+ fs.obs.write.buffer.size
+ 8192
+
+
+ fs.obs.socket.recv.buffer
+ -1
+
+
+ fs.obs.socket.send.buffer
+ -1
+
+
+ fs.obs.keep.alive
+ true
+
+
+ fs.obs.validate.certificate
+ false
+
+
+ fs.obs.verify.response.content.type
+ true
+
+
+ fs.obs.strict.hostname.verification
+ false
+
+
+ fs.obs.cname
+ false
+
+
+
+ fs.obs.test.local.path
+ /uplod_file
+
+
+
+ fs.obs.fast.upload
+ true
+
+
+ fs.obs.multipart.size
+ 10485760
+
+
+ fs.obs.experimental.input.fadvise
+ random
+
+
+
+
+
+
+
+
diff --git a/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/resources/log4j.properties b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/resources/log4j.properties
new file mode 100644
index 0000000000000..6c0829f4ee68b
--- /dev/null
+++ b/hadoop-cloud-storage-project/hadoop-huaweicloud/src/test/resources/log4j.properties
@@ -0,0 +1,23 @@
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# log4j configuration used during build and unit tests
+
+log4j.rootLogger=error,stdout
+log4j.threshold=ALL
+log4j.appender.stdout=org.apache.log4j.ConsoleAppender
+log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
+log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} [%t] %-5p %c{2} (%F:%M(%L)) - %m%n
+
+log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
+
+# for debugging low level obs operations, uncomment this line
+log4j.logger.org.apache.hadoop.fs.obs=ERROR
diff --git a/hadoop-cloud-storage-project/pom.xml b/hadoop-cloud-storage-project/pom.xml
index da0d88a8117b8..8df6bb41e9080 100644
--- a/hadoop-cloud-storage-project/pom.xml
+++ b/hadoop-cloud-storage-project/pom.xml
@@ -32,6 +32,7 @@
hadoop-cloud-storagehadoop-cos
+ hadoop-huaweicloud
diff --git a/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/VisibleForTesting.java b/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/VisibleForTesting.java
new file mode 100644
index 0000000000000..6b405ae972922
--- /dev/null
+++ b/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/VisibleForTesting.java
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.classification;
+
+import java.lang.annotation.Documented;
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+/**
+ * Annotates a program element that exists, or is more widely visible than
+ * otherwise necessary, specifically for use in test code.
+ * More precisely test code within the hadoop-* modules.
+ * Moreover, this gives the implicit scope and stability of:
+ *
+ * If external modules need to access/override these methods, then
+ * they MUST be re-scoped as public/limited private.
+ */
+@Retention(RetentionPolicy.CLASS)
+@Target({ ElementType.TYPE, ElementType.METHOD, ElementType.FIELD, ElementType.CONSTRUCTOR })
+@Documented
+public @interface VisibleForTesting {
+}
diff --git a/hadoop-common-project/hadoop-auth/pom.xml b/hadoop-common-project/hadoop-auth/pom.xml
index 4761945c6080d..6eaa4fdfce5b4 100644
--- a/hadoop-common-project/hadoop-auth/pom.xml
+++ b/hadoop-common-project/hadoop-auth/pom.xml
@@ -128,6 +128,15 @@
org.apache.zookeeperzookeeper
+
+ io.dropwizard.metrics
+ metrics-core
+
+
+ org.xerial.snappy
+ snappy-java
+ provided
+ org.apache.curatorcurator-framework
@@ -193,7 +202,7 @@
guavatest
-
+
@@ -233,8 +242,8 @@
- org.codehaus.mojo
- findbugs-maven-plugin
+ com.github.spotbugs
+ spotbugs-maven-plugin${basedir}/dev-support/findbugsExcludeFile.xml
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/client/AuthenticatedURL.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/client/AuthenticatedURL.java
index 488400647cf06..32f4edfbc5710 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/client/AuthenticatedURL.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/client/AuthenticatedURL.java
@@ -153,7 +153,6 @@ private synchronized void setAuthCookie(HttpCookie cookie) {
cookieHeaders = new HashMap<>();
cookieHeaders.put("Cookie", Arrays.asList(cookie.toString()));
}
- LOG.trace("Setting token value to {} ({})", authCookie, oldCookie);
}
private void setAuthCookieValue(String value) {
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/client/KerberosAuthenticator.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/client/KerberosAuthenticator.java
index c035dd44ce036..06b63c1b9916c 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/client/KerberosAuthenticator.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/client/KerberosAuthenticator.java
@@ -13,7 +13,7 @@
*/
package org.apache.hadoop.security.authentication.client;
-import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.classification.VisibleForTesting;
import java.lang.reflect.Constructor;
import org.apache.commons.codec.binary.Base64;
import org.apache.hadoop.security.authentication.server.HttpConstants;
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationFilter.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationFilter.java
index 94d11f48cf2a9..3658bd8b8ec01 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationFilter.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationFilter.java
@@ -237,8 +237,8 @@ public static SignerSecretProvider constructSecretProvider(
provider.init(config, ctx, validity);
} catch (Exception e) {
if (!disallowFallbackToRandomSecretProvider) {
- LOG.info("Unable to initialize FileSignerSecretProvider, " +
- "falling back to use random secrets.");
+ LOG.warn("Unable to initialize FileSignerSecretProvider, " +
+ "falling back to use random secrets. Reason: " + e.getMessage());
provider = new RandomSignerSecretProvider();
provider.init(config, ctx, validity);
} else {
@@ -619,11 +619,17 @@ && getMaxInactiveInterval() > 0) {
KerberosAuthenticator.WWW_AUTHENTICATE))) {
errCode = HttpServletResponse.SC_FORBIDDEN;
}
+ // After Jetty 9.4.21, sendError() no longer allows a custom message.
+ // use setStatus() to set a custom message.
+ String reason;
if (authenticationEx == null) {
- httpResponse.sendError(errCode, "Authentication required");
+ reason = "Authentication required";
} else {
- httpResponse.sendError(errCode, authenticationEx.getMessage());
+ reason = authenticationEx.getMessage();
}
+
+ httpResponse.setStatus(errCode, reason);
+ httpResponse.sendError(errCode, reason);
}
}
}
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationHandlerUtil.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationHandlerUtil.java
index 79739a487b431..e86dc3ffaf6ee 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationHandlerUtil.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationHandlerUtil.java
@@ -20,8 +20,6 @@
import java.util.Locale;
-import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions;
-
/**
* This is a utility class designed to provide functionality related to
* {@link AuthenticationHandler}.
@@ -44,8 +42,10 @@ private AuthenticationHandlerUtil() {
* @return an instance of AuthenticationHandler implementation.
*/
public static String getAuthenticationHandlerClassName(String authHandler) {
- String handlerName =
- Preconditions.checkNotNull(authHandler).toLowerCase(Locale.ENGLISH);
+ if (authHandler == null) {
+ throw new NullPointerException();
+ }
+ String handlerName = authHandler.toLowerCase(Locale.ENGLISH);
String authHandlerClassName = null;
@@ -98,8 +98,14 @@ public static String checkAuthScheme(String scheme) {
* specified authentication scheme false Otherwise.
*/
public static boolean matchAuthScheme(String scheme, String auth) {
- scheme = Preconditions.checkNotNull(scheme).trim();
- auth = Preconditions.checkNotNull(auth).trim();
+ if (scheme == null) {
+ throw new NullPointerException();
+ }
+ scheme = scheme.trim();
+ if (auth == null) {
+ throw new NullPointerException();
+ }
+ auth = auth.trim();
return auth.regionMatches(true, 0, scheme, 0, scheme.length());
}
}
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/JWTRedirectAuthenticationHandler.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/JWTRedirectAuthenticationHandler.java
index 5e4b0e844275a..2dcb60836b5e4 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/JWTRedirectAuthenticationHandler.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/JWTRedirectAuthenticationHandler.java
@@ -28,7 +28,7 @@
import java.security.interfaces.RSAPublicKey;
-import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.security.authentication.client.AuthenticationException;
import org.apache.hadoop.security.authentication.util.CertificateUtil;
import org.slf4j.Logger;
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/KerberosAuthenticationHandler.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/KerberosAuthenticationHandler.java
index 703842f3e3915..110d706008acd 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/KerberosAuthenticationHandler.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/KerberosAuthenticationHandler.java
@@ -13,7 +13,7 @@
*/
package org.apache.hadoop.security.authentication.server;
-import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.security.authentication.client.AuthenticationException;
import org.apache.hadoop.security.authentication.client.KerberosAuthenticator;
import org.apache.commons.codec.binary.Base64;
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/LdapAuthenticationHandler.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/LdapAuthenticationHandler.java
index 94ed5d44d2a68..60a62f1a102b5 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/LdapAuthenticationHandler.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/LdapAuthenticationHandler.java
@@ -38,8 +38,7 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
-import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
-import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions;
+import org.apache.hadoop.classification.VisibleForTesting;
/**
* The {@link LdapAuthenticationHandler} implements the BASIC authentication
@@ -144,15 +143,20 @@ public void init(Properties config) throws ServletException {
this.enableStartTls =
Boolean.valueOf(config.getProperty(ENABLE_START_TLS, "false"));
- Preconditions
- .checkNotNull(this.providerUrl, "The LDAP URI can not be null");
- Preconditions.checkArgument((this.baseDN == null)
- ^ (this.ldapDomain == null),
- "Either LDAP base DN or LDAP domain value needs to be specified");
+ if (this.providerUrl == null) {
+ throw new NullPointerException("The LDAP URI can not be null");
+ }
+ if (!((this.baseDN == null)
+ ^ (this.ldapDomain == null))) {
+ throw new IllegalArgumentException(
+ "Either LDAP base DN or LDAP domain value needs to be specified");
+ }
if (this.enableStartTls) {
String tmp = this.providerUrl.toLowerCase();
- Preconditions.checkArgument(!tmp.startsWith("ldaps"),
- "Can not use ldaps and StartTLS option at the same time");
+ if (tmp.startsWith("ldaps")) {
+ throw new IllegalArgumentException(
+ "Can not use ldaps and StartTLS option at the same time");
+ }
}
}
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/MultiSchemeAuthenticationHandler.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/MultiSchemeAuthenticationHandler.java
index b2499ff734bbe..a9c9754a9fd77 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/MultiSchemeAuthenticationHandler.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/MultiSchemeAuthenticationHandler.java
@@ -30,7 +30,6 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
-import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions;
import org.apache.hadoop.thirdparty.com.google.common.base.Splitter;
/**
@@ -114,10 +113,10 @@ public void init(Properties config) throws ServletException {
}
this.types.clear();
-
- String schemesProperty =
- Preconditions.checkNotNull(config.getProperty(SCHEMES_PROPERTY),
- "%s system property is not specified.", SCHEMES_PROPERTY);
+ if (config.getProperty(SCHEMES_PROPERTY) == null) {
+ throw new NullPointerException(SCHEMES_PROPERTY + " system property is not specified.");
+ }
+ String schemesProperty = config.getProperty(SCHEMES_PROPERTY);
for (String scheme : STR_SPLITTER.split(schemesProperty)) {
scheme = AuthenticationHandlerUtil.checkAuthScheme(scheme);
if (schemeToAuthHandlerMapping.containsKey(scheme)) {
@@ -128,8 +127,10 @@ public void init(Properties config) throws ServletException {
String authHandlerPropName =
String.format(AUTH_HANDLER_PROPERTY, scheme).toLowerCase();
String authHandlerName = config.getProperty(authHandlerPropName);
- Preconditions.checkNotNull(authHandlerName,
- "No auth handler configured for scheme %s.", scheme);
+ if (authHandlerName == null) {
+ throw new NullPointerException(
+ "No auth handler configured for scheme " + scheme);
+ }
String authHandlerClassName =
AuthenticationHandlerUtil
@@ -145,7 +146,9 @@ public void init(Properties config) throws ServletException {
protected AuthenticationHandler initializeAuthHandler(
String authHandlerClassName, Properties config) throws ServletException {
try {
- Preconditions.checkNotNull(authHandlerClassName);
+ if (authHandlerClassName == null) {
+ throw new NullPointerException();
+ }
logger.debug("Initializing Authentication handler of type "
+ authHandlerClassName);
Class> klass =
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/FileSignerSecretProvider.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/FileSignerSecretProvider.java
index c03703732cf08..2a8a712b595ba 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/FileSignerSecretProvider.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/FileSignerSecretProvider.java
@@ -13,15 +13,15 @@
*/
package org.apache.hadoop.security.authentication.util;
-import org.apache.hadoop.thirdparty.com.google.common.base.Charsets;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.security.authentication.server.AuthenticationFilter;
-import org.apache.hadoop.security.authentication.util.SignerSecretProvider;
import javax.servlet.ServletContext;
import java.io.*;
-import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Paths;
import java.util.Properties;
/**
@@ -43,29 +43,24 @@ public void init(Properties config, ServletContext servletContext,
String signatureSecretFile = config.getProperty(
AuthenticationFilter.SIGNATURE_SECRET_FILE, null);
- Reader reader = null;
if (signatureSecretFile != null) {
- try {
+ try (Reader reader = new InputStreamReader(Files.newInputStream(
+ Paths.get(signatureSecretFile)), StandardCharsets.UTF_8)) {
StringBuilder sb = new StringBuilder();
- reader = new InputStreamReader(
- new FileInputStream(signatureSecretFile), Charsets.UTF_8);
int c = reader.read();
while (c > -1) {
sb.append((char) c);
c = reader.read();
}
- secret = sb.toString().getBytes(Charset.forName("UTF-8"));
+
+ secret = sb.toString().getBytes(StandardCharsets.UTF_8);
+ if (secret.length == 0) {
+ throw new RuntimeException("No secret in signature secret file: "
+ + signatureSecretFile);
+ }
} catch (IOException ex) {
throw new RuntimeException("Could not read signature secret file: " +
signatureSecretFile);
- } finally {
- if (reader != null) {
- try {
- reader.close();
- } catch (IOException e) {
- // nothing to do
- }
- }
}
}
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/KerberosName.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/KerberosName.java
index a308cef190396..76a723a4d72c7 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/KerberosName.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/KerberosName.java
@@ -26,7 +26,7 @@
import java.util.regex.Matcher;
import java.util.regex.Pattern;
-import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.slf4j.Logger;
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/KerberosUtil.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/KerberosUtil.java
index 4319aa5b0df98..fc6f957b9622e 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/KerberosUtil.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/KerberosUtil.java
@@ -22,7 +22,6 @@
import java.io.File;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
-import java.lang.reflect.Field;
import java.lang.reflect.InvocationTargetException;
import java.net.InetAddress;
import java.net.UnknownHostException;
@@ -73,21 +72,34 @@ private static Oid getNumericOidInstance(String oidName) {
}
}
- public static Oid getOidInstance(String oidName)
+ /**
+ * Returns the Oid instance from string oidName.
+ * Use {@link GSS_SPNEGO_MECH_OID}, {@link GSS_KRB5_MECH_OID},
+ * or {@link NT_GSS_KRB5_PRINCIPAL_OID} instead.
+ *
+ * @return Oid instance
+ * @param oidName The oid Name
+ * @throws ClassNotFoundException for backward compatibility.
+ * @throws GSSException for backward compatibility.
+ * @throws NoSuchFieldException if the input is not supported.
+ * @throws IllegalAccessException for backward compatibility.
+ *
+ */
+ @Deprecated
+ public static Oid getOidInstance(String oidName)
throws ClassNotFoundException, GSSException, NoSuchFieldException,
IllegalAccessException {
- Class> oidClass;
- if (IBM_JAVA) {
- if ("NT_GSS_KRB5_PRINCIPAL".equals(oidName)) {
- // IBM JDK GSSUtil class does not have field for krb5 principal oid
- return new Oid("1.2.840.113554.1.2.2.1");
- }
- oidClass = Class.forName("com.ibm.security.jgss.GSSUtil");
- } else {
- oidClass = Class.forName("sun.security.jgss.GSSUtil");
+ switch (oidName) {
+ case "GSS_SPNEGO_MECH_OID":
+ return GSS_SPNEGO_MECH_OID;
+ case "GSS_KRB5_MECH_OID":
+ return GSS_KRB5_MECH_OID;
+ case "NT_GSS_KRB5_PRINCIPAL":
+ return NT_GSS_KRB5_PRINCIPAL_OID;
+ default:
+ throw new NoSuchFieldException(
+ "oidName: " + oidName + " is not supported.");
}
- Field oidField = oidClass.getDeclaredField(oidName);
- return (Oid)oidField.get(oidClass);
}
/**
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/RandomSignerSecretProvider.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/RandomSignerSecretProvider.java
index a57b744c2be0d..fe15310c53aca 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/RandomSignerSecretProvider.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/RandomSignerSecretProvider.java
@@ -13,7 +13,7 @@
*/
package org.apache.hadoop.security.authentication.util;
-import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.classification.VisibleForTesting;
import java.security.SecureRandom;
import java.util.Random;
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/RolloverSignerSecretProvider.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/RolloverSignerSecretProvider.java
index 69a09c189be27..ca95272cf9fd6 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/RolloverSignerSecretProvider.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/RolloverSignerSecretProvider.java
@@ -18,7 +18,7 @@
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import javax.servlet.ServletContext;
-import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.slf4j.Logger;
diff --git a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/ZKSignerSecretProvider.java b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/ZKSignerSecretProvider.java
index a1cd6de8e5933..374f4a5665796 100644
--- a/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/ZKSignerSecretProvider.java
+++ b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/util/ZKSignerSecretProvider.java
@@ -13,7 +13,7 @@
*/
package org.apache.hadoop.security.authentication.util;
-import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.classification.VisibleForTesting;
import java.nio.ByteBuffer;
import java.security.SecureRandom;
import java.util.Collections;
diff --git a/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/server/TestAuthenticationFilter.java b/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/server/TestAuthenticationFilter.java
index 20c0343f957b7..4f4a4521b2f0c 100644
--- a/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/server/TestAuthenticationFilter.java
+++ b/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/server/TestAuthenticationFilter.java
@@ -305,6 +305,34 @@ public byte[][] getAllSecrets() {
filter.destroy();
}
}
+
+ @Test
+ public void testEmptySecretFileFallbacksToRandomSecret() throws Exception {
+ AuthenticationFilter filter = new AuthenticationFilter();
+ try {
+ FilterConfig config = Mockito.mock(FilterConfig.class);
+ Mockito.when(config.getInitParameter(
+ AuthenticationFilter.AUTH_TYPE)).thenReturn("simple");
+ File secretFile = File.createTempFile("test_empty_secret", ".txt");
+ secretFile.deleteOnExit();
+ Assert.assertTrue(secretFile.exists());
+ Mockito.when(config.getInitParameter(
+ AuthenticationFilter.SIGNATURE_SECRET_FILE))
+ .thenReturn(secretFile.getAbsolutePath());
+ Mockito.when(config.getInitParameterNames()).thenReturn(
+ new Vector<>(Arrays.asList(AuthenticationFilter.AUTH_TYPE,
+ AuthenticationFilter.SIGNATURE_SECRET_FILE)).elements());
+ ServletContext context = Mockito.mock(ServletContext.class);
+ Mockito.when(context.getAttribute(
+ AuthenticationFilter.SIGNER_SECRET_PROVIDER_ATTRIBUTE))
+ .thenReturn(null);
+ Mockito.when(config.getServletContext()).thenReturn(context);
+ filter.init(config);
+ Assert.assertTrue(filter.isRandomSecret());
+ } finally {
+ filter.destroy();
+ }
+ }
@Test
public void testInitCaseSensitivity() throws Exception {
diff --git a/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/server/TestKerberosAuthenticationHandler.java b/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/server/TestKerberosAuthenticationHandler.java
index 629b68bffbbd9..f10371b925758 100644
--- a/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/server/TestKerberosAuthenticationHandler.java
+++ b/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/server/TestKerberosAuthenticationHandler.java
@@ -301,11 +301,10 @@ public String call() throws Exception {
GSSContext gssContext = null;
try {
String servicePrincipal = KerberosTestUtils.getServerPrincipal();
- Oid oid =
- KerberosUtil.getOidInstance("NT_GSS_KRB5_PRINCIPAL");
+ Oid oid = KerberosUtil.NT_GSS_KRB5_PRINCIPAL_OID;
GSSName serviceName = gssManager.createName(servicePrincipal,
oid);
- oid = KerberosUtil.getOidInstance("GSS_KRB5_MECH_OID");
+ oid = KerberosUtil.GSS_KRB5_MECH_OID;
gssContext = gssManager.createContext(serviceName, oid, null,
GSSContext.DEFAULT_LIFETIME);
gssContext.requestCredDeleg(true);
diff --git a/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/StringSignerSecretProvider.java b/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/StringSignerSecretProvider.java
index a7747398eec46..5582c923ae0e7 100644
--- a/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/StringSignerSecretProvider.java
+++ b/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/StringSignerSecretProvider.java
@@ -17,7 +17,7 @@
import java.util.Properties;
import javax.servlet.ServletContext;
-import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.security.authentication.server.AuthenticationFilter;
diff --git a/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/StringSignerSecretProviderCreator.java b/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/StringSignerSecretProviderCreator.java
index cb59c2099fc2c..d471cea8a6d74 100644
--- a/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/StringSignerSecretProviderCreator.java
+++ b/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/StringSignerSecretProviderCreator.java
@@ -13,7 +13,7 @@
*/
package org.apache.hadoop.security.authentication.util;
-import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.classification.InterfaceStability;
/**
diff --git a/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/TestFileSignerSecretProvider.java b/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/TestFileSignerSecretProvider.java
index 1856410fd2943..5d4aabfc7c7a3 100644
--- a/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/TestFileSignerSecretProvider.java
+++ b/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/util/TestFileSignerSecretProvider.java
@@ -16,12 +16,16 @@
import org.apache.hadoop.security.authentication.server.AuthenticationFilter;
import org.junit.Assert;
import org.junit.Test;
+import org.junit.function.ThrowingRunnable;
import java.io.File;
import java.io.FileWriter;
import java.io.Writer;
import java.util.Properties;
+import static org.junit.Assert.assertThrows;
+import static org.junit.Assert.assertTrue;
+
public class TestFileSignerSecretProvider {
@Test
@@ -48,4 +52,27 @@ public void testGetSecrets() throws Exception {
Assert.assertEquals(1, allSecrets.length);
Assert.assertArrayEquals(secretValue.getBytes(), allSecrets[0]);
}
+
+ @Test
+ public void testEmptySecretFileThrows() throws Exception {
+ File secretFile = File.createTempFile("test_empty_secret", ".txt");
+ assertTrue(secretFile.exists());
+
+ FileSignerSecretProvider secretProvider
+ = new FileSignerSecretProvider();
+ Properties secretProviderProps = new Properties();
+ secretProviderProps.setProperty(
+ AuthenticationFilter.SIGNATURE_SECRET_FILE,
+ secretFile.getAbsolutePath());
+
+ Exception exception =
+ assertThrows(RuntimeException.class, new ThrowingRunnable() {
+ @Override
+ public void run() throws Throwable {
+ secretProvider.init(secretProviderProps, null, -1);
+ }
+ });
+ assertTrue(exception.getMessage().startsWith(
+ "No secret in signature secret file:"));
+ }
}
diff --git a/hadoop-common-project/hadoop-common/dev-support/jdiff/Apache_Hadoop_Common_3.2.2.xml b/hadoop-common-project/hadoop-common/dev-support/jdiff/Apache_Hadoop_Common_3.2.2.xml
new file mode 100644
index 0000000000000..40bea21f378fe
--- /dev/null
+++ b/hadoop-common-project/hadoop-common/dev-support/jdiff/Apache_Hadoop_Common_3.2.2.xml
@@ -0,0 +1,35381 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ UnsupportedOperationException
+
+ If a key is deprecated in favor of multiple keys, they are all treated as
+ aliases of each other, and setting any one of them resets all the others
+ to the new value.
+
+ If you have multiple deprecation entries to add, it is more efficient to
+ use #addDeprecations(DeprecationDelta[] deltas) instead.
+
+ @param key
+ @param newKeys
+ @param customMessage
+ @deprecated use {@link #addDeprecation(String key, String newKey,
+ String customMessage)} instead]]>
+
+
+
+
+
+
+
+ UnsupportedOperationException
+
+ If you have multiple deprecation entries to add, it is more efficient to
+ use #addDeprecations(DeprecationDelta[] deltas) instead.
+
+ @param key
+ @param newKey
+ @param customMessage]]>
+
+
+
+
+
+
+ UnsupportedOperationException
+
+ If a key is deprecated in favor of multiple keys, they are all treated as
+ aliases of each other, and setting any one of them resets all the others
+ to the new value.
+
+ If you have multiple deprecation entries to add, it is more efficient to
+ use #addDeprecations(DeprecationDelta[] deltas) instead.
+
+ @param key Key that is to be deprecated
+ @param newKeys list of keys that take up the values of deprecated key
+ @deprecated use {@link #addDeprecation(String key, String newKey)} instead]]>
+
+
+
+
+
+
+ UnsupportedOperationException
+
+ If you have multiple deprecation entries to add, it is more efficient to
+ use #addDeprecations(DeprecationDelta[] deltas) instead.
+
+ @param key Key that is to be deprecated
+ @param newKey key that takes up the value of deprecated key]]>
+
+
+
+
+
+ key is deprecated.
+
+ @param key the parameter which is to be checked for deprecation
+ @return true if the key is deprecated and
+ false otherwise.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ final.
+
+ @param name resource to be added, the classpath is examined for a file
+ with that name.]]>
+
+
+
+
+
+
+
+
+
+ final.
+
+ @param url url of the resource to be added, the local filesystem is
+ examined directly to find the resource, without referring to
+ the classpath.]]>
+
+
+
+
+
+
+
+
+
+ final.
+
+ @param file file-path of resource to be added, the local filesystem is
+ examined directly to find the resource, without referring to
+ the classpath.]]>
+
+
+
+
+
+
+
+
+
+ final.
+
+ WARNING: The contents of the InputStream will be cached, by this method.
+ So use this sparingly because it does increase the memory consumption.
+
+ @param in InputStream to deserialize the object from. In will be read from
+ when a get or set is called next. After it is read the stream will be
+ closed.]]>
+
+
+
+
+
+
+
+
+
+
+ final.
+
+ @param in InputStream to deserialize the object from.
+ @param name the name of the resource because InputStream.toString is not
+ very descriptive some times.]]>
+
+
+
+
+
+
+
+
+
+
+ final.
+
+ @param conf Configuration object from which to load properties]]>
+
+
+
+
+
+
+
+
+
+
+ name property, null if
+ no such property exists. If the key is deprecated, it returns the value of
+ the first key which replaces the deprecated key and is not null.
+
+ Values are processed for variable expansion
+ before being returned.
+
+ @param name the property name, will be trimmed before get value.
+ @return the value of the name or its replacing property,
+ or null if no such property exists.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ name property, but only for
+ names which have no valid value, usually non-existent or commented
+ out in XML.
+
+ @param name the property name
+ @return true if the property name exists without value]]>
+
+
+
+
+
+ name property as a trimmed String,
+ null if no such property exists.
+ If the key is deprecated, it returns the value of
+ the first key which replaces the deprecated key and is not null
+
+ Values are processed for variable expansion
+ before being returned.
+
+ @param name the property name.
+ @return the value of the name or its replacing property,
+ or null if no such property exists.]]>
+
+
+
+
+
+
+ name property as a trimmed String,
+ defaultValue if no such property exists.
+ See @{Configuration#getTrimmed} for more details.
+
+ @param name the property name.
+ @param defaultValue the property default value.
+ @return the value of the name or defaultValue
+ if it is not set.]]>
+
+
+
+
+
+ name property, without doing
+ variable expansion.If the key is
+ deprecated, it returns the value of the first key which replaces
+ the deprecated key and is not null.
+
+ @param name the property name.
+ @return the value of the name property or
+ its replacing property and null if no such property exists.]]>
+
+
+
+
+
+
+ value of the name property. If
+ name is deprecated or there is a deprecated name associated to it,
+ it sets the value to both names. Name will be trimmed before put into
+ configuration.
+
+ @param name property name.
+ @param value property value.]]>
+
+
+
+
+
+
+
+ value of the name property. If
+ name is deprecated, it also sets the value to
+ the keys that replace the deprecated key. Name will be trimmed before put
+ into configuration.
+
+ @param name property name.
+ @param value property value.
+ @param source the place that this configuration value came from
+ (For debugging).
+ @throws IllegalArgumentException when the value or name is null.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ name. If the key is deprecated,
+ it returns the value of the first key which replaces the deprecated key
+ and is not null.
+ If no such property exists,
+ then defaultValue is returned.
+
+ @param name property name, will be trimmed before get value.
+ @param defaultValue default value.
+ @return property value, or defaultValue if the property
+ doesn't exist.]]>
+
+
+
+
+
+
+ name property as an int.
+
+ If no such property exists, the provided default value is returned,
+ or if the specified value is not a valid int,
+ then an error is thrown.
+
+ @param name property name.
+ @param defaultValue default value.
+ @throws NumberFormatException when the value is invalid
+ @return property value as an int,
+ or defaultValue.]]>
+
+
+
+
+
+ name property as a set of comma-delimited
+ int values.
+
+ If no such property exists, an empty array is returned.
+
+ @param name property name
+ @return property value interpreted as an array of comma-delimited
+ int values]]>
+
+
+
+
+
+
+ name property to an int.
+
+ @param name property name.
+ @param value int value of the property.]]>
+
+
+
+
+
+
+ name property as a long.
+ If no such property exists, the provided default value is returned,
+ or if the specified value is not a valid long,
+ then an error is thrown.
+
+ @param name property name.
+ @param defaultValue default value.
+ @throws NumberFormatException when the value is invalid
+ @return property value as a long,
+ or defaultValue.]]>
+
+
+
+
+
+
+ name property as a long or
+ human readable format. If no such property exists, the provided default
+ value is returned, or if the specified value is not a valid
+ long or human readable format, then an error is thrown. You
+ can use the following suffix (case insensitive): k(kilo), m(mega), g(giga),
+ t(tera), p(peta), e(exa)
+
+ @param name property name.
+ @param defaultValue default value.
+ @throws NumberFormatException when the value is invalid
+ @return property value as a long,
+ or defaultValue.]]>
+
+
+
+
+
+
+ name property to a long.
+
+ @param name property name.
+ @param value long value of the property.]]>
+
+
+
+
+
+
+ name property as a float.
+ If no such property exists, the provided default value is returned,
+ or if the specified value is not a valid float,
+ then an error is thrown.
+
+ @param name property name.
+ @param defaultValue default value.
+ @throws NumberFormatException when the value is invalid
+ @return property value as a float,
+ or defaultValue.]]>
+
+
+
+
+
+
+ name property to a float.
+
+ @param name property name.
+ @param value property value.]]>
+
+
+
+
+
+
+ name property as a double.
+ If no such property exists, the provided default value is returned,
+ or if the specified value is not a valid double,
+ then an error is thrown.
+
+ @param name property name.
+ @param defaultValue default value.
+ @throws NumberFormatException when the value is invalid
+ @return property value as a double,
+ or defaultValue.]]>
+
+
+
+
+
+
+ name property to a double.
+
+ @param name property name.
+ @param value property value.]]>
+
+
+
+
+
+
+ name property as a boolean.
+ If no such property is specified, or if the specified value is not a valid
+ boolean, then defaultValue is returned.
+
+ @param name property name.
+ @param defaultValue default value.
+ @return property value as a boolean,
+ or defaultValue.]]>
+
+
+
+
+
+
+ name property to a boolean.
+
+ @param name property name.
+ @param value boolean value of the property.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+ name property to the given type. This
+ is equivalent to set(<name>, value.toString()).
+ @param name property name
+ @param value new value]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ name to the given time duration. This
+ is equivalent to set(<name>, value + <time suffix>).
+ @param name Property name
+ @param value Time duration
+ @param unit Unit of time]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ name property as a Pattern.
+ If no such property is specified, or if the specified value is not a valid
+ Pattern, then DefaultValue is returned.
+ Note that the returned value is NOT trimmed by this method.
+
+ @param name property name
+ @param defaultValue default value
+ @return property value as a compiled Pattern, or defaultValue]]>
+
+
+
+
+
+
+ Pattern.
+ If the pattern is passed as null, sets the empty pattern which results in
+ further calls to getPattern(...) returning the default value.
+
+ @param name property name
+ @param pattern new value]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ name property as
+ a collection of Strings.
+ If no such property is specified then empty collection is returned.
+
+ This is an optimized version of {@link #getStrings(String)}
+
+ @param name property name.
+ @return property value as a collection of Strings.]]>
+
+
+
+
+
+ name property as
+ an array of Strings.
+ If no such property is specified then null is returned.
+
+ @param name property name.
+ @return property value as an array of Strings,
+ or null.]]>
+
+
+
+
+
+
+ name property as
+ an array of Strings.
+ If no such property is specified then default value is returned.
+
+ @param name property name.
+ @param defaultValue The default value
+ @return property value as an array of Strings,
+ or default value.]]>
+
+
+
+
+
+ name property as
+ a collection of Strings, trimmed of the leading and trailing whitespace.
+ If no such property is specified then empty Collection is returned.
+
+ @param name property name.
+ @return property value as a collection of Strings, or empty Collection]]>
+
+
+
+
+
+ name property as
+ an array of Strings, trimmed of the leading and trailing whitespace.
+ If no such property is specified then an empty array is returned.
+
+ @param name property name.
+ @return property value as an array of trimmed Strings,
+ or empty array.]]>
+
+
+
+
+
+
+ name property as
+ an array of Strings, trimmed of the leading and trailing whitespace.
+ If no such property is specified then default value is returned.
+
+ @param name property name.
+ @param defaultValue The default value
+ @return property value as an array of trimmed Strings,
+ or default value.]]>
+
+
+
+
+
+
+ name property as
+ as comma delimited values.
+
+ @param name property name.
+ @param values The values]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ hostProperty as a
+ InetSocketAddress. If hostProperty is
+ null, addressProperty will be used. This
+ is useful for cases where we want to differentiate between host
+ bind address and address clients should use to establish connection.
+
+ @param hostProperty bind host property name.
+ @param addressProperty address property name.
+ @param defaultAddressValue the default value
+ @param defaultPort the default port
+ @return InetSocketAddress]]>
+
+
+
+
+
+
+
+ name property as a
+ InetSocketAddress.
+ @param name property name.
+ @param defaultAddress the default value
+ @param defaultPort the default port
+ @return InetSocketAddress]]>
+
+
+
+
+
+
+ name property as
+ a host:port.]]>
+
+
+
+
+
+
+
+
+ name property as a host:port. The wildcard
+ address is replaced with the local host's address. If the host and address
+ properties are configured the host component of the address will be combined
+ with the port component of the addr to generate the address. This is to allow
+ optional control over which host name is used in multi-home bind-host
+ cases where a host can have multiple names
+ @param hostProperty the bind-host configuration name
+ @param addressProperty the service address configuration name
+ @param defaultAddressValue the service default address configuration value
+ @param addr InetSocketAddress of the service listener
+ @return InetSocketAddress for clients to connect]]>
+
+
+
+
+
+
+ name property as a host:port. The wildcard
+ address is replaced with the local host's address.
+ @param name property name.
+ @param addr InetSocketAddress of a listener to store in the given property
+ @return InetSocketAddress for clients to connect]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ name property
+ as an array of Class.
+ The value of the property specifies a list of comma separated class names.
+ If no such property is specified, then defaultValue is
+ returned.
+
+ @param name the property name.
+ @param defaultValue default value.
+ @return property value as a Class[],
+ or defaultValue.]]>
+
+
+
+
+
+
+ name property as a Class.
+ If no such property is specified, then defaultValue is
+ returned.
+
+ @param name the class name.
+ @param defaultValue default value.
+ @return property value as a Class,
+ or defaultValue.]]>
+
+
+
+
+
+
+
+ name property as a Class
+ implementing the interface specified by xface.
+
+ If no such property is specified, then defaultValue is
+ returned.
+
+ An exception is thrown if the returned class does not implement the named
+ interface.
+
+ @param name the class name.
+ @param defaultValue default value.
+ @param xface the interface implemented by the named class.
+ @return property value as a Class,
+ or defaultValue.]]>
+
+
+
+
+
+
+ name property as a List
+ of objects implementing the interface specified by xface.
+
+ An exception is thrown if any of the classes does not exist, or if it does
+ not implement the named interface.
+
+ @param name the property name.
+ @param xface the interface implemented by the classes named by
+ name.
+ @return a List of objects implementing xface.]]>
+
+
+
+
+
+
+
+ name property to the name of a
+ theClass implementing the given interface xface.
+
+ An exception is thrown if theClass does not implement the
+ interface xface.
+
+ @param name property name.
+ @param theClass property value.
+ @param xface the interface implemented by the named class.]]>
+
+
+
+
+
+
+
+ dirsProp with
+ the given path. If dirsProp contains multiple directories,
+ then one is chosen based on path's hash code. If the selected
+ directory does not exist, an attempt is made to create it.
+
+ @param dirsProp directory in which to locate the file.
+ @param path file-path.
+ @return local file under the directory with the given path.]]>
+
+
+
+
+
+
+
+ dirsProp with
+ the given path. If dirsProp contains multiple directories,
+ then one is chosen based on path's hash code. If the selected
+ directory does not exist, an attempt is made to create it.
+
+ @param dirsProp directory in which to locate the file.
+ @param path file-path.
+ @return local file under the directory with the given path.]]>
+
+
+
+
+
+
+
+
+
+
+
+ name.
+
+ @param name configuration resource name.
+ @return an input stream attached to the resource.]]>
+
+
+
+
+
+ name.
+
+ @param name configuration resource name.
+ @return a reader attached to the resource.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ String
+ key-value pairs in the configuration.
+
+ @return an iterator over the entries.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ When property name is not empty and the property exists in the
+ configuration, this method writes the property and its attributes
+ to the {@link Writer}.
+
+
+
+
+ When property name is null or empty, this method writes all the
+ configuration properties and their attributes to the {@link Writer}.
+
+
+
+
+ When property name is not empty but the property doesn't exist in
+ the configuration, this method throws an {@link IllegalArgumentException}.
+
+
+ @param out the writer to write to.]]>
+
+
+
+
+
+
+
+
+
+ When propertyName is not empty, and the property exists
+ in the configuration, the format of the output would be,
+
+ When propertyName is not empty, and the property is not
+ found in the configuration, this method will throw an
+ {@link IllegalArgumentException}.
+
+
+ @param config the configuration
+ @param propertyName property name
+ @param out the Writer to write to
+ @throws IOException
+ @throws IllegalArgumentException when property name is not
+ empty and the property is not found in configuration]]>
+
+
+
+
+
+
+
+
+ { "properties" :
+ [ { key : "key1",
+ value : "value1",
+ isFinal : "key1.isFinal",
+ resource : "key1.resource" },
+ { key : "key2",
+ value : "value2",
+ isFinal : "ke2.isFinal",
+ resource : "key2.resource" }
+ ]
+ }
+
+
+ It does not output the properties of the configuration object which
+ is loaded from an input stream.
+
+
+ @param config the configuration
+ @param out the Writer to write to
+ @throws IOException]]>
+
Configurations are specified by resources. A resource contains a set of
+ name/value pairs as XML data. Each resource is named by either a
+ String or by a {@link Path}. If named by a String,
+ then the classpath is examined for a file with that name. If named by a
+ Path, then the local filesystem is examined directly, without
+ referring to the classpath.
+
+
Unless explicitly turned off, Hadoop by default specifies two
+ resources, loaded in-order from the classpath:
core-site.xml: Site-specific configuration for a given hadoop
+ installation.
+
+ Applications may add additional resources, which are loaded
+ subsequent to these resources in the order they are added.
+
+
Final Parameters
+
+
Configuration parameters may be declared final.
+ Once a resource declares a value final, no subsequently-loaded
+ resource can alter that value.
+ For example, one might define a final parameter with:
+
When conf.get("tempdir") is called, then ${basedir}
+ will be resolved to another property in this Configuration, while
+ ${user.name} would then ordinarily be resolved to the value
+ of the System property with that name.
+
When conf.get("otherdir") is called, then ${env.BASE_DIR}
+ will be resolved to the value of the ${BASE_DIR} environment variable.
+ It supports ${env.NAME:-default} and ${env.NAME-default} notations.
+ The former is resolved to "default" if ${NAME} environment variable is undefined
+ or its value is empty.
+ The latter behaves the same way only if ${NAME} is undefined.
+
By default, warnings will be given to any deprecated configuration
+ parameters and these are suppressible by configuring
+ log4j.logger.org.apache.hadoop.conf.Configuration.deprecation in
+ log4j.properties file.
+
+
Tags
+
+
Optionally we can tag related properties together by using tag
+ attributes. System tags are defined by hadoop.tags.system property. Users
+ can define there own custom tags in hadoop.tags.custom property.
+
+
Properties marked with tags can be retrieved with conf
+ .getAllPropertiesByTag("HDFS") or conf.getAllPropertiesByTags
+ (Arrays.asList("YARN","SECURITY")).
]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This implementation generates the key material and calls the
+ {@link #createKey(String, byte[], Options)} method.
+
+ @param name the base name of the key
+ @param options the options for the new key.
+ @return the version name of the first version of the key.
+ @throws IOException
+ @throws NoSuchAlgorithmException]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This implementation generates the key material and calls the
+ {@link #rollNewVersion(String, byte[])} method.
+
+ @param name the basename of the key
+ @return the name of the new version of the key
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ KeyProvider implementations must be thread safe.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ NULL if
+ a provider for the specified URI scheme could not be found.
+ @throws IOException thrown if the provider failed to initialize.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ uri has syntax error]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ uri is
+ not found]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ uri
+ determines a configuration property name,
+ fs.AbstractFileSystem.scheme.impl whose value names the
+ AbstractFileSystem class.
+
+ The entire URI and conf is passed to the AbstractFileSystem factory method.
+
+ @param uri for the file system to be created.
+ @param conf which is passed to the file system impl.
+
+ @return file system for the given URI.
+
+ @throws UnsupportedFileSystemException if the file system for
+ uri is not supported.]]>
+
+
+
+
+
+
+
+
+
+
+
+ default port;]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ In some FileSystem implementations such as HDFS metadata
+ synchronization is essential to guarantee consistency of read requests
+ particularly in HA setting.
+ @throws IOException
+ @throws UnsupportedOperationException]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ describing modifications
+ @throws IOException if an ACL could not be modified]]>
+
+
+
+
+
+
+
+ describing entries to remove
+ @throws IOException if an ACL could not be modified]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ describing modifications, must include entries
+ for user, group, and others for compatibility with permission bits.
+ @throws IOException if an ACL could not be modified]]>
+
+
+
+
+
+
+ which returns each AclStatus
+ @throws IOException if an ACL could not be read]]>
+
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to modify
+ @param name xattr name.
+ @param value xattr value.
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to modify
+ @param name xattr name.
+ @param value xattr value.
+ @param flag xattr set flag
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attribute
+ @param name xattr name.
+ @return byte[] xattr value.
+ @throws IOException]]>
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attributes
+ @return Map describing the XAttrs of the file or directory
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attributes
+ @param names XAttr names.
+ @return Map describing the XAttrs of the file or directory
+ @throws IOException]]>
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attributes
+ @return Map describing the XAttrs of the file or directory
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to remove extended attribute
+ @param name xattr name
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ BlockLocation(offset: 0, length: BLOCK_SIZE,
+ hosts: {"host1:9866", "host2:9866, host3:9866"})
+
+
+ And if the file is erasure-coded, each BlockLocation represents a logical
+ block groups. Value offset is the offset of a block group in the file and
+ value length is the total length of a block group. Hosts of a BlockLocation
+ are the datanodes that holding all the data blocks and parity blocks of a
+ block group.
+ Suppose we have a RS_3_2 coded file (3 data units and 2 parity units).
+ A BlockLocation example will be like:
+
CREATE - to create a file if it does not exist,
+ else throw FileAlreadyExists.
+
APPEND - to append to a file if it exists,
+ else throw FileNotFoundException.
+
OVERWRITE - to truncate a file if it exists,
+ else throw FileNotFoundException.
+
CREATE|APPEND - to create a file if it does not exist,
+ else append to an existing file.
+
CREATE|OVERWRITE - to create a file if it does not exist,
+ else overwrite an existing file.
+
SYNC_BLOCK - to force closed blocks to the disk device.
+ In addition {@link Syncable#hsync()} should be called after each write,
+ if true synchronous behavior is required.
+
LAZY_PERSIST - Create the block on transient storage (RAM) if
+ available.
+
APPEND_NEWBLOCK - Append data to a new block instead of end of the last
+ partial block.
+
+
+ Following combinations are not valid and will result in
+ {@link HadoopIllegalArgumentException}:
+
+
]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws AccessControlException if access denied
+ @throws IOException If an IO Error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server
+
+ RuntimeExceptions:
+ @throws InvalidPathException If path f is not valid]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Progress - to report progress on the operation - default null
+
Permission - umask is applied against permission: default is
+ FsPermissions:getDefault()
+
+
CreateParent - create missing parent path; default is to not
+ to create parents
+
The defaults for the following are SS defaults of the file
+ server implementing the target path. Not all parameters make sense
+ for all kinds of file system - eg. localFS ignores Blocksize,
+ replication, checksum
+
+
BufferSize - buffersize used in FSDataOutputStream
+
Blocksize - block size for file blocks
+
ReplicationFactor - replication for blocks
+
ChecksumParam - Checksum parameters. server default is used
+ if not specified.
+
+
+
+ @return {@link FSDataOutputStream} for created file
+
+ @throws AccessControlException If access is denied
+ @throws FileAlreadyExistsException If file f already exists
+ @throws FileNotFoundException If parent of f does not exist
+ and createParent is false
+ @throws ParentNotDirectoryException If parent of f is not a
+ directory.
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server
+
+ RuntimeExceptions:
+ @throws InvalidPathException If path f is not valid]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ dir already
+ exists
+ @throws FileNotFoundException If parent of dir does not exist
+ and createParent is false
+ @throws ParentNotDirectoryException If parent of dir is not a
+ directory
+ @throws UnsupportedFileSystemException If file system for dir
+ is not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server
+
+ RuntimeExceptions:
+ @throws InvalidPathException If path dir is not valid]]>
+
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server
+
+ RuntimeExceptions:
+ @throws InvalidPathException If path f is invalid]]>
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f
+ is not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+
+
Fails if path is a directory.
+
Fails if path does not exist.
+
Fails if path is not closed.
+
Fails if new size is greater than current size.
+
+ @param f The path to the file to be truncated
+ @param newLength The size the file is to be truncated to
+
+ @return true if the file has been truncated to the desired
+ newLength and is immediately available to be reused for
+ write operations such as append, or
+ false if a background process of adjusting the length of
+ the last block has been started, and clients should wait for it to
+ complete before proceeding with further file updates.
+
+ @throws AccessControlException If access is denied
+ @throws FileNotFoundException If file f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Fails if src is a file and dst is a directory.
+
Fails if src is a directory and dst is a file.
+
Fails if the parent of dst does not exist or is a file.
+
+
+ If OVERWRITE option is not passed as an argument, rename fails if the dst
+ already exists.
+
+ If OVERWRITE option is passed as an argument, rename overwrites the dst if
+ it is a file or an empty directory. Rename fails if dst is a non-empty
+ directory.
+
+ Note that atomicity of rename is dependent on the file system
+ implementation. Please refer to the file system documentation for details
+
+
+ @param src path to be renamed
+ @param dst new path after rename
+
+ @throws AccessControlException If access is denied
+ @throws FileAlreadyExistsException If dst already exists and
+ options has {@link Options.Rename#OVERWRITE}
+ option false.
+ @throws FileNotFoundException If src does not exist
+ @throws ParentNotDirectoryException If parent of dst is not a
+ directory
+ @throws UnsupportedFileSystemException If file system for src
+ and dst is not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f
+ is not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server
+
+ RuntimeExceptions:
+ @throws HadoopIllegalArgumentException If username or
+ groupname is invalid.]]>
+
+
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred]]>
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If the given path does not refer to a symlink
+ or an I/O error occurred]]>
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Given a path referring to a symlink of form:
+
+ <---X--->
+ fs://host/A/B/link
+ <-----Y----->
+
+ In this path X is the scheme and authority that identify the file system,
+ and Y is the path leading up to the final path component "link". If Y is
+ a symlink itself then let Y' be the target of Y and X' be the scheme and
+ authority of Y'. Symlink targets may:
+
+ 1. Fully qualified URIs
+
+ fs://hostX/A/B/file Resolved according to the target file system.
+
+ 2. Partially qualified URIs (eg scheme but no host)
+
+ fs:///A/B/file Resolved according to the target file system. Eg resolving
+ a symlink to hdfs:///A results in an exception because
+ HDFS URIs must be fully qualified, while a symlink to
+ file:///A will not since Hadoop's local file systems
+ require partially qualified URIs.
+
+ 3. Relative paths
+
+ path Resolves to [Y'][path]. Eg if Y resolves to hdfs://host/A and path
+ is "../B/file" then [Y'][path] is hdfs://host/B/file
+
+ 4. Absolute paths
+
+ path Resolves to [X'][path]. Eg if Y resolves hdfs://host/A/B and path
+ is "/file" then [X][path] is hdfs://host/file
+
+
+ @param target the target of the symbolic link
+ @param link the path to be created that points to target
+ @param createParent if true then missing parent dirs are created if
+ false then parent must exist
+
+
+ @throws AccessControlException If access is denied
+ @throws FileAlreadyExistsException If file linkcode> already exists
+ @throws FileNotFoundException If target does not exist
+ @throws ParentNotDirectoryException If parent of link is not a
+ directory.
+ @throws UnsupportedFileSystemException If file system for
+ target or link is not supported
+ @throws IOException If an I/O error occurred]]>
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws UnsupportedFileSystemException If file system for f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+ f is
+ not supported
+ @throws IOException If an I/O error occurred
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ describing modifications
+ @throws IOException if an ACL could not be modified]]>
+
+
+
+
+
+
+
+ describing entries to remove
+ @throws IOException if an ACL could not be modified]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ describing modifications, must include entries
+ for user, group, and others for compatibility with permission bits.
+ @throws IOException if an ACL could not be modified]]>
+
+
+
+
+
+
+ which returns each AclStatus
+ @throws IOException if an ACL could not be read]]>
+
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to modify
+ @param name xattr name.
+ @param value xattr value.
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to modify
+ @param name xattr name.
+ @param value xattr value.
+ @param flag xattr set flag
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attribute
+ @param name xattr name.
+ @return byte[] xattr value.
+ @throws IOException]]>
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attributes
+ @return Map describing the XAttrs of the file or directory
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attributes
+ @param names XAttr names.
+ @return Map describing the XAttrs of the file or directory
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to remove extended attribute
+ @param name xattr name
+ @throws IOException]]>
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attributes
+ @return List of the XAttr names of the file or directory
+ @throws IOException]]>
+
+
+
+
+
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+ Exceptions applicable to file systems accessed over RPC:
+ @throws RpcClientException If an exception occurred in the RPC client
+ @throws RpcServerException If an exception occurred in the RPC server
+ @throws UnexpectedServerException If server implementation throws
+ undeclared exception to RPC server]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Path Names
+
+ The Hadoop file system supports a URI namespace and URI names. This enables
+ multiple types of file systems to be referenced using fully-qualified URIs.
+ Two common Hadoop file system implementations are
+
+
the local file system: file:///path
+
the HDFS file system: hdfs://nnAddress:nnPort/path
+
+
+ The Hadoop file system also supports additional naming schemes besides URIs.
+ Hadoop has the concept of a default file system, which implies a
+ default URI scheme and authority. This enables slash-relative names
+ relative to the default FS, which are more convenient for users and
+ application writers. The default FS is typically set by the user's
+ environment, though it can also be manually specified.
+
+
+ Hadoop also supports working-directory-relative names, which are paths
+ relative to the current working directory (similar to Unix). The working
+ directory can be in a different file system than the default FS.
+
+ Thus, Hadoop path names can be specified as one of the following:
+
+
a fully-qualified URI: scheme://authority/path (e.g.
+ hdfs://nnAddress:nnPort/foo/bar)
+
a slash-relative name: path relative to the default file system (e.g.
+ /foo/bar)
+
a working-directory-relative name: path relative to the working dir (e.g.
+ foo/bar)
+
+ Relative paths with scheme (scheme:foo/bar) are illegal.
+
+
Role of FileContext and Configuration Defaults
+
+ The FileContext is the analogue of per-process file-related state in Unix. It
+ contains two properties:
+
+
+
the default file system (for resolving slash-relative names)
+
the umask (for file permissions)
+
+ In general, these properties are obtained from the default configuration file
+ in the user's environment (see {@link Configuration}).
+
+ Further file system properties are specified on the server-side. File system
+ operations default to using these server-side defaults unless otherwise
+ specified.
+
+ The file system related server-side defaults are:
+
+
the home directory (default is "/user/userName")
+
the initial wd (only for local fs)
+
replication factor
+
block size
+
buffer size
+
encryptDataTransfer
+
checksum option. (checksumType and bytesPerChecksum)
+
+
+
Example Usage
+
+ Example 1: use the default config read from the $HADOOP_CONFIG/core.xml.
+ Unspecified values come from core-defaults.xml in the release jar.
+
+
myFContext = FileContext.getFileContext(); // uses the default config
+ // which has your default FS
+
myFContext.create(path, ...);
+
myFContext.setWorkingDir(path);
+
myFContext.open (path, ...);
+
...
+
+ Example 2: Get a FileContext with a specific URI as the default FS
+
+
myFContext = FileContext.getFileContext(URI);
+
myFContext.create(path, ...);
+
...
+
+ Example 3: FileContext with local file system as the default
+
+ If the configuration has the property
+ {@code "fs.$SCHEME.impl.disable.cache"} set to true,
+ a new instance will be created, initialized with the supplied URI and
+ configuration, then returned without being cached.
+
+
+ If the there is a cached FS instance matching the same URI, it will
+ be returned.
+
+
+ Otherwise: a new FS instance will be created, initialized with the
+ configuration and URI, cached and returned to the caller.
+
+
+ @throws IOException if the FileSystem cannot be instantiated.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ if f == null :
+ result = null
+ elif f.getLen() <= start:
+ result = []
+ else result = [ locations(FS, b) for b in blocks(FS, p, s, s+l)]
+
+ This call is most helpful with and distributed filesystem
+ where the hostnames of machines that contain blocks of the given file
+ can be determined.
+
+ The default implementation returns an array containing one element:
+
+
+ And if a file is erasure-coded, the returned BlockLocation are logical
+ block groups.
+
+ Suppose we have a RS_3_2 coded file (3 data units and 2 parity units).
+ 1. If the file size is less than one stripe size, say 2 * CELL_SIZE, then
+ there will be one BlockLocation returned, with 0 offset, actual file size
+ and 4 hosts (2 data blocks and 2 parity blocks) hosting the actual blocks.
+ 3. If the file size is less than one group size but greater than one
+ stripe size, then there will be one BlockLocation returned, with 0 offset,
+ actual file size with 5 hosts (3 data blocks and 2 parity blocks) hosting
+ the actual blocks.
+ 4. If the file size is greater than one group size, 3 * BLOCK_SIZE + 123
+ for example, then the result will be like:
+
Fails if the parent of dst does not exist or is a file.
+
+
+ If OVERWRITE option is not passed as an argument, rename fails
+ if the dst already exists.
+
+ If OVERWRITE option is passed as an argument, rename overwrites
+ the dst if it is a file or an empty directory. Rename fails if dst is
+ a non-empty directory.
+
+ Note that atomicity of rename is dependent on the file system
+ implementation. Please refer to the file system documentation for
+ details. This default implementation is non atomic.
+
+ This method is deprecated since it is a temporary method added to
+ support the transition from FileSystem to FileContext for user
+ applications.
+
+ @param src path to be renamed
+ @param dst new path after rename
+ @throws FileNotFoundException src path does not exist, or the parent
+ path of dst does not exist.
+ @throws FileAlreadyExistsException dest path exists and is a file
+ @throws ParentNotDirectoryException if the parent path of dest is not
+ a directory
+ @throws IOException on failure]]>
+
+
+
+
+
+
+
+
+
Fails if path is a directory.
+
Fails if path does not exist.
+
Fails if path is not closed.
+
Fails if new size is greater than current size.
+
+ @param f The path to the file to be truncated
+ @param newLength The size the file is to be truncated to
+
+ @return true if the file has been truncated to the desired
+ newLength and is immediately available to be reused for
+ write operations such as append, or
+ false if a background process of adjusting the length of
+ the last block has been started, and clients should wait for it to
+ complete before proceeding with further file updates.
+ @throws IOException IO failure
+ @throws UnsupportedOperationException if the operation is unsupported
+ (default).]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Clean shutdown of the JVM cannot be guaranteed.
+
The time to shut down a FileSystem will depends on the number of
+ files to delete. For filesystems where the cost of checking
+ for the existence of a file/directory and the actual delete operation
+ (for example: object stores) is high, the time to shutdown the JVM can be
+ significantly extended by over-use of this feature.
+
Connectivity problems with a remote filesystem may delay shutdown
+ further, and may cause the files to not be deleted.
+
+ @param f the path to delete.
+ @return true if deleteOnExit is successful, otherwise false.
+ @throws IOException IO failure]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Does not guarantee to return the List of files/directories status in a
+ sorted order.
+
+ Will not return null. Expect IOException upon access error.
+ @param f given path
+ @return the statuses of the files/directories in the given patch
+ @throws FileNotFoundException when the path does not exist
+ @throws IOException see specific implementation]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Does not guarantee to return the List of files/directories status in a
+ sorted order.
+
+ @param f
+ a path name
+ @param filter
+ the user-supplied path filter
+ @return an array of FileStatus objects for the files under the given path
+ after applying the filter
+ @throws FileNotFoundException when the path does not exist
+ @throws IOException see specific implementation]]>
+
+
+
+
+
+
+
+
+ Does not guarantee to return the List of files/directories status in a
+ sorted order.
+
+ @param files
+ a list of paths
+ @return a list of statuses for the files under the given paths after
+ applying the filter default Path filter
+ @throws FileNotFoundException when the path does not exist
+ @throws IOException see specific implementation]]>
+
+
+
+
+
+
+
+
+
+ Does not guarantee to return the List of files/directories status in a
+ sorted order.
+
+ @param files
+ a list of paths
+ @param filter
+ the user-supplied path filter
+ @return a list of statuses for the files under the given paths after
+ applying the filter
+ @throws FileNotFoundException when the path does not exist
+ @throws IOException see specific implementation]]>
+
+
+
+
+
+
+ Return all the files that match filePattern and are not checksum
+ files. Results are sorted by their names.
+
+
+ A filename pattern is composed of regular characters and
+ special pattern matching characters, which are:
+
+
+
+
+
+
?
+
Matches any single character.
+
+
+
*
+
Matches zero or more characters.
+
+
+
[abc]
+
Matches a single character from character set
+ {a,b,c}.
+
+
+
[a-b]
+
Matches a single character from the character range
+ {a...b}. Note that character a must be
+ lexicographically less than or equal to character b.
+
+
+
[^a]
+
Matches a single character that is not from character set or range
+ {a}. Note that the ^ character must occur
+ immediately to the right of the opening bracket.
+
+
+
\c
+
Removes (escapes) any special meaning of character c.
+
+
+
{ab,cd}
+
Matches a string from the string set {ab, cd}
+
+
+
{ab,c{de,fh}}
+
Matches a string from the string set {ab, cde, cfh}
+
+
+
+
+
+ @param pathPattern a glob specifying a path pattern
+
+ @return an array of paths that match the path pattern
+ @throws IOException IO failure]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws IOException If an I/O error occurred]]>
+
+
+
+
+
+
+
+
+ f does not exist
+ @throws IOException if any I/O error occurred]]>
+
+
+
+
+
+
+
+ p does not exist
+ @throws IOException if any I/O error occurred]]>
+
+
+
+
+
+
+
+
+
+ If the path is a directory,
+ if recursive is false, returns files in the directory;
+ if recursive is true, return files in the subtree rooted at the path.
+ If the path is a file, return the file's status and block locations.
+
+ @param f is the path
+ @param recursive if the subdirectories need to be traversed recursively
+
+ @return an iterator that traverses statuses of the files
+
+ @throws FileNotFoundException when the path does not exist;
+ @throws IOException see specific implementation]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ undefined.
+ @throws IOException IO failure]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ In some FileSystem implementations such as HDFS metadata
+ synchronization is essential to guarantee consistency of read requests
+ particularly in HA setting.
+ @throws IOException
+ @throws UnsupportedOperationException]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ describing modifications
+ @throws IOException if an ACL could not be modified
+ @throws UnsupportedOperationException if the operation is unsupported
+ (default outcome).]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to modify
+ @param name xattr name.
+ @param value xattr value.
+ @throws IOException IO failure
+ @throws UnsupportedOperationException if the operation is unsupported
+ (default outcome).]]>
+
+
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to modify
+ @param name xattr name.
+ @param value xattr value.
+ @param flag xattr set flag
+ @throws IOException IO failure
+ @throws UnsupportedOperationException if the operation is unsupported
+ (default outcome).]]>
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attribute
+ @param name xattr name.
+ @return byte[] xattr value.
+ @throws IOException IO failure
+ @throws UnsupportedOperationException if the operation is unsupported
+ (default outcome).]]>
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attributes
+ @return Map describing the XAttrs of the file or directory
+ @throws IOException IO failure
+ @throws UnsupportedOperationException if the operation is unsupported
+ (default outcome).]]>
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attributes
+ @param names XAttr names.
+ @return Map describing the XAttrs of the file or directory
+ @throws IOException IO failure
+ @throws UnsupportedOperationException if the operation is unsupported
+ (default outcome).]]>
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to get extended attributes
+ @return List of the XAttr names of the file or directory
+ @throws IOException IO failure
+ @throws UnsupportedOperationException if the operation is unsupported
+ (default outcome).]]>
+
+
+
+
+
+
+
+
+ Refer to the HDFS extended attributes user documentation for details.
+
+ @param path Path to remove extended attribute
+ @param name xattr name
+ @throws IOException IO failure
+ @throws UnsupportedOperationException if the operation is unsupported
+ (default outcome).]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This is a default method which is intended to be overridden by
+ subclasses. The default implementation returns an empty storage statistics
+ object.
+
+ @return The StorageStatistics for this FileSystem instance.
+ Will never be null.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ All user code that may potentially use the Hadoop Distributed
+ File System should be written to use a FileSystem object or its
+ successor, {@link FileContext}.
+
+
+ The local implementation is {@link LocalFileSystem} and distributed
+ implementation is DistributedFileSystem. There are other implementations
+ for object stores and (outside the Apache Hadoop codebase),
+ third party filesystems.
+
+ Notes
+
+
The behaviour of the filesystem is
+
+ specified in the Hadoop documentation.
+ However, the normative specification of the behavior of this class is
+ actually HDFS: if HDFS does not behave the way these Javadocs or
+ the specification in the Hadoop documentations define, assume that
+ the documentation is incorrect.
+
+
The term {@code FileSystem} refers to an instance of this class.
+
The acronym "FS" is used as an abbreviation of FileSystem.
+
The term {@code filesystem} refers to the distributed/local filesystem
+ itself, rather than the class used to interact with it.
+
The term "file" refers to a file in the remote filesystem,
+ rather than instances of {@code java.io.File}.
+ Fencing is configured by the operator as an ordered list of methods to
+ attempt. Each method will be tried in turn, and the next in the list
+ will only be attempted if the previous one fails. See {@link NodeFencer}
+ for more information.
+
+ If an implementation also implements {@link Configurable} then its
+ setConf method will be called upon instantiation.]]>
+
StaticUserWebFilter - An authorization plugin that makes all
+users a static configured user.
+
]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ public class IntArrayWritable extends ArrayWritable {
+ public IntArrayWritable() {
+ super(IntWritable.class);
+ }
+ }
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ o is a ByteWritable with the same value.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ the class of the item
+ @param conf the configuration to store
+ @param item the object to be stored
+ @param keyName the name of the key to use
+ @throws IOException : forwards Exceptions from the underlying
+ {@link Serialization} classes.]]>
+
+
+
+
+
+
+
+
+ the class of the item
+ @param conf the configuration to use
+ @param keyName the name of the key to use
+ @param itemClass the class of the item
+ @return restored object
+ @throws IOException : forwards Exceptions from the underlying
+ {@link Serialization} classes.]]>
+
+
+
+
+
+
+
+
+ the class of the item
+ @param conf the configuration to use
+ @param items the objects to be stored
+ @param keyName the name of the key to use
+ @throws IndexOutOfBoundsException if the items array is empty
+ @throws IOException : forwards Exceptions from the underlying
+ {@link Serialization} classes.]]>
+
+
+
+
+
+
+
+
+ the class of the item
+ @param conf the configuration to use
+ @param keyName the name of the key to use
+ @param itemClass the class of the item
+ @return restored object
+ @throws IOException : forwards Exceptions from the underlying
+ {@link Serialization} classes.]]>
+
+
+
+
+ DefaultStringifier offers convenience methods to store/load objects to/from
+ the configuration.
+
+ @param the class of the objects to stringify]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ o is a DoubleWritable with the same value.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ value argument is null or
+ its size is zero, the elementType argument must not be null. If
+ the argument value's size is bigger than zero, the argument
+ elementType is not be used.
+
+ @param value
+ @param elementType]]>
+
+
+
+
+ value should not be null
+ or empty.
+
+ @param value]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+ value and elementType. If the value argument
+ is null or its size is zero, the elementType argument must not be
+ null. If the argument value's size is bigger than zero, the
+ argument elementType is not be used.
+
+ @param value
+ @param elementType]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ o is an EnumSetWritable with the same value,
+ or both are null.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ o is a FloatWritable with the same value.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ When two sequence files, which have same Key type but different Value
+ types, are mapped out to reduce, multiple Value types is not allowed.
+ In this case, this class can help you wrap instances with different types.
+
+
+
+ Compared with ObjectWritable, this class is much more effective,
+ because ObjectWritable will append the class declaration as a String
+ into the output file in every Key-Value pair.
+
+
+
+ Generic Writable implements {@link Configurable} interface, so that it will be
+ configured by the framework. The configuration is passed to the wrapped objects
+ implementing {@link Configurable} interface before deserialization.
+
+
+ how to use it:
+ 1. Write your own class, such as GenericObject, which extends GenericWritable.
+ 2. Implements the abstract method getTypes(), defines
+ the classes which will be wrapped in GenericObject in application.
+ Attention: this classes defined in getTypes() method, must
+ implement Writable interface.
+
+
+ @since Nov 8, 2006]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ o is a IntWritable with the same value.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ closes the input and output streams
+ at the end.
+
+ @param in InputStrem to read from
+ @param out OutputStream to write to
+ @param conf the Configuration object]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ignore any {@link Throwable} or
+ null pointers. Must only be used for cleanup in exception handlers.
+
+ @param log the log to record problems to at debug level. Can be null.
+ @param closeables the objects to close
+ @deprecated use {@link #cleanupWithLogger(Logger, java.io.Closeable...)}
+ instead]]>
+
+
+
+
+
+
+ ignore any {@link Throwable} or
+ null pointers. Must only be used for cleanup in exception handlers.
+
+ @param logger the log to record problems to at debug level. Can be null.
+ @param closeables the objects to close]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This is better than File#listDir because it does not ignore IOExceptions.
+
+ @param dir The directory to list.
+ @param filter If non-null, the filter to use when listing
+ this directory.
+ @return The list of files in the directory.
+
+ @throws IOException On I/O error]]>
+
+
+
+
+
+
+
+ Borrowed from Uwe Schindler in LUCENE-5588
+ @param fileToSync the file to fsync]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ o is a LongWritable with the same value.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ A map is a directory containing two files, the data file,
+ containing all keys and values in the map, and a smaller index
+ file, containing a fraction of the keys. The fraction is determined by
+ {@link Writer#getIndexInterval()}.
+
+
The index file is read entirely into memory. Thus key implementations
+ should try to keep themselves small.
+
+
Map files are created by adding entries in-order. To maintain a large
+ database, perform updates by copying the previous version of a database and
+ merging in a sorted change list, to create a new version of the database in
+ a new file. Sorting large change lists can be done with {@link
+ SequenceFile.Sorter}.]]>
+
SequenceFile provides {@link SequenceFile.Writer},
+ {@link SequenceFile.Reader} and {@link Sorter} classes for writing,
+ reading and sorting respectively.
+
+ There are three SequenceFileWriters based on the
+ {@link CompressionType} used to compress key/value pairs:
+
+
+ Writer : Uncompressed records.
+
+
+ RecordCompressWriter : Record-compressed files, only compress
+ values.
+
+
+ BlockCompressWriter : Block-compressed files, both keys &
+ values are collected in 'blocks'
+ separately and compressed. The size of
+ the 'block' is configurable.
+
+
+
The actual compression algorithm used to compress key and/or values can be
+ specified by using the appropriate {@link CompressionCodec}.
+
+
The recommended way is to use the static createWriter methods
+ provided by the SequenceFile to chose the preferred format.
+
+
The {@link SequenceFile.Reader} acts as the bridge and can read any of the
+ above SequenceFile formats.
+
+
SequenceFile Formats
+
+
Essentially there are 3 different formats for SequenceFiles
+ depending on the CompressionType specified. All of them share a
+ common header described below.
+
+
SequenceFile Header
+
+
+ version - 3 bytes of magic header SEQ, followed by 1 byte of actual
+ version number (e.g. SEQ4 or SEQ6)
+
+
+ keyClassName -key class
+
+
+ valueClassName - value class
+
+
+ compression - A boolean which specifies if compression is turned on for
+ keys/values in this file.
+
+
+ blockCompression - A boolean which specifies if block-compression is
+ turned on for keys/values in this file.
+
+
+ compression codec - CompressionCodec class which is used for
+ compression of keys and/or values (if compression is
+ enabled).
+
+
+ metadata - {@link Metadata} for this file.
+
+
+ sync - A sync marker to denote end of the header.
+
The compressed blocks of key lengths and value lengths consist of the
+ actual lengths of individual keys/values encoded in ZeroCompressedInteger
+ format.
+
+ @see CompressionCodec]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ o is a ShortWritable with the same value.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ the class of the objects to stringify]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ position. Note that this
+ method avoids using the converter or doing String instantiation
+ @return the Unicode scalar value at position or -1
+ if the position is invalid or points to a
+ trailing byte]]>
+
+
+
+
+
+
+
+
+
+ what in the backing
+ buffer, starting as position start. The starting
+ position is measured in bytes and the return value is in
+ terms of byte position in the buffer. The backing buffer is
+ not converted to a string for this operation.
+ @return byte position of the first occurrence of the search
+ string in the UTF-8 buffer or -1 if not found]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Note: For performance reasons, this call does not clear the
+ underlying byte array that is retrievable via {@link #getBytes()}.
+ In order to free the byte-array memory, call {@link #set(byte[])}
+ with an empty byte array (For example, new byte[0]).]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ o is a Text with the same contents.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ replace is true, then
+ malformed input is replaced with the
+ substitution character, which is U+FFFD. Otherwise the
+ method throws a MalformedInputException.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ replace is true, then
+ malformed input is replaced with the
+ substitution character, which is U+FFFD. Otherwise the
+ method throws a MalformedInputException.
+ @return ByteBuffer: bytes stores at ByteBuffer.array()
+ and length is ByteBuffer.limit()]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ In
+ addition, it provides methods for string traversal without converting the
+ byte array to a string.
Also includes utilities for
+ serializing/deserialing a string, coding/decoding a string, checking if a
+ byte array contains valid UTF8 code, calculating the length of an encoded
+ string.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This is useful when a class may evolve, so that instances written by the
+ old version of the class may still be processed by the new version. To
+ handle this situation, {@link #readFields(DataInput)}
+ implementations should catch {@link VersionMismatchException}.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ o is a VIntWritable with the same value.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ o is a VLongWritable with the same value.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ out.
+
+ @param out DataOuput to serialize this object into.
+ @throws IOException]]>
+
+
+
+
+
+
+ in.
+
+
For efficiency, implementations should attempt to re-use storage in the
+ existing object where possible.
+
+ @param in DataInput to deseriablize this object from.
+ @throws IOException]]>
+
+
+
+ Any key or value type in the Hadoop Map-Reduce
+ framework implements this interface.
+
+
Implementations typically implement a static read(DataInput)
+ method which constructs a new instance, calls {@link #readFields(DataInput)}
+ and returns the instance.
+
+
Example:
+
+ public class MyWritable implements Writable {
+ // Some data
+ private int counter;
+ private long timestamp;
+
+ // Default constructor to allow (de)serialization
+ MyWritable() { }
+
+ public void write(DataOutput out) throws IOException {
+ out.writeInt(counter);
+ out.writeLong(timestamp);
+ }
+
+ public void readFields(DataInput in) throws IOException {
+ counter = in.readInt();
+ timestamp = in.readLong();
+ }
+
+ public static MyWritable read(DataInput in) throws IOException {
+ MyWritable w = new MyWritable();
+ w.readFields(in);
+ return w;
+ }
+ }
+
]]>
+
+
+
+
+
+
+
+
+ WritableComparables can be compared to each other, typically
+ via Comparators. Any type which is to be used as a
+ key in the Hadoop Map-Reduce framework should implement this
+ interface.
+
+
Note that hashCode() is frequently used in Hadoop to partition
+ keys. It's important that your implementation of hashCode() returns the same
+ result across different instances of the JVM. Note also that the default
+ hashCode() implementation in Object does not
+ satisfy this property.
+
+
Example:
+
+ public class MyWritableComparable implements WritableComparable {
+ // Some data
+ private int counter;
+ private long timestamp;
+
+ public void write(DataOutput out) throws IOException {
+ out.writeInt(counter);
+ out.writeLong(timestamp);
+ }
+
+ public void readFields(DataInput in) throws IOException {
+ counter = in.readInt();
+ timestamp = in.readLong();
+ }
+
+ public int compareTo(MyWritableComparable o) {
+ int thisValue = this.value;
+ int thatValue = o.value;
+ return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
+ }
+
+ public int hashCode() {
+ final int prime = 31;
+ int result = 1;
+ result = prime * result + counter;
+ result = prime * result + (int) (timestamp ^ (timestamp >>> 32));
+ return result
+ }
+ }
+
One may optimize compare-intensive operations by overriding
+ {@link #compare(byte[],int,int,byte[],int,int)}. Static utility methods are
+ provided to assist in optimized implementations of this method.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Enum type
+ @param in DataInput to read from
+ @param enumType Class type of Enum
+ @return Enum represented by String read from DataInput
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ len number of bytes in input streamin
+ @param in input stream
+ @param len number of bytes to skip
+ @throws IOException when skipped less number of bytes]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ CompressionCodec for which to get the
+ Compressor
+ @param conf the Configuration object which contains confs for creating or reinit the compressor
+ @return Compressor for the given
+ CompressionCodec from the pool or a new one]]>
+
+
+
+
+
+
+
+
+ CompressionCodec for which to get the
+ Decompressor
+ @return Decompressor for the given
+ CompressionCodec the pool or a new one]]>
+
+
+
+
+
+ Compressor to be returned to the pool]]>
+
+
+
+
+
+ Decompressor to be returned to the
+ pool]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Codec aliases are case insensitive.
+
+ The code alias is the short class name (without the package name).
+ If the short class name ends with 'Codec', then there are two aliases for
+ the codec, the complete short class name and the short class name without
+ the 'Codec' ending. For example for the 'GzipCodec' codec class name the
+ alias are 'gzip' and 'gzipcodec'.
+
+ @param codecName the canonical class name of the codec
+ @return the codec object]]>
+
+
+
+
+
+
+ Codec aliases are case insensitive.
+
+ The code alias is the short class name (without the package name).
+ If the short class name ends with 'Codec', then there are two aliases for
+ the codec, the complete short class name and the short class name without
+ the 'Codec' ending. For example for the 'GzipCodec' codec class name the
+ alias are 'gzip' and 'gzipcodec'.
+
+ @param codecName the canonical class name of the codec
+ @return the codec class]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Implementations are assumed to be buffered. This permits clients to
+ reposition the underlying input stream then call {@link #resetState()},
+ without having to also synchronize client buffers.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ true indicating that more input data is required.
+
+ @param b Input data
+ @param off Start offset
+ @param len Length]]>
+
+
+
+
+ true if the input data buffer is empty and
+ #setInput() should be called in order to provide more input.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ true if the end of the compressed
+ data output stream has been reached.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ true indicating that more input data is required.
+ (Both native and non-native versions of various Decompressors require
+ that the data passed in via b[] remain unmodified until
+ the caller is explicitly notified--via {@link #needsInput()}--that the
+ buffer may be safely modified. With this requirement, an extra
+ buffer-copy can be avoided.)
+
+ @param b Input data
+ @param off Start offset
+ @param len Length]]>
+
+
+
+
+ true if the input data buffer is empty and
+ {@link #setInput(byte[], int, int)} should be called to
+ provide more input.
+
+ @return true if the input data buffer is empty and
+ {@link #setInput(byte[], int, int)} should be called in
+ order to provide more input.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+ true if a preset dictionary is needed for decompression.
+ @return true if a preset dictionary is needed for decompression]]>
+
+
+
+
+ true if the end of the decompressed
+ data output stream has been reached. Indicates a concatenated data stream
+ when finished() returns true and {@link #getRemaining()}
+ returns a positive value. finished() will be reset with the
+ {@link #reset()} method.
+ @return true if the end of the decompressed
+ data output stream has been reached.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+ true and getRemaining() returns a positive value. If
+ {@link #finished()} returns true and getRemaining() returns
+ a zero value, indicates that the end of data stream has been reached and
+ is not a concatenated data stream.
+ @return The number of bytes remaining in the compressed data buffer.]]>
+
+
+
+
+ true and {@link #getRemaining()} returns a positive value,
+ reset() is called before processing of the next data stream in the
+ concatenated data stream. {@link #finished()} will be reset and will
+ return false when reset() is called.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Seek by key or by file offset.
+
+ The memory footprint of a TFile includes the following:
+
+
Some constant overhead of reading or writing a compressed block.
+
+
Each compressed block requires one compression/decompression codec for
+ I/O.
+
Temporary space to buffer the key.
+
Temporary space to buffer the value (for TFile.Writer only). Values are
+ chunk encoded, so that we buffer at most one chunk of user data. By default,
+ the chunk buffer is 1MB. Reading chunked value does not require additional
+ memory.
+
+
TFile index, which is proportional to the total number of Data Blocks.
+ The total amount of memory needed to hold the index can be estimated as
+ (56+AvgKeySize)*NumBlocks.
+
MetaBlock index, which is proportional to the total number of Meta
+ Blocks.The total amount of memory needed to hold the index for Meta Blocks
+ can be estimated as (40+AvgMetaBlockName)*NumMetaBlock.
+
+
+ The behavior of TFile can be customized by the following variables through
+ Configuration:
+
+
tfile.io.chunk.size: Value chunk size. Integer (in bytes). Default
+ to 1MB. Values of the length less than the chunk size is guaranteed to have
+ known value length in read time (See
+ {@link TFile.Reader.Scanner.Entry#isValueLengthKnown()}).
+
tfile.fs.output.buffer.size: Buffer size used for
+ FSDataOutputStream. Integer (in bytes). Default to 256KB.
+
tfile.fs.input.buffer.size: Buffer size used for
+ FSDataInputStream. Integer (in bytes). Default to 256KB.
+
+
+ Suggestions on performance optimization.
+
+
Minimum block size. We recommend a setting of minimum block size between
+ 256KB to 1MB for general usage. Larger block size is preferred if files are
+ primarily for sequential access. However, it would lead to inefficient random
+ access (because there are more data to decompress). Smaller blocks are good
+ for random access, but require more memory to hold the block index, and may
+ be slower to create (because we must flush the compressor stream at the
+ conclusion of each data block, which leads to an FS I/O flush). Further, due
+ to the internal caching in Compression codec, the smallest possible block
+ size would be around 20KB-30KB.
+
The current implementation does not offer true multi-threading for
+ reading. The implementation uses FSDataInputStream seek()+read(), which is
+ shown to be much faster than positioned-read call in single thread mode.
+ However, it also means that if multiple threads attempt to access the same
+ TFile (using multiple scanners) simultaneously, the actual I/O is carried out
+ sequentially even if they access different DFS blocks.
+
Compression codec. Use "none" if the data is not very compressable (by
+ compressable, I mean a compression ratio at least 2:1). Generally, use "lzo"
+ as the starting point for experimenting. "gz" overs slightly better
+ compression ratio over "lzo" but requires 4x CPU to compress and 2x CPU to
+ decompress, comparing to "lzo".
+
File system buffering, if the underlying FSDataInputStream and
+ FSDataOutputStream is already adequately buffered; or if applications
+ reads/writes keys and values in large buffers, we can reduce the sizes of
+ input/output buffering in TFile layer by setting the configuration parameters
+ "tfile.fs.input.buffer.size" and "tfile.fs.output.buffer.size".
+
+
+ Some design rationale behind TFile can be found at Hadoop-3315.]]>
+
+
+
+
+
+
+
+
+
+
+ Utils#writeVLong(out, n).
+
+ @param out
+ output stream
+ @param n
+ The integer to be encoded
+ @throws IOException
+ @see Utils#writeVLong(DataOutput, long)]]>
+
+
+
+
+
+
+
+
+
if n in [-32, 127): encode in one byte with the actual value.
+ Otherwise,
+
if n in [-20*2^8, 20*2^8): encode in two bytes: byte[0] = n/256 - 52;
+ byte[1]=n&0xff. Otherwise,
+
if n IN [-16*2^16, 16*2^16): encode in three bytes: byte[0]=n/2^16 -
+ 88; byte[1]=(n>>8)&0xff; byte[2]=n&0xff. Otherwise,
+
if n in [-8*2^24, 8*2^24): encode in four bytes: byte[0]=n/2^24 - 112;
+ byte[1] = (n>>16)&0xff; byte[2] = (n>>8)&0xff; byte[3]=n&0xff. Otherwise:
+
if n in [-2^31, 2^31): encode in five bytes: byte[0]=-125; byte[1] =
+ (n>>24)&0xff; byte[2]=(n>>16)&0xff; byte[3]=(n>>8)&0xff; byte[4]=n&0xff;
+
if n in [-2^39, 2^39): encode in six bytes: byte[0]=-124; byte[1] =
+ (n>>32)&0xff; byte[2]=(n>>24)&0xff; byte[3]=(n>>16)&0xff;
+ byte[4]=(n>>8)&0xff; byte[5]=n&0xff
+
if n in [-2^47, 2^47): encode in seven bytes: byte[0]=-123; byte[1] =
+ (n>>40)&0xff; byte[2]=(n>>32)&0xff; byte[3]=(n>>24)&0xff;
+ byte[4]=(n>>16)&0xff; byte[5]=(n>>8)&0xff; byte[6]=n&0xff;
+
if n in [-2^55, 2^55): encode in eight bytes: byte[0]=-122; byte[1] =
+ (n>>48)&0xff; byte[2] = (n>>40)&0xff; byte[3]=(n>>32)&0xff;
+ byte[4]=(n>>24)&0xff; byte[5]=(n>>16)&0xff; byte[6]=(n>>8)&0xff;
+ byte[7]=n&0xff;
+
if n in [-2^63, 2^63): encode in nine bytes: byte[0]=-121; byte[1] =
+ (n>>54)&0xff; byte[2] = (n>>48)&0xff; byte[3] = (n>>40)&0xff;
+ byte[4]=(n>>32)&0xff; byte[5]=(n>>24)&0xff; byte[6]=(n>>16)&0xff;
+ byte[7]=(n>>8)&0xff; byte[8]=n&0xff;
+
+
+ @param out
+ output stream
+ @param n
+ the integer number
+ @throws IOException]]>
+
if (FB in [-72, -33]), return (FB+52)<<8 + NB[0]&0xff;
+
if (FB in [-104, -73]), return (FB+88)<<16 + (NB[0]&0xff)<<8 +
+ NB[1]&0xff;
+
if (FB in [-120, -105]), return (FB+112)<<24 + (NB[0]&0xff)<<16 +
+ (NB[1]&0xff)<<8 + NB[2]&0xff;
+
if (FB in [-128, -121]), return interpret NB[FB+129] as a signed
+ big-endian integer.
+
+ @param in
+ input stream
+ @return the decoded long integer.
+ @throws IOException]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Type of the input key.
+ @param list
+ The list
+ @param key
+ The input key.
+ @param cmp
+ Comparator for the key.
+ @return The index to the desired element if it exists; or list.size()
+ otherwise.]]>
+
+
+
+
+
+
+
+
+ Type of the input key.
+ @param list
+ The list
+ @param key
+ The input key.
+ @param cmp
+ Comparator for the key.
+ @return The index to the desired element if it exists; or list.size()
+ otherwise.]]>
+
+
+
+
+
+
+
+ Type of the input key.
+ @param list
+ The list
+ @param key
+ The input key.
+ @return The index to the desired element if it exists; or list.size()
+ otherwise.]]>
+
+
+
+
+
+
+
+ Type of the input key.
+ @param list
+ The list
+ @param key
+ The input key.
+ @return The index to the desired element if it exists; or list.size()
+ otherwise.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ An experimental {@link Serialization} for Java {@link Serializable} classes.
+
+ @see JavaSerializationComparator]]>
+
+
+
+
+
+
+
+
+
+
+ A {@link RawComparator} that uses a {@link JavaSerialization}
+ {@link Deserializer} to deserialize objects that are then compared via
+ their {@link Comparable} interfaces.
+
+ @param
+ @see JavaSerialization]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+This package provides a mechanism for using different serialization frameworks
+in Hadoop. The property "io.serializations" defines a list of
+{@link org.apache.hadoop.io.serializer.Serialization}s that know how to create
+{@link org.apache.hadoop.io.serializer.Serializer}s and
+{@link org.apache.hadoop.io.serializer.Deserializer}s.
+
+
+
+To add a new serialization framework write an implementation of
+{@link org.apache.hadoop.io.serializer.Serialization} and add its name to the
+"io.serializations" property.
+
]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ avro.reflect.pkgs or implement
+ {@link AvroReflectSerializable} interface.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+This package provides Avro serialization in Hadoop. This can be used to
+serialize/deserialize Avro types in Hadoop.
+
+
+
+Use {@link org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization} for
+serialization of classes generated by Avro's 'specific' compiler.
+
+
+
+Use {@link org.apache.hadoop.io.serializer.avro.AvroReflectSerialization} for
+other classes.
+{@link org.apache.hadoop.io.serializer.avro.AvroReflectSerialization} work for
+any class which is either in the package list configured via
+{@link org.apache.hadoop.io.serializer.avro.AvroReflectSerialization#AVRO_REFLECT_PACKAGES}
+or implement {@link org.apache.hadoop.io.serializer.avro.AvroReflectSerializable}
+interface.
+
{@link MetricsSource} generate and update metrics information.
+
{@link MetricsSink} consume the metrics information
+
+
+ {@link MetricsSource} and {@link MetricsSink} register with the metrics
+ system. Implementations of {@link MetricsSystem} polls the
+ {@link MetricsSource}s periodically and pass the {@link MetricsRecord}s to
+ {@link MetricsSink}.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ (aggregate).
+ Filter out entries that don't have at least minSamples.
+
+ @return a map of peer DataNode Id to the average latency to that
+ node seen over the measurement period.]]>
+
+
+
+
+ This class maintains a group of rolling average metrics. It implements the
+ algorithm of rolling average, i.e. a number of sliding windows are kept to
+ roll over and evict old subsets of samples. Each window has a subset of
+ samples in a stream, where sub-sum and sub-total are collected. All sub-sums
+ and sub-totals in all windows will be aggregated to final-sum and final-total
+ used to compute final average, which is called rolling average.
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This class is a metrics sink that uses
+ {@link org.apache.hadoop.fs.FileSystem} to write the metrics logs. Every
+ roll interval a new directory will be created under the path specified by the
+ basepath property. All metrics will be logged to a file in the
+ current interval's directory in a file named <hostname>.log, where
+ <hostname> is the name of the host on which the metrics logging
+ process is running. The base path is set by the
+ <prefix>.sink.<instance>.basepath property. The
+ time zone used to create the current interval's directory name is GMT. If
+ the basepath property isn't specified, it will default to
+ "/tmp", which is the temp directory on whatever default file
+ system is configured for the cluster.
+
+
The <prefix>.sink.<instance>.ignore-error
+ property controls whether an exception is thrown when an error is encountered
+ writing a log file. The default value is true. When set to
+ false, file errors are quietly swallowed.
+
+
The roll-interval property sets the amount of time before
+ rolling the directory. The default value is 1 hour. The roll interval may
+ not be less than 1 minute. The property's value should be given as
+ number unit, where number is an integer value, and
+ unit is a valid unit. Valid units are minute, hour,
+ and day. The units are case insensitive and may be abbreviated or
+ plural. If no units are specified, hours are assumed. For example,
+ "2", "2h", "2 hour", and
+ "2 hours" are all valid ways to specify two hours.
+
+
The roll-offset-interval-millis property sets the upper
+ bound on a random time interval (in milliseconds) that is used to delay
+ before the initial roll. All subsequent rolls will happen an integer
+ number of roll intervals after the initial roll, hence retaining the original
+ offset. The purpose of this property is to insert some variance in the roll
+ times so that large clusters using this sink on every node don't cause a
+ performance impact on HDFS by rolling simultaneously. The default value is
+ 30000 (30s). When writing to HDFS, as a rule of thumb, the roll offset in
+ millis should be no less than the number of sink instances times 5.
+
+
The primary use of this class is for logging to HDFS. As it uses
+ {@link org.apache.hadoop.fs.FileSystem} to access the target file system,
+ however, it can be used to write to the local file system, Amazon S3, or any
+ other supported file system. The base path for the sink will determine the
+ file system used. An unqualified path will write to the default file system
+ set by the configuration.
+
+
Not all file systems support the ability to append to files. In file
+ systems without the ability to append to files, only one writer can write to
+ a file at a time. To allow for concurrent writes from multiple daemons on a
+ single host, the source property is used to set unique headers
+ for the log files. The property should be set to the name of
+ the source daemon, e.g. namenode. The value of the
+ source property should typically be the same as the property's
+ prefix. If this property is not set, the source is taken to be
+ unknown.
+
+
Instead of appending to an existing file, by default the sink
+ will create a new file with a suffix of ".<n>&quet;, where
+ n is the next lowest integer that isn't already used in a file name,
+ similar to the Hadoop daemon logs. NOTE: the file with the highest
+ sequence number is the newest file, unlike the Hadoop daemon logs.
+
+
For file systems that allow append, the sink supports appending to the
+ existing file instead. If the allow-append property is set to
+ true, the sink will instead append to the existing file on file systems that
+ support appends. By default, the allow-append property is
+ false.
+
+
Note that when writing to HDFS with allow-append set to true,
+ there is a minimum acceptable number of data nodes. If the number of data
+ nodes drops below that minimum, the append will succeed, but reading the
+ data will fail with an IOException in the DataStreamer class. The minimum
+ number of data nodes required for a successful append is generally 2 or
+ 3.
+
+
Note also that when writing to HDFS, the file size information is not
+ updated until the file is closed (at the end of the interval) even though
+ the data is being written successfully. This is a known HDFS limitation that
+ exists because of the performance cost of updating the metadata. See
+ HDFS-5478.
+
+
When using this sink in a secure (Kerberos) environment, two additional
+ properties must be set: keytab-key and
+ principal-key. keytab-key should contain the key by
+ which the keytab file can be found in the configuration, for example,
+ yarn.nodemanager.keytab. principal-key should
+ contain the key by which the principal can be found in the configuration,
+ for example, yarn.nodemanager.principal.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ CollectD StatsD plugin).
+
+ To configure this plugin, you will need to add the following
+ entries to your hadoop-metrics2.properties file:
+
+
+ *.sink.statsd.class=org.apache.hadoop.metrics2.sink.StatsDSink
+ [prefix].sink.statsd.server.host=
+ [prefix].sink.statsd.server.port=
+ [prefix].sink.statsd.skip.hostname=true|false (optional)
+ [prefix].sink.statsd.service.name=NameNode (name you want for service)
+
]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ,name="
+ Where the and are the supplied parameters.
+
+ @param serviceName
+ @param nameName
+ @param theMbean - the MBean to register
+ @return the named used to register the MBean]]>
+
+
+
+
+
+
+
+
+ ,name="
+ Where the and are the supplied parameters.
+
+ @param serviceName
+ @param nameName
+ @param properties - Key value pairs to define additional JMX ObjectName
+ properties.
+ @param theMbean - the MBean to register
+ @return the named used to register the MBean]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ hostname or hostname:port. If
+ the specs string is null, defaults to localhost:defaultPort.
+
+ @param specs server specs (see description)
+ @param defaultPort the default port if not specified
+ @return a list of InetSocketAddress objects.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This method is used when parts of Hadoop need know whether to apply
+ single rack vs multi-rack policies, such as during block placement.
+ Such algorithms behave differently if they are on multi-switch systems.
+
+
+ @return true if the mapping thinks that it is on a single switch]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This predicate simply assumes that all mappings not derived from
+ this class are multi-switch.
+ @param mapping the mapping to query
+ @return true if the base class says it is single switch, or the mapping
+ is not derived from this class.]]>
+
+
+
+ It is not mandatory to
+ derive {@link DNSToSwitchMapping} implementations from it, but it is strongly
+ recommended, as it makes it easy for the Hadoop developers to add new methods
+ to this base class that are automatically picked up by all implementations.
+
+
+ This class does not extend the Configured
+ base class, and should not be changed to do so, as it causes problems
+ for subclasses. The constructor of the Configured calls
+ the {@link #setConf(Configuration)} method, which will call into the
+ subclasses before they have been fully constructed.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ If a name cannot be resolved to a rack, the implementation
+ should return {@link NetworkTopology#DEFAULT_RACK}. This
+ is what the bundled implementations do, though it is not a formal requirement
+
+ @param names the list of hosts to resolve (can be empty)
+ @return list of resolved network paths.
+ If names is empty, the returned list is also empty]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Calling {@link #setConf(Configuration)} will trigger a
+ re-evaluation of the configuration settings and so be used to
+ set up the mapping script.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This will get called in the superclass constructor, so a check is needed
+ to ensure that the raw mapping is defined before trying to relaying a null
+ configuration.
+ @param conf]]>
+
+
+
+
+
+
+
+
+
+ It contains a static class RawScriptBasedMapping that performs
+ the work: reading the configuration parameters, executing any defined
+ script, handling errors and such like. The outer
+ class extends {@link CachedDNSToSwitchMapping} to cache the delegated
+ queries.
+
+ This DNS mapper's {@link #isSingleSwitch()} predicate returns
+ true if and only if a script is defined.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Simple {@link DNSToSwitchMapping} implementation that reads a 2 column text
+ file. The columns are separated by whitespace. The first column is a DNS or
+ IP address and the second column specifies the rack where the address maps.
+
+
+ This class uses the configuration parameter {@code
+ net.topology.table.file.name} to locate the mapping file.
+
+
+ Calls to {@link #resolve(List)} will look up the address as defined in the
+ mapping file. If no entry corresponding to the address is found, the value
+ {@code /default-rack} is returned.
+
]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ (cause==null ? null : cause.toString()) (which
+ typically contains the class and detail message of cause).
+ @param cause the cause (which is saved for later retrieval by the
+ {@link #getCause()} method). (A null value is
+ permitted, and indicates that the cause is nonexistent or
+ unknown.)]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ mapping
+ and mapping]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ /host@realm.
+ @param principalName principal name of format as described above
+ @return host name if the the string conforms to the above format, else null]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ "jack"
+
+ @param userName
+ @return userName without login method]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ the return type of the run method
+ @param action the method to execute
+ @return the value from the run method]]>
+
+
+
+
+
+
+
+ the return type of the run method
+ @param action the method to execute
+ @return the value from the run method
+ @throws IOException if the action throws an IOException
+ @throws Error if the action throws an Error
+ @throws RuntimeException if the action throws a RuntimeException
+ @throws InterruptedException if the action throws an InterruptedException
+ @throws UndeclaredThrowableException if the action throws something else]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ CredentialProvider implementations must be thread safe.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ (cause==null ? null : cause.toString()) (which
+ typically contains the class and detail message of cause).
+ @param cause the cause (which is saved for later retrieval by the
+ {@link #getCause()} method). (A null value is
+ permitted, and indicates that the cause is nonexistent or
+ unknown.)]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+ does not provide the stack trace for security purposes.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ A User-Agent String is considered to be a browser if it matches
+ any of the regex patterns from browser-useragent-regex; the default
+ behavior is to consider everything a browser that matches the following:
+ "^Mozilla.*,^Opera.*". Subclasses can optionally override
+ this method to use different behavior.
+
+ @param userAgent The User-Agent String, or null if there isn't one
+ @return true if the User-Agent String refers to a browser, false if not]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ The type of the token identifier]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ T extends TokenIdentifier]]>
+
+
+
+
+
+
+
+
+
+ DelegationTokenAuthenticatedURL.
+
+ An instance of the default {@link DelegationTokenAuthenticator} will be
+ used.]]>
+
+
+
+
+ DelegationTokenAuthenticatedURL.
+
+ @param authenticator the {@link DelegationTokenAuthenticator} instance to
+ use, if null the default one will be used.]]>
+
+
+
+
+ DelegationTokenAuthenticatedURL using the default
+ {@link DelegationTokenAuthenticator} class.
+
+ @param connConfigurator a connection configurator.]]>
+
+
+
+
+ DelegationTokenAuthenticatedURL.
+
+ @param authenticator the {@link DelegationTokenAuthenticator} instance to
+ use, if null the default one will be used.
+ @param connConfigurator a connection configurator.]]>
+
+
+
+
+
+
+
+
+
+
+
+ The default class is {@link KerberosDelegationTokenAuthenticator}
+
+ @return the delegation token authenticator class to use as default.]]>
+
+
+
+
+
+
+ This method is provided to enable WebHDFS backwards compatibility.
+
+ @param useQueryString TRUE if the token is transmitted in the
+ URL query string, FALSE if the delegation token is transmitted
+ using the {@link DelegationTokenAuthenticator#DELEGATION_TOKEN_HEADER} HTTP
+ header.]]>
+
+
+
+
+ TRUE if the token is transmitted in the URL query
+ string, FALSE if the delegation token is transmitted using the
+ {@link DelegationTokenAuthenticator#DELEGATION_TOKEN_HEADER} HTTP header.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Authenticator.
+
+ @param url the URL to connect to. Only HTTP/S URLs are supported.
+ @param token the authentication token being used for the user.
+ @return an authenticated {@link HttpURLConnection}.
+ @throws IOException if an IO error occurred.
+ @throws AuthenticationException if an authentication exception occurred.]]>
+
+
+
+
+
+
+
+
+
+ Authenticator. If the doAs parameter is not NULL,
+ the request will be done on behalf of the specified doAs user.
+
+ @param url the URL to connect to. Only HTTP/S URLs are supported.
+ @param token the authentication token being used for the user.
+ @param doAs user to do the the request on behalf of, if NULL the request is
+ as self.
+ @return an authenticated {@link HttpURLConnection}.
+ @throws IOException if an IO error occurred.
+ @throws AuthenticationException if an authentication exception occurred.]]>
+
+
+
+
+
+
+
+
+
+ Authenticator
+ for authentication.
+
+ @param url the URL to get the delegation token from. Only HTTP/S URLs are
+ supported.
+ @param token the authentication token being used for the user where the
+ Delegation token will be stored.
+ @param renewer the renewer user.
+ @return a delegation token.
+ @throws IOException if an IO error occurred.
+ @throws AuthenticationException if an authentication exception occurred.]]>
+
+
+
+
+
+
+
+
+
+
+ Authenticator
+ for authentication.
+
+ @param url the URL to get the delegation token from. Only HTTP/S URLs are
+ supported.
+ @param token the authentication token being used for the user where the
+ Delegation token will be stored.
+ @param renewer the renewer user.
+ @param doAsUser the user to do as, which will be the token owner.
+ @return a delegation token.
+ @throws IOException if an IO error occurred.
+ @throws AuthenticationException if an authentication exception occurred.]]>
+
+
+
+
+
+
+
+
+ Authenticator for authentication.
+
+ @param url the URL to renew the delegation token from. Only HTTP/S URLs are
+ supported.
+ @param token the authentication token with the Delegation Token to renew.
+ @throws IOException if an IO error occurred.
+ @throws AuthenticationException if an authentication exception occurred.]]>
+
+
+
+
+
+
+
+
+
+ Authenticator for authentication.
+
+ @param url the URL to renew the delegation token from. Only HTTP/S URLs are
+ supported.
+ @param token the authentication token with the Delegation Token to renew.
+ @param doAsUser the user to do as, which will be the token owner.
+ @throws IOException if an IO error occurred.
+ @throws AuthenticationException if an authentication exception occurred.]]>
+
+
+
+
+
+
+
+ Authenticator.
+
+ @param url the URL to cancel the delegation token from. Only HTTP/S URLs
+ are supported.
+ @param token the authentication token with the Delegation Token to cancel.
+ @throws IOException if an IO error occurred.]]>
+
+
+
+
+
+
+
+
+ Authenticator.
+
+ @param url the URL to cancel the delegation token from. Only HTTP/S URLs
+ are supported.
+ @param token the authentication token with the Delegation Token to cancel.
+ @param doAsUser the user to do as, which will be the token owner.
+ @throws IOException if an IO error occurred.]]>
+
+
+
+ DelegationTokenAuthenticatedURL is a
+ {@link AuthenticatedURL} sub-class with built-in Hadoop Delegation Token
+ functionality.
+
+ The authentication mechanisms supported by default are Hadoop Simple
+ authentication (also known as pseudo authentication) and Kerberos SPNEGO
+ authentication.
+
+ Additional authentication mechanisms can be supported via {@link
+ DelegationTokenAuthenticator} implementations.
+
+ The default {@link DelegationTokenAuthenticator} is the {@link
+ KerberosDelegationTokenAuthenticator} class which supports
+ automatic fallback from Kerberos SPNEGO to Hadoop Simple authentication via
+ the {@link PseudoDelegationTokenAuthenticator} class.
+
+ AuthenticatedURL instances are not thread-safe.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Authenticator
+ for authentication.
+
+ @param url the URL to get the delegation token from. Only HTTP/S URLs are
+ supported.
+ @param token the authentication token being used for the user where the
+ Delegation token will be stored.
+ @param renewer the renewer user.
+ @throws IOException if an IO error occurred.
+ @throws AuthenticationException if an authentication exception occurred.]]>
+
+
+
+
+
+
+
+
+
+
+ Authenticator
+ for authentication.
+
+ @param url the URL to get the delegation token from. Only HTTP/S URLs are
+ supported.
+ @param token the authentication token being used for the user where the
+ Delegation token will be stored.
+ @param renewer the renewer user.
+ @param doAsUser the user to do as, which will be the token owner.
+ @throws IOException if an IO error occurred.
+ @throws AuthenticationException if an authentication exception occurred.]]>
+
+
+
+
+
+
+
+
+
+ Authenticator for authentication.
+
+ @param url the URL to renew the delegation token from. Only HTTP/S URLs are
+ supported.
+ @param token the authentication token with the Delegation Token to renew.
+ @throws IOException if an IO error occurred.
+ @throws AuthenticationException if an authentication exception occurred.]]>
+
+
+
+
+
+
+
+
+
+
+ Authenticator for authentication.
+
+ @param url the URL to renew the delegation token from. Only HTTP/S URLs are
+ supported.
+ @param token the authentication token with the Delegation Token to renew.
+ @param doAsUser the user to do as, which will be the token owner.
+ @throws IOException if an IO error occurred.
+ @throws AuthenticationException if an authentication exception occurred.]]>
+
+
+
+
+
+
+
+
+ Authenticator.
+
+ @param url the URL to cancel the delegation token from. Only HTTP/S URLs
+ are supported.
+ @param token the authentication token with the Delegation Token to cancel.
+ @throws IOException if an IO error occurred.]]>
+
+
+
+
+
+
+
+
+
+ Authenticator.
+
+ @param url the URL to cancel the delegation token from. Only HTTP/S URLs
+ are supported.
+ @param token the authentication token with the Delegation Token to cancel.
+ @param doAsUser the user to do as, which will be the token owner.
+ @throws IOException if an IO error occurred.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ KerberosDelegationTokenAuthenticator provides support for
+ Kerberos SPNEGO authentication mechanism and support for Hadoop Delegation
+ Token operations.
+
+ It falls back to the {@link PseudoDelegationTokenAuthenticator} if the HTTP
+ endpoint does not trigger a SPNEGO authentication]]>
+
+
+
+
+
+
+
+
+ PseudoDelegationTokenAuthenticator provides support for
+ Hadoop's pseudo authentication mechanism that accepts
+ the user name specified as a query string parameter and support for Hadoop
+ Delegation Token operations.
+
+ This mimics the model of Hadoop Simple authentication trusting the
+ {@link UserGroupInformation#getCurrentUser()} value.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ live.
+ @return a (snapshotted) map of blocker name->description values]]>
+
+
+
+
+
+
+
+
+
+
+
+
+ Do nothing if the service is null or not
+ in a state in which it can be/needs to be stopped.
+
+ The service state is checked before the operation begins.
+ This process is not thread safe.
+ @param service a service or null]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Any long-lived operation here will prevent the service state
+ change from completing in a timely manner.
+
If another thread is somehow invoked from the listener, and
+ that thread invokes the methods of the service (including
+ subclass-specific methods), there is a risk of a deadlock.
+
+
+
+ @param service the service that has changed.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ The base implementation logs all arguments at the debug level,
+ then returns the passed in config unchanged.]]>
+
+
+
+
+
+
+ The action is to signal success by returning the exit code 0.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This method is called before {@link #init(Configuration)};
+ Any non-null configuration that is returned from this operation
+ becomes the one that is passed on to that {@link #init(Configuration)}
+ operation.
+
+ This permits implementations to change the configuration before
+ the init operation. As the ServiceLauncher only creates
+ an instance of the base {@link Configuration} class, it is
+ recommended to instantiate any subclass (such as YarnConfiguration)
+ that injects new resources.
+
+ @param config the initial configuration build up by the
+ service launcher.
+ @param args list of arguments passed to the command line
+ after any launcher-specific commands have been stripped.
+ @return the configuration to init the service with.
+ Recommended: pass down the config parameter with any changes
+ @throws Exception any problem]]>
+
+
+
+
+
+
+ The return value becomes the exit code of the launched process.
+
+ If an exception is raised, the policy is:
+
+
Any subset of {@link org.apache.hadoop.util.ExitUtil.ExitException}:
+ the exception is passed up unmodified.
+
+
Any exception which implements
+ {@link org.apache.hadoop.util.ExitCodeProvider}:
+ A new {@link ServiceLaunchException} is created with the exit code
+ and message of the thrown exception; the thrown exception becomes the
+ cause.
+
Any other exception: a new {@link ServiceLaunchException} is created
+ with the exit code {@link LauncherExitCodes#EXIT_EXCEPTION_THROWN} and
+ the message of the original exception (which becomes the cause).
+
+ @return the exit code
+ @throws org.apache.hadoop.util.ExitUtil.ExitException an exception passed
+ up as the exit code and error text.
+ @throws Exception any exception to report. If it provides an exit code
+ this is used in a wrapping exception.]]>
+
+
+
+
+ The command line options will be passed down before the
+ {@link Service#init(Configuration)} operation is invoked via an
+ invocation of {@link LaunchableService#bindArgs(Configuration, List)}
+ After the service has been successfully started via {@link Service#start()}
+ the {@link LaunchableService#execute()} method is called to execute the
+ service. When this method returns, the service launcher will exit, using
+ the return code from the method as its exit option.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Approximate HTTP equivalent: {@code 400 Bad Request}]]>
+
+
+
+
+
+ approximate HTTP equivalent: Approximate HTTP equivalent: {@code 401 Unauthorized}]]>
+
+
+
+
+
+
+
+
+
+
+ Approximate HTTP equivalent: Approximate HTTP equivalent: {@code 403: Forbidden}]]>
+
+
+
+
+
+ Approximate HTTP equivalent: {@code 404: Not Found}]]>
+
+
+
+
+
+ Approximate HTTP equivalent: {@code 405: Not allowed}]]>
+
+
+
+
+
+ Approximate HTTP equivalent: {@code 406: Not Acceptable}]]>
+
+
+
+
+
+ Approximate HTTP equivalent: {@code 408: Request Timeout}]]>
+
+
+
+
+
+ Approximate HTTP equivalent: {@code 409: Conflict}]]>
+
+
+
+
+
+ Approximate HTTP equivalent: {@code 500 Internal Server Error}]]>
+
+
+
+
+
+ Approximate HTTP equivalent: {@code 501: Not Implemented}]]>
+
+
+
+
+
+ Approximate HTTP equivalent: {@code 503 Service Unavailable}]]>
+
+
+
+
+
+ If raised, this is expected to be raised server-side and likely due
+ to client/server version incompatibilities.
+
+ Approximate HTTP equivalent: {@code 505: Version Not Supported}]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Codes with a YARN prefix are YARN-related.
+
+ Many of the exit codes are designed to resemble HTTP error codes,
+ squashed into a single byte. e.g 44 , "not found" is the equivalent
+ of 404. The various 2XX HTTP error codes aren't followed;
+ the Unix standard of "0" for success is used.
+
+ 0-10: general command issues
+ 30-39: equivalent to the 3XX responses, where those responses are
+ considered errors by the application.
+ 40-49: client-side/CLI/config problems
+ 50-59: service-side problems.
+ 60+ : application specific error codes
+
]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ This uses {@link String#format(String, Object...)}
+ to build the formatted exception in the ENGLISH locale.
+
+ If the last argument is a throwable, it becomes the cause of the exception.
+ It will also be used as a parameter for the format.
+ @param exitCode exit code
+ @param format format for message to use in exception
+ @param args list of arguments]]>
+
+
+
+
+ When caught by the ServiceLauncher, it will convert that
+ into a process exit code.
+
+ The {@link #ServiceLaunchException(int, String, Object...)} constructor
+ generates formatted exceptions.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Clients and/or applications can use the provided Progressable
+ to explicitly report progress to the Hadoop framework. This is especially
+ important for operations which take significant amount of time since,
+ in-lieu of the reported progress, the framework has to assume that an error
+ has occurred and time-out the operation.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Class is to be obtained
+ @return the correctly typed Class of the given object.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ kill -0 command or equivalent]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ".cmd" on Windows, or ".sh" otherwise.
+
+ @param parent File parent directory
+ @param basename String script file basename
+ @return File referencing the script in the directory]]>
+
+
+
+
+
+ ".cmd" on Windows, or ".sh" otherwise.
+
+ @param basename String script file basename
+ @return String script file name]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ IOException.
+ @return the path to {@link #WINUTILS_EXE}
+ @throws RuntimeException if the path is not resolvable]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Shell.
+ @return the thread that ran runCommand() that spawned this shell
+ or null if no thread is waiting for this shell to complete]]>
+
+
+
+
+
+
+
+
+
+
+
+ Shell interface.
+ @param cmd shell command to execute.
+ @return the output of the executed command.]]>
+
+
+
+
+
+
+
+
+ Shell interface.
+ @param env the map of environment key=value
+ @param cmd shell command to execute.
+ @param timeout time in milliseconds after which script should be marked timeout
+ @return the output of the executed command.
+ @throws IOException on any problem.]]>
+
+
+
+
+
+
+
+ Shell interface.
+ @param env the map of environment key=value
+ @param cmd shell command to execute.
+ @return the output of the executed command.
+ @throws IOException on any problem.]]>
+
+
+
+
+ Shell processes.
+ Iterates through a map of all currently running Shell
+ processes and destroys them one by one. This method is thread safe]]>
+
+
+
+
+ Shell objects.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ CreateProcess synchronization object.]]>
+
+
+
+
+ os.name property.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Important: caller must check for this value being null.
+ The lack of such checks has led to many support issues being raised.
+
+ @deprecated use one of the exception-raising getter methods,
+ specifically {@link #getWinUtilsPath()} or {@link #getWinUtilsFile()}]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Shell can be used to run shell commands like du or
+ df. It also offers facilities to gate commands by
+ time-intervals.]]>
+
+
+
+
+
+
+
+ ShutdownHookManager singleton.
+
+ @return ShutdownHookManager singleton.]]>
+
+
+
+
+
+
+ Runnable
+ @param priority priority of the shutdownHook.]]>
+
+
+
+
+
+
+
+
+ Runnable
+ @param priority priority of the shutdownHook
+ @param timeout timeout of the shutdownHook
+ @param unit unit of the timeout TimeUnit]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ShutdownHookManager enables running shutdownHook
+ in a deterministic order, higher priority first.
+
+ The JVM runs ShutdownHooks in a non-deterministic order or in parallel.
+ This class registers a single JVM shutdownHook and run all the
+ shutdownHooks registered to it (to this class) in order based on their
+ priority.
+
+ Unless a hook was registered with a shutdown explicitly set through
+ {@link #addShutdownHook(Runnable, int, long, TimeUnit)},
+ the shutdown time allocated to it is set by the configuration option
+ {@link CommonConfigurationKeysPublic#SERVICE_SHUTDOWN_TIMEOUT} in
+ {@code core-site.xml}, with a default value of
+ {@link CommonConfigurationKeysPublic#SERVICE_SHUTDOWN_TIMEOUT_DEFAULT}
+ seconds.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Tool, is the standard for any Map-Reduce tool/application.
+ The tool/application should delegate the handling of
+
+ standard command-line options to {@link ToolRunner#run(Tool, String[])}
+ and only handle its custom arguments.
+
+
Here is how a typical Tool is implemented:
+
+ public class MyApp extends Configured implements Tool {
+
+ public int run(String[] args) throws Exception {
+ // Configuration processed by ToolRunner
+ Configuration conf = getConf();
+
+ // Create a JobConf using the processed conf
+ JobConf job = new JobConf(conf, MyApp.class);
+
+ // Process custom command-line options
+ Path in = new Path(args[1]);
+ Path out = new Path(args[2]);
+
+ // Specify various job-specific parameters
+ job.setJobName("my-app");
+ job.setInputPath(in);
+ job.setOutputPath(out);
+ job.setMapperClass(MyMapper.class);
+ job.setReducerClass(MyReducer.class);
+
+ // Submit the job, then poll for progress until the job is complete
+ RunningJob runningJob = JobClient.runJob(job);
+ if (runningJob.isSuccessful()) {
+ return 0;
+ } else {
+ return 1;
+ }
+ }
+
+ public static void main(String[] args) throws Exception {
+ // Let ToolRunner handle generic command-line options
+ int res = ToolRunner.run(new Configuration(), new MyApp(), args);
+
+ System.exit(res);
+ }
+ }
+
+
+ @see GenericOptionsParser
+ @see ToolRunner]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Tool by {@link Tool#run(String[])}, after
+ parsing with the given generic arguments. Uses the given
+ Configuration, or builds one if null.
+
+ Sets the Tool's configuration with the possibly modified
+ version of the conf.
+
+ @param conf Configuration for the Tool.
+ @param tool Tool to run.
+ @param args command-line arguments to the tool.
+ @return exit code of the {@link Tool#run(String[])} method.]]>
+
+
+
+
+
+
+
+ Tool with its Configuration.
+
+ Equivalent to run(tool.getConf(), tool, args).
+
+ @param tool Tool to run.
+ @param args command-line arguments to the tool.
+ @return exit code of the {@link Tool#run(String[])} method.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ToolRunner can be used to run classes implementing
+ Tool interface. It works in conjunction with
+ {@link GenericOptionsParser} to parse the
+
+ generic hadoop command line arguments and modifies the
+ Configuration of the Tool. The
+ application-specific options are passed along without being modified.
+
+
+ @see Tool
+ @see GenericOptionsParser]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ this filter.
+ @param nbHash The number of hash function to consider.
+ @param hashType type of the hashing function (see
+ {@link org.apache.hadoop.util.hash.Hash}).]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Bloom filter, as defined by Bloom in 1970.
+
+ The Bloom filter is a data structure that was introduced in 1970 and that has been adopted by
+ the networking research community in the past decade thanks to the bandwidth efficiencies that it
+ offers for the transmission of set membership information between networked hosts. A sender encodes
+ the information into a bit vector, the Bloom filter, that is more compact than a conventional
+ representation. Computation and space costs for construction are linear in the number of elements.
+ The receiver uses the filter to test whether various elements are members of the set. Though the
+ filter will occasionally return a false positive, it will never return a false negative. When creating
+ the filter, the sender can choose its desired point in a trade-off between the false positive rate and the size.
+
+
+
+
+
+
+
+
+
+
+
+
+
+ this filter.
+ @param nbHash The number of hash function to consider.
+ @param hashType type of the hashing function (see
+ {@link org.apache.hadoop.util.hash.Hash}).]]>
+
+
+
+
+
+
+
+
+ this counting Bloom filter.
+
+ Invariant: nothing happens if the specified key does not belong to this counter Bloom filter.
+ @param key The key to remove.]]>
+
+
+
+
+
+
+
+
+
+
+
+ key -> count map.
+
NOTE: due to the bucket size of this filter, inserting the same
+ key more than 15 times will cause an overflow at all filter positions
+ associated with this key, and it will significantly increase the error
+ rate for this and other keys. For this reason the filter can only be
+ used to store small count values 0 <= N << 15.
+ @param key key to be tested
+ @return 0 if the key is not present. Otherwise, a positive value v will
+ be returned such that v == count with probability equal to the
+ error rate of this filter, and v > count otherwise.
+ Additionally, if the filter experienced an underflow as a result of
+ {@link #delete(Key)} operation, the return value may be lower than the
+ count with the probability of the false negative rate of such
+ filter.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ counting Bloom filter, as defined by Fan et al. in a ToN
+ 2000 paper.
+
+ A counting Bloom filter is an improvement to standard a Bloom filter as it
+ allows dynamic additions and deletions of set membership information. This
+ is achieved through the use of a counting vector instead of a bit vector.
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Builds an empty Dynamic Bloom filter.
+ @param vectorSize The number of bits in the vector.
+ @param nbHash The number of hash function to consider.
+ @param hashType type of the hashing function (see
+ {@link org.apache.hadoop.util.hash.Hash}).
+ @param nr The threshold for the maximum number of keys to record in a
+ dynamic Bloom filter row.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ dynamic Bloom filter, as defined in the INFOCOM 2006 paper.
+
+ A dynamic Bloom filter (DBF) makes use of a s * m bit matrix but
+ each of the s rows is a standard Bloom filter. The creation
+ process of a DBF is iterative. At the start, the DBF is a 1 * m
+ bit matrix, i.e., it is composed of a single standard Bloom filter.
+ It assumes that nr elements are recorded in the
+ initial bit vector, where nr <= n (n is
+ the cardinality of the set A to record in the filter).
+
+ As the size of A grows during the execution of the application,
+ several keys must be inserted in the DBF. When inserting a key into the DBF,
+ one must first get an active Bloom filter in the matrix. A Bloom filter is
+ active when the number of recorded keys, nr, is
+ strictly less than the current cardinality of A, n.
+ If an active Bloom filter is found, the key is inserted and
+ nr is incremented by one. On the other hand, if there
+ is no active Bloom filter, a new one is created (i.e., a new row is added to
+ the matrix) according to the current size of A and the element
+ is added in this new Bloom filter and the nr value of
+ this new Bloom filter is set to one. A given key is said to belong to the
+ DBF if the k positions are set to one in one of the matrix rows.
+
+
+
+
+
+
+
+
+ Builds a hash function that must obey to a given maximum number of returned values and a highest value.
+ @param maxValue The maximum highest returned value.
+ @param nbHash The number of resulting hashed values.
+ @param hashType type of the hashing function (see {@link Hash}).]]>
+
+
+
+
+ this hash function. A NOOP]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ The idea is to randomly select a bit to reset.]]>
+
+
+
+
+
+ The idea is to select the bit to reset that will generate the minimum
+ number of false negative.]]>
+
+
+
+
+
+ The idea is to select the bit to reset that will remove the maximum number
+ of false positive.]]>
+
+
+
+
+
+ The idea is to select the bit to reset that will, at the same time, remove
+ the maximum number of false positve while minimizing the amount of false
+ negative generated.]]>
+
+
+
+
+ Originally created by
+ European Commission One-Lab Project 034819.]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+ this filter.
+ @param nbHash The number of hash function to consider.
+ @param hashType type of the hashing function (see
+ {@link org.apache.hadoop.util.hash.Hash}).]]>
+
+
+
+
+
+
+
+
+ this retouched Bloom filter.
+
+ Invariant: if the false positive is null, nothing happens.
+ @param key The false positive key to add.]]>
+
+
+
+
+
+ this retouched Bloom filter.
+ @param coll The collection of false positive.]]>
+
+
+
+
+
+ this retouched Bloom filter.
+ @param keys The list of false positive.]]>
+
+
+
+
+
+ this retouched Bloom filter.
+ @param keys The array of false positive.]]>
+
+
+
+
+
+
+ this retouched Bloom filter.
+ @param scheme The selective clearing scheme to apply.]]>
+
+
+
+
+
+
+
+
+
+
+
+ retouched Bloom filter, as defined in the CoNEXT 2006 paper.
+
+ It allows the removal of selected false positives at the cost of introducing
+ random false negatives, and with the benefit of eliminating some random false
+ positives at the same time.
+
+